

/*
The contents of this file contain text and code describing and 
implementing the 'DES' encryption algorithm. Despite the fact 
that this information is freely available overseas, it remains 
a violation of ITAR and/or EAR to export this information 
from inside the US or Canada to outside the US or Canada, or 
to pass it to a non-US or non-Canadian citizen within the US 
or Canada. The US Government evidently defines 'Export' to 
include placing this information on a non-restricted FTP server 
or Web site. Please do not do so, and be sure that any person you
pass this on to is made aware of this restriction.
									Peter Trei
									ptrei@acm.org

 * THIS SOFTWARE IS PROVIDED BY PETER TREI ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.

This software is copyright (c) 1997 Peter Trei (ptrei@acm.org), except for
those portions written by Phil Karn, which retain their original ownership.

This software may be redistributed freely for use in the RSA DES Challenge,
but please obey the restrictions imposed by the US Government, and make
sure that anyone you pass it to is also aware of them.

This software may not be used for commercial purposes without the written
permission of Peter Trei and the other owners.

Please redistribute only as a complete, unmodified package, including 
source code, and ptrei@acm.org's PGP signature file and key.

 */

Notes on my DES Key Recovery (DESKR) program.

DESKR is a piece of software designed to find single-DES, CBC mode
keys by a brute-force, known-plaintext attack. This means that we have a
short piece of encrypted text to which we believe we have the 
corrosponding plain text, and we are trying to find the key which
transforms one to the other. This is a useful and realistic mode
of attack, since it is very common to have a good idea as to what
is in the header of a packet, the headers of a piece of email, etc. 

This is a 'brute-force' search because it relies on no clever
cryptanalysis - we simply try every key until we hit upon one which
decrypts the ciphertext to the expected plaintext. This sounds 
simple, but it's not.

DES has a 64 bit key, but 8 of those bits are for parity
only, and add nothing to the cryptological strength. The 56 
remaining bits give about 7.2x10^16 keys. This is a large 
number, so speed in all portions of the search is a primary goal.

There are two phases to testing a key - the key's expansion into
a key schedule, and the actual test decryption.

---------------
Key expansion: 

In DES, the 56 'real' bits of a DES key are expanded into 16 
subkeys, each 48 bits long, before the en/decryption starts. This
expanded form is referred to as a 'key schedule'. Each subkey 
is XOR'd into half of the data being encrypted or decrypted 
during one of the 16 DES 'rounds'.

The expansion of the key into the key schedule is a rather complex
system of rotations and permutations. Since it is usually performed
only once (the same key schedule is used repeatedly for encrypting
or decrypting all of the material to which that key is applied),
little effort has been applied to making the key expansion fast
and efficient. Typically, key expansion takes far more time than
the encryption or decryption of each block to which it is applied.

Thus, if we perform a standard key expansion to each key we try, 
the effort of key expansion greatly dominates the time we expend.

Fortunately, I've discovered a way around this. It turns out that
in all of the rotates and permutations performed in the key 
expansion, the bits of the key operate independently - each bit
in each subkey depends solely on the value of one of the bits in
the original key. Therefore, when a given bit in the original key
changes, those subkey bits which derive from it also change. So
far as I know, I am the first to describe this property.

[I reported this property in late 1996. Since then, I've discovered
that at least one person discovered it prior to me, and posted a
note mentioning it in the comp.arch.fpga archive back in August 1996.]

This means that for each bit in the original key, we can set up a
mask for each subkey, which we XOR into the subkey when that bit
changes. Thus, updating a key schedule from one key to another
involves determining which bit(s) in the key changed, and XORing 
into the subkeys the appropriate mask(s). This is much faster than 
generating a key schedule de-novo via the normal method.

If we test keys in numerical order, then on average two bits
will change for each iteration. We can do better than this by
testing in 'Gray code' order. (Thanks to Perry Metzger for
suggesting this.)

'Gray codes' are method of iterating through a set of binary 
numbers in such a way that only one bit changes during each
iteration.

Since only one bit changes, we need only ever XOR into the key
schedule one set of masks. However, it does mean that we are no
longer testing keys in numerical order. This is not important, so
long as we continue to test every key.

--------------

DES decryption.

We perform the test by decrypting the cryptotext with the current
key schedule, and seeing if it matches the expected plaintext. 
There are several ways in which this process can be speeded up
for a brute-force attack.

I will not describe the DES algorithm in detail - check Schneier
or one of the many other sources. I assume familiarity with the
algorithm from this point on.

I am not using the classical DES algorithm, but rather a
modification of it I found in an implementation by Phil Karn. 
This variant produces exactly the same output, but speeds things
up by storing the key schedule and the working data in an expanded
form. This avoids the expansion permutation altogether. It also 
combines the S-boxes and the P-permutation into a single step. 

There are a number of speedups that are applied in this program,
which are applicable only to this form of brute force search.

1. Strip off the initial permutation.

Since we are trying many keys on the same data, the inital
permutation can be done once, and the output stored and reused.
We do this once at the start of the test, and do not have to do
it again.

2. Strip off the final permutation.

Similarly, the output of the final DES round is normally sent
through a 'final permutation' to produce the final output. This
permutation is also fixed, and is the inverse of the initial 
permutation. For a known plaintext attack, we can run the expected
plaintext through the inital permutation, producing the expected
output from the final round of the decryption of the cryptotext.
By comparing this against the output of the DES rounds, we can 
determine if we've found the key without re-running the final
permutation.

3. (Usually) skip the 16th round.

Since we work on half of the data each round, the output of the
15th round is the final output for that half, and we can check it
before the 16th round is run. If it does not match the expected
value, we can skip running the 16th round. On average, we'll only
have to run round 16 one time in 2^32 keys.

4. (Usually) skip the 1st round.

This is a strange one. Because of the way the key is expanded, only
48 of the 56 key bits contribute to the subkey for each round. There
are thus 8 bits which do not contribute to any particular subkey. If
we make the 8 bits which do not contribute to the subkey for the first
round the most rapidly varying ones as we step through the keys, then
the output of the first round is invariant until we change another bit,
which happens only once in 256 keys.

To perform this magic, we need to rearrange the key bits to move those
particular bits to the least significant byte of the key. 

---------------

Installation:

Installation is pretty simple: Just 

1. Unzip the inner archive into a new directory.
2. Create a file 'desident.txt' which contains a brief, one-line identifier. eg:

Peter Trei (ptrei@acm.org)

DESKR looks for, or stores data, in four directories:

The 'Shared' directory, pointed to by the environment variable DESKRSHARE, if it exists.
The 'Local' directory, pointed to by the environment variable DESKRLOCAL, if it exists.
The 'program' directory, which is where the deskr.exe file lives.
the 'connected' directory, which is where the deskr program was started from.

The model here is that at many locations, people will be installing DESKR on a number
of machines with shared disks. Some information - the results being output, and 
the challenge data being worked on -  can be shared, as well as the deskr.exe executable
itself. Other info, such as the location in the keyspace of an individual machine, and 
(possibly) the perpetrator of the search, cannot. DESKRSHARE points to the common 
shared directory. DESKRLOCAL points to where to store local information. If 
either or both are not defined, the program tries to make reasonable guesses. 

DESKR looks for:

1. Challenge data. This should be in a file	testchal.txt. I'll probably rename this when
we get the actual real challenge data. This is searched for in this order: DESKRSHARE,
DESKRLOCAL, program directory, connected directory.	 If it can't be found in any of them,
then the program uses an internally stored challenge. At the moment, this is RSA's test
vector.

2. Results file, deskr.out. It will write or append to this file in the first of these
directories which it can find: DESKRSHARE, DESKRLOCAL, connected dir. A 'lock file' 
deslock.txt exists briefly to prevent simultaneous access by more than one machine.

3. Checkpoint files.
DESKR checkpoints it's status when it first starts, and then every 30 minutes - if the program 
is stopped, on restarting it will recommence from the point specified in the checkpoint file.
This file 'chkpnt.des', is place in DESKRLOCAL if it exists, otherwise in the connected dir.

4. Identity file. desident.txt
Each entry placed in the deskr.out file will include an indication of who the searcher was,
read from the desident file. This is searched for in this order: connected dir, DESKRLOCAL,
program dir, DESKRSHARE. The idea is that a locally defined value can override a
globally defined one. If no file is found, the string 'Unknown searcher' is used. (I may
delete this 'feature' - the user really should identify themselves).

Ideally, people will set up DESKRSHARE and DESKRLOCAL at startup, and automatically kick off 
deskr.exe as a low-priority, background process.


---------------
I really want beta-testers to dig into the key updating code, and assure themselves
that every key in a 'chunk' is indeed being tested - if you compile with the TESTGRAY
symbol and either the call to do_lower_16 or do_upper_16 uncommented in main(), you
ought to be able to test this.
 
---------------



Code description and porting notes.

This code is written as console app for Win95 and WinNT. The assembler
version has the following restrictions: It will not run under Windows 3.11,
or 16-bit MS-DOS since it needs 32 bit protected mode. It will not run on a 
386 (I'll fix this).

I expect porting to UNIX to be fairly straightforward. You'll have to deal
with the use of / vs \ in filenames, and some of the run-time functions will
need to be tweaked, as will the system #includes. Please use the symbol WIN32 
to #ifdef out the Microsoft specific code. There are also endian-specific
rearrangements in the IP/FP and the key expansion code.

After you have completed a port, PLEASE send the source back to me. I want all
'official' releases of the code to have my PGP signature, so people can
be sure they haven't received a version with a Trojan (or at least, they'll
know who to blame :-).

I'm not particularly proud of this code - most of it has been written
at great speed, and I have been in more of a 'get it working' mode
than a 'get it neat' mode. That said, there are some justifications
for what I've done.

You'll notice that a LOT of the variables are global. The less I have
to pass from routine to routine in argument lists, the faster the code
runs. It's also easier to get globals on the heap in and out of the
assembly language sections than it is to get stuff on and off of the
call frames.

I tend to favor obviousness over conciseness - I don't use a million
clever nested macros, unlike some authors.

File list.

DESKR.H
	Global variables, macros, and procedure prototypes.

DESKEY.C
	Mostly routines related to key and key schedule manipulation.

DESSPA.H
	Combined S-P permutation tables for Intel assembler.
	
DESSP.H 
	Combined S-P permutation tables for portable 'C'.

DESSTD5.C
	DES encrypt/decrypt for Microsoft Visual C++ inline assembler.

DESPORT.C
	DES encrypt/decrypt for portable 'C'.

DESUTIL.C
	Utility functions - data input and output, tests, etc.

DESKR.C
	Main module. User interface, setup, key schedule updates, etc.


The code should be built including all of these files, with the following 
exception: include either (DESSPA.H and DESSTD5.C) OR (DESSP.H and DESPORT.C).
The first includes inline Intel assembler, and the latter is generally portable
'C' code.

I've written this very much with a Wintel mindset - the following issues spring
to mind when porting it to other platforms:

1. Endianess. The data and key schedules gets re-arranged to account for the
endian property of Intel processors. This should get commented out for platforms
with the opposite endianess.

2. Filepath formats. Some of the routines assume file path separators are 
backslashes - this will have to be fixed for UNIX and other operating
systems which use slashes.

3. Include files. Some include files are different for Wintel vs UNIX etc.
I suspect we'll need some code like:

#ifndef WIN32
#include <unistd.h>
#else
#include <io.h>
#endif

If you're changes result in deskr -t reporting correct DES operation, you're probably
on the right path.

-------------
TO BE DONE, and OPEN ISSUES:

1. Faster code. I'm working on incorporting Svend Mikkelsens' code, which should
be at least 25% faster. I also still need to incorporate the key updates into the
assembly code, which should give a 10% speedup. I'm also looking at Biham's 
'Des on it's side' implementation.

2. Key servers. The current code makes no use of key servers. This could be done,
but I don't have the resources. The deskr.out file is formatted in such a way as
to allow it parsed easily by a server.

3. Checksums. Two forms of checksums are kept - 'half-matches' and the 'round 15
checksum'. I'm not too sure of the value of these, since I've recently realized 
that different DES algorithms can provide different, valid results for the same
chunk of keyspace. The deskr.out file will need to report the exact algorithm 
used to find the half-match.

4. We need better docs. There's a LOT of comments in the source code.

5. Don't worry about the speed of your code, except in the sensitive areas:
key_update, do_lower_16, and the DES round along with the routines which use
it. None of the other code if called often enough to matter.

----------------
Code history:

The original DES key expansion code, the 'C' DES encrypt/decrypt, the x86
IP/FP, and the original version of the x86 DES round are all from Phil Karn's
des386 package. I ported the assembler to MSVC++ inline assembler in 32
bit protected mode, and tweaked the DES round extensively, about doubling
it's speed. Phil got some parts of the IP/FP from other authors.

The rest is my own, though some of timing code is derived from Eric Young's
libdes.
----------------

Peter Trei
ptrei@acm.org 


