Update of DFC Implementations

Fabrice Noilhan

The code of DFC has been changed since the version sent to the NIST
in June 1998. New code was written for the API functions and better
implementations of the inner encryption functions have been written. As
stated by the NIST, we did not expect that the API functions would be
used for timings, so that our code was not much optimized. New code has
been optimized and is still portable from one architecture to another.
The improvements in the encryption functions are due to different ways
of computing ax+b mod p mod 2**64, often dedicated to architectures and
compilers.

The new JAVA version uses 64 bits signed integers; they nonetheless
respect unsigned integers operations in the JAVA standard for most
operations (such as additions and multiplications). This fact, combined
with the new compilers and JIT compilers, gives a huge speed up (by
a factor of 40). The code has been optimized for Sun's UltraSparc
processors but it has a good behaviour on Intel processors. A version
dedicated to Intel processors could be a bit faster.

The C-API has been rewritten so as not to make as many conversions
as in the prior version. Thus, new timings using C API are really
faster. Two versions are provided: the standard ANSI-C uses 32 bits
integers and should work on all processors, regardless on endianess
or size of ints, or alignement requirements. This is the default
when building. The second version uses 64 bits integers, may they be
provided by the compiler (long long for gcc on Intel processors) or
by the processor itself. To build this version, you have to specify
the "INT_64" compile-time definition. In addition, if you have a 64 bit
processor, then you have to use the "LONG_64" compile-time definition.
The code for the inner function is the same in the 32 bits case, and has
been modified in the 64 bits case.

We should be cautious when using timings based on C-API versions:
several candidates suppose that the processor is little-endian and 32
bits and use casts to convert from an array of bytes to 32 bits integers,
which is prohibited on other processors. Using similar tricks, we could
have a speedup, but it is not portable and will not produce the correct
result on 64 bits processors for instance. So our implementation does not
use these non-standard tricks.

Assembly coded functions are also provided (Pentium, Pentium Pro). A C
version using one ASM opcode is provided for Alpha processors and a C
version using floats is provided for Sparc processors (this version is
faster than the version using 64 bits integers). These implementations
are noticeably faster on 32 bits processors than C implementations. It 
is not surprising given the fact that DFC is 64 bits oriented and current
compilers do not optimize computings on 64 bits integers. On 64 bits
processors, timings of C code and assembly code are similar.

See details in README files of each directory for implementations.

Timings of implementations provided (timings have been made by direct call
to the encryption function. For ANSI C code and C 64 bits code, this is
made by the dfc_bench program):


    Processor     compiler     cycles (all key sizes)        author             
                     Compiler Flags
                               (encrypt/decrypt)  key setup
    
    ANSI C code (32 bits) (see RefCode directory)
1.  Alpha 21164   DEC cc       2562               10248      Pornin
2.  Pentium II    GCC          2592               10368      Pornin
3.  UltraSparc    Sun CC-5.0   4160               16640      Pornin
    
    C code (64 bits) (see RefCode directory)
4.  Alpha 21164   DEC cc       564                2256       Noilhan
5.  Pentium II    GCC          1262               5048       Noilhan
6.  UltraSparc    Sun CC-5.0   875                3500       Noilhan
    
    Other implementations 
    Alpha 21164   DEC cc       526                2104       Harley
    Alpha 21164   DEC cc       323                1292       Harley (C code + one Opcode)
    Alpha 21164   GCC          310                1240       Harley (ASM)
    Pentium       NASM         609                2436       Behr/Harley/McCougan/Mathisen (ASM)
    Pentium II    NASM         392                1568       Behr/Harley/McCougan/Mathisen (ASM)
    UltraSparc    SUN CC-5.0   775                3100       Harley (C code using floats)
    StrongARM     GCC          440                1760       Harley/Seal (ASM)

Compiler flags for RefCode directory:
1. -w0 -arch ev56 -O4 -newc -fast -inline all -tune ev56 -speculate all
2. -Wall -O9 -finline-functions -fomit-frame-pointer -mpentiumpro
3. -fast -xO5 -xtarget=ultra2 -xcache=16/32/1:256/64/1 -xsafe=mem -xarch=v9a
4. -w0 -arch ev56 -O4 -newc -fast -inline all -tune ev56 -speculate all -DINT_64 -DLONG_64
5. -Wall -O9 -finline-functions -fomit-frame-pointer -mpentiumpro -DINT_64
6. -fast -xO5 -xtarget=ultra2 -xcache=16/32/1:256/64/1 -xsafe=mem -xarch=v9a -DINT_64 -DLONG_64

See files in AuxCode directory for other implementations.
Compiler flags are indicated in the header of these files.

Return to the DFC home page.