This report was included in a package sent to NIST.
Due to export control regulation and potential IPR, the corresponding source
codes are not publicly available.
If you cannot get it from NIST, please contact us in order to get it via
NDA.
Update of DFC Implementations
Fabrice Noilhan
The code of DFC has been changed since the version sent to the NIST
in June 1998. New code was written for the API functions and better
implementations of the inner encryption functions have been written. As
stated by the NIST, we did not expect that the API functions would be
used for timings, so that our code was not much optimized. New code has
been optimized and is still portable from one architecture to another.
The improvements in the encryption functions are due to different ways
of computing ax+b mod p mod 2**64, often dedicated to architectures and
compilers.
The new JAVA version uses 64 bits signed integers; they nonetheless
respect unsigned integers operations in the JAVA standard for most
operations (such as additions and multiplications). This fact, combined
with the new compilers and JIT compilers, gives a huge speed up (by
a factor of 40). The code has been optimized for Sun's UltraSparc
processors but it has a good behaviour on Intel processors. A version
dedicated to Intel processors could be a bit faster.
The C-API has been rewritten so as not to make as many conversions
as in the prior version. Thus, new timings using C API are really
faster. Two versions are provided: the standard ANSI-C uses 32 bits
integers and should work on all processors, regardless on endianess
or size of ints, or alignement requirements. This is the default
when building. The second version uses 64 bits integers, may they be
provided by the compiler (long long for gcc on Intel processors) or
by the processor itself. To build this version, you have to specify
the "INT_64" compile-time definition. In addition, if you have a 64 bit
processor, then you have to use the "LONG_64" compile-time definition.
The code for the inner function is the same in the 32 bits case, and has
been modified in the 64 bits case.
We should be cautious when using timings based on C-API versions:
several candidates suppose that the processor is little-endian and 32
bits and use casts to convert from an array of bytes to 32 bits integers,
which is prohibited on other processors. Using similar tricks, we could
have a speedup, but it is not portable and will not produce the correct
result on 64 bits processors for instance. So our implementation does not
use these non-standard tricks.
Assembly coded functions are also provided (Pentium, Pentium Pro). A C
version using one ASM opcode is provided for Alpha processors and a C
version using floats is provided for Sparc processors (this version is
faster than the version using 64 bits integers). These implementations
are noticeably faster on 32 bits processors than C implementations. It
is not surprising given the fact that DFC is 64 bits oriented and current
compilers do not optimize computings on 64 bits integers. On 64 bits
processors, timings of C code and assembly code are similar.
See details in README files of each directory for implementations.
Timings of implementations provided (timings have been made by direct call
to the encryption function. For ANSI C code and C 64 bits code, this is
made by the dfc_bench program):
Processor compiler cycles (all key sizes) author
Compiler Flags
(encrypt/decrypt) key setup
ANSI C code (32 bits) (see RefCode directory)
1. Alpha 21164 DEC cc 2562 10248 Pornin
2. Pentium II GCC 2592 10368 Pornin
3. UltraSparc Sun CC-5.0 4160 16640 Pornin
C code (64 bits) (see RefCode directory)
4. Alpha 21164 DEC cc 564 2256 Noilhan
5. Pentium II GCC 1262 5048 Noilhan
6. UltraSparc Sun CC-5.0 875 3500 Noilhan
Other implementations
Alpha 21164 DEC cc 526 2104 Harley
Alpha 21164 DEC cc 323 1292 Harley (C code + one Opcode)
Alpha 21164 GCC 310 1240 Harley (ASM)
Pentium NASM 609 2436 Behr/Harley/McCougan/Mathisen (ASM)
Pentium II NASM 392 1568 Behr/Harley/McCougan/Mathisen (ASM)
UltraSparc SUN CC-5.0 775 3100 Harley (C code using floats)
StrongARM GCC 440 1760 Harley/Seal (ASM)
Compiler flags for RefCode directory:
1. -w0 -arch ev56 -O4 -newc -fast -inline all -tune ev56 -speculate all
2. -Wall -O9 -finline-functions -fomit-frame-pointer -mpentiumpro
3. -fast -xO5 -xtarget=ultra2 -xcache=16/32/1:256/64/1 -xsafe=mem -xarch=v9a
4. -w0 -arch ev56 -O4 -newc -fast -inline all -tune ev56 -speculate all -DINT_64 -DLONG_64
5. -Wall -O9 -finline-functions -fomit-frame-pointer -mpentiumpro -DINT_64
6. -fast -xO5 -xtarget=ultra2 -xcache=16/32/1:256/64/1 -xsafe=mem -xarch=v9a -DINT_64 -DLONG_64
See files in AuxCode directory for other implementations.
Compiler flags are indicated in the header of these files.
Return to the DFC home page.