Introduction to Decorrelation Theory
On-Line Manual In order to make the decorrelation theory more easily accessable, here is an easy introduction. This "decorrelation manual" will be subject to frequent updates. (It is mainly under construction at this time.) For any feedback, please email to
Serge Vaudenay <>

Last update: July 15th, 2002.

Table of Content

Goals
d-wise Distribution Matrices
Decorrelation Distance and Decorrelation Bias
Matrix Norms and Decorrelation
The Four Problems of Decorrelation Theory
On the Choice of the Matrix Norm
Norms and Undistinguishability
Decorrelation Modules
A few Decorrelated Cipher Constructions
Differential and Linear Cryptanalysis
Iterated Attacks
References

1. Goals

The goals of decorrelation theory is

to make the theory on k-wise independence usable in the block cipher theory
to provide handy tools for dealing with undistinguishability
to make formal security results feasible on block ciphers
to provide design guidelines for new algorithms.

For this we consider block ciphers as random permutations in a mathematical sense: the random choice of the secret key K induce a distribution on the permutation C_K. In the following we will omit the reference to the key.

The goal of the designer of the block cipher is to make it "look like" a truly random permutation, i.e. like a random permutation C^* with a uniform distribution among the set of permutations. For this we need tools to compare C and C^*.

Intuitively, we can compare C and C^* at different level of depth. Namely, if for any fixed plaintext x the corresponding ciphertext C(x) is uniformly distributed, we say that C and C^* are similar "to the order 1". Similarly, if for any two different plaintexts x₁ and x₂ the corresponding ciphertext pair (C(x₁),C(x₂)) has a uniform distribution among all the pairs of different ciphertexts, we say that C and C^* are similar "to the order 2". And so on. This intuitively defined the notion of order of decorrelation.

2. d-wise Distribution Matrices

For any random function F (a block cipher is a random function with the special property that it is always a permutation) and any integer d we associate a huge matrix [F]^d with real number entries which is called the "d-wise distribution matrix".

Beginners do not really need to understand where this matrix come from. For the others, here is the formal definition.

Definition (d-wise distribution matrix).

Let F be a random function and d be en integer. Any d inputs x₁, ..., x_d define a d-multi-input
x = (x₁,...,x_d).
Any d outputs y₁, ..., y_d define a d-multi-output
y = (y₁,...,y_d).
To any d-multi-input x and any d-multi-output y we associate the probability
[F]^d_(x,y) = Pr[ C(x₁=y₁), ..., C(x_d=y_d) ].
This way we define the d-wise distribution matrix [F]^d in which each row corresponds to a d-multi-input and each colomn corresponds to a d-multi-output. In particular, the row of [F]^d corresponding to the d-multi-input (x₁,...,x_d) defines the distribution of the random d-multi-output (C(x₁),...,C(x_d)).

The most important property we need to know is the following.

Proposition 1. If F and G are random functions with independent distributions, we can consider the product FoG. For any integer d we have
[F o G]^d = [G]^d x [F]^d
with the regular matrix product.

Interestingly, if C^* denotes the ideal random permutation (which means it has a uniform distribution), we have

(Note that this does not hold for functions. Indeed if F is the constant function which is zero on all points, we have [F o F^*]^d = [F]^d and [F^* o F]^d = [G]^d where G is a random function such that G(x) = G(y) for any x and y and G(x) is uniformly distributed.) This implies the following property.

Proposition 2. If C₁, C₂ and C^* are random permutations with independent distributions, C^* being uniformly distributed, for any integer d we have
[C₁ o C₂]^d - [C^*]^d = ([C₂]^d - [C^*]^d) x ([C₁]^d - [C^*]^d)

(To prove this, we just develop the right hand term and use Proposition 1.)

In the introduction was addressed the problem of comparing a cipher C to the ideal cipher C^* "to a given order". Now we can be a little more precise. We will compare [C]^d and [C^*]^d.

3. Decorrelation Distance and Decorrelation Bias

Assuming we are given a norm ||.|| on the matrices set, a straightforward way to compare the two matrices [F]^d [G]^d is to compute the distance

This is called the d-wise decorrelation distance between F and G according to the norm ||.||.

Random functions F must be compared to an ideal random function F^* which has a uniform distribution. For this we define

_||.||

the d-wise decorrelation bias of function F.

Similarly, ciphers C must be compared to the ideal cipher C^*, and we define

_||.||

the d-wise decorrelation bias of permutation C.

The question on the choice of the norm remains.
What we can already notice is that a decorrelation distance of zero do not depend on this choice, but holds for any norm. Thus we say that F and G have the same decorrelation of order d if their d-wise decorrelation distance is zero. Similarly, we say that a function F is perfectly decorrelated to the order d if its d-wise decorrelation bias is zero. We say that a cipher C is perfectly decorrelated to the order d if its d-wise decorrelation bias is zero.

We can notice the link with well-known combinatorial properties. First of all, when a cipher C is perfectly decorrelated to the order 1, it obviously achieve perfect secrecy in Shannon's sense when used only once. Second we notice that a random hash function H is perfectly decorrelated to the order 2 if, and only if it is strong universal₂ hash function in Carter-Wegman's sense. (See Ref. [CW79,WC81].)

We have an equivalent to the Shannon Theorem.

Proposition 3. If F is a perfectly decorrelated function to the order d whose range is {0,1}ⁿ, then we have H(F) >= dn where H(F) denotes Shannon's entropy for F.
Similarly, if C is a perfectly decorrelated permutation over {0,1}ⁿ to the order d, then we have H(C) >= dn - O(d²/2ⁿ).

Hence we need a lot of random bits in order to construct perfectly decorrelated functions.

4. Matrix Norms and Decorrelation

A mapping from the set of matrices to the set of real numbers is a norm if

||A|| = 0 holds if and only if A=0
||u.A|| = |u|.||A|| for any real number u (|u| is the absolute value of u here)
||A+B|| <= ||A|| + ||B||

In addition "matrix norms" are norms such that

||AxB|| <= ||A||.||B||

If we measure decorrelation biases with matrix norms, from the property of matrix norms, the definitions and Proposition 2, we have the interesting property that

₂

₁

₂

This means that the decorrelation bias of an iterated cipher decreases exponentially with the number of rounds (as long as the decorrelation of a few rounds is less than one), which is quite a satisfactory property. Note that this property does not hold for functions (despite what was claimed in [Sac99]).

5. The Four Problems of Decorrelation Theory

Decorrelation theory faces four complementary problems.

How to choose the matrix norm?
Does the decorrelation of a cipher provide protection?
How to build simple decorrelated primitives (decorrelation modules)?
How to use them in order to build decorrelated ciphers?

For the fourth problem, the strategy consists of considering any regular conservative designs, and plugging into it a few decorrelated primitives which are called decorrelation modules. If we choose the right notion of decorrelation (the right norm), the rights decorrelation modules and plug them in the right positions, we obtain a decorrelated cipher which may be protected against the right attacks.

6. On the Choice of the Matrix Norm

The Euclidean norm (the L₂ norm) has first been considered. For any matrix A, its Euclidean norm ||A||₂ is the square root of the sum of the square of all entries. It is not obvious to see it is a matrix norm, but it is indeed. One problem is that it leads with hard to manipulate quantities. (See Ref. [Sac98] for more details.)

A much more convenient norm is the |||.|||_oo norm. For any matrix A, the norm |||A|||_oo is the maximum over all rows of the sum of the absolute values of all entries:

_oo

_(x,y)

The notation of this norm comes from the fact that it is precisely the matrix norm associated to the L_oo vector-norm. More precisely we define

_oo

for a vector and

_oo

This is the regular way for constructing a matrix norm from a vector norm.

Other interesting matrix norms are the ||.||_a norm and the ||.||_s norm defined by

_x₁

_y₁

_x₂

_y₂

_{((x₁,x₂,...),
(y₁,y₂,...))}

_{b₁,b₂,...}

_{x_1,0}

_{x_1,1}

_{x_2,0}

_{x_2,1}

_{((x_1,b₁,x_2,b₂,...),
(x_1,1-b₁,x_2,1-b₂,...))}

where b₁, b₂, ... are 0 or 1.

7. Norms and Undistinguishability

In the theory of randomness, we measure how much a function is pseudorandom by measuring the advantage of Turing tests. Actually, given two random functions F and G, we consider an oracle Turing machine A^O which outputs 0 or 1 after playing with the oracle O. We measure the advantage for distinguishing F from G by

Considering a class Cl of oracle Turing machines which output 0 or 1, we consider the best advantage

_Cl

_{A in Cl}

Here are important results which motivate the choice of the |||.|||_oo, ||.||_a and ||.||_s matrix norms.

Theorem 1 (Non-adaptive distinguishers). Let d be an integer. Let Cl^d_na be the class of non-adaptive distinguishers limited to d chosen plaintexts. Such an attack consists of choosing d plaintexts x₁, ..., x_d, querying the oracle with it, then choosing the final output 0 or 1. For any random function F, and the ideal random function F^*, we have
Adv_{Cl^d_na}(F,F^*) = (1/2).DecF^d_{|||.|||_oo}(F).
Similarly, for any cipher C and the ideal cipher C^*, we have
Adv_{Cl^d_na}(C,C^*) = (1/2).DecP^d_{|||.|||_oo}(C).

Theorem 2 (Adaptive distinguishers). Let d be an integer. Let Cl^d_a be the class of (adaptive) distinguishers limited to d chosen plaintexts. For any random function F, and the ideal random function F^*, we have
Adv_{Cl^d_a}(F,F^*) = (1/2).DecF^d_{||.||_a}(F).
Similarly, for any cipher C and the ideal cipher C^*, we have
Adv_{Cl^d_a}(C,C^*) = (1/2).DecP^d_{||.||_a}(C).

Theorem 3 (Super-pseudorandomness). Let d be an integer. Let Cl^d_s be the class of (adaptive) distinguishers limited to d chosen plaintexts or ciphertexts. Such an attack against a cipher C can choose to query C or it inverse C^-1. For any cipher C and the ideal cipher C^*, we have
Adv_{Cl^d_s}(C,C^*) = (1/2).DecP^d_{||.||_s}(C).

(See Ref. [Sac99] for more details.)

This formalism turns out to make trivial (from the matrix norm properties) the following intuitive results. If we consider r independent instances C₁, ..., C_r of the same cipher C, we consider the iterated cipher C_ro...oC₁. We have

_{Cl^d_na}

₁

^r-1

_{Cl^d_na}

_{Cl^d_a}

₁

^r-1

_{Cl^d_a}

_{Cl^d_s}

₁

^r-1

_{Cl^d_s}

The best advantage for distinguishing an iterated cipher from the ideal cipher C^* decreases exponentially with the number of rounds r.

8. Decorrelation Modules

Inspired by what has been done based on the Carter-Wegman universal hashing theory (see Ref. [CW79], Ref. [WC81] and Ref. [CG89]), it is quite straightforward to propose decorrelation modules.
Here are a few decorrelation modules for the corresponding norms.

(NUT-I: Perfectly decorrelated random function in GF(q))
First of all, making a random function on a finite field GF(q) which is perfectly decorrelated to the order d (which means with a decorrelation bias of 0) is easy. If we let
be random with a uniform distribution, the function
on GF(q) is such that DecF^d(F) = 0.
(NUT-II: Perfectly pairwise decorrelated permutation in GF(q))
Making a random permutation on a finite field GF(q) perfectly decorrelated to an arbitrary order d is still an open problem. For d=1, the trivial solution
satisfies DecP¹(C) = 0, where a is uniformly distributed in GF(q). Note that this corresponds to Vernam's cipher (also known as one-time pad) which is known to be unconditionally secure (from Shannon's theory) if used only once.
For d=2, the well known solution
satisfies DecP²(C) = 0, where (a,b) is uniformly distributed in GF(q)^*xGF(q).
In general, polynomials of larger degree are not permutations. Therefore it is not quite clear how to adapt the previous construction of perfectly decorrelated functions.
(NUT-III: Pairwise decorrelated function in GF(p) for the L₂ norm)
In general, practical applications require that we work with bit-strings, in {0,1}^m, which suggest that we use GF(2^m). It is a little frustrating however, for software, to have to implement the multiplication in this field when we have an optimized built-in hardware multiplication on integers. For this we can use GF(p) with a prime p close to 2^m.
The first possibility consists of choosing the maximal prime p smaller than 2^m. Let us write
Then we let
with
uniformly distributed. We obtain that
for delta < 1/14. Unfortunately, the decorrelation with respect to the |||.|||_oo, ||.||_a and ||.||_s are quite bad. Namely, we have
which is approximately 2, i.e. the worst decorrelation bias.
(See
Ref. [Sac98] for more details.)
(NUT-IV: Decorrelated function in GF(q) for the |||.|||_oo and ||.||_a norms)
The other possibility consists of choosing the minimal prime power q larger than 2^m. Let us write
Then we let
with
uniformly distributed, and where r is any injective mapping from {0,1,...,2^m-1} to GF(q) (a "representation" mapping) and pi is any surjective mapping from GF(q) to {0,1,...,2^m-1} (a "projection" mapping). For instance we can use
for a prime p when considering {0,1,...,2^m-1} as a subset of GF(p). We obtain that
(See
Ref. [Sac99] for more details.)

9. A few Decorrelated Cipher Constructions

Once we have a decorrelation module, we can plug it in regular conservative ciphers. In order to deduce an upper bound on the decorrelation bias of the cipher from an upper bound on the decorrelation bias of these modules, we use the following "meta-theorem".

Theorem 4 (Decorrelation modules to decorrelated ciphers).

Let
C = Omega(F₁,...,F_r,C₁,...,C_s)
be a computation network which defines a cipher from r independent random functions F₁,...,F_r and s random permutations C₁,...,C_s. We consider an ideal version of this construction
C' = Omega(F^*₁,...,F^*_r, C^*₁,...,C^*_s)
in which the F^*_i and C^*_j are uniformly distributed functions or permutations. We assume that a computation of C requires a_i computations of F_i for any i, and b_j computations of C_j for any j. For any d we have
DecP^d(C) <= DecP^d(C') + DecF^da₁(F₁) + ... + DecF^da_r(F_r) + DecP^db₁(C₁) + ... + DecP^db_s(C_s)
for the |||.|||_oo and ||.||_a norms. Similarly, assuming that a computation of C or C^-1 requires a'_i computations of F_i for any i, and b'_j computations of C_j or C^-1_j for any j, we have
DecP^d_{||.||_s}(C) <= DecP^d_{||.||_s}(C') + DecF^da'₁_{||.||_a}(F₁) + ... + DecF^da'_r_{||.||_a}(F_r) + DecP^db'₁_{||.||_s}(C₁) + ... + DecP^db'_s_{||.||_s}(C_s).

(See Ref. [Sac99] for more details.)
This theorem reduce the problem of estimating the decorrelation bias of C to the problem of estimating the decorrelation bias of C'.

It shall be outlined that it makes trivial the extension of Luby-Rackoff Theorem to pseudorandom functions instead of random ones: assuming that the advantage of any distinguisher which is limited to d chosen plaintext is at most epsilon against any round function F_i of a Feistel construction with at least three rounds, we obtain that any distinguisher which is limited to d chosen plaintext against the Feistel cipher is at most

^-m/2

This result extends to chosen plaintext and ciphertext with 4 instead of 3.

We can use the following constructions.

(Coconut design)
We let C₁ and C₃ be any ciphers, and C₂ be a decorrelation invertible module. We can use the
construction. For the ideal version with C₂ = C^*, we obtain that DecP^d(C') = 0. Therefore we have
for the |||.|||_oo, ||.||_a and ||.||_s norms.
In Ref. [Stacs98] was propose the Coconut98 cipher with a perfect pairwise decorrelation with
shown as the NUT-II decorrelation module.
(Peanut design)
In Peanut ciphers, we use a regular r-round Feistel cipher with decorrelation modules F₁, ..., F_r as round functions. With the standard notations we have
We recall that the Feistel scheme Psi is defined by
The decorrelation bias of the corresponding ideal function C' is given by Luby-Rackoff Theorem (see Ref. [LR88]):
Thus if we let epsilon denote the maximal decorrelation bias of the decorrelation modules we have
(See Ref. [Sac99] for more details.)
In Ref. [Stacs98] was propose the Peanut98 cipher with pairwise decorrelation with
In Ref. [Aes98] (see also Ref. [Aes99]) was propose the DFC cipher with pairwise decorrelation with
These are the NUT-IV decorrelation module. In Ref. [Sac98] this construction was adapted to the L₂ norm in order to propose the Peanut97 cipher with pairwise decorrelation with
which is the NUT-III decorrelation module.
(Walnut design)
We can also use the Lai-Massey Scheme which has been used in the IDEA cipher (see Ref. [Lai92] and Ref. [LM90]). More precisely we define
where sigma is a group alpha-orthomorphism. Similar bounds than for the Luby-Rackoff construction can be obtained. We need 3 rounds for pseudorandomness, and 4 rounds for super-pseudorandomness. In the Walnut ciphers, F₁, ..., F_r are decorrelation modules.
(See Ref. [Asiacrypt99] for more details.)

Other constructions can be adapted. In the literature we can find many theorems of constructions in the ideal cases. For instance Ref. [ZMI89] investigates generalized Feistel schemes with more than two branches. Ref. [NR99] improves the Luby-Rackoff construction by using the same round functions several times. Ref. [PRS99] investigates a new construction, ...

Note on the "nut" names: the first aim of this name was to claim that decorrelation modules are cheap and basically cost nuts (see [Liens-97-3]). Here NUT stands for "N-Universal Transformation" (which reminds the Carter-Wegman paradigm).
COCONUT stands for "Cipher Organized with Cute Operations and NUT".
PEANUT stands for "Pretty Encryption Algorithm and NUT".
WALNUT stands for "Wonderful Algorithm with Light NUT".
All other nuts are welcome!

10. Differential and Linear Cryptanalysis

Differential cryptanalysis against r-round block ciphers usually starts from a simple distinguisher between a (r-i)-round block cipher and the perfect cipher with a small i (typically i = 1, 2, or 3). This distinguisher uses a fixed differential characteristic. It can be formalized as follows.

Differential distinguisher

iterate n times the following process
1. pick a random X
2. query C(X) and C(X XOR a)
3. if C(X XOR a) = C(X) XOR b then output 1, exit the loop, and stop
output 0 and stop

The capability of a distinguisher A to distinguish two block ciphers is measured by the advantage

₁

₂

₁

₂

where the probabilities hold over the X and C samples spaces.

Given a function f we define

and given a block cipher C (defined by a random key) we define

_C,X

The link between the advantage of the differential distinguisher and this quantity is given by the following result proven in [Stacs98].

Proposition 4. Given a random permutation C over {0,1}^m and a differential distinguisher A characterized by n,a,b as above, we have

where C^* is a uniformly distributed random permutation over {0,1}^m.

There is also a link between EDP and the decorrelation to the order 2.

Proposition 5. Given a random permutation C over {0,1}^m, for any a and b we have

_{|||._oo}

This leads to the following theorem.

Theorem 5 (Decorrelation and differential distinguisher). Given a random permutation C over {0,1}^m and a differential distinguisher A which uses n iterations, the advantage for distinguishing C from C^* is at most

_{|||._oo}

where C^* is a uniformly distributed random permutation over {0,1}^m.

Other approach to guaranty low differential probabilities in block ciphers hold for a target group law like XOR. We can notice that the result here extends to differential attacks using any group law which is somewhat stronger.

There is a similar treatment for linear distinguishers. They are formalized by n,a,b and an acceptance set B:

Linear distinguisher

initialize a counter u to 0
iterate n times the following process
1. pick a random X
2. query C(X)
3. if a.X = b.C(X) then increment counter u
If u/n is in B then output 1 otherwise output 0

We define

and we have the following results.

Proposition 6. Given a random permutation C over {0,1}^m and a linear distinguisher A characterized by n,a,b as above, we have

where C^* is a uniformly distributed random permutation over {0,1}^m. Here the big O holds as n increases.

Proposition 7. Given a random permutation C over {0,1}^m, for any a and b we have

_{|||._oo}

Theorem 6 (Decorrelation and linear distinguisher). Given a random permutation C over {0,1}^m and a linear distinguisher A which uses n iterations, the advantage for distinguishing C from C^* is a big O (as n increases) of the cubir root of

_{|||._oo}

where C^* is a uniformly distributed random permutation over {0,1}^m.

This treatment is quite useful when desiging block ciphers: if we manage to construct one with low decorrelation of order 2, we can guaranty for free that no differential or linear characteristic has a useful bias on average over the distribution of the key.

11. Iterated Attacks

[to be completed]

12. References

[CW79]	*Universal Classes of Hash Functions*. By Carter and Wegman. In Journal of Computer and System Sciences vol. 18, pp. 143-154, 1979.
[WC81]	*New Hash Functions and their Use in Authentication and Set Equality*. By Wegman and Carter. In Journal of Computer and System Sciences vol. 22, pp. 265-279, 1981.
[LR88]	*How to Construct Pseudorandom Permutations from Pseudorandom Functions*. By Luby and Rackoff. In SIAM Journal on Computing vol. 17, pp. 373-386, 1988.
[CG89]	*On the Power of Two-Point Based Sampling*. By Chor and Goldreich. In Jounal of Complexity vol. 5, pp. 96-106, 1989.
[ZMI89]	*Impossibility and Optimality Results on Constructing Pseudorandom Permutations*. By Zheng, Matsumoto and Imai. In EUROCRYPT' 89, pp. 412-422, LNCS 434, Springer-Verlag, 1990.
[LM90]	*A Proposal for a New Block Encryption Standard*. By Lai and Massey. In EUROCRYPT' 90, pp. 389-404, LNCS 473, Springer-Verlag, 1991.
[Lai92]	*On the Design and Security of Block Ciphers*. By Lai. In ETH Series in Information Processing vol. 1, Hartung-Gorre Verlag Konstanz, 1992.
[Liens-97-3]	*A Cheap Paradigm for Block Cipher Security Strengthening*. Technical report LIENS-97-3, Ecole Normale Supérieure, 1997.
[Stacs98]	*Provable Security for Block Ciphers by Decorrelation*. In STACS' 98, pp. 249-275, LNCS 1373, Springer-Verlag, 1998.
[Sac98]	*Feistel Ciphers with L₂ Decorrelation*. In SAC' 98, pp. 1-14, LNCS 1556, Springer-Verlag, 1999.
[Aes98]	*Decorrelated Fast Cipher: an AES Candidate*. By Gilbert, Girault, Hoogvoorst, Noilhan, Pornin, Poupard, Stern and Vaudenay. In the Proceedings from the First Advanced Encryption Standard Candidate Conference, National Institute of Standards and Technology (NIST), 1998.
[NR99]	*On the Construction of Pseudorandom Permutations: Luby-Rackoff Revisited*. By Naor and Reingold. In Journal of Cryptology vol. 12, pp. 29-66, 1999.
[Aes99]	*DFC Update*. By Baudron, Gilbert, Granboulan, Handschuh, Harley, Joux, Nguyen, Noilhan, Pointcheval, Pornin, Poupard, Stern and Vaudenay. In the Proceedings from the Second Advanced Encryption Standard Candidate Conference, National Institute of Standards and Technology (NIST), 1999.
[PRS99]	*Towards Making Luby-Rackoff Ciphers Optimal and Practical*. By Patel, Ramzan, Sundaram. In Fast Software Encryption' 99, pp. 171-185, LNCS 1636, Springer-Verlag, 1999.
[Eurocrypt99]	*Resistance Against General Iterated Attacks*. In EUROCRYPT' 99, pp. 255-271, LNCS 1592, Springer-Verlag, 1999.
[Sac99]	*Adaptive-Attack Norm for Decorrelation and Super-Pseudorandomness*. Technical report LIENS-99-2, Ecole Normale Supérieure, 1999. To appear in SAC'99.
[Asiacrypt99]	*On the Lai-Massey Scheme*. Technical report LIENS-99-3, Ecole Normale Supérieure, 1999. To appear in ASIACRYPT'99.

Return to the Decorrelation Home Page