Format-preserving encryption
In cryptography, format-preserving encryption (FPE) refers to ciphers that encrypt (permute) arbitrarily sized sets. This is a generalization of standard block ciphers, such as AES, that permute fixed-size sets (for a 128-bit block cipher, the set being permuted is {0,...,2^128-1}).
The usual approach is to use a well-understood block cipher (such as AES) as a primitive, in a mode of operation that constructs a family of permutations on the desired domain. Thus the format of the encrypted data is the same as the format of the unencrypted data. The key of the new cipher is the set of keys used for the primitive block cipher operation.
There are ways to implement FPE that are provably as secure as the underlying block cipher. How to do this was first described in a paper by cryptographers John Black and Phillip Rogaway,[1] which described three ways to do this. FPE is NOT an approved MODE of the AES algorithm and hence its not FIPS Compliant. An approach based on one of these techniques has been accepted by NIST for consideration as an approved mode of the AES algorithm under the name "Feistel Finite Set Encryption Mode (FFSEM)".[2] The following discussion uses AES as the underlying block cipher, but the same results are also true for any other block cipher.
The motivation for FPE
The motivation for using FPE comes from the problems associated with integrating encryption into existing applications. Suppose that you want to encrypt the 16-digit credit card number 1234567812345670
. Using other modes of the AES algorithm like ECB or CBC will transform a credit card number into a large, fixed-length, binary value.
Using AES-128-ECB, this credit card number might get encrypted to the hexadecimal value 0x96a45cbcf9c2a9425cde9e274948cb67
, which contains many bytes that are considered invalid when compared to a typical credit card number. If a credit card number is stored in a column of a database whose entries are char
or varchar
data, then the encrypted data cannot be stored in same column without changing the format of the column. If the encryped data is Base64 encoded to ensure that it only contains valid characters, the size of the encrypted credit card number increases from 16 bytes to 24 bytes, changing the encrypted credit card number to lqRcvPnCqUJc3p4nSUjLZw==
, for example. In either case, applications that process the credit number may similarly be unable to handle an encrypted value without some modification.
Using AES-128-CBC, this credit card number might get encrypted to the hexadecimal value 0xde015724b081ea7003de4593d792fd8b695b39e095c98f3a220ff43522a2df02
. In addition to the problems caused by creating invalid characters and increasing the size of the data, data encrypted using the CBC mode of an encryption algorithm also changes its value when it is decrypted and encrypted again. This happens because the random seed value that is used to initialize the encryption algorithm and is included as part of the encrypted value is different for each encryption operation. Because of this, it is impossible to use data that has been encrypted with the CBC mode as a unique key to identify a row in a database.
Another use is to generate a pseudorandom sequence of data of a fixed format (again credit card numbers are a good example).
The FPE constructions of Black and Rogaway
Black and Rogaway described three ways to construct an FPE algorithm and proved that each of these techniques is as secure as the block cipher that is used to construct it. This means that if the AES algorithm is used to create an FPE algorithm, then the resulting FPE algorithm is as secure as AES because an adversary capable of defeating the FPE algorithm can also defeat the AES algorithm. So if we assume that AES is secure, then the FPE algorithms constructed from it are also secure. In all of the following, we use E to denote the AES encryption operation that is used to construct an FPE algorithm and F to denote the FPE encryption operation.
FPE from a prefix cipher
One easy way to create an FPE algorithm is to build a lookup table that implements the encryption. Black and Rogaway call this technique a "prefix cipher." Suppose that we want to be able to encrypt the n values 0, 1, ..., n-1. To do this, we can create the table I = (E(0),E(1),...E(n-1))
. From this table, we create a second table J
by replacing each element of the table I
with its ordinal position (the smallest element being assigned the value 0, the next smallest being assigned the value 1, etc.), starting at 0. Then we encrypt an integer m as the mth entry in the table J
.
For example, suppose that we want to create a way of encrypting the integers 0 through 3. We first build the table I
by encrypting 0 through 3 to get I = (E(0),...,E(3))
. Suppose that we use AES-128-ECB to do this and get the following, :
E(0) = 0x56c644080098fc5570f2b329323dbf62
E(1) = 0x08ee98c0d05e3dad3eb3d6236f23e7b7
E(2) = 0x47d2e1bf72264fa01fb274465e56ba20
E(3) = 0x077de40941c93774857961a8a772650d
Because E(3)
is the smallest of these, we get that J(0) = 3
. E(1)
is the next smallest, so we get J(1) = 1
, etc, so that J = (3,1,2,0)
. Then we would encrypt 0 through 3 using the following:
F(0) = 3
F(1) = 1
F(2) = 2
F(3) = 0
which is a way of using AES that encrypts a two-bit value into another two-bit value.
This method is only useful for small values of n. For larger values, the size of the lookup table and the required number of encryptions gets too big to be practical.
This would be a bad way to encrypt English text, as high frequency letters such as 'e' would be mapped to the same encrypted letter.
FPE from cycle walking
If we have a set M of allowed values, we can create an FPE algorithm from AES as follows. This iterates the AES encryption algorithm until the output is one of the allowed values.
CycleWalkingFPE(x)
{
if E(x) is an element of M
return E(x)
else
return CycleWalkingFPE(E(x))
}
Because AES encryption is a permutation of its inputs, this is guaranteed to eventually terminate.
For example, suppose that we want to encrypt 100-bit values with AES in a way that creates another 100-bit value. One way to do this is to create an FPE algorithm that iterates AES-128-ECB encryption until it reaches a value which has all of its 28 highest bits set to 0, which will take an average of 214 iterations to happen. In general, the closer the size of M is to the size of the output of the symmetric cipher, the faster the cycle-walking technique will be able to reach a valid output value. Because of this, FPE algorithms created using the cycle-walking technique may be too slow when the size of the set M is too far from the size of the output of the underlying block cipher.
FPE from a Feistel network
It is also possible to make a FPE algorithm using a Feistel network. A Feistel network needs a source of pseudo-random values for the sub-keys for each round, and the output of the AES algorithm can be used as these pseudo-random values. When this is done, the resulting Feistel construction is as secure as AES if enough rounds are used.[3]
One way to implement an FPE algorithm using AES and a Feistel network is to use as many bits of AES output as are needed to equal the length of the left or right halves of the Feistel network. If a 24-bit value is needed as a sub-key, for example, it is possible to use the lowest 24 bits of the output of AES for this value.
This may not result in the output of the Feistel network preserving the format of the input, but it is possible to iterate the Feistel network in the same way that the cycle-walking technique does to ensure that format can be preserved. Because it is possible to adjust the size of the inputs to a Feistel network, it is possible to make it very likely that this iteration ends very quickly on average. In the case of credit card numbers, for example, there are 1016 possible 16-digit credit card numbers, and because the 1016 = 253.1, using a 54-bit wide Feistel network along with cycle walking will create an FPE algorithm that encrypts fairly quickly on average.
The Thorp shuffle
A Thorp shuffle is like an idealized card-shuffle, or equivalently a maximally-unbalanced Feistel cipher where one side is a single bit. This method and its provable security bounds are discussed in [4].
VIL mode
For domain sizes that are a power of two, and an existing block cipher with a smaller block size, a new cipher may be created using VIL mode as described by Bellare, Rogaway[5].
Hasty Pudding Cipher
The Hasty Pudding Cipher uses custom constructions (not depending on existing block ciphers as primitives) to encrypt arbitrary finite small domains.
Other FPE constructions
The paper "Format-Preserving Encryption"[6] by Mihir Bellare and Thomas Ristenpart describes using "nearly balanced" Feistel networks to create secure FPE algorithms.
The paper "Format Controlling Encryption Using Datatype Preserving Encryption"[7] by Ulf Mattsson describes other ways to create FPE algorithms.
Section 8 of the FIPS 74, Federal Information Processing Standards Publication 1981 Guidelines for Implementing and Using the NBS Data Encryption Standard[8], describes a way to use the DES encryption algorithm in a way that preserves the format of data. This standard was withdrawn on May 19, 2005, so the technique should be considered obsolete.
The paper "Using Datatype-Preserving Encryption to Enhance Data Warehouse Security"[9] by Michael Brightwell and Harry Smith describes a way to use the DES encryption algorithm in a way that preserves the format of the plaintext.
The FFSEM mode of AES
The FFSEM mode of AES that has been accepted for consideration by NIST uses the Feistel network construction of Black and Rogaway with one slight modification: the FFSEM construction uses a tweaked mode of AES-ECB in which the tweak for each round of the Feistel network is the number of the round. The tweak 1 is used for the first round, the tweak 2 is used for the second round, etc.
FCEM and FFSEM
The support for data types is different in FCEM and FFSEM. You can define all the data types you need in FCEM. FFSEM is restricted to digits only and the field length must be between nine and nineteen digits.
References
- ^ John Black and Philip Rogaway, Ciphers with Arbitrary Domains, Proceedings RSA-CT, 2002, pp. 114-130. http://citeseer.ist.psu.edu/old/black00ciphers.html (http://www.cs.ucdavis.edu/~rogaway/papers/subset.pdf)
- ^ Terence Spies, Feistel Finite Set Encryption Mode http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/proposedmodes/ffsem/ffsem-spec.pdf
- ^ Jacques Patarin, Luby-Rackoff: 7 Rounds Are Enough for 2n(1-epsilon) Security, Proceedings of CRYPTO 2003, Lecture Notes in Computer Science, Volume 2729, Oct 2003, pp. 513 - 529. http://www.iacr.org/archive/crypto2003/27290510/27290510.pdf
- ^ Rogaway, Phillip (2009), "How to Encipher Messages on a Small Domain" (PDF), Advances in Cryptology
- ^ Bellare, Mihir (1999), On the construction of Variable-Input-Length Ciphers (PDF)
- ^ Mihir Bellare and Thomas Ristenpart, Format-Preserving Encryption http://eprint.iacr.org/2009/251
- ^ Ulf Mattsson, Format Controlling Encryption Using Datatype Preserving Encryption http://eprint.iacr.org/2009/257
- ^ FIPS 74, Federal Information Processing Standards Publication 1981 Guidelines for Implementing and Using the NBS Data Encryption Standard http://www.itl.nist.gov/fipspubs/fip74.htm
- ^ Michael Brightwell and Harry Smith, "Using Datatype-Preserving Encryption to Enhance Data Warehouse Security, Proceedings of the 1997 National Information Systems Security Conference https://portfolio.du.edu/portfolio/getportfoliofile?uid=135556