Jump to content

Cyclic redundancy check

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Arghman (talk | contribs) at 16:18, 23 February 2006 (Implementation). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

A cyclic redundancy check (CRC) is a type of hash function used to produce a checksum - which is a small, fixed number of bits - against a block of data, such as a packet of network traffic or a block of a computer file. The checksum is used to detect and correct errors after transmission or storage. A CRC is computed and appended before transmission or storage, and verified afterwards by recipient to confirm that no changes occurred on transit. Correction can also be done if information lost is lower than information held by the checksum. CRCs are popular because they are simple to implement in binary hardware, are easy to analyze mathematically, and are particularly good at detecting common errors caused by noise in transmission channels.

Introduction

A CRC "checksum" is the remainder of a binary division with no bit carry (XOR used instead of subtraction), of the message bit stream, by a predefined (short) bit stream of length , which represent the coefficients of a polynomial. Before the division, zeros are appended to the message stream.

CRCs are based on division in a finite field, namely , the field of polynomials over the integers modulo 2. In simpler terms, this is the set of polynomials where each coefficient is either zero or one, and arithmetic operations wrap around. For example:

Two becomes zero because addition of coefficients is performed modulo 2. Multiplication is similar:

We can also divide polynomials mod 2 and find the quotient and remainder. For example, suppose we're dividing x3 + x2 + x by x + 1. We would find that

In other words,

The division yields a quotient of x2 + 1 with a remainder of -1, which, since it is odd, has a last bit of 1.

Any string of bits can be interpreted as the coefficients of a polynomial of this sort, and to find the CRC, we divide by another fixed polynomial, after first multiplying by where is the degree of the fixed polynomial. The coefficients of the remainder polynomial are the CRC.

In the above equations, represents the original message bits 111, acts as the key, and the remainder (equivalently, ) is the CRC. The degree of the key is 1, so we first multiplied the message by to get .

In general form:

Here M(x) is the original message polynomial. K(x) is the key polynomial, with degree . The bits of are the original message with zeros added at the end. R(x) is the remainder polynomial, which can be used as a CRC 'checksum'. In communication, the sender attaches the bits of R after the original message bits of M and sends them out (in place of the zeros). The receiver takes M and R and checks whether . If it is, then the receiver assumes the received message bits are correct. Note that is exactly the string of bits the sender sent; this string is called the codeword.

CRCs are often referred to as "checksums," but such designations are not strictly accurate since, technically, a checksum is calculated through addition and not through division as is the case with CRCs.

Implementation

There are simple, efficient algorithms for computing the CRC of a block of data, such as those shown below.

One common bitwise algorithm is based on a shift register which has a size in bits equal to the degree of the chosen polynomial. The main portion of the algorithm can be expressed in pseudocode as follows:

 function crc(bit array bitString[1..len], int polynomial) {
     shiftRegister := initial value // commonly all 0 bits or all 1 bits
     for i from 1 to len {
         if most significant bit of shiftRegister xor bitString[i] = 1 {
             shiftRegister := (shiftRegister left shift 1) xor polynomial
         } else {
             shiftRegister := (shiftRegister left shift 1)
         }
     }
     return shiftRegister
 }

Another common algorithm uses a lookup table indexed by multiple most-significant bits of the shiftRegister to process multiple bits at once. A 256-entry lookup table is a particularly common choice.

Another implementation uses a shift register, but instead of changing multiple bits in the shiftRegister based on the xor of one bit of the shiftRegister and one bit of the bitString, it is possible to xor (compute the parity of) all the bits of the shiftRegister selected by the polynomial and the bitString and add that single bit to the shiftRegister. With suitable adjustments to the polynomial, this also produces the same remainder. This algorithm is usually inefficient in software, but used in some hardware implementations, and is often used when describing the close relative to a CRC, the linear feedback shift register.

The specific CRC produced is defined by the polynomial used. To produce an n-bit CRC requires a degree-n polynomial, of the form . This is naturally expressed as an (n + 1)-bit string, but the leading term is normally implicit, leaving an n-bit string Thus, depending on the bit-order convention used, the CRC-16-IBM polynomial , will be represented as the hexadecimal number 0xC005 or as 0xA001.

One of the most commonly encountered is known as CRC-32, used by (among others) Ethernet, FDDI, ZIP and other archive formats, and PNG image format. Its polynomial can be written 0x04C11DB7, or 0xEDB88320 by reversing the bit string derived from the polynomial as explained above. The W3C webpage on PNG includes an appendix with a short and simple table-driven implementation in C of CRC-32.

Variations

There are several variations on CRCs

  • The shiftRegister can be reversed, so its least-significant bit is tested and it is shifted to the right by 1 bit each step. This requires a polynomial with its bits reversed, and produces a bit-reversed result. This variant is actually the one most commonly in use.
  • The data bits may be read into the shift register either most significant first, or least significant bit first. When used in communication protocols, the CRC is usually calculated in the order the bits are sent over the physical layer, to preserve the CRC's burst error detection characteristics.
  • To check the CRC, instead of calculating the CRC on the message and comparing it to the CRC, a CRC calculation may be run on the entire codeword. If the result is zero, the check passes. This works because the codeword is , which is always divisible by .
  • The shift register may be initialized with ones instead of zeroes. Equivalently, the first bits of the message may be inverted before feeding them into the algorithm. The reason to do this is because an unmodified CRC does not distinguish between two messages which differ only in the number of leading zeros. When this inversion is done, the CRC does distinguish between such messages.
  • The CRC may be inverted before being appended to the message stream. When this is done, the CRC may be checked either by the straightforward method of computing the CRC on the message, inverting it, and comparing with the CRC in the message stream, or by calculating the CRC on the entire message. In the latter method, the result for a correct message will not be zero, but instead will be the result of dividing by . This result is called the check polynomial , and its hexadecimal representation is sometimes called a magic number.

When using the CRC-32 polynomial and the CRC-16-CCITT polynomial, by convention both inversions are performed. The check polynomial for CRC-32 is .

Reversed representations and reciprocal polynomials

Polynomial representations

All the well-known CRC polynomials of degree have two common hexadecimal representations:

  • A hexadecimal number with bits, the least significant bit of which is always 1, the most significant bit representing the coefficient and the least significant bit representing the coefficient.
  • A hexadecimal number with bits, the most significant bit of which is always 1, the most significant bit representing the coefficient and the least significant bit representing the coefficient.

The first of these is the normal representation, and gives the correct results when used in a CRC algorithm using a left shift. The second is called the reversed representation, because it is the bitwise reverse of the normal representation. When used in a CRC algorithm using a right shift, the reverse representation gives the bitwise reverse of the results that using the left shift algorithm with the normal representation does.

Using the normal representation in the right shift algorithm or the reversed one in the left shift algorithm gives incorrect results.

In both these representations the coefficient is omitted and understood to be 1.

There is a third representation in use, from a paper by P. Koopman and T. Chakravarty [1].

  • A hexadecimal number with bits, the most significant bit of which is always 1, the most significant bit representing the coefficient and the least significant bit representing the coefficient. The coefficient is omitted and understood to be 1.

This representation (call it Koopman) has the advantage that (like the normal representation) the coefficients are retained in their usual mathematical order, and (like the reversed representation) the order of the polynomial can be determined directly from the representation. Unfortunately, it will not produce correct results in either the left- or right- shift versions of the CRC algorithm.

Reciprocal polynomials

A reciprocal polynomial is created by assigning the through coefficients of one polynomial to the through coefficients of a new polynomial. Example: The reverse of is . The most interesting property of reciprocal polynomials, when used in CRCs, is that they have the exact same error-detecting strength as the polynomials they are reciprocals of. The reciprocal of a polynomial generates the same codewords — i.e., the bit strings consisting of a valid CRC appended to a message — as the original polynomial, only bit reversed. But the reciprocal polynomial is not the same as the original polynomial, and the CRCs generated using it are not the same (even modulo bit reversal) as those generated from the original polynomial.

Another interesting property of reciprocal polynomials is related to their representation. Consider again the CRC-16-IBM polynomial and its reciprocal. These are its representations

  • Normal original: 0x8005
  • Reversed original: 0xA001
  • Koopman original: 0xC002
  • Normal reciprocal: 0x4003
  • Reversed reciprocal: 0xC002
  • Koopman reciprocal: 0xA001

Note that the Koopman representation of the reciprocal polynomial is the same as the reversed representation of the original polynomial. This means that if you mistakenly use the Koopman represenation of a polynomial in a right-shift algorithm, you get a CRC that is just as strong as (but different from) the one you intended. This holds only for polynomials with a degree which is a multiple of 4.

Error detection strength

The error-detection ability of a CRC depends on the degree of its key polynomial and on the specific key polynomial used. The "error polynomial" is the exclusive-or of the received message codeword and the correct message codeword. An error will go undetected by a CRC algorithm if and only if the error polynomial is divisible by the CRC polynomial.

  • Because a CRC is based on division, no polynomial can detect errors consisting of a string of zeros prepended to the data, or of missing leading zeros. However, see Variations.
  • All single bit errors will be detected by any polynomial with at least two terms with non-zero coefficients. The error polynomial is , and is divisible only by polynomials where .
  • All two bit errors separated by a distance less than the order of the polynomial will be detected. The error polynomial in the two bit case is . As noted above, the term will not be divisible by the CRC polynomial, which leaves the term. By definition, the smallest value of such that a polynomial divides is the polynomial's order. The polynomials with the largest order are called primitive polynomials, and for polynomials of degree with binary coefficients, have order .
  • All errors in an odd number of bits will be detected by a polynomial which is a multiple of . This is equivalent to the polynomial having an even number of terms with non-zero coefficients.
  • All burst errors of length will be detected by any polynomial of degree or greater which has a non-zero term.

(as an aside, there is never reason to use a polynomial with a zero term. Recall that a CRC is the remainder of the message polynomial times x^n divided by the CRC polynomial. A polynomial with a zero term always has as a factor. So if is the original CRC polynomial and , then

That is, the CRC of any message with the polynomial is the same as that of the same message with the polynomial with a zero appended. It's just a waste of a bit.)

The combination of these factors means that good CRC polynomials are often primitive polynomials (which have the best 2-bit error detection) or polynomials whose factors are a primitive polynomial of degree and (which detects all odd numbers of bit errors, and has half the two-bit error detection ability of a primitive polynomial of degree ).

Polynomial CRC key specifications [ITU-IEEE syntax]

Name Polynomial Representations: Normal or Reverse (Normal of Reciprocal)
CRC-1 (Used in hardware, also known as parity bit) 0x1 or 0x1 (0x1)
CRC-5-USB (used in USB token packets) 0x05 or 0x14 (0x9)
CRC-7 (used in some telecom systems) 0x09 or 0x48 (0x11)
CRC-8 0x07 or 0xE0 (0xC1)
CRC-12 (used in telecom systems) 0x80F or 0xF01 (0xE03)
CRC-16-CCITT x16 + x12 + x5 + 1 0x1021 or 0x8408 (0x0811)
CRC-16-IBM x16 +x15 + x2 + 1 0x8005 or 0xA001 (0x4003)
CRC-32-IEEE 802.3 0x04C11DB7 or 0xEDB88320 (0xDB710641)
CRC-32C (Castagnoli) 0x1EDC6F41 or 0x82F63B78 (0x05EC76F1)
CRC-64-ISO (as described in ISO 3309) 0x000000000000001B or 0xD800000000000000 (0xB000000000000001)
CRC-64-ECMA-182 (as described in ECMA-182 p.63) 0x42F0E1EBA9EA3693 or 0xC96C5795D7870F42 (0x92D8AF2BAF0E1E85)
CRC-128 (IEEE? / ITU?) ?

CRCs and data integrity

While useful for error detection, CRCs cannot be safely relied upon to verify data integrity (that no changes whatsoever have occurred), since, because of the linear structure of CRC polynomials, it is extremely easy to intentionally change data without modifying its CRC. Cryptographic hash functions can be used to verify data integrity.


Computational costs of CRCs versus hashes

Checksum Algorithm Relative Cost
32-bit Adler 1.00

32-bit CRC (IEEE)

1.46

64-bit CRC (ISO)

2.23
128-bit CRC 2.29
128-bit MD4 4.48
128-bit MD5 4.51

Note that 128-bit MD4 is 1.96 times as expensive as 128-bit CRC.

See also

General category

Specific Technological References

  • Jacksum — by Dipl.-Inf. (FH) Johann N. Löfflmann in Java. Various message verification functions. Released under the GPL.