Chacha20

This page covers the basics of the ChaCha20 steam cipher invented by Daniel J Bernstein. I learnt about the cipher by reading the Salsa20 paper, the ChaCha20 paper and this video.

The ChaCha20 algorithm takes 16 4-byte words as input:

• 8 words: 256-bit key
• 2 words: IV/nonce (to be used only once with a key)
• 2 words are a counter (makes the stream cipher allow for 264 random access for use cases like videos)
• 4 words are constant defined in the ChaCha20 specification

Each application of the ChaCha20 function gives us 16 words (512 bits) of deterministic random-bit generation and can be assembled into a keystream like so:

ks := ChaCha20(key, nonce, 0) || ChaCha20(key, nonce, 1) || ... || ChaCha20(key, nonce, n)

It’s evident in the above formulation how the counter can be used to “seek” into up to 264parts of the keystream.

Algorithm to generate 512 DRBG bits

Let’s see how the ChaCha20 function works. All the inputs are loaded into a 16 words as shown below:

constant constant constant constant
key key key key
key key key key
input input input input

Then, a QUARTERROUND(a, b, c, d) function is applied to 4-words at a time. DJB chose 4-words to optimize for speed on various computing architectures.

QUARTERROUND(a, b, c, d) {
a += b; d ^= a; d <<<= 16;
c += d; b ^= c; b <<<= 12;
a += b; d ^= a; d <<<= 8;
c += d; b ^= c; b <<<= 7;
}

The QUARTERROUND function is applied to 4 columns in the matrix, comprising one round and then applied to 4 diagonals in the matrix, comprising another round. This is repeated 10 times, totalling 20 rounds. Hence, ChaCha20. Two rounds are depicted below where xi represents the ith word in the matrix if it were represented linearly as 16 words.

QUARTERROUND(x0, x4, x8, x12)
QUARTERROUND(x1, x5, x9, x13)
QUARTERROUND(x2, x6,x10, x14)
QUARTERROUND(x3, x7,x11, x15)
QUARTERROUND(x0, x5,x10, x15)
QUARTERROUND(x1, x6,x11, x12)
QUARTERROUND(x2, x7, x8, x13)
QUARTERROUND(x3, x4, x9, x14)

At the end of 20 rounds, we have 512 deterministically generated pseudo-random bits based on the key material and the inputs. To generate more bits, we increment the counter, giving us 264 * 64 bytes = 270 bytes. Per the recommendation in the TLS 1.3 spec, the key must be rotated after 270 bytes of keystream has been obtained. In practice, most AEAD specifications will codify re-keying at much smaller intervals (like the 1GB re-keying requirement in [email protected]) and the 4064 bytes re-keying requirement in [email protected]

Characteristics

• Faster than Salsa20 which is faster than AES in rand-ctr mode.
• More secure than AES baased on the cryptanalysis available to date
• Faster than Salsa20 and even AES which the AES-NI hardware acceleration is not available. It is only marginally slower when AES-NI is available.
• Produces higher diffusion on most benchmarks compared to Salsa20 and therefore DJB speculates it gives similar level of security at a lower numebr of rounds.
• Used in TLS 1.3 authenticated encryption with the Poly1305 Carter-Wegman MAC.