Good–Turing frequency estimation

The leftover hash lemma is a lemma in cryptography first stated by Russell Impagliazzo, Leonid Levin, and Michael Luby.

Imagine that you have a secret key $\scriptstyle X$ that has $\scriptstyle n$ uniform random bits, and you would like to use this secret key to encrypt a message. Unfortunately, you were a bit careless with the key, and know that an adversary was able to learn about $\scriptstyle t\;<\;n$ bits of that key, but you do not know which. Can you still use your key, or do you have to throw it away and choose a new key? The leftover hash lemma tells us that we can produce a key of almost $\scriptstyle n\,-\,t$ bits, over which the adversary has almost no knowledge. Since the adversary knows all but $\scriptstyle n\,-\,t$ bits, this is almost optimal.

More precisely, the leftover hash lemma tells us that we can extract about $\scriptstyle H_{\infty }(X)$ (the min-entropy of $\scriptstyle X$ ) bits from a random variable $\scriptstyle X$ that are almost uniformly distributed. In other words, an adversary who has some partial knowledge about $\scriptstyle X$ , will have almost no knowledge about the extracted value. That is why this is also called privacy amplification (see privacy amplification section in the article Quantum key distribution).

Randomness extractors achieve the same result, but use (normally) less randomness.

Leftover hash lemma

Let $\scriptstyle X$ be a random variable over $\scriptstyle {\mathcal {X}}$ and let $\scriptstyle m\;>\;0$ . Let $\scriptstyle h:\;{\mathcal {S}}\,\times \,{\mathcal {X}}\;\rightarrow \;\{0,\,1\}^{m}$ be a 2-universal hash function. If

m\leq H_{\infty }(X)-2\log \left({\frac {1}{\varepsilon }}\right)

then for $\scriptstyle S$ uniform over $\scriptstyle {\mathcal {S}}$ and independent of $\scriptstyle X$ , we have

\delta [(h(S,X),S),(U,S)]\leq \varepsilon

where $\scriptstyle U$ is uniform over $\scriptstyle \{0,\,1\}^{m}$ and independent of $\scriptstyle S$ .

$\scriptstyle H_{\infty }(X)\;=\;-\log \max _{x}\Pr[X=x]$ is the Min-entropy of $\scriptstyle X$ , which measures the amount of randomness $\scriptstyle X$ has. The min-entropy is always less than or equal to the Shannon entropy. Note that $\scriptstyle \max _{x}\Pr[X=x]$ is the probability of correctly guessing $\scriptstyle X$ . (The best guess is to guess the most probable value.) Therefore, the min-entropy measures how difficult it is to guess $\scriptstyle X$ .