# Small-bias sample space

In theoretical computer science, a small-bias sample space (also known as ${\displaystyle \epsilon }$-biased sample space, ${\displaystyle \epsilon }$-biased generator, or small-bias probability space) is a probability distribution that fools parity functions. In other words, no parity function can distinguish between a small-bias sample space and the uniform distribution with high probability, and hence, small-bias sample spaces naturally give rise to pseudorandom generators for parity functions.

The main useful property of small-bias sample spaces is that they need far fewer truly random bits than the uniform distribution to fool parities. Efficient constructions of small-bias sample spaces have found many applications in computer science, some of which are derandomization, error-correcting codes, and probabilistically checkable proofs. The connection with error-correcting codes is in fact very strong since ${\displaystyle \epsilon }$-biased sample spaces are equivalent to ${\displaystyle \epsilon }$-balanced error-correcting codes.

## Definition

### Bias

${\displaystyle {\text{bias}}_{I}(X)=\left|\Pr _{x\sim X}\left(\sum _{i\in I}x_{i}=0\right)-\Pr _{x\sim X}\left(\sum _{i\in I}x_{i}=1\right)\right|=\left|2\cdot \Pr _{x\sim X}\left(\sum _{i\in I}x_{i}=0\right)-1\right|\,,}$

where the sum is taken over ${\displaystyle \mathbb {F} _{2}}$, the finite field with two elements. In other words, the sum ${\displaystyle \sum _{i\in I}x_{i}}$ equals ${\displaystyle 0}$ if the number of ones in the sample ${\displaystyle x\in \{0,1\}^{n}}$ at the positions defined by ${\displaystyle I}$ is even, and otherwise, the sum equals ${\displaystyle 1}$. For ${\displaystyle I=\emptyset }$, the empty sum is defined to be zero, and hence ${\displaystyle {\text{bias}}_{\emptyset }(X)=1}$.

### ϵ-biased sample space

A probability distribution ${\displaystyle X}$ over ${\displaystyle \{0,1\}^{n}}$ is called an ${\displaystyle \epsilon }$-biased sample space if ${\displaystyle {\text{bias}}_{I}(X)\leq \epsilon }$ holds for all non-empty subsets ${\displaystyle I\subseteq \{1,2,\ldots ,n\}}$.

### ϵ-biased set

An ${\displaystyle \epsilon }$-biased sample space ${\displaystyle X}$ that is generated by picking a uniform element from a multiset ${\displaystyle X\subseteq \{0,1\}^{n}}$ is called ${\displaystyle \epsilon }$-biased set. The size ${\displaystyle s}$ of an ${\displaystyle \epsilon }$-biased set ${\displaystyle X}$ is the size of the multiset that generates the sample space.

### ϵ-biased generator

An ${\displaystyle \epsilon }$-biased generator ${\displaystyle G:\{0,1\}^{\ell }\to \{0,1\}^{n}}$ is a function that maps strings of length ${\displaystyle \ell }$ to strings of length ${\displaystyle n}$ such that the multiset ${\displaystyle X_{G}=\{G(y)\;\vert \;y\in \{0,1\}^{\ell }\}}$ is an ${\displaystyle \epsilon }$-biased set. The seed length of the generator is the number ${\displaystyle \ell }$ and is related to the size of the ${\displaystyle \epsilon }$-biased set ${\displaystyle X_{G}}$ via the equation ${\displaystyle s=2^{\ell }}$.

## Connection with epsilon-balanced error-correcting codes

Then it holds that a multiset ${\displaystyle X\subset \{0,1\}^{n}}$ is ${\displaystyle \epsilon }$-biased if and only if the linear code ${\displaystyle C_{X}}$, whose columns are exactly elements of ${\displaystyle X}$, is ${\displaystyle \epsilon }$-balanced.[2]

## Constructions of small epsilon-biased sets

Usually the goal is to find ${\displaystyle \epsilon }$-biased sets that have a small size ${\displaystyle s}$ relative to the parameters ${\displaystyle n}$ and ${\displaystyle \epsilon }$. This is because a smaller size ${\displaystyle s}$ means that the amount of randomness needed to pick a random element from the set is smaller, and so the set can be used to fool parities using few random bits.

### Theoretical bounds

The probabilistic method gives a non-explicit construction that achieves size ${\displaystyle s=O(n/\epsilon ^{2})}$.[2] The construction is non-explicit in the sense that finding the ${\displaystyle \epsilon }$-biased set requires a lot of true randomness, which does not help towards the goal of reducing the overall randomness. However, this non-explicit construction is useful because it shows that these efficient codes exist. On the other hand, the best known lower bound for the size of ${\displaystyle \epsilon }$-biased sets is ${\displaystyle s=\Omega (n/(\epsilon ^{2}\log(1/\epsilon ))}$, that is, in order for a set to be ${\displaystyle \epsilon }$-biased, it must be at least that big.[2]

### Explicit constructions

There are many explicit, i.e., deterministic constructions of ${\displaystyle \epsilon }$-biased sets with various parameter settings:

These bounds are mutually incomparable. In particular, none of these constructions yields the smallest ${\displaystyle \epsilon }$-biased sets for all settings of ${\displaystyle \epsilon }$ and ${\displaystyle n}$.

## Application: almost k-wise independence

An important application of small-bias sets lies in the construction of almost k-wise independent sample spaces.

### k-wise independent spaces

A random variable ${\displaystyle Y}$ over ${\displaystyle \{0,1\}^{n}}$ is a k-wise independent space if, for all index sets ${\displaystyle I\subseteq \{1,\dots ,n\}}$ of size ${\displaystyle k}$, the marginal distribution ${\displaystyle Y|_{I}}$ is exactly equal to the uniform distribution over ${\displaystyle \{0,1\}^{k}}$. That is, for all such ${\displaystyle I}$ and all strings ${\displaystyle z\in \{0,1\}^{k}}$, the distribution ${\displaystyle Y}$ satisfies ${\displaystyle \Pr _{Y}(Y|_{I}=z)=2^{-k}}$.

#### Constructions and bounds

k-wise independent spaces are fairly well-understood.

#### Joffe's construction

Template:Harvtxt constructs a ${\displaystyle k}$-wise independent space ${\displaystyle Y}$ over the finite field with some prime number ${\displaystyle n>k}$ of elements, i.e., ${\displaystyle Y}$ is a distribution over ${\displaystyle \mathbb {F} _{n}^{n}}$. The initial ${\displaystyle k}$ marginals of the distribution are drawn independently and uniformly at random:

${\displaystyle (Y_{0},\dots ,Y_{k-1})\sim \mathbb {F} _{n}^{k}}$.

For each ${\displaystyle i}$ with ${\displaystyle k\leq i, the marginal distribution of ${\displaystyle Y_{i}}$ is then defined as

${\displaystyle Y_{i}=Y_{0}+Y_{1}\cdot i+Y_{2}\cdot i^{2}+\dots +Y_{k-1}\cdot i^{k-1}\,,}$

where the calculation is done in ${\displaystyle \mathbb {F} _{n}}$. Template:Harvtxt proves that the distribution ${\displaystyle Y}$ constructed in this way is ${\displaystyle k}$-wise independent as a distribution over ${\displaystyle \mathbb {F} _{n}^{n}}$. The distribution ${\displaystyle Y}$ is uniform on its support, and hence, the support of ${\displaystyle Y}$ forms a ${\displaystyle k}$-wise independent set. It contains all ${\displaystyle n^{k}}$ strings in ${\displaystyle \mathbb {F} _{n}^{k}}$ that have been extended to strings of length ${\displaystyle n}$ using the deterministic rule above.

### Almost k-wise independent spaces

A random variable ${\displaystyle Y}$ over ${\displaystyle \{0,1\}^{n}}$ is a ${\displaystyle \delta }$-almost k-wise independent space if, for all index sets ${\displaystyle I\subseteq \{1,\dots ,n\}}$ of size ${\displaystyle k}$, the restricted distribution ${\displaystyle Y|_{I}}$ and the uniform distribution ${\displaystyle U_{k}}$ on ${\displaystyle \{0,1\}^{k}}$ are ${\displaystyle \delta }$-close in 1-norm, i.e., ${\displaystyle {\Big \|}Y|_{I}-U_{k}{\Big \|}_{1}\leq \delta }$.

#### Constructions

Template:Harvtxt give a general framework for combining small k-wise independent spaces with small ${\displaystyle \epsilon }$-biased spaces to obtain ${\displaystyle \delta }$-almost k-wise independent spaces of even smaller size. In particular, let ${\displaystyle G_{1}:\{0,1\}^{h}\to \{0,1\}^{n}}$ be a linear mapping that generates a k-wise independent space and let ${\displaystyle G_{2}:\{0,1\}^{\ell }\to \{0,1\}^{h}}$ be a generator of an ${\displaystyle \epsilon }$-biased set over ${\displaystyle \{0,1\}^{h}}$. That is, when given a uniformly random input, the output of ${\displaystyle G_{1}}$ is a k-wise independent space, and the output of ${\displaystyle G_{2}}$ is ${\displaystyle \epsilon }$-biased. Then ${\displaystyle G:\{0,1\}^{\ell }\to \{0,1\}^{n}}$ with ${\displaystyle G(x)=G_{1}(G_{2}(x))}$ is a generator of an ${\displaystyle \delta }$-almost ${\displaystyle k}$-wise independent space, where ${\displaystyle \delta =2^{k/2}\epsilon }$.[3]

## Notes

1. cf., e.g., Template:Harvtxt
2. cf., e.g., p. 2 of Template:Harvtxt
3. Section 4 in Template:Harvtxt

## References

• {{#invoke:citation/CS1|citation

|CitationClass=citation }}

• {{#invoke:citation/CS1|citation

|CitationClass=citation }}

• {{#invoke:citation/CS1|citation

|CitationClass=citation }}

• {{#invoke:citation/CS1|citation

|CitationClass=citation }}

• {{#invoke:citation/CS1|citation

|CitationClass=citation }}

• {{#invoke:citation/CS1|citation

|CitationClass=citation }}

• {{#invoke:citation/CS1|citation

|CitationClass=citation }}