# Binomial distribution

Binomial distribution for ${\displaystyle p=0.5}$
with n and k as in Pascal's triangle

The probability that a ball in a Galton box with 8 layers (n = 8) ends up in the central bin (k = 4) is ${\displaystyle 70/256}$.

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. A success/failure experiment is also called a Bernoulli experiment or Bernoulli trial; when n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

The binomial distribution is frequently used to model the number of successes in a sample of size n drawn with replacement from a population of size N. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hypergeometric distribution, not a binomial one. However, for N much larger than n, the binomial distribution is a good approximation, and widely used.

## Specification

### Probability mass function

In general, if the random variable X follows the binomial distribution with parameters n and p, we write X ~ B(np). The probability of getting exactly k successes in n trials is given by the probability mass function:

${\displaystyle f(k;n,p)=\Pr(X=k)={n \choose k}p^{k}(1-p)^{n-k}}$

for k = 0, 1, 2, ..., n, where

${\displaystyle {n \choose k}={\frac {n!}{k!(n-k)!}}}$

is the binomial coefficient, hence the name of the distribution. The formula can be understood as follows: we want exactly k successes (pk) and n − k failures (1 − p)n − k. However, the k successes can occur anywhere among the n trials, and there are ${\displaystyle {n \choose k}}$ different ways of distributing k successes in a sequence of n trials.

In creating reference tables for binomial distribution probability, usually the table is filled in up to n/2 values. This is because for k > n/2, the probability can be calculated by its complement as

${\displaystyle f(k,n,p)=f(n-k,n,1-p).}$

Looking at the expression ƒ(knp) as a function of k, there is a k value that maximizes it. This k value can be found by calculating

${\displaystyle {\frac {f(k+1,n,p)}{f(k,n,p)}}={\frac {(n-k)p}{(k+1)(1-p)}}}$

and comparing it to 1. There is always an integer M that satisfies

${\displaystyle (n+1)p-1\leq M<(n+1)p.}$

ƒ(knp) is monotone increasing for k < M and monotone decreasing for k > M, with the exception of the case where (n + 1)p is an integer. In this case, there are two values for which ƒ is maximal: (n + 1)p and (n + 1)p − 1. M is the most probable (most likely) outcome of the Bernoulli trials and is called the mode. Note that the probability of it occurring can be fairly small.

### Cumulative distribution function

The cumulative distribution function can be expressed as:

${\displaystyle F(k;n,p)=\Pr(X\leq k)=\sum _{i=0}^{\lfloor k\rfloor }{n \choose i}p^{i}(1-p)^{n-i}}$

where ${\displaystyle \scriptstyle \lfloor k\rfloor \,}$ is the "floor" under k, i.e. the greatest integer less than or equal to k.

It can also be represented in terms of the regularized incomplete beta function, as follows:[1]

{\displaystyle {\begin{aligned}F(k;n,p)&=\Pr(X\leq k)\\&=I_{1-p}(n-k,k+1)\\&=(n-k){n \choose k}\int _{0}^{1-p}t^{n-k-1}(1-t)^{k}\,dt.\end{aligned}}}

Some closed-form bounds for the cumulative distribution function are given below.

## Example

Suppose a biased coin comes up heads with probability 0.3 when tossed. What is the probability of achieving 0, 1,..., 6 heads after six tosses?

${\displaystyle \Pr(0{\text{ heads}})=f(0)=\Pr(X=0)={6 \choose 0}0.3^{0}(1-0.3)^{6-0}\approx 0.1176}$
${\displaystyle \Pr(1{\text{ head }})=f(1)=\Pr(X=1)={6 \choose 1}0.3^{1}(1-0.3)^{6-1}\approx 0.3025}$
${\displaystyle \Pr(2{\text{ heads}})=f(2)=\Pr(X=2)={6 \choose 2}0.3^{2}(1-0.3)^{6-2}\approx 0.3241}$
${\displaystyle \Pr(3{\text{ heads}})=f(3)=\Pr(X=3)={6 \choose 3}0.3^{3}(1-0.3)^{6-3}\approx 0.1852}$
${\displaystyle \Pr(4{\text{ heads}})=f(4)=\Pr(X=4)={6 \choose 4}0.3^{4}(1-0.3)^{6-4}\approx 0.0595}$
${\displaystyle \Pr(5{\text{ heads}})=f(5)=\Pr(X=5)={6 \choose 5}0.3^{5}(1-0.3)^{6-5}\approx 0.0102}$
${\displaystyle \Pr(6{\text{ heads}})=f(6)=\Pr(X=6)={6 \choose 6}0.3^{6}(1-0.3)^{6-6}\approx 0.0007}$[2]

## Mean and variance

If X ~ B(n, p), that is, X is a binomially distributed random variable, n being the total number of experiments and p the probability of each experiment yielding a successful result, then the expected value of X is

${\displaystyle \operatorname {E} [X]=np,}$

(For example, if n=100, and p=1/4, then the average number of successful results will be 25)

and the variance

${\displaystyle \operatorname {Var} [X]=np(1-p).}$

## Mode and median

Usually the mode of a binomial B(n, p) distribution is equal to ${\displaystyle \lfloor (n+1)p\rfloor }$, where ${\displaystyle \lfloor \cdot \rfloor }$ is the floor function. However when (n + 1)p is an integer and p is neither 0 nor 1, then the distribution has two modes: (n + 1)p and (n + 1)p − 1. When p is equal to 0 or 1, the mode will be 0 and n correspondingly. These cases can be summarized as follows:

${\displaystyle {\text{mode}}={\begin{cases}\lfloor (n+1)\,p\rfloor &{\text{if }}(n+1)p{\text{ is 0 or a noninteger}},\\(n+1)\,p\ {\text{ and }}\ (n+1)\,p-1&{\text{if }}(n+1)p\in \{1,\dots ,n\},\\n&{\text{if }}(n+1)p=n+1.\end{cases}}}$

In general, there is no single formula to find the median for a binomial distribution, and it may even be non-unique. However several special results have been established:

• If np is an integer, then the mean, median, and mode coincide and equal np.[3][4]
• Any median m must lie within the interval ⌊np⌋ ≤ m ≤ ⌈np⌉.[5]
• A median m cannot lie too far away from the mean: |mnp| ≤ min{ ln 2, max{p, 1 − p} }.[6]
• The median is unique and equal to m = round(np) in cases when either p ≤ 1 − ln 2 or p ≥ ln 2 or |m − np| ≤ min{p, 1 − p} (except for the case when p = ½ and n is odd).[5][6]
• When p = 1/2 and n is odd, any number m in the interval ½(n − 1) ≤ m ≤ ½(n + 1) is a median of the binomial distribution. If p = 1/2 and n is even, then m = n/2 is the unique median.

## Covariance between two binomials

If two binomially distributed random variables X and Y are observed together, estimating their covariance can be useful. Using the definition of covariance, in the case n = 1 (thus being Bernoulli trials) we have

${\displaystyle \operatorname {Cov} (X,Y)=\operatorname {E} (XY)-\mu _{X}\mu _{Y}.}$

The first term is non-zero only when both X and Y are one, and μX and μY are equal to the two probabilities. Defining pB as the probability of both happening at the same time, this gives

${\displaystyle \operatorname {Cov} (X,Y)=p_{B}-p_{X}p_{Y},}$

and for n independent pairwise trials

${\displaystyle \operatorname {Cov} (X,Y)_{n}=n(p_{B}-p_{X}p_{Y}).}$

If X and Y are the same variable, this reduces to the variance formula given above.

## Related distributions

If X ~ B(np) and Y ~ B(mp) are independent binomial variables with the same probability p, then X + Y is again a binomial variable; its distribution is{{ safesubst:#invoke:Unsubst||date=__DATE__ |$B= {{#invoke:Category handler|main}}{{#invoke:Category handler|main}}[citation needed] }} ${\displaystyle X+Y\sim B(n+m,p).\,}$ ### Conditional binomials If X ~ B(np) and, conditional on X, Y ~ B(Xq), then Y is a simple binomial variable with distribution{{ safesubst:#invoke:Unsubst||date=__DATE__ |$B= {{#invoke:Category handler|main}}{{#invoke:Category handler|main}}[citation needed] }}

${\displaystyle Y\sim B(n,pq).}$

For example imagine throwing n balls to a basket UX and taking the balls that hit and throwing them to another basket UY. If p is the probability to hit UX then X ~ B(np) is the number of balls that hit UX. If q is the probability to hit UY then the number of balls that hit UY is Y ~ B(Xq) and therefore Y ~ B(npq).

### Normal approximation

Binomial probability mass function and normal probability density function approximation for n = 6 and p = 0.5

If n is large enough, then the skew of the distribution is not too great. In this case a reasonable approximation to B(np) is given by the normal distribution

${\displaystyle {\mathcal {N}}(np,\,np(1-p)),}$

and this basic approximation can be improved in a simple way by using a suitable continuity correction. The basic approximation generally improves as n increases (at least 20) and is better when p is not near to 0 or 1.[7] Various rules of thumb may be used to decide whether n is large enough, and p is far enough from the extremes of zero or one:

• One rule is that both x=np and n(1 − p) must be greater than 5. However, the specific number varies from source to source, and depends on how good an approximation one wants; some sources give 10 which gives virtually the same results as the following rule for large n until n is very large (ex: x=11, n=7752).
• A second rule[7] is that for n > 5 the normal approximation is adequate if
${\displaystyle \left|\left({\frac {1}{\sqrt {n}}}\right)\left({\sqrt {\frac {1-p}{p}}}-{\sqrt {\frac {p}{1-p}}}\right)\right|<0.3}$
• Another commonly used rule holds that the normal approximation is appropriate only if everything within 3 standard deviations of its mean is within the range of possible values,{{ safesubst:#invoke:Unsubst||date=__DATE__ |$B= {{#invoke:Category handler|main}}{{#invoke:Category handler|main}}[citation needed] }} that is if ${\displaystyle \mu \pm 3\sigma =np\pm 3{\sqrt {np(1-p)}}\in [0,n].}$ The following is an example of applying a continuity correction. Suppose one wishes to calculate Pr(X ≤ 8) for a binomial random variable X. If Y has a distribution given by the normal approximation, then Pr(X ≤ 8) is approximated by Pr(Y ≤ 8.5). The addition of 0.5 is the continuity correction; the uncorrected normal approximation gives considerably less accurate results. This approximation, known as de Moivre–Laplace theorem, is a huge time-saver when undertaking calculations by hand (exact calculations with large n are very onerous); historically, it was the first use of the normal distribution, introduced in Abraham de Moivre's book The Doctrine of Chances in 1738. Nowadays, it can be seen as a consequence of the central limit theorem since B(np) is a sum of n independent, identically distributed Bernoulli variables with parameter p. This fact is the basis of a hypothesis test, a "proportion z-test", for the value of p using x/n, the sample proportion and estimator of p, in a common test statistic.[8] For example, suppose one randomly samples n people out of a large population and ask them whether they agree with a certain statement. The proportion of people who agree will of course depend on the sample. If groups of n people were sampled repeatedly and truly randomly, the proportions would follow an approximate normal distribution with mean equal to the true proportion p of agreement in the population and with standard deviation σ = (p(1 − p)/n)1/2. Large sample sizes n are good because the standard deviation, as a proportion of the expected value, gets smaller, which allows a more precise estimate of the unknown parameter p. ### Poisson approximation The binomial distribution converges towards the Poisson distribution as the number of trials goes to infinity while the product np remains fixed. Therefore the Poisson distribution with parameter λ = np can be used as an approximation to B(n, p) of the binomial distribution if n is sufficiently large and p is sufficiently small. According to two rules of thumb, this approximation is good if n ≥ 20 and p ≤ 0.05, or if n ≥ 100 and np ≤ 10.[9] ### Limiting distributions ${\displaystyle {\frac {X-np}{\sqrt {np(1-p)}}}}$ approaches the normal distribution with expected value 0 and variance 1.{{ safesubst:#invoke:Unsubst||date=__DATE__ |$B=

{{#invoke:Category handler|main}}{{#invoke:Category handler|main}}[citation needed] }} This result is sometimes loosely stated by saying that the distribution of X is asymptotically normal with expected value np and variance np(1 − p). This result is a specific case of the central limit theorem.

### Beta distribution

Beta distributions provide a family of conjugate prior probability distributions for binomial distributions in Bayesian inference. The domain of the beta distribution can be viewed as a probability, and in fact the beta distribution is often used to describe the distribution of a probability value p:[10]

${\displaystyle P(p;\alpha ,\beta )={\frac {p^{\alpha -1}(1-p)^{\beta -1}}{\mathrm {B} (\alpha ,\beta )}}}$.

## Confidence intervals

{{#invoke:main|main}}

Even for quite large values of n, the actual distribution of the mean is significantly nonnormal.[11] Because of this problem several methods to estimate confidence intervals have been proposed.

Let n1 be the number of successes out of n, the total number of trials, and let

${\displaystyle {\hat {p}}={\frac {n_{1}}{n}}}$

be the proportion of successes. Let zα/2 be the 100(1 − α/2)th percentile of the standard normal distribution.

• Wald method
${\displaystyle {\hat {p}}\pm z_{\frac {\alpha }{2}}{\sqrt {\frac {{\hat {p}}(1-{\hat {p}})}{n}}}.}$
A continuity correction of 0.5/n may be added.Template:Clarify
• Agresti-Coull method[12]
${\displaystyle {\tilde {p}}\pm z_{\frac {\alpha }{2}}{\sqrt {\frac {{\tilde {p}}(1-{\tilde {p}})}{n+z_{\frac {\alpha }{2}}^{2}}}}.}$
Here the estimate of p is modified to
${\displaystyle {\tilde {p}}={\frac {n_{1}+{\frac {1}{2}}z_{\frac {\alpha }{2}}^{2}}{n+z_{\frac {\alpha }{2}}^{2}}}}$
${\displaystyle \sin ^{2}\left(\arcsin \left({\sqrt {\hat {p}}}\right)\pm {\frac {z}{2{\sqrt {n}}}}\right)}$
• Wilson (score) method[14]
${\displaystyle {\frac {{\hat {p}}+{\frac {1}{2n}}z_{1-{\frac {\alpha }{2}}}^{2}\pm {\frac {1}{2n}}z_{1-{\frac {\alpha }{2}}}{\sqrt {4n{\hat {p}}(1-{\hat {p}})+z_{1-{\frac {\alpha }{2}}}^{2}}}}{1+{\frac {1}{n}}z_{1-{\frac {\alpha }{2}}}^{2}}}.}$

The exact (Clopper-Pearson) method is the most conservative.[11] The Wald method although commonly recommended in the text books is the most biased.Template:Clarify

## Generating binomial random variates

Methods for random number generation where the marginal distribution is a binomial distribution are well-established.[15][16]

One way to generate random samples from a binomial distribution is to use an inversion algorithm. To do so, one must calculate the probability that P(X=k) for all values k from 0 through n. (These probabilities should sum to a value close to one, in order to encompass the entire sample space.) Then by using a Linear congruential generator to generate samples uniform between 0 and 1, one can transform the calculated samples U[0,1] into discrete numbers by using the probabilities calculated in step one.

## Tail Bounds

For knp, upper bounds for the lower tail of the distribution function can be derived. In particular, Hoeffding's inequality yields the bound

${\displaystyle F(k;n,p)\leq \exp \left(-2{\frac {(np-k)^{2}}{n}}\right),\!}$

and Chernoff's inequality can be used to derive the bound

${\displaystyle F(k;n,p)\leq \exp \left(-{\frac {1}{2\,p}}{\frac {(np-k)^{2}}{n}}\right).\!}$

Moreover, these bounds are reasonably tight when p = 1/2, since the following expression holds for all k3n/8[17]

${\displaystyle F(k;n,{\tfrac {1}{2}})\geq {\frac {1}{15}}\exp \left(-{\frac {16({\frac {n}{2}}-k)^{2}}{n}}\right).\!}$

However, the bounds do not work well for extreme values of p. In particular, as p ${\displaystyle \rightarrow }$ 1, value F(k;n,p) goes to zero (for fixed k, n with k<n) while the upper bound above goes to a positive constant. In this case a better bound is given by [18]

${\displaystyle F(k;n,p)\leq \exp \left(-nD\left({\frac {k}{n}}\left|\right|p\right)\right)\quad \quad {\mbox{if }}0<{\frac {k}{n}}

where D(a|| p) is the relative entropy between an a-coin and a p-coin (i.e. between the Bernoulli(a) and Bernoulli(p) distribution):

${\displaystyle D(a||p)=(a)\log {\frac {a}{p}}+(1-a)\log {\frac {1-a}{1-p}}.\!}$

Asymptotically, this bound is reasonably tight; see [18] for details. An equivalent formulation of the bound is

${\displaystyle \Pr(X\geq k)=F(n-k;n,1-p)\leq \exp \left(-nD\left({\frac {k}{n}}\left|\right|p\right)\right)\quad \quad {\mbox{if }}p<{\frac {k}{n}}<1.\!}$

Both these bounds are derived directly from the Chernoff bound. It can also be shown that,

${\displaystyle \Pr(X\geq k)=F(n-k;n,1-p)\geq {\frac {1}{(n+1)^{2}}}\exp \left(-nD\left({\frac {k}{n}}\left|\right|p\right)\right)\quad \quad {\mbox{if }}p<{\frac {k}{n}}<1.\!}$

This is proved using the method of types (see for example chapter 12 of Elements of Information Theory by Cover and Thomas [19]).

{{#invoke:Portal|portal}}

## References

1. {{#invoke:citation/CS1|citation |CitationClass=book }}
2. Hamilton Institute. "The Binomial Distribution" October 20, 2010.
3. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
4. Lord, Nick. (July 2010). "Binomial averages when the mean is an integer", The Mathematical Gazette 94, 331-332.
5. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
6. Template:Cite doi
7. {{#invoke:citation/CS1|citation |CitationClass=book }}
8. NIST/SEMATECH, "7.2.4. Does the proportion of defectives meet requirements?" e-Handbook of Statistical Methods.
9. NIST/SEMATECH, "6.3.3.1. Counts Control Charts", e-Handbook of Statistical Methods.
10. {{#invoke:citation/CS1|citation |CitationClass=book }}
11. {{#invoke:citation/CS1|citation |CitationClass=citation }}
12. {{#invoke:citation/CS1|citation |CitationClass=citation }}
13. {{#invoke:citation/CS1|citation |CitationClass=citation }}
14. Devroye, Luc (1986) Non-Uniform Random Variate Generation, New York: Springer-Verlag. (See especially Chapter X, Discrete Univariate Distributions)
15. Template:Cite doi
16. Matoušek, J, Vondrak, J: The Probabilistic Method (lecture notes) [1].
17. R. Arratia and L. Gordon: Tutorial on large deviations for the binomial distribution, Bulletin of Mathematical Biology 51(1) (1989), 125–131 [2].
18. T. Cover and J. Thomas, "Elements of Information Theory, 2nd Edition", Wiley 2006