# Fisher's exact test

{{ safesubst:#invoke:Unsubst||$N=Use dmy dates |date=__DATE__ |$B= }} Fisher's exact test is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes. It is named after its inventor, Sir R. A. Fisher, and is one of a class of exact tests, so called because the significance of the deviation from a null hypothesis (e.g., P-value) can be calculated exactly, rather than relying on an approximation that becomes exact in the limit as the sample size grows to infinity, as with many statistical tests. Fisher is said to have devised the test following a comment from Dr Muriel Bristol, who claimed to be able to detect whether the tea or the milk was added first to her cup; see lady tasting tea.

## Purpose and scope

The test is useful for categorical data that result from classifying objects in two different ways; it is used to examine the significance of the association (contingency) between the two kinds of classification. So in Fisher's original example, one criterion of classification could be whether milk or tea was put in the cup first; the other could be whether Dr Bristol thinks that the milk or tea was put in first. We want to know whether these two classifications are associated – that is, whether Dr Bristol really can tell whether milk or tea was poured in first. Most uses of the Fisher test involve, like this example, a 2 × 2 contingency table. The p-value from the test is computed as if the margins of the table are fixed, i.e. as if, in the tea-tasting example, Dr Bristol knows the number of cups with each treatment (milk or tea first) and will therefore provide guesses with the correct number in each category. As pointed out by Fisher, this leads under a null hypothesis of independence to a hypergeometric distribution of the numbers in the cells of the table.

With large samples, a chi-squared test can be used in this situation. However, the significance value it provides is only an approximation, because the sampling distribution of the test statistic that is calculated is only approximately equal to the theoretical chi-squared distribution. The approximation is inadequate when sample sizes are small, or the data are very unequally distributed among the cells of the table, resulting in the cell counts predicted on the null hypothesis (the "expected values") being low. The usual rule of thumb for deciding whether the chi-squared approximation is good enough is that the chi-squared test is not suitable when the expected values in any of the cells of a contingency table are below 5, or below 10 when there is only one degree of freedom (this rule is now known to be overly conservative). In fact, for small, sparse, or unbalanced data, the exact and asymptotic p-values can be quite different and may lead to opposite conclusions concerning the hypothesis of interest. In contrast the Fisher test is, as its name states, exact as long as the experimental procedure keeps the row and column totals fixed, and it can therefore be used regardless of the sample characteristics. It becomes difficult to calculate with large samples or well-balanced tables, but fortunately these are exactly the conditions where the chi-squared test is appropriate.

For hand calculations, the test is only feasible in the case of a 2 × 2 contingency table. However the principle of the test can be extended to the general case of an m × n table, and some statistical packages provide a calculation (sometimes using a Monte Carlo method to obtain an approximation) for the more general case.

## Example

For example, a sample of teenagers might be divided into male and female on the one hand, and those that are and are not currently dieting on the other. We hypothesize, for example, that the proportion of dieting individuals is higher among the women than among the men, and we want to test whether any difference of proportions that we observe is significant. The data might look like this:

    Men      Women   Dieting Row total 1 9 10 11 3 14 Column total 12 12 24

The question we ask about these data is: knowing that 10 of these 24 teenagers are dieters, and that 12 of the 24 are female, and assuming the null hypothesis that men and women are equally likely to diet, what is the probability that these 10 dieters would be so unevenly distributed between the women and the men? If we were to choose 10 of the teenagers at random, what is the probability that 9 or more of them would be among the 12 women, and only 1 or fewer from among the 12 men?

Before we proceed with the Fisher test, we first introduce some notation. We represent the cells by the letters a, b, c and d, call the totals across rows and columns marginal totals, and represent the grand total by n. So the table now looks like this:

    Men      Women   Dieting Row Total a b a + b c d c + d Column Total a + c b + d a + b + c + d (=n)

Fisher showed that the probability of obtaining any such set of values was given by the hypergeometric distribution:

where ${\tbinom {n}{k}}$ is the binomial coefficient and the symbol ! indicates the factorial operator. With the data above, this gives:

The formula above gives the exact hypergeometric probability of observing this particular arrangement of the data, assuming the given marginal totals, on the null hypothesis that men and women are equally likely to be dieters. To put it another way, if we assume that the probability that a man is a dieter is P, the probability that a woman is a dieter is p, and we assume that both men and women enter our sample independently of whether or not they are dieters, then this hypergeometric formula gives the conditional probability of observing the values a, b, c, d in the four cells, conditionally on the observed marginals (i.e., assuming the row and column totals shown in the margins of the table are given). This remains true even if men enter our sample with different probabilities than women. The requirement is merely that the two classification characteristics—gender, and dieter (or not) -- are not associated.

For example, suppose we knew probabilities P,Q,p,q with P+Q=p+q=1 such that (male dieter, male non-dieter, female dieter, female non-dieter) had respective probabilities (Pp,Pq,Qp,Qq) for each individual encountered under our sampling procedure. Then still, were we to calculate the distribution of cell entries conditional given marginals, we would obtain the above formula in which neither p nor P occurs. Thus, we can calculate the exact probability of any arrangement of the 24 teenagers into the four cells of the table, but Fisher showed that to generate a significance level, we need consider only the cases where the marginal totals are the same as in the observed table, and among those, only the cases where the arrangement is as extreme as the observed arrangement, or more so. (Barnard's test relaxes this constraint on one set of the marginal totals.) In the example, there are 11 such cases. Of these only one is more extreme in the same direction as our data; it looks like this:

    Men      Women   Dieting Row Total 0 10 10 12 2 14 Column Total 12 12 24

For this table (with extremely unequal dieting proportions) the probability is ${p={\tbinom {10}{0}}{\tbinom {14}{12}}}/{\tbinom {24}{12}}\approx 0.000033652$ .