|
|
Line 1: |
Line 1: |
| | | Hello! My name is Fleta. <br>It is a little about myself: I live in France, my city of Pau. <br>It's called often Eastern or cultural capital of ILE-DE-FRANCE. I've married 3 years ago.<br>I have 2 children - a son (Michal) and the daughter (Dwayne). We all like Table tennis.<br><br>Check out my web site :: Hostgator Coupon Codes ([http://sec.pusan.ac.kr/xe/DS201402/2181645 sec.pusan.ac.kr]) |
| | |
| In [[statistics]], the '''odds ratio'''<ref>{{cite journal | last=Cornfield | first=J | journal=Journal of the National Cancer Institute | title=A Method for Estimating Comparative Rates from Clinical Data. Applications to Cancer of the Lung, Breast, and Cervix | volume=11 | pages=1269–1275 |year=1951|pmid=14861651 }}</ref><ref>{{cite journal | last=Mosteller | first=Frederick | year=1968 | volume=63 | title=Association and Estimation in Contingency Tables | pages=1–28 | journal=Journal of the American Statistical Association | doi=10.2307/2283825 | issue=321 | publisher=American Statistical Association | jstor=2283825}}</ref><ref>{{cite journal | last=Edwards | first=A.W.F. | year=1963 | journal=Journal of the Royal Statistical Society, Series A | volume=126 | pages=109–114 | title=The measure of association in a 2x2 table | doi=10.2307/2982448 | issue=1 | publisher=Blackwell Publishing | jstor=2982448}}</ref> (usually abbreviated ″OR″) is one of three main ways to quantify how strongly the presence or absence of property A is [[Association (statistics)|associated]] with the presence or absence of property B in a given [[Statistical population|population]]. If each individual in a [[Statistical population|population]] either does or does not have a property ″A,″ (e.g. "high blood pressure″), and also either does or does not have a property ″B″ (e.g. ″moderate alcohol consumption″) where both properties are appropriately defined, then a ratio can be formed which quantitatively describes the association between the presence/absence of "A" (high blood pressure) and the presence/absence of "B" (moderate alcohol consumption) for individuals in the population. This ratio is the odds ratio (OR) and can be computed following these steps:
| |
| | |
| * 1) For a given individual that has "B" compute the [[odds]] that the same individual has "A"
| |
| * 2) For a given individual that does not have "B" compute the odds that the same individual has "A"
| |
| * 3) Divide the odds from step 1 by the odds from step 2 to obtain the odds ratio (OR).
| |
| | |
| The term "individual" in this usage does not have to refer to a human being, as a statistical [[Statistical population|population]] can measure any set of entities, whether living or inanimate.
| |
| | |
| If the OR is greater than 1, then having ″A″ is considered to be ″associated″ with having ″B″ in the sense that the having of ″B″ raises (relative to not-having ″B″) the odds of having ″A.″ Note that this is not enough to establish that B is a contributing cause of ″A″: it could be that the association is due to a third property, ″C,″ which is a contributing cause of both ″A″ and ″B.″
| |
| | |
| The two other major ways of quantifying association are the [[risk ratio]] (″RR″) and the [[absolute risk reduction]] (″ARR″). In clinical studies and many other settings, the parameter of greatest interest is often actually the RR, which is determined in a way that is similar to the one just described for the OR, except using probabilities instead of odds. Frequently, however, the available data [[Odds ratio#A motivating example, in the context of the rare disease assumption|only allows the computation of the OR]]; notably, this is so in the case of [[Case-control study|case-control studies]], as explained below. On the other hand, if one of the properties (say, A) is sufficiently rare (the ″[[rare disease assumption]]″), then the OR of having A given that the individual has B is a good approximation to the corresponding RR (the specification ″A given B″ is needed because, while the OR treats the two properties symmetrically, the RR and other measures do not).
| |
| | |
| In a more technical language, the OR is a measure of [[effect size]], describing the strength of [[association (statistics)|association]] or non-[[independence (probability theory)|independence]] between two binary [[data]] values. It is used as a [[descriptive statistics|descriptive statistic]], and plays an important role in [[logistic regression]].
| |
| | |
| | |
| | |
| == Definition and basic properties ==
| |
| | |
| ===A motivating example, in the context of the rare disease assumption===
| |
| | |
| Imagine there is rare disease, afflicting, say, only one in many thousands of adults in a country. Imagine we suspect that being exposed to something (say, having had a particular sort of injury in childhood) makes it more likely to develop that disease in adulthood. The most informative thing to compute would be the risk ratio, RR. To do this in the ideal case, for all the adults in the population we would need to know whether they (a) had the exposure to the injury as children and (b) whether they developed the disease as adults. From this we would extract the following information: the total number of people exposed to the childhood injury, <math>N_{E},</math> out of which <math>D_{E}</math> developed the disease and <math>H_{E}</math> stayed healthy; and the total number of people not exposed, <math>N_{NE},</math> out of which <math>D_{NE}</math> developed the disease and <math>H_{NE}</math> stayed healthy. Since <math>N_{E}=D_{E}+H_{E}</math> and similarly for the ″NE″ numbers, we only have four independent numbers, which we can organize in a [[contingency table|table]]:
| |
| | |
| <center>
| |
| {| class="wikitable" style="text-align: center; background: #FFFFFF;"
| |
| |-----
| |
| |
| |
| || Diseased || Healthy
| |
| |-----
| |
| | Exposed || <math>D_{E}</math> || <math>H_{E}</math>
| |
| |-----
| |
| | Not exposed || <math>D_{NE}</math> || <math>H_{NE}</math>
| |
| |}</center>
| |
| | |
| To avoid possible confusion, we emphasize that all these numbers refer to the entire population, and not to some sample of it.
| |
| | |
| Now the ''risk'' of developing the disease given exposure is <math>D_{E}/N_{E}</math> (where <math>N_{E}=D_{E}+H_{E}</math>), and of developing the disease given non-exposure is <math>D_{NE}/N_{NE}.</math> The ''risk ratio'', RR, is just the ratio of the two,
| |
| | |
| :<math>RR=\frac{D_{E}/N_{E}}{D_{NE}/N_{NE}}\,,</math>
| |
| | |
| which can be rewritten as <math>RR=\frac{D_{E}N_{NE}}{D_{NE}N_{E}}=\frac{D_{E}/D_{NE}}{N_{E}/N_{NE}}.</math>
| |
| | |
| In contrast, the ''odds'' of developing the disease given exposure is <math>D_{E}/H_{E}\,,</math> and of developing the disease given non-exposure is <math>D_{NE}/H_{NE}\,.</math> The ''odds ratio'', OR, is the ratio of the two,
| |
| | |
| :<math>OR=\frac{D_{E}/H_{E}}{D_{NE}/H_{NE}}\,,</math>
| |
| | |
| which can be rewritten as <math>OR=\frac{D_{E}H_{NE}}{D_{NE}H_{E}}=\frac{D_{E}/D_{NE}}{H_{E}/H_{NE}}.</math>
| |
| | |
| We may already note that if the disease is rare, then OR≈RR. Indeed, for a rare disease, we will have <math>D_{E}\ll H_{E},</math> and so <math>D_{E}+H_{E}\approx H_{E};</math> but then <math>D_{E}/(D_{E}+H_{E})\approx D_{E}/H_{E},</math> in other words, for the exposed population, the risk of developing the disease is approximately equal to the odds. Analogous reasoning shows that this the risk is approximately equal to the odds for the non-exposed population as well; but then the ''ratio'' of the risks, which is RR, is approximately equal to the ratio of the odds, which is OR. Or, we could just notice that the rare disease assumption says that <math>N_{E}\approx H_{E}</math> and <math>N_{NE}\approx H_{NE},</math> from which it follows that <math>N_{E}/N_{NE}\approx H_{E}/H_{NE},</math> in other words that the denominators in the final expressions for the RR and the OR are approximately the same. The numerators are exactly the same, and so, again, we conclude that OR≈RR.
| |
| | |
| Returning to our hypothetical study, the problem we often face is that we may not have the data to estimate these four numbers. For example, we may not have the population-wide data on who did or did not have the childhood injury.
| |
| | |
| Often we may overcome this problem by employing [[random sampling]] of the population: namely, if neither the disease nor the exposure to the injury are too rare in our population, then we can pick (say) a hundred people at random, and find out these four numbers in that sample; assuming the sample is representative enough of the population, then the RR computed for this sample will be a good estimate for the RR for the whole population.
| |
| | |
| However, some diseases may be so rare that, in all likelihood, even a large random sample may not contain even a single diseased individual (or it may contain some, but too few to be statistically significant). This would make it impossible to compute the RR. But, we ''may'' nevertheless be able to estimate the OR, ''provided that'', unlike the disease, the exposure to the childhood injury is not too rare. Of course, because the disease is rare, this is then also our estimate for the RR.
| |
| | |
| Looking at the final expression for the OR: the fraction in the numerator, <math>D_{E}/D_{NE},</math> we can estimate by collecting all the known cases of the disease (presumably there must be some, or else we likely wouldn't be doing the study in the first place), and seeing how many of the diseased people had the exposure, and how many did not. And the fraction in the denominator, <math>H_{E}/H_{NE},</math> is the odds that a healthy individual in the population was exposed to the childhood injury. Now note that this latter odds can indeed be estimated by random sampling of the population—provided, as we said, that the [[prevalence]] of the exposure to the childhood injury is not too small, so that a random sample of a manageable size would be likely to contain a fair number of individuals who have had the exposure. So here the disease is very rare, but the factor thought to contribute to it is not quite so rare; such situations are quite common in practice.
| |
| | |
| Thus we can estimate the OR, and then, invoking the rare disease assumption again, we say that this is also a good approximation of the RR. Incidentally, the story just told is a paradigmatic example of a [[case-control study]].<ref name=BUCaseControl>{{Citation | last = LaMorte | first =Wayne W. | title = Case-Control Studies | publisher = [[Boston University School of Public Health]] | date = May 13, 2013 | url = http://sph.bu.edu/otlt/MPH-Modules/EP/EP713_AnalyticOverview/EP713_AnalyticOverview5.html#| accessdate = 2013-09-02}}</ref>
| |
| | |
| The same story ''could'' be told without ever mentioning the OR, like so: as soon as we have that <math>N_{E}\approx H_{E}</math> and <math>N_{NE}\approx H_{NE},</math> then we have that <math>N_{E}/N_{NE}\approx H_{E}/H_{NE}.</math> Thus if, by random sampling, we manage to estimate <math>H_{E}/H_{NE},</math> then, by rare disease assumption, that will be a good estimate of <math>N_{E}/N_{NE},</math> which is all we need (besides <math>D_{E}/D_{NE},</math> which we presumably already know by studying the few cases of the disease) to compute the RR. However, it is standard in the literature to explicitly report the OR and then claim that the RR is approximately equal to it.
| |
| | |
| ===Definition in terms of group-wise odds===
| |
| | |
| The odds ratio is the ratio of the [[odds]] of an event occurring in one group to the odds of it occurring in another group. The term is also used to refer to sample-based estimates of this ratio. These groups might be men and women, an experimental group and a [[control group]], or any other [[dichotomy|dichotomous]] classification. If the probabilities of the event in each of the groups are ''p''<sub>1</sub> (first group) and ''p''<sub>2</sub> (second group), then the odds ratio is:
| |
| | |
| :<math>{ p_1/(1-p_1) \over p_2/(1-p_2)}={ p_1/q_1 \over p_2/q_2}=\frac{\;p_1q_2\;}{\;p_2q_1\;},</math>
| |
| | |
| where ''q''<sub>x</sub> = 1 − ''p''<sub>''x''</sub>. An odds ratio of 1 indicates that the condition or event under study is equally likely to occur in both groups. An odds ratio greater than 1 indicates that the condition or event is more likely to occur in the first group. And an odds ratio less than 1 indicates that the condition or event is less likely to occur in the first group. The odds ratio must be nonnegative if it is defined. It is undefined if ''p''<sub>2</sub>''q''<sub>1</sub> equals zero, i.e., if ''p''<sub>2</sub> equals zero or ''p''<sub>1</sub> equals one.
| |
| | |
| ===Definition in terms of joint and conditional probabilities===
| |
| | |
| The odds ratio can also be defined in terms of the joint [[probability distribution]] of two binary [[random variable]]s. The joint distribution of binary random variables ''X'' and ''Y'' can be written
| |
| | |
| <center>
| |
| {| class="wikitable" style="text-align: center; background: #FFFFFF;"
| |
| |-----
| |
| |
| |
| || ''Y'' = 1 || ''Y'' = 0
| |
| |-----
| |
| | ''X'' = 1 || <math>p_{11}</math> || <math>p_{10}</math>
| |
| |-----
| |
| | ''X'' = 0 || <math>p_{01}</math> || <math>p_{00}</math>
| |
| |}</center>
| |
| | |
| where ''p''<sub>11</sub>, ''p''<sub>10</sub>, ''p''<sub>01</sub> and ''p''<sub>00</sub> are non-negative "cell probabilities" that sum to one. The odds for ''Y'' within the two subpopulations defined by ''X'' = 1 and ''X'' = 0 are defined in terms of the [[conditional probabilities]] given ''X'', ''i.e.'', ''P''(''Y''|''X''):
| |
| | |
| <center>
| |
| {| class="wikitable" style="text-align: center; background: #FFFFFF;"
| |
| |-----
| |
| |
| |
| || ''Y'' = 1 || ''Y'' = 0
| |
| |-----
| |
| | ''X'' = 1 || <math>p_{11}/(p_{11}+p_{10})</math> || <math>p_{10}/(p_{11}+p_{10})</math>
| |
| |-----
| |
| | ''X'' = 0 || <math>p_{01}/(p_{01}+p_{00})</math> || <math>p_{00}/(p_{01}+p_{00})</math>
| |
| |}</center>
| |
| | |
| Thus the odds ratio is
| |
| | |
| :<math>{ \dfrac{p_{11}/(p_{11}+p_{10})}{p_{10}/(p_{11}+p_{10})} \bigg / \dfrac{p_{01}/(p_{01}+p_{00})}{p_{00}/(p_{01}+p_{00})}} = \dfrac{p_{11}p_{00}}{p_{10}p_{01}}.</math>
| |
| | |
| The simple expression on the right, above, is easy to remember as the product of the probabilities of the "concordant cells" (''X'' = ''Y'') divided by the product of the probabilities of the "discordant cells" (''X'' ≠ ''Y''). However note that in some applications the labeling of categories as zero and one is arbitrary, so there is nothing special about concordant versus discordant values in these applications.
| |
| | |
| ===Symmetry===
| |
| | |
| If we had calculated the odds ratio based on the conditional probabilities given ''Y'',
| |
| | |
| <center>
| |
| {| class="wikitable" style="text-align: center; background: #FFFFFF;"
| |
| |-----
| |
| |
| |
| || ''Y'' = 1 || ''Y'' = 0
| |
| |-----
| |
| | ''X'' = 1 || <math>p_{11}/(p_{11}+p_{01})</math> || <math>p_{10}/(p_{10}+p_{00})</math>
| |
| |-----
| |
| | ''X'' = 0 || <math>p_{01}/(p_{11}+p_{01})</math> || <math>p_{00}/(p_{10}+p_{00})</math>
| |
| |}</center>
| |
| | |
| we would have gotten the same result
| |
| | |
| :<math>{ \dfrac{p_{11}/(p_{11}+p_{01})}{p_{01}/(p_{11}+p_{01})} \bigg / \dfrac{p_{10}/(p_{10}+p_{00})}{p_{00}/(p_{10}+p_{00})}} = \dfrac{p_{11}p_{00}}{p_{10}p_{01}}.</math>
| |
| | |
| Other measures of effect size for binary data such as the [[relative risk]] do not have this symmetry property.
| |
| | |
| ===Relation to statistical independence===
| |
| | |
| If ''X'' and ''Y'' are independent, their joint probabilities can be expressed in terms of their marginal probabilities ''p''<sub>''x''</sub> = ''P''(''X'' = 1) and ''p''<sub>''y''</sub> = ''P''(''Y'' = 1), as follows
| |
| | |
| <center>
| |
| {| class="wikitable" style="text-align: center; background: #FFFFFF;"
| |
| |-----
| |
| |
| |
| || ''Y'' = 1 || ''Y'' = 0
| |
| |-----
| |
| | ''X'' = 1 || <math>p_xp_y</math> || <math>p_x(1-p_y)</math>
| |
| |-----
| |
| | ''X'' = 0 || <math>(1-p_x)p_y</math> || <math>(1-p_x)(1-p_y)</math>
| |
| |}</center>
| |
| | |
| In this case, the odds ratio equals one, and conversely the odds ratio can only equal one if the joint probabilities can be factored in this way. Thus the odds ratio equals one if and only if ''X'' and ''Y'' are [[statistical independence|independent]].
| |
| | |
| ===Recovering the cell probabilities from the odds ratio and marginal probabilities===
| |
| | |
| The odds ratio is a function of the cell probabilities, and conversely, the cell probabilities can be recovered given knowledge of the odds ratio and the marginal probabilities ''P''(''X'' = 1) = ''p''<sub>11</sub> + ''p''<sub>10</sub> and ''P''(''Y'' = 1) = ''p''<sub>11</sub> + ''p''<sub>01</sub>. If the odds ratio ''R'' differs from 1, then
| |
| | |
| :<math>
| |
| p_{11} = \frac{1 + (p_{1\cdot}+p_{\cdot 1})(R-1) - S}{2(R-1)}
| |
| </math>
| |
| | |
| where ''p''<sub>1•</sub> = ''p''<sub>11</sub> + ''p''<sub>10</sub>, ''p''<sub>•1</sub> = ''p''<sub>11</sub> + ''p''<sub>01</sub>, and
| |
| | |
| :<math>
| |
| S = \sqrt{(1+(p_{1\cdot}+p_{\cdot 1})(R-1))^2 + 4R(1-R)p_{1\cdot}p_{\cdot 1}}.
| |
| </math>
| |
| | |
| In the case where ''R'' = 1, we have independence, so ''p''<sub>11</sub> = ''p''<sub>1•</sub>''p''<sub>•1</sub>.
| |
| | |
| Once we have ''p''<sub>11</sub>, the other three cell probabilities can easily be recovered from the marginal probabilities.
| |
| | |
| ==Example==
| |
| [[Image:odds ratio map.svg|300px|right|thumb|A graph showing how the log odds ratio relates to the underlying probabilities of the outcome ''X'' occurring in two groups, denoted ''A'' and ''B''. The log odds ratio shown here is based on the odds for the event occurring in group ''B'' relative to the odds for the event occurring in group ''A''. Thus, when the probability of ''X'' occurring in group ''B'' is greater than the probability of ''X'' occurring in group ''A'', the odds ratio is greater than 1, and the log odds ratio is greater than 0.]]
| |
| | |
| Suppose that in a sample of 100 men, 90 drank wine in the previous week, while in a sample of 100 women only 20 drank wine in the same period. The odds of a man drinking wine are 90 to 10, or 9:1, while the odds of a woman drinking wine are only 20 to 80, or 1:4 = 0.25:1. The odds ratio is thus 9/0.25, or 36, showing that men are much more likely to drink wine than women. The detailed calculation is:
| |
| | |
| :<math>{ 0.9/0.1 \over 0.2/0.8}=\frac{\;0.9\times 0.8\;}{\;0.1\times 0.2\;} ={0.72 \over 0.02} = 36.</math>
| |
| | |
| This example also shows how odds ratios are sometimes sensitive in stating relative positions: in this sample men are 90/20 = 4.5 times more likely to have drunk wine than women, but have 36 times the odds. The logarithm of the odds ratio, the difference of the [[logit]]s of the [[probability|probabilities]], tempers this effect, and also makes the measure [[symmetry|symmetric]] with respect to the ordering of groups. For example, using [[natural logarithms]], an odds ratio of 36/1 maps to 3.584, and an odds ratio of 1/36 maps to −3.584.
| |
| | |
| ==Statistical inference==
| |
| [[Image:odds ratio minsig.svg|300px|right|thumb|A graph showing the minimum value of the sample log odds ratio statistic that must be observed to be deemed significant at the 0.05 level, for a given sample size. The three lines correspond to different settings of the marginal probabilities in the 2x2 contingency table (the row and column marginal probabilities are equal in this graph).]]Several approaches to statistical inference for odds ratios have been developed.
| |
| | |
| One approach to inference uses large sample approximations to the sampling distribution of the log odds ratio (the [[natural logarithm]] of the odds ratio). If we use the joint probability notation defined above, the population log odds ratio is
| |
| | |
| :<math>{\log\left(\frac{p_{11}p_{00}}{p_{01}p_{10}}\right) = \log(p_{11}) + \log(p_{00}\big) - \log(p_{10}) - \log(p_{01})}.\,</math>
| |
| | |
| If we observe data in the form of a [[contingency table]]
| |
| | |
| <center>
| |
| {| class="wikitable" style="text-align: center; background: #FFFFFF;"
| |
| |-----
| |
| |
| |
| || ''Y'' = 1 || ''Y'' = 0
| |
| |-----
| |
| | ''X'' = 1 || <math>n_{11}</math> || <math>n_{10}</math>
| |
| |-----
| |
| | ''X'' = 0 || <math>n_{01}</math> || <math>n_{00}</math>
| |
| |}</center>
| |
| | |
| then the probabilities in the joint distribution can be estimated as
| |
| | |
| <center>
| |
| {| class="wikitable" style="text-align: center; background: #FFFFFF;"
| |
| |-----
| |
| |
| |
| || ''Y'' = 1 || ''Y'' = 0
| |
| |-----
| |
| | ''X'' = 1 || <math>\hat{p}_{11}</math> || <math>\hat{p}_{10}</math>
| |
| |-----
| |
| | ''X'' = 0 || <math>\hat{p}_{01}</math> || <math>\hat{p}_{00}</math>
| |
| |}</center>
| |
| | |
| where ''p''̂ = ''n''<sub>''ij''</sub> / ''n'', with ''n'' = ''n''<sub>11</sub> + ''n''<sub>10</sub> + ''n''<sub>01</sub> + ''n''<sub>00</sub> being the sum of all four cell counts. The sample log odds ratio is
| |
| | |
| :<math>{L=\log\left(\dfrac{\hat{p}_{11}\hat{p}_{00}}{\hat{p}_{10}\hat{p}_{01}}\right) = \log\left(\dfrac{n_{11}n_{00}}{n_{10}n_{01}}\right)}</math>.
| |
| | |
| The distribution of the log odds ratio is approximately [[Normal distribution|normal]] with:
| |
| | |
| <math>
| |
| X\ \sim\ \mathcal{N}(\log (OR),\,\sigma^2). \,
| |
| </math>
| |
| | |
| The [[standard error (statistics)|standard error]] for the log odds ratio is approximately
| |
| | |
| :<math>{{\rm SE} = \sqrt{\dfrac{1}{n_{11}} + \dfrac{1}{n_{10}} + \dfrac{1}{n_{01}} + \dfrac{1}{n_{00}}}}</math>.
| |
| | |
| This is an asymptotic approximation, and will not give a meaningful result if any of the cell counts are very small. If ''L'' is the sample log odds ratio, an approximate 95% [[confidence interval]] for the population log odds ratio is ''L'' ± 1.96''SE''.<ref>{{cite journal|doi=10.1136/bmj.296.6632.1313|author=Morris and Gardner|year=1988|last2=Gardner|first2=MJ|title=Calculating confidence intervals for relative risks (odds ratios) and standardised ratios and rates|journal=British Medical Journal|volume=296|issue=6632|pages=1313–1316|url=http://www.bmj.com/cgi/reprint/296/6632/1313|pmid=3133061|pmc=2545775}}</ref> This can be mapped to exp(''L'' − 1.96SE), exp(''L'' + 1.96SE) to obtain a 95% confidence interval for the odds ratio. If we wish to test the hypothesis that the population odds ratio equals one, the two-sided [[p-value]] is 2''P''(''Z''< −|''L''|/SE), where ''P'' denotes a probability, and ''Z'' denotes a [[standard normal random variable]].
| |
| | |
| An alternative approach to inference for odds ratios looks at the distribution of the data conditionally on the marginal frequencies of ''X'' and ''Y''. An advantage of this approach is that the sampling distribution of the odds ratio can be expressed exactly.
| |
| | |
| ==Role in logistic regression==
| |
| | |
| [[Logistic regression]] is one way to generalize the odds ratio beyond two binary variables. Suppose we have a binary response variable ''Y'' and a binary predictor variable ''X'', and in addition we have other predictor variables ''Z''<sub>1</sub>, ..., ''Z<sub>p</sub>'' that may or may not be binary. If we use multiple logistic regression to regress ''Y'' on ''X'', ''Z<sub>1</sub>'', ..., ''Z<sub>p</sub>'', then the estimated coefficient <math>\hat{\beta}_x</math> for ''X'' is related to a conditional odds ratio. Specifically, at the population level
| |
| | |
| :<math>
| |
| \exp(\beta_x) = \frac{P(Y=1|X=1, Z_1, \ldots, Z_p)/P(Y=0|X=1, Z_1, \ldots, Z_p)}{P(Y=1|X=0, Z_1, \ldots, Z_p)/P(Y=0|X=0, Z_1, \ldots, Z_p)},
| |
| </math>
| |
| | |
| so <math>\exp(\hat{\beta}_x)</math> is an estimate of this conditional odds ratio. The interpretation of <math>\exp(\hat{\beta}_x)</math> is as an estimate of the odds ratio between ''Y'' and ''X'' when the values of ''Z''<sub>1</sub>, ..., ''Z<sub>p</sub>'' are held fixed.
| |
| | |
| ==Insensitivity to the type of sampling==
| |
| | |
| If the data form a "population sample", then the cell probabilities ''p''̂<sub>''ij''</sub> are interpreted as the frequencies of each of the four groups in the population as defined by their ''X'' and ''Y'' values. In many settings it is impractical to obtain a population sample, so a selected sample is used. For example, we may choose to sample [[unit (statistics)|units]] with ''X'' = 1 with a given probability ''f'', regardless of their frequency in the population (which would necessitate sampling units with ''X'' = 0 with probability 1 − ''f''). In this situation, our data would follow the following joint probabilities:
| |
| | |
| <center>
| |
| {| class="wikitable" style="text-align: center; background: #FFFFFF;"
| |
| |-----
| |
| |
| |
| || ''Y'' = 1 || ''Y'' = 0
| |
| |-----
| |
| | ''X'' = 1 || <math>fp_{11}/(p_{11}+p_{10})</math> || <math>fp_{10}(p_{11}+p_{10})</math>
| |
| |-----
| |
| | ''X'' = 0 || <math>(1-f)p_{01}/(p_{01}+p_{00})</math> || <math>(1-f)p_{00}/(p_{01}+p_{00})</math>
| |
| |}</center>
| |
| | |
| The ''odds ratio'' ''p''<sub>11</sub>''p''<sub>00</sub> / ''p''<sub>01</sub>''p''<sub>10</sub> for this distribution does not depend on the value of ''f''. This shows that the odds ratio (and consequently the log odds ratio) is invariant to non-random sampling based on one of the variables being studied. Note however that the standard error of the log odds ratio does depend on the value of ''f''. This fact is exploited in two important situations:
| |
| | |
| * Suppose it is inconvenient or impractical to obtain a population sample, but it is practical to obtain a [[accidental sampling|convenience sample]] of units with different ''X'' values, such that within the ''X'' = 0 and ''X'' = 1 subsamples the ''Y'' values are representative of the population (i.e. they follow the correct conditional probabilities).
| |
| | |
| * Suppose the marginal distribution of one variable, say ''X'', is very skewed. For example, if we are studying the relationship between high alcohol consumption and pancreatic cancer in the general population, the incidence of pancreatic cancer would be very low, so it would require a very large population sample to get a modest number of pancreatic cancer cases. However we could use data from hospitals to contact most or all of their pancreatic cancer patients, and then randomly sample an equal number of subjects without pancreatic cancer (this is called a "case-control study").
| |
| | |
| In both these settings, the odds ratio can be calculated from the selected sample, without biasing the results relative to what would have been obtained for a population sample.
| |
| | |
| ==Use in quantitative research==
| |
| | |
| Due to the widespread use of [[logistic regression]], the odds ratio is widely used in many fields of medical and social science research. The odds ratio is commonly used in [[survey research]], in [[epidemiology]], and to express the results of some [[clinical trial]]s, such as in [[case-control studies]]. It is often abbreviated "OR" in reports. When data from multiple surveys is combined, it will often be expressed as "pooled OR".
| |
| | |
| ==Relation to relative risk==
| |
| In clinical studies, as well as in some other settings, the parameter of greatest interest is often the [[relative risk]] rather than the odds ratio. The relative risk is best estimated using a population sample, but if the [[rare disease assumption]] holds, the odds ratio is a good approximation to the relative risk — the [[odds]] is ''p'' / (1 − ''p''), so when ''p'' moves towards zero, 1 − ''p'' moves towards 1, meaning that the odds approaches the risk, and the odds ratio approaches the relative risk.<ref name="pmid18580722">{{cite journal |author=Viera AJ |title=Odds ratios and risk ratios: what's the difference and why does it matter? |journal=South. Med. J. |volume=101 |issue=7 |pages=730–4 |date=July 2008 |pmid=18580722 |doi=10.1097/SMJ.0b013e31817a7ee4 |url=http://meta.wkhealth.com/pt/pt-core/template-journal/lwwgateway/media/landingpage.htm?issn=0038-4348&volume=101&issue=7&spage=730 }}</ref> When the rare disease assumption does not hold, the odds ratio can overestimate the relative risk.<ref name="pmid9832001">{{cite journal |author=Zhang J, Yu KF |title=What's the relative risk? A method of correcting the odds ratio in cohort studies of common outcomes |journal=JAMA |volume=280 |issue=19 |pages=1690–1 |date=November 1998 |pmid=9832001 |doi= 10.1001/jama.280.19.1690|url=http://jama.ama-assn.org/cgi/pmidlookup?view=long&pmid=9832001}}</ref><ref name="pmid12377421">{{cite journal |author=Robbins AS, Chao SY, Fonseca VP |title=What's the relative risk? A method to directly estimate risk ratios in cohort studies of common outcomes |journal=Ann Epidemiol |volume=12 |issue=7 |pages=452–4 |date=October 2002 |pmid=12377421 |doi= 10.1016/S1047-2797(01)00278-2|url=http://linkinghub.elsevier.com/retrieve/pii/S1047279701002782}}</ref><ref>{{cite journal | last=Nurminen | first=Markku | journal=European Journal of Epidemiology | title= To Use or Not to Use the Odds Ratio in Epidemiologic Analyses? | volume=11 | issue=4 | pages=365–371 | doi=10.1007/BF01721219 | year=1995 | jstor=3582428}}</ref>
| |
| | |
| If the absolute risk in the control group is available, conversion between the two is calculated by:<ref name="pmid9832001"/>
| |
| | |
| : <math> RR \approx \frac{OR}{1 - R_C + (R_C \times OR)}</math>
| |
| | |
| where:
| |
| * ''RR'' = relative risk
| |
| * ''OR'' = odds ratio
| |
| * ''R''<sub>''C''</sub> = absolute risk in the unexposed group, given as a fraction (for example: fill in 10% risk as 0.1)
| |
| | |
| ===Confusion and exaggeration===
| |
| Odds ratios have often been confused with relative risk in medical literature. For non-statisticians, the odds ratio is a difficult concept to comprehend, and it gives a more impressive figure for the effect.<ref name="bmj.com">"On the use, misuse and interpretation of odds ratios". Dirk Taeger, Yi Sun, Kurt Straif. 10 August 1998. {{doi| 10.1136/bmj.316.7136.989}} http://www.bmj.com/content/316/7136/989?tab=responses</ref> However, most authors consider that the relative risk is readily understood.<ref name=ACourt>"Against all odds? Improving the understanding of risk reporting". A'Court, Christine; Stevens, Richard; Heneghan, Carl. ''British Journal of General Practice'', Volume 62, Number 596, March 2012, pp. e220-e223(4). {{doi|10.3399/bjgp12X630223}}</ref> In one study, members of a national disease foundation were actually 3.5 times more likely than nonmembers to have heard of a common treatment for that disease – but the odds ratio was 24 and the paper stated that members were ‘more than 20-fold more likely to have heard of’ the treatment.<ref>Nijsten T, Rolstad T, Feldman SR, Stern RS. Members of the national psoriasis foundation: more extensive disease and better informed about treatment options. ''Archives of Dermatology'' 2005;141(1): 19–26, p24 table 3 and text. http://archderm.ama-assn.org/cgi/reprint/141/1/19.pdf</ref> A study of papers published in two journals reported that 26% of the articles that used an odds ratio interpreted it as a risk ratio.<ref>Holcomb WL, Chaiworapongsa T, Luke DA, Burgdorf KD. (2001) [http://journals.lww.com/greenjournal/fulltext/2001/10000/an_odd_measure_of_risk__use_and_misuse_of_the_odds.28.aspx "An Odd Measure of Risk: Use and Misuse of the Odds Ratio"]. ''Obstetrics and Gynecology'', 98(4): 685–688.</ref>
| |
| | |
| This may reflect the simple process of uncomprehending authors choosing the most impressive-looking and publishable figure.<ref name=ACourt/> But its use may in some cases be deliberately deceptive.<ref>"The trouble with odds ratios". Thabani Sibanda. 1 May 2003 {{doi| 10.1136/bmj.316.7136.989}} http://www.bmj.com/content/316/7136/989?tab=responses</ref> It has been suggested that the odds ratio should only be presented as a measure of [[effect size]] when the [[risk ratio]] can not be estimated directly.<ref name="bmj.com"/>
| |
| | |
| ==Invertibility and invariance==
| |
| | |
| The odds ratio has another unique property of being directly mathematically invertible whether analyzing the OR as either disease survival or disease onset incidence – where the OR for survival is direct reciprocal of 1/OR for risk.{{citation needed|date=June 2012}} This is known as the 'invariance of the odds ratio'. In contrast, the relative risk does not possess this mathematical invertible property when studying disease survival vs. onset incidence.{{citation needed|date=June 2012}} This phenomenon of OR invertibility vs. RR non-invertibility is best illustrated with an example:
| |
| | |
| Suppose in a clinical trial, one has an adverse event risk of 4/100 in drug group, and 2/100 in placebo... yielding a RR=2 and OR=2.04166 for drug-vs-placebo adverse risk. However, if analysis was inverted and adverse events were instead analyzed as event-free survival, then the drug group would have a rate of 96/100, and placebo group would have a rate of 98/100—yielding a drug-vs-placebo a RR=0.9796 for survival, but an OR=0.48979. As one can see, a RR of 0.9796 is clearly not the reciprocal of a RR of 2. In contrast, an OR of 0.48979 is indeed the direct reciprocal of an OR of 2.04166.
| |
| | |
| This is again what is called the 'invariance of the odds ratio', and why a RR for survival is not the same as a RR for risk, while the OR has this symmetrical property when analyzing either survival or adverse risk. The danger to clinical interpretation for the OR comes when the adverse event rate is not rare, thereby exaggerating differences when the OR rare-disease assumption is not met. On the other hand, when the disease is rare, using a RR for survival (e.g. the RR=0.9796 from above example) can clinically hide and conceal an important doubling of adverse risk associated with a drug or exposure.{{citation needed|date=June 2012}}
| |
| | |
| ==Alternative estimators of the odds ratio==
| |
| | |
| The sample odds ratio ''n''<sub>11</sub>''n''<sub>00</sub> / ''n''<sub>10</sub>''n''<sub>01</sub> is easy to calculate, and for moderate and large samples performs well as an estimator of the population odds ratio. When one or more of the cells in the contingency table can have a small value, the sample odds ratio can be [[bias (statistics)|biased]] and exhibit high [[variance]]. A number of alternative estimators of the odds ratio have been proposed to address this issue. One alternative estimator is the conditional maximum likelihood estimator, which conditions on the row and column margins when forming the likelihood to maximize (as in [[Fisher's exact test]]).<ref>{{cite book | last=Rothman | first=Kenneth J. | coauthors=Greenland, Sander; Lash, Timothy L. | title=Modern Epidemiology | year=2008 | publisher=Lippincott Williams & Wilkins | isbn=0-7817-5564-6}}</ref> Another alternative estimator is the Mantel-Haenszel estimator.
| |
| | |
| ==Numerical examples==
| |
| | |
| The following four contingency tables contain observed cell counts, along with the corresponding sample odds ratio (''OR'') and sample log odds ratio (''LOR''):
| |
| | |
| {| cellpadding="5" cellspacing="0" align="center"
| |
| |-
| |
| ! rowspan=2 |
| |
| ! style="background:#efefef;border-left:1px solid black;border-top:1px solid black;" colspan=2 | ''OR'' = 1, ''LOR'' = 0
| |
| ! style="background:#efefef;border-left:1px solid black;border-top:1px solid black;" colspan=2 | ''OR'' = 1, ''LOR'' = 0
| |
| ! style="background:#efefef;border-left:1px solid black;border-top:1px solid black;" colspan=2 | ''OR'' = 4, ''LOR'' = 1.39
| |
| ! style="background:#efefef;border-left:1px solid black;border-top:1px solid black;border-right:1px solid black;" colspan=2 | ''OR'' = 0.25, ''LOR'' = −1.39
| |
| |-
| |
| ! style="background:#ffdead;border-left:1px solid black;" | ''Y'' = 1
| |
| ! style="background:#ffdead;" | ''Y'' = 0
| |
| ! style="background:#ffdead;border-left:1px solid black;" | ''Y'' = 1
| |
| ! style="background:#ffdead;" | ''Y'' = 0
| |
| ! style="background:#ffdead;border-left:1px solid black;" | ''Y'' = 1
| |
| ! style="background:#ffdead;" | ''Y'' = 0
| |
| ! style="background:#ffdead;border-left:1px solid black;" | ''Y'' = 1
| |
| ! style="background:#ffdead;border-right:1px solid black;" | ''Y'' = 0
| |
| |-
| |
| ! style="background:#ffdead;border-left:1px solid black;border-top:1px solid black;" | ''X'' = 1
| |
| ! style="border-left:1px solid black;" | 10
| |
| ! 10
| |
| ! style="border-left:1px solid black;" | 100
| |
| ! 100
| |
| ! style="border-left:1px solid black;" | 20
| |
| ! 10
| |
| ! style="border-left:1px solid black;" | 10
| |
| ! style="border-right:1px solid black;" |20
| |
| |-
| |
| ! style="background:#ffdead;border-bottom:1px solid black;border-left:1px solid black;" | ''X'' = 0
| |
| ! style="border-bottom:1px solid black;border-left:1px solid black;" | 5
| |
| ! style="border-bottom:1px solid black;" | 5
| |
| ! style="border-left:1px solid black;border-bottom:1px solid black;" | 50
| |
| ! style="border-bottom:1px solid black;" | 50
| |
| ! style="border-left:1px solid black;border-bottom:1px solid black;" | 10
| |
| ! style="border-bottom:1px solid black;" | 20
| |
| ! style="border-left:1px solid black;border-bottom:1px solid black;" | 20
| |
| ! style="border-right:1px solid black;border-bottom:1px solid black;" | 10
| |
| |}
| |
| | |
| The following [[joint probability distribution]]s contain the population cell probabilities, along with the corresponding population odds ratio (''OR'') and population log odds ratio (''LOR''):
| |
| | |
| {| cellpadding="5" cellspacing="0" align="center"
| |
| |-
| |
| ! rowspan=2 |
| |
| ! style="background:#efefef;border-left:1px solid black;border-top:1px solid black;" colspan=2 | ''OR'' = 1, ''LOR'' = 0
| |
| ! style="background:#efefef;border-left:1px solid black;border-top:1px solid black;" colspan=2 | ''OR'' = 1, ''LOR'' = 0
| |
| ! style="background:#efefef;border-left:1px solid black;border-top:1px solid black;" colspan=2 | ''OR'' = 16, ''LOR'' = 2.77
| |
| ! style="background:#efefef;border-left:1px solid black;border-top:1px solid black;border-right:1px solid black;" colspan=2 | ''OR'' = 0.67, ''LOR'' = −0.41
| |
| |-
| |
| ! style="background:#ffdead;border-left:1px solid black;" | ''Y'' = 1
| |
| ! style="background:#ffdead;" | ''Y'' = 0
| |
| ! style="background:#ffdead;border-left:1px solid black;" | ''Y'' = 1
| |
| ! style="background:#ffdead;" | ''Y'' = 0
| |
| ! style="background:#ffdead;border-left:1px solid black;" | ''Y'' = 1
| |
| ! style="background:#ffdead;" | ''Y'' = 0
| |
| ! style="background:#ffdead;border-left:1px solid black;" | ''Y'' = 1
| |
| ! style="background:#ffdead;border-right:1px solid black;" | ''Y'' = 0
| |
| |-
| |
| ! style="background:#ffdead;border-left:1px solid black;border-top:1px solid black;" | ''X'' = 1
| |
| ! style="border-left:1px solid black;" | 0.2
| |
| ! 0.2
| |
| ! style="border-left:1px solid black;" | 0.4
| |
| ! 0.4
| |
| ! style="border-left:1px solid black;" | 0.4
| |
| ! 0.1
| |
| ! style="border-left:1px solid black;" | 0.1
| |
| ! style="border-right:1px solid black;" | 0.3
| |
| |-
| |
| ! style="background:#ffdead;border-bottom:1px solid black;border-left:1px solid black;" | ''X'' = 0
| |
| ! style="border-bottom:1px solid black;border-left:1px solid black;" | 0.3
| |
| ! style="border-bottom:1px solid black;" | 0.3
| |
| ! style="border-left:1px solid black;border-bottom:1px solid black;" | 0.1
| |
| ! style="border-bottom:1px solid black;" | 0.1
| |
| ! style="border-left:1px solid black;border-bottom:1px solid black;" | 0.1
| |
| ! style="border-bottom:1px solid black;" | 0.4
| |
| ! style="border-left:1px solid black;border-bottom:1px solid black;" | 0.2
| |
| ! style="border-right:1px solid black;border-bottom:1px solid black;" | 0.4
| |
| |}
| |
| | |
| ==Worked example==
| |
| {{ARR RRR worksheet}}
| |
| | |
| ==See also==
| |
| * [[Diagnostic odds ratio]]
| |
| * [[Forest plot]]
| |
| * [[Hazard ratio]]
| |
| | |
| ==References==
| |
| {{reflist|2}}
| |
| | |
| == External links ==
| |
| * [http://www.hutchon.net/ConfidOR.htm Odds Ratio Calculator – website]
| |
| * [http://statpages.org/ctab2x2.html Odds Ratio Calculator with various tests – website]
| |
| * [http://www.OpenEpi.com OpenEpi, a web-based program that calculates the odds ratio, both unmatched and pair-matched]
| |
| | |
| {{Medical research studies}}
| |
| {{Statistics}}
| |
| | |
| {{DEFAULTSORT:Odds Ratio}}
| |
| [[Category:Epidemiology]]
| |
| [[Category:Medical statistics]]
| |
| [[Category:Statistical terminology]]
| |
| [[Category:Bayesian statistics]]
| |