Alternating group: Difference between revisions
en>Razimantv |
en>Rjwilmsi m →References: Journal cites, added 1 DOI using AWB (9887) |
||
Line 1: | Line 1: | ||
{{about|the form of [[Bayes' theorem]]|the decision rule|Bayes estimator|the use of Bayes factor in model selection|Bayes factor}} | |||
{{Bayesian statistics}} | |||
In [[probability theory]] and applications, '''Bayes' rule''' relates the [[odds]] of event <math>A_1</math> to event <math>A_2</math>, before (prior to) and after (posterior to) [[Conditional probability|conditioning]] on another event <math>B</math>. The odds on <math>A_1</math> to event <math>A_2</math> is simply the ratio of the probabilities of the two events. The prior odds is the ratio of the unconditional or prior probabilities, the posterior odds is the ratio of conditional or posterior probabilities given the event <math>B</math>. The relationship is expressed in terms of the '''likelihood ratio''' or '''Bayes factor''', <math>\Lambda</math>. By definition, this is the ratio of the conditional probabilities of the event <math>B</math> given that <math>A_1</math> is the case or that <math>A_2</math> is the case, respectively. The rule simply states: '''posterior odds equals prior odds times Bayes factor''' (Gelman et al., 2005, Chapter 1). | |||
When arbitrarily many events <math>A</math> are of interest, not just two, the rule can be rephrased as '''posterior is proportional to prior times likelihood''', <math>P(A|B)\propto P(A) P(B|A)</math> where the proportionality symbol means that the left hand side is proportional to (i.e., equals a constant times) the right hand side as <math>A</math> varies, for fixed or given <math>B</math> (Lee, 2012; Bertsch McGrayne, 2012). In this form it goes back to Laplace (1774) and to Cournot (1843); see Fienberg (2005). | |||
Bayes' rule is an equivalent way to formulate [[Bayes' theorem]]. If we know the odds for and against <math>A</math> we also know the probabilities of <math>A</math>. It may be preferred to Bayes' theorem in practice for a number of reasons. | |||
Bayes' rule is widely used in [[statistics]], [[science]] and [[engineering]], for instance in [[Bayesian model selection|model selection]], probabilistic [[expert systems]] based on [[Bayes networks]], [[statistical proof]] in legal proceedings, email spam filters, and so on (Rosenthal, 2005; Bertsch McGrayne, 2012). As an elementary fact from the calculus of probability, Bayes' rule tells us how unconditional and conditional probabilities are related whether we work with a [[frequentist interpretation of probability]] or a [[Bayesian probability|Bayesian interpretation of probability]]. Under the Bayesian interpretation it is frequently applied in the situation where <math>A_1</math> and <math>A_2</math> are competing hypotheses, and <math>B</math> is some observed evidence. The rule shows how one's judgement on whether <math>A_1</math> or <math>A_2</math> is true should be updated on observing the evidence <math>B</math> (Gelman et al., 2003). | |||
==The rule== | |||
== | ===Single event=== | ||
Given events <math>A_1</math>, <math>A_2</math> and <math>B</math>, Bayes' rule states that the conditional odds of <math>A_1:A_2</math> given <math>B</math> are equal to the marginal odds of <math>A_1:A_2</math> multiplied by the [[Bayes factor]] or [[likelihood ratio]] <math>\Lambda</math>: | |||
:<math>O(A_1:A_2|B) = \Lambda(A_1:A_2|B) \cdot O(A_1:A_2) ,</math> | |||
where | |||
:<math>\Lambda(A_1:A_2|B) = \frac{P(B|A_1)}{P(B|A_2)}.</math> | |||
Here, the odds and conditional odds, also known as prior odds and posterior odds, are defined by | |||
:<math>O(A_1:A_2) = \frac{P(A_1)}{P(A_2)},</math> | |||
:<math>O(A_1:A_2|B) = \frac{P(A_1|B)}{P(A_2|B)}.</math> | |||
In the special case that <math>A_1 = A</math> and <math>A_2 = \neg A</math>, one writes <math>O(A)=O(A:\neg A)</math>, and uses a similar abbreviation for the Bayes factor and for the conditional odds. The odds on <math>A</math> is by definition the odds for and against <math>A</math>. Bayes' rule can then be written in the abbreviated form | |||
:<math>O(A|B) = O(A) \cdot \Lambda(A|B) ,</math> | |||
or in words: the posterior odds on <math>A</math> equals the prior odds on <math>A</math> times the likelihood ratio for <math>A</math> given information <math>B</math>. In short, '''posterior odds equals prior odds times likelihood ratio'''. | |||
The rule is frequently applied when <math>A_1 = A</math> and <math>A_2 = \neg A</math> are two competing hypotheses concerning the cause of some event <math>B</math>. The prior odds on <math>A</math>, in other words, the odds between <math> A</math> and <math> \neg A</math>, expresses our initial beliefs concerning whether or not <math> A</math> is true. The event <math>B</math> represents some evidence, information, data, or observations. The likelihood ratio is the ratio of the chances of observing <math>B</math> under the two hypotheses <math>A</math> and <math>\neg A</math>. The rule tells us how our prior beliefs concerning whether or not <math> A</math> is true needs to be updated on receiving the information <math>B</math>. | |||
===Many events=== | |||
If we think of <math>A</math> as arbitrary and <math>B</math> as fixed then we can rewrite Bayes' theorem <math>P(A|B)=P(A)P(B|A)/P(B)</math> in the form <math>P(A|B) \propto P(A)P(B|A)</math> where the proportionality symbol means that, as <math>A</math> varies but keeping <math>B</math> fixed, the left hand side is equal to a constant times the right hand side. | |||
In words '''posterior is proportional to prior times likelihood'''. This version of Bayes' theorem was first called "Bayes' rule" by Cournot (1843). Cournot popularized the earlier work of Laplace (1774) who had independently discovered Bayes' rule. The work of Bayes was published posthumously (1763) but remained more or less unknown till Cournot drew attention to it; see Fienberg (2006). | |||
Bayes' rule may be preferred to the usual statement of Bayes' theorem for a number of reasons. One is that it is intuitively simpler to understand. Another reason is that normalizing probabilities is sometimes unnecessary: one sometimes only needs to know ratios of probabilities. Finally, doing the normalization is often easier to do after simplifying the product of prior and likelihood by deleting any factors which do not depend on <math>A</math>, so we do not need to actually compute the denominator <math>P(B)</math> in the usual statement of Bayes' theorem <math>P(A|B)=P(A)P(B|A)/P(B)</math>. | |||
In [[Bayesian statistics]], Bayes' rule is often applied with a so-called [[improper prior]], for instance, a uniform probability distribution over all real numbers. In that case, the prior distribution does not exist as a probability measure within conventional probability theory, and Bayes' theorem itself is not available. | |||
===Series of events=== | |||
Bayes' rule may be applied a number of times. Each time we observe a new event, we update the odds between the events of interest, say <math>A_1</math> and <math>A_2</math> by taking account of the new information. For two events (information, evidence) <math>B</math> and <math>C</math>, | |||
:<math> O(A_1:A_2|B \cap C) = \Lambda(A_1:A_2|B \cap C) \cdot \Lambda(A_1:A_2|B) \cdot O(A_1:A_2) ,</math> | |||
where | |||
:<math>\Lambda(A_1:A_2|B) = \frac{P(B|A_1)}{P(B|A_2)} ,</math> | |||
:<math>\Lambda(A_1:A_2|B \cap C) = \frac{P(C|A_1 \cap B)}{P(C|A_2 \cap B)} .</math> | |||
In the special case of two complementary events <math>A</math> and <math>\neg A</math>, the equivalent notation is | |||
:<math> O(A|B,C) = \Lambda(A|B \cap C) \cdot \Lambda(B|A) \cdot O(A).</math> | |||
==Derivation== | |||
Consider two instances of [[Bayes' theorem]]: | |||
:<math>P(A_1|B) = \frac{1}{P(B)} \cdot P(B|A_1) \cdot P(A_1),</math> | |||
:<math>P(A_2|B) = \frac{1}{P(B)} \cdot P(B|A_2) \cdot P(A_2).</math> | |||
Combining these gives | |||
:<math>\frac{P(A_1|B)}{P(A_2|B)} = \frac{P(B|A_1)}{P(B|A_2)} \cdot \frac{P(A_1)}{P(A_2)}.</math> | |||
Now defining | |||
:<math>O(A_1:A_2|B) \triangleq \frac{P(A_1|B)}{P(A_2|B)}</math> | |||
:<math>O(A_1:A_2) \triangleq \frac{P(A_1)}{P(A_2)}</math> | |||
:<math>\Lambda(A_1:A_2|B) \triangleq \frac{P(B|A_1)}{P(B|A_2)},</math> | |||
this implies | |||
:<math>O(A_1:A_2|B) = \Lambda(A_1:A_2|B) \cdot O(A_1:A_2).</math> | |||
A similar derivation applies for conditioning on multiple events, using the appropriate [[Bayes' theorem#Further extensions|extension of Bayes' theorem]] | |||
==Examples== | |||
===Frequentist example=== <!-- Currently references Bayes' theorem for convenience - should later be replaced with complete example --> | |||
Consider the [[Bayes' theorem#Drug testing|drug testing example]] in the article on Bayes' theorem. | |||
The same results may be obtained using Bayes' rule. The prior odds on an individual being a drug-user are 199 to 1 against, as <math>\textstyle 0.5%=\frac{1}{200}</math> and <math>\textstyle 99.5%=\frac{199}{200}</math>. The [[Bayes factor]] when an individual tests positive is <math>\textstyle \frac{0.99}{0.01} = 99:1</math> in favour of being a drug-user: this is the ratio of the probability of a drug-user testing positive, to the probability of a non-drug user testing positive. The posterior odds on being a drug user are therefore <math>\textstyle 1 \times 99 : 199 \times 1 = 99:199</math>, which is very close to <math>\textstyle 100:200 = 1:2</math>. In round numbers, only one in three of those testing positive are actually drug-users. | |||
===Model selection=== | |||
{{main|Bayesian model selection}} | |||
== External links == | |||
* Sharon Bertsch McGrayne (2012), "The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy", Yale University Press. | |||
* Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin (2003), "Bayesian Data Analysis", Second Edition, CRC Press. | |||
* Stephen E. Fienberg (2006), "When did Bayesian inference become "Bayesian"?"", ''Bayesian analysis'' vol. 1, nr. 1, pp. 1-40. | |||
* Peter M. Lee (2012), "Bayesian Statistics: An Introduction", Wiley. | |||
* [http://www.inference.phy.cam.ac.uk/mackay/itila/ The on-line textbook: Information Theory, Inference, and Learning Algorithms], by [[David J.C. MacKay]], discusses Bayesian model comparison in Chapters 3 and 28. | |||
* <span class="citation" id=refRosenthal2005b>Rosenthal, Jeffrey S. (2005): ''Struck by Lightning: the Curious World of Probabilities''. Harper Collings 2005, ISBN 978-0-00-200791-7.</span> | |||
* Stone, JV (2013). Chapter 1 of book [http://jim-stone.staff.shef.ac.uk/BookBayes2012/BayesRuleBookMain.html "Bayes’ Rule: A Tutorial Introduction"], University of Sheffield, Psychology. | |||
* Pierre Bessière, Emmanuel Mazer, Juan-Manuel Ahuactzin and Kamel Mekhnacha (2013), "[http://www.crcpress.com/product/isbn/9781439880326 Bayesian Programming]", CRC Press | |||
[[Category:Bayesian inference|Rule]] | |||
[[Category:Model selection]] | |||
[[Category:Statistical ratios]] | |||
[[ar:عامل بايز]] | |||
[[ja:ベイズ因子]] |
Revision as of 15:32, 25 January 2014
29 yr old Orthopaedic Surgeon Grippo from Saint-Paul, spends time with interests including model railways, top property developers in singapore developers in singapore and dolls. Finished a cruise ship experience that included passing by Runic Stones and Church.
Template:Bayesian statistics In probability theory and applications, Bayes' rule relates the odds of event to event , before (prior to) and after (posterior to) conditioning on another event . The odds on to event is simply the ratio of the probabilities of the two events. The prior odds is the ratio of the unconditional or prior probabilities, the posterior odds is the ratio of conditional or posterior probabilities given the event . The relationship is expressed in terms of the likelihood ratio or Bayes factor, . By definition, this is the ratio of the conditional probabilities of the event given that is the case or that is the case, respectively. The rule simply states: posterior odds equals prior odds times Bayes factor (Gelman et al., 2005, Chapter 1).
When arbitrarily many events are of interest, not just two, the rule can be rephrased as posterior is proportional to prior times likelihood, where the proportionality symbol means that the left hand side is proportional to (i.e., equals a constant times) the right hand side as varies, for fixed or given (Lee, 2012; Bertsch McGrayne, 2012). In this form it goes back to Laplace (1774) and to Cournot (1843); see Fienberg (2005).
Bayes' rule is an equivalent way to formulate Bayes' theorem. If we know the odds for and against we also know the probabilities of . It may be preferred to Bayes' theorem in practice for a number of reasons.
Bayes' rule is widely used in statistics, science and engineering, for instance in model selection, probabilistic expert systems based on Bayes networks, statistical proof in legal proceedings, email spam filters, and so on (Rosenthal, 2005; Bertsch McGrayne, 2012). As an elementary fact from the calculus of probability, Bayes' rule tells us how unconditional and conditional probabilities are related whether we work with a frequentist interpretation of probability or a Bayesian interpretation of probability. Under the Bayesian interpretation it is frequently applied in the situation where and are competing hypotheses, and is some observed evidence. The rule shows how one's judgement on whether or is true should be updated on observing the evidence (Gelman et al., 2003).
The rule
Single event
Given events , and , Bayes' rule states that the conditional odds of given are equal to the marginal odds of multiplied by the Bayes factor or likelihood ratio :
where
Here, the odds and conditional odds, also known as prior odds and posterior odds, are defined by
In the special case that and , one writes , and uses a similar abbreviation for the Bayes factor and for the conditional odds. The odds on is by definition the odds for and against . Bayes' rule can then be written in the abbreviated form
or in words: the posterior odds on equals the prior odds on times the likelihood ratio for given information . In short, posterior odds equals prior odds times likelihood ratio.
The rule is frequently applied when and are two competing hypotheses concerning the cause of some event . The prior odds on , in other words, the odds between and , expresses our initial beliefs concerning whether or not is true. The event represents some evidence, information, data, or observations. The likelihood ratio is the ratio of the chances of observing under the two hypotheses and . The rule tells us how our prior beliefs concerning whether or not is true needs to be updated on receiving the information .
Many events
If we think of as arbitrary and as fixed then we can rewrite Bayes' theorem in the form where the proportionality symbol means that, as varies but keeping fixed, the left hand side is equal to a constant times the right hand side.
In words posterior is proportional to prior times likelihood. This version of Bayes' theorem was first called "Bayes' rule" by Cournot (1843). Cournot popularized the earlier work of Laplace (1774) who had independently discovered Bayes' rule. The work of Bayes was published posthumously (1763) but remained more or less unknown till Cournot drew attention to it; see Fienberg (2006).
Bayes' rule may be preferred to the usual statement of Bayes' theorem for a number of reasons. One is that it is intuitively simpler to understand. Another reason is that normalizing probabilities is sometimes unnecessary: one sometimes only needs to know ratios of probabilities. Finally, doing the normalization is often easier to do after simplifying the product of prior and likelihood by deleting any factors which do not depend on , so we do not need to actually compute the denominator in the usual statement of Bayes' theorem .
In Bayesian statistics, Bayes' rule is often applied with a so-called improper prior, for instance, a uniform probability distribution over all real numbers. In that case, the prior distribution does not exist as a probability measure within conventional probability theory, and Bayes' theorem itself is not available.
Series of events
Bayes' rule may be applied a number of times. Each time we observe a new event, we update the odds between the events of interest, say and by taking account of the new information. For two events (information, evidence) and ,
where
In the special case of two complementary events and , the equivalent notation is
Derivation
Consider two instances of Bayes' theorem:
Combining these gives
Now defining
this implies
A similar derivation applies for conditioning on multiple events, using the appropriate extension of Bayes' theorem
Examples
Frequentist example
Consider the drug testing example in the article on Bayes' theorem.
The same results may be obtained using Bayes' rule. The prior odds on an individual being a drug-user are 199 to 1 against, as and . The Bayes factor when an individual tests positive is in favour of being a drug-user: this is the ratio of the probability of a drug-user testing positive, to the probability of a non-drug user testing positive. The posterior odds on being a drug user are therefore , which is very close to . In round numbers, only one in three of those testing positive are actually drug-users.
Model selection
Mining Engineer (Excluding Oil ) Truman from Alma, loves to spend time knotting, largest property developers in singapore developers in singapore and stamp collecting. Recently had a family visit to Urnes Stave Church.
External links
- Sharon Bertsch McGrayne (2012), "The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy", Yale University Press.
- Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin (2003), "Bayesian Data Analysis", Second Edition, CRC Press.
- Stephen E. Fienberg (2006), "When did Bayesian inference become "Bayesian"?"", Bayesian analysis vol. 1, nr. 1, pp. 1-40.
- Peter M. Lee (2012), "Bayesian Statistics: An Introduction", Wiley.
- The on-line textbook: Information Theory, Inference, and Learning Algorithms, by David J.C. MacKay, discusses Bayesian model comparison in Chapters 3 and 28.
- Rosenthal, Jeffrey S. (2005): Struck by Lightning: the Curious World of Probabilities. Harper Collings 2005, ISBN 978-0-00-200791-7.
- Stone, JV (2013). Chapter 1 of book "Bayes’ Rule: A Tutorial Introduction", University of Sheffield, Psychology.
- Pierre Bessière, Emmanuel Mazer, Juan-Manuel Ahuactzin and Kamel Mekhnacha (2013), "Bayesian Programming", CRC Press