Vector meson: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>Addbot
m Bot: Migrating 3 interwiki links, now provided by Wikidata on d:q2748910
en>Jayjg
m clean up, fix grammar/shorter using AWB
 
Line 1: Line 1:
'''Cohen's kappa coefficient''' is a [[statistical]] measure of [[inter-rater agreement]] or ''inter-annotator agreement''<ref>Carletta, Jean. (1996) [http://acl.ldc.upenn.edu/J/J96/J96-2004.pdf Assessing agreement on classification tasks: The kappa statistic.] Computational Linguistics, 22(2), pp.&nbsp;249–254.</ref> for qualitative (categorical) items.  It is generally thought to be a more robust measure than simple percent agreement calculation since κ takes into account the agreement occurring by chance. Some researchers<ref name=SMPJ>{{Cite journal | doi = 10.1016/j.compedu.2005.04.002 | last1 = Strijbos | first1 = J. | last2 = Martens | first2 = R. | last3 = Prins | first3 = F. | last4 = Jochems | first4 = W. | year = 2006 | title = Content analysis: What are they talking about? | url = | journal = Computers & Education | volume = 46 | issue = | pages = 29–48 }}</ref>{{Citation needed|date=April 2012}} have expressed concern over κ's tendency to take the observed categories' frequencies as givens, which can have the effect of underestimating agreement for a category that is also commonly used; for this reason, κ is considered an overly conservative measure of agreement.
Hi there! :) My name is Arnoldo, I'm a student studying Art from Schierling, Germany.<br><br>Look into my web page; FIFA coin generator ([http://www.kiss-esca.com/userinfo.php?uid=553431 visit the up coming document])
 
Others<ref>{{cite journal | doi = 10.1037/0033-2909.101.1.140 | last1 = Uebersax | first1 = JS. | author-separator =, | author-name-separator= | year = 1987 | title = Diversity of decision-making models and the measurement of interrater agreement | url = http://www.na-mic.org/Wiki/images/d/df/Kapp_and_decision_making_models.pdf | format = PDF | journal = Psychological Bulletin | volume = 101 | issue = | pages = 140–146 }}</ref>{{Citation needed|date=April 2012}} contest the assertion that kappa "takes into account" chance agreement.  To do this effectively would require an explicit model of how chance affects rater decisions.  The so-called chance adjustment of kappa statistics supposes that, when not completely certain, raters simply guess—a very unrealistic scenario.
 
==Calculation==
 
Cohen's kappa measures the agreement between two raters who each classify ''N'' items into ''C'' mutually exclusive categories. The first mention of a kappa-like statistic is attributed to Galton (1892),<ref>Galton, F. (1892). ''Finger Prints'' Macmillan, London.</ref> see Smeeton (1985).<ref>{{Cite journal | last1 = Smeeton | first1 = N.C. | year = 1985 | title = Early History of the Kappa Statistic | url = | journal = Biometrics | volume = 41 | issue = | page = 795 }}</ref>
 
The equation for κ is:
 
:<math>\kappa = \frac{\Pr(a) - \Pr(e)}{1 - \Pr(e)}, \!</math>
 
where Pr(''a'') is the relative observed agreement among raters, and Pr(''e'') is the hypothetical probability of chance agreement, using the observed data to calculate the probabilities of each observer randomly saying each category.  If the raters are in complete agreement then κ = 1.  If there is no agreement among the raters other than what would be expected by chance (as defined by Pr(''e'')), κ = 0.
 
The seminal paper introducing kappa as a new technique was published by [[Jacob Cohen (statistician)|Jacob Cohen]] in the journal ''Educational and Psychological Measurement'' in 1960.<ref>
Cohen, Jacob (1960). "A coefficient of agreement for nominal scales". Educational and Psychological Measurement 20 (1): 37–46. doi:10.1177/001316446002000104</ref>
 
A similar statistic, called [[Scott's Pi|pi]], was proposed by Scott (1955).  Cohen's kappa and [[Scott's Pi|Scott's pi]] differ in terms of how Pr(''e'') is calculated.
 
Note that Cohen's kappa measures agreement between '''two''' raters only.  For a similar measure of agreement ([[Fleiss' kappa]]) used when there are more than two raters, see [[Joseph L. Fleiss|Fleiss]] (1971).  The Fleiss kappa, however, is a multi-rater generalization of [[Scott's Pi|Scott's pi]] statistic, not Cohen's kappa.
 
==Example==
Suppose that you were analyzing data related to a group of 50 people applying for a grant.  Each grant proposal was read by two readers and each reader either said "Yes" or "No" to the proposal.  Suppose the data were as follows, where rows are reader A and columns are reader B:
{| class="wikitable" border="1"
|
|
| B
| B
|-
|
|
| Yes
| No
|-
| A
| Yes
| 20
| 5
|-
| A
| No
| 10
| 15
|}
 
Note that there were 20 proposals that were granted by both reader A and reader B, and 15 proposals that were rejected by both readers. Thus, the observed percentage agreement is {{nowrap|Pr(''a'') {{=}} (20 + 15) / 50 {{=}} 0.70}}
 
To calculate Pr(''e'') (the probability of random agreement) we note that:
* Reader A said "Yes" to 25 applicants and "No" to 25 applicants.  Thus reader A said "Yes" 50% of the time.
* Reader B said "Yes" to 30 applicants and "No" to 20 applicants.  Thus reader B said "Yes" 60% of the time.
 
Therefore the probability that both of them would say "Yes" randomly is {{nowrap|0.50 · 0.60 {{=}} 0.30}} and the probability that both of them would say "No" is {{nowrap|0.50 · 0.40 {{=}} 0.20.}}  Thus the overall probability of random agreement is {{nowrap|Pr(''e'') {{=}} 0.3 + 0.2 {{=}} 0.5.}}
 
So now applying our formula for Cohen's Kappa we get:
:<math>\kappa = \frac{\Pr(a) - \Pr(e)}{1 - \Pr(e)} = \frac{0.70-0.50}{1-0.50} =0.40 \!</math>
 
==Same percentages but different numbers==
 
A case sometimes considered to be a problem with Cohen's Kappa occurs when comparing the Kappa calculated for two pairs of raters with the two raters in each pair having the same percentage agreement but one pair give a similar number of ratings while the other pair give a very different number of ratings.<ref name="Gwet2002">{{Cite journal
| author = Kilem Gwet
| title = Inter-Rater Reliability: Dependency on Trait Prevalence and Marginal Homogeneity
| journal = Statistical Methods for Inter-Rater Reliability Assessment
| volume = 2
| pages = 1–10
|date=May 2002
| url=http://agreestat.com/research_papers/inter_rater_reliability_dependency.pdf}}</ref>  For instance, in the following two cases there is equal agreement between A and B (60 out of 100 in both cases) so we would expect the relative values of Cohen's Kappa to reflect this.  However, calculating Cohen's Kappa for each:
 
{| class="wikitable" border="1"
|-
|
| Yes
| No
|-
| Yes
| 45
| 15
|-
| No
| 25
| 15
|}
 
: <math>\kappa = \frac{0.60-0.54}{1-0.54} = 0.1304</math>
 
{| class="wikitable" border="1"
|-
|
| Yes
| No
|-
| Yes
| 25
| 35
|-
| No
| 5
| 35
|}
 
: <math>\kappa = \frac{0.60-0.46}{1-0.46} = 0.2593</math>
 
we find that it shows greater similarity between A and B in the second case, compared to the first.
 
==Significance and magnitude==
 
''[[Statistical significance]]'' makes no claim on how important is the magnitude in a given application or what is considered as high or low agreement.
 
Statistical significance for kappa is rarely reported, probably because even relatively low values of kappa can nonetheless be significantly different from zero but not of sufficient magnitude to satisfy investigators.<ref name=BakemanGottman1997/>{{rp|66}}
Still, its standard error has been described<ref name=FleissCohenEv1969>{{cite journal
|doi=10.1037/h0028106
|last=Fleiss|first=J.L. |year=1969
|coauthors=Cohen, J., & Everitt, B.S.
|title=Large sample standard errors of kappa and weighted kappa
|journal=Psychological Bulletin |volume=72 |pages=323–327}}</ref>
and is computed by various computer programs.<ref name=BakemanRobinson1998>{{cite journal
|doi=10.3758/BF03209495
|last=Robinson|first=B.F |coauthors=& Bakeman, R. |year=1998
|title=ComKappa: A Windows 95 program for calculating kappa and related statistics
|journal=Behavior Research Methods, Instruments, and Computers
|volume=30 |pages=731–732}}</ref>
 
If statistical significance is not a useful guide, what magnitude of kappa reflects adequate agreement?  Guidelines would be helpful, but factors other than agreement can influence its magnitude, which makes interpretation of a given magnitude problematic.  As Sim and Wright noted, two important factors are prevalence (are the codes equiprobable or do their probabilities vary) and bias (are the marginal probabilities for the two observers similar or different).  Other things being equal, kappas are higher when codes are equiprobable. On the other hand Kappas are higher when codes are distributed assymetrically by the two observers. In contrast to probability variations, the effect of bias is greater when Kappa is small than when it is large.<ref name=SimWright2005>{{cite journal
|last=Sim|first=J|coauthors= & Wright, C. C|year=2005
|title=The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements
|journal= Physical Therapy|volume=85|pages=257–268|pmid= 15733050 }}</ref>{{rp|261–262}}
 
Another factor is the number of codes.  As number of codes increases, kappas become higher.  Based on a simulation study, Bakeman and colleagues concluded that for fallible observers, values for kappa were lower when codes were fewer.  And, in agreement with Sim & Wrights's statement concerning prevalence, kappas were higher when codes were roughly equiprobable.  Thus Bakeman et al. concluded that "no one value of kappa can be regarded as universally acceptable."<ref name=BakemanEtAl1997>{{cite journal
|doi=10.1037/1082-989X.2.4.357
|last=Bakeman|first=R. |coauthors=Quera, V., McArthur, D., & Robinson, B. F.
|year=1997 |title=Detecting sequential patterns and determining their reliability with fallible observers
|journal=Psychological Methods|volume=2|pages=357–370}}</ref>{{rp|357}} They also provide a computer program that lets users compute values for kappa specifying number of codes, their probability, and observer accuracy.  For example, given equiprobable codes and observers who are 85% accurate, value of kappa are 0.49, 0.60, 0.66, and 0.69 when number of codes is 2, 3, 5, and 10, respectively.
 
Nonetheless, magnitude guidelines have appeared in the literature. Perhaps the first was Landis and Koch,<ref name=LandisKoch1977>{{cite journal
|doi=10.2307/2529310
|last=Landis|first=J.R.|coauthors=& Koch, G.G.|year=1977
|title=The measurement of observer agreement for categorical data
|jstor=2529310
|journal=Biometrics|volume=33
|issue=1|pages=159–174|pmid=843571}}</ref>
who characterized values&nbsp;<&nbsp;0 as indicating no agreement and 0–0.20 as slight, 0.21–0.40 as fair, 0.41–0.60 as moderate, 0.61–0.80 as substantial, and 0.81–1 as almost perfect agreement.  This set of guidelines is however by no means universally accepted; Landis and Koch supplied no evidence to support it, basing it instead on personal opinion. It has been noted that these guidelines may be more harmful than helpful.<ref>Gwet, K. (2010). "[http://www.agreestat.com/ Handbook of Inter-Rater Reliability (Second Edition)]" ISBN 978-0-9708062-2-2 {{Page needed|date=April 2012}}</ref>Fleiss's<ref name=Fleiss1981>{{cite book
|last=Fleiss |first=J.L. |year=1981
|title=Statistical methods for rates and proportions
|edition=2nd
|location=New York |publisher=John Wiley
|isbn=0-471-26370-2}}</ref>{{rp|218}}
equally arbitrary guidelines characterize kappas over 0.75 as excellent, 0.40 to 0.75 as fair to good, and below 0.40 as poor.
 
==Weighted kappa==
 
Weighted kappa lets you count disagreements differently<ref name=Cohen1968>{{cite journal
|doi=10.1037/h0026256
|last=Cohen |first=J. |year=1968
|title=Weighed kappa: Nominal scale agreement with provision for scaled disagreement or partial credit
|journal=Psychological Bulletin |volume=70
|issue=4 |pages=213–220 |pmid=19673146}}</ref>
and is especially useful when codes are ordered<ref name=BakemanGottman1997>{{cite book
|last=Bakeman |first=R. | coauthors=& Gottman, J.M. |year=1997
|title=Observing interaction: An introduction to sequential analysis
|edition=2nd
|location=Cambridge, UK |publisher=Cambridge University Press
|isbn=0-521-27593-8}}</ref>{{rp|66}}.
Three matrices are involved, the matrix of observed scores, the matrix of expected scores based on chance agreement, and the weight matrix. Weight matrix cells located on the diagonal (upper-left to bottom-right) represent agreement and thus contain zeros. Off-diagonal cells contain weights indicating the seriousness of that disagreement. Often, cells one off the diagonal are weighted 1, those two off 2, etc.
 
The equation for weighted κ is:
: <math>\kappa = 1- \frac{\sum_{i=1}^{k} \sum_{j=1}^{k}w_{ij}x_{ij}} {\sum_{i=1}^{k} \sum_{j=1}^{k}w_{ij}m_{ij}} </math>
 
where ''k''=number of codes and <math>w_{ij}</math>, <math>x_{ij}</math>, and <math>m_{ij}</math> are elements in the weight, observed, and expected matrices, respectively. When diagonal cells contain weights of 0 and all off-diagonal cells weights of 1, this formula produces the same value of kappa as the calculation given above.
 
==Kappa maximum==
 
Kappa assumes its theoretical maximum value of 1 only when both observers distribute codes the same, that is, when corresponding row and column sums are identical. Anything less is less than perfect agreement. Still, the maximum value kappa could achieve given unequal distributions helps interpret the value of kappa actually obtained. The equation for κ maximum is:<ref name=Umesh989>{{cite journal
|doi=10.1177/001316448904900407
|last=Umesh |first=U. N. |coauthors=Peterson, R.A., & Sauber. M. H.
|title=Interjudge agreement and the maximum value of kappa.
|year=1989
|journal=Educational and Psychological Measurement
|volume=49 |pages=835–850}}</ref>
 
: <math>\kappa_{\max} =\frac{P_{\max} - P_{\exp}}{1-P_{\exp}}</math>
 
where <math>P_{\exp} = \sum_{i=1}^k P_{i+}P_{+i}</math>, as usual,
<math>P_{\max} = \sum_{i=1}^k \min(P_{i+},P_{+i})</math>,
 
''k''&nbsp;=&nbsp;number of codes, <math>P_{i+}</math> are the row probabilities, and <math>P_{+i}</math> are the column probabilities.
 
==See also==
* [[Fleiss' kappa]]
* [[Intraclass correlation]]
 
==References==
{{Reflist|30em}}
 
==Further reading==
* {{Cite journal | last1 = Banerjee | first1 = M. | author2 = Capozzoli, Michelle | year = 1999 | author3 = McSweeney, Laura | author4 = Sinha, Debajyoti | title = Beyond Kappa: A Review of Interrater Agreement Measures | jstor = 3315487 | journal = The Canadian Journal of Statistics / La Revue Canadienne de Statistique | volume = 27 | issue = 1| pages = 3–23 }}
* {{Cite journal | doi = 10.1177/001316448104100307 | last1 = Brennan | first1 = R. L. | last2 = Prediger | first2 = D. J. | year = 1981 | title = Coefficient λ: Some Uses, Misuses, and Alternatives | url = | journal = Educational and Psychological Measurement | volume = 41 | issue = | pages = 687–699 }}
* {{Cite journal | doi = 10.1177/001316446002000104 | last1 = Cohen | first1 = Jacob | year = 1960 | title = A coefficient of agreement for nominal scales | url = | journal = Educational and Psychological Measurement | volume = 20 | issue = 1| pages = 37–46 }}
* {{Cite journal | doi = 10.1037/h0026256 | last1 = Cohen | first1 = J. | year = 1968 | title = Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit | url = | journal = Psychological Bulletin | volume = 70 | issue = 4| pages = 213–220 | pmid = 19673146 }}
* {{Cite journal | doi = 10.1037/h0031619 | last1 = Fleiss | first1 = J.L. | year = 1971 | title = Measuring nominal scale agreement among many raters | url = | journal = Psychological Bulletin | volume = 76 | issue = 5| pages = 378–382 }}
* Fleiss, J. L. (1981) ''Statistical methods for rates and proportions''. 2nd ed. (New York: John Wiley) pp.&nbsp;38–46
* {{Cite journal | doi = 10.1177/001316447303300309 | last1 = Fleiss | first1 = J.L. | last2 = Cohen | first2 = J. | year = 1973 | title = The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability | url = | journal = Educational and Psychological Measurement | volume = 33 | issue = | pages = 613–619 }}
* {{Cite journal | doi = 10.1348/000711006X126600 | last1 = Gwet | first1 = K. | year = 2008 | title = Computing inter-rater reliability and its variance in the presence of high agreement | url = http://www.agreestat.com/research_papers/bjmsp2008_interrater.pdf | journal = British Journal of Mathematical and Statistical Psychology | volume = 61 | issue = Pt 1| pages = 29–48 | pmid = 18482474 }}
* {{Cite journal | doi = 10.1007/s11336-007-9054-8 | last1 = Gwet | first1 = K. | year = 2008 | title = Variance Estimation of Nominal-Scale Inter-Rater Reliability with Random Selection of Raters | url = http://www.agreestat.com/research_papers/psychometrika2008_irr_random_raters.pdf | journal = Psychometrika | volume = 73 | issue = 3| pages = 407–430 }}
* Gwet, K. (2008). "[http://www.agreestat.com/research_papers/wiley_encyclopedia2008_eoct631.pdf Intrarater Reliability]." ''Wiley Encyclopedia of Clinical Trials, Copyright 2008 John Wiley & Sons, Inc.''
* {{Cite journal | last1 = Scott | first1 = W. | year = 1955 | title = Reliability of content analysis: The case of nominal scale coding | url = | journal = Public Opinion Quarterly | volume = 17 | issue = | pages = 321–325 }}
* {{Cite journal | last1 = Sim | first1 = J. | last2 = Wright | first2 = C. C. | year = 2005 | title = The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements | url = | journal = Physical Therapy | volume = 85 | issue = 3| pages = 257–268 | pmid = 15733050 }}
 
==External links==
*[http://dl.dropbox.com/u/27743223/201209-eacl2012-Kappa.pdf The Problem with Kappa]
*[http://www.agreestat.com/research_papers.html Kappa, its meaning, problems, and several alternatives]
*[http://www.john-uebersax.com/stat/kappa.htm#procon Kappa Statistics:  Pros and Cons]
*[http://www.gsu.edu/~psyrab/ComKappa2.zip Windows program for kappa, weighted kappa, and kappa maximum]
*[http://akcora.wordpress.com/2011/05/30/weighted-kappa-example-in-php/ Java and PHP implementation of weighted Kappa]
 
===Online calculators===
* [http://www.glue.umd.edu/~dchoy/thesis/Kappa/ Cohen's Kappa for Maps]
* [http://justus.randolph.name/kappa Online (Multirater) Kappa Calculator]
* [https://mlnl.net/jg/software/ira/ Online Kappa Calculator (multiple raters and variables)]
* [http://faculty.vassar.edu/lowry/kappa.html Vassar College's Kappa Calculator]
* [http://www.niwa.co.nz/online-services/statistical-calculators/kappa NIWA's Cohen's Kappa Calculator]
 
{{Statistics|analysis}}
 
{{DEFAULTSORT:Cohen's Kappa}}
[[Category:Categorical data]]
[[Category:Non-parametric statistics]]
[[Category:Inter-rater reliability]]

Latest revision as of 22:05, 5 October 2014

Hi there! :) My name is Arnoldo, I'm a student studying Art from Schierling, Germany.

Look into my web page; FIFA coin generator (visit the up coming document)