Quotient category: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
No edit summary
Line 1: Line 1:
In [[statistics | statistical]] analysis of [[Binary classification]], the '''F<sub>1</sub> score''' (also '''F-score''' or '''F-measure''') is a measure of a test's accuracy. It considers both the [[Precision (information retrieval)|precision]] ''p'' and the [[Recall (information retrieval)|recall]] ''r'' of the test to compute the score: ''p'' is the number of correct results divided by the number of all returned results and ''r'' is the number of correct results divided by the number of results that should have been returned.  The F<sub>1</sub> score can be interpreted as a weighted average of the [[precision and recall]], where an F<sub>1</sub> score reaches its best value at 1 and worst score at 0.
Myrtle Benny is how I'm known as and I feel comfy when individuals use the complete title. The preferred hobby for home [http://www.teensource.org/ts/std std test] kit my children and me is to perform baseball but I haven't produced a dime  [http://drupal.12thirty4.com/gmaps/node/4991 drupal.12thirty4.com] with it. South Dakota is exactly where me and my spouse live. I used to be [http://home-Hiv-tests.com/home-test-kits/std-tests unemployed] but  [http://kitakeluarga.com/index.php?do=/profile-12707/info/ kitakeluarga.com] now I am a librarian and the wage has been truly home std test kit satisfying.<br><br>my web site over  at home std testing the counter std test ([http://facehack.ir/index.php?do=/blog/20/what-you-have-to-do-facing-candida/ Suggested Webpage])
 
The traditional F-measure or balanced F-score ('''F<sub>1</sub> score''') is the [[Harmonic mean#Harmonic mean of two numbers|harmonic mean]] of precision and recall:
 
:<math>F_1 = 2 \cdot \frac{\mathrm{precision} \cdot \mathrm{recall}}{\mathrm{precision} + \mathrm{recall}}</math>.
 
The general formula for positive real β is:
:<math>F_\beta = (1 + \beta^2) \cdot \frac{\mathrm{precision} \cdot \mathrm{recall}}{(\beta^2 \cdot \mathrm{precision}) + \mathrm{recall}}</math>.
 
The formula in terms of [[Type I and type II errors]]:
 
:<math>F_\beta = \frac {(1 + \beta^2) \cdot \mathrm{true\ positive} }{(1 + \beta^2) \cdot \mathrm{true\ positive} + \beta^2 \cdot \mathrm{false\ negative} + \mathrm{false\ positive}}\,</math>.
 
Two other commonly used F measures are the <math>F_{2}</math> measure, which weights recall higher than precision, and the <math>F_{0.5}</math> measure, which puts more emphasis on precision than recall.
 
The F-measure was derived so that <math>F_\beta</math> "measures the effectiveness of retrieval with respect to a user who attaches β times as much importance to recall as precision".<ref>{{cite book | last = Van Rijsbergen | first = C. J. | url=http://www.dcs.gla.ac.uk/Keith/Preface.html|year = 1979 | title = Information Retrieval | edition= 2nd | publisher=Butterworth }}</ref> It is based on [[C. J. van Rijsbergen|Van Rijsbergen]]'s effectiveness measure
 
:<math>E = 1 - \left(\frac{\alpha}{P} + \frac{1-\alpha}{R}\right)^{-1}</math>.
 
Their relationship is <math>F_\beta = 1 - E</math> where <math>\alpha=\frac{1}{1 + \beta^2}</math>.
 
== Diagnostic Testing ==
 
This is related to the field of [[binary classification]] where recall is often termed as Sensitivity. There are several reason that the F1 score can be criticized in particular circumstances.<ref>{{cite journal|last=POWERS|first=D.M.W.|title=EVALUATION: FROM PRECISION, RECALL AND F-MEASURE TO ROC, INFORMEDNESS, MARKEDNESS & CORRELATION|journal=Journal of Machine Learning Technologies|date=February 27, 2011|year=2011|month=February|volume=2|issue=1|pages=37-63|url=http://www.bioinfo.in/contents.php?id=51}}</ref>
 
{{DiagnosticTesting_Diagram}}
 
== Applications ==
 
The F-score is often used in the field of [[information retrieval]] for measuring [[web search|search]], [[document classification]], and [[query classification]] performance.<ref>{{cite thesis | first=Steven M. |last=Beitzel. |id = {{citeseerx|10.1.1.127.634}} | title=On Understanding and Classifying Web Queries | degree=Ph.D.  | publisher=IIT | year= 2006}}</ref> Earlier works focused primarily on the F<sub>1</sub> score, but with the proliferation of large scale search engines, performance goals changed to place more emphasis on either precision or recall<ref>{{cite conference | author = X. Li, Y.-Y. Wang, and A. Acero | url=http://research.microsoft.com/apps/pubs/default.aspx?id=75219| title=Learning query intent from regularized click graphs | booktitle= Proceedings of the 31st SIGIR Conference |date=July 2008}}</ref> and so <math>F_\beta</math> is seen in wide application.
 
The F-score is also used in machine learning.<ref>See, e.g., the evaluation of the [http://www.cnts.ua.ac.be/conll2002/ner/ CoNLL 2002 shared task].</ref> Note, however, that the F-measures do not take the true negative rate into account, and that measures such as the [[Phi coefficient]], [[Matthews correlation coefficient]], [[Informedness]] or [[Cohen's kappa]] may be preferable to assess the performance of a binary classifier.<ref name="Powers2007">{{cite journal |first=David M W |last=Powers |date=2007/2011 |title=Evaluation: From Precision, Recall and F-Measure  to ROC, Informedness, Markedness & Correlation |journal=Journal of Machine Learning Technologies |volume=2 |issue=1 |pages=37–63 |url=http://www.bioinfo.in/uploadfiles/13031311552_1_1_JMLT.pdf}}</ref>
 
The F-score has been widely used in the natural language processing literature, such as the evaluation of named entity recognition,<ref>{{cite conference | author = Aaron L.-F. Han, Derek F. Wong, and Lidia S. Chao | url=http://link.springer.com/chapter/10.1007/978-3-642-38634-3_8| title=Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics | booktitle= Proceedings of the 20th IIS Conference. LNCS Vol. 7912, pp. 57–68. Springer-Verlag Berlin Heidelberg |date=June 2013}}</ref> Chinese word segmentation,<ref>{{cite conference | author = Aaron L.-F. Han, Derek F. Wong, Lidia S. Chao, Liangye He, Ling Zhu, and Shuo Li | url=http://www.academia.edu/4375396/A_Study_of_Chinese_Word_Segmentation_Based_on_the_Characteristics_of_Chinese| title=A Study of Chinese Word Segmentation Based on the Characteristics of Chinese | booktitle= Proceedings of the 25th GSCL Conference. LNCS Vol. 8105, pp. 111–118. Springer-Verlag Berlin Heidelberg |date=September 2013}}</ref> etc. F-score is usually measured by IV F-score and OOV F-score, where the IV means in vocabulary and OOV means out of vocabulary. IV and OOV are distinguished by whether the testing words exist in the training data.
 
 
==G-measure==
While the F-measure is the Harmonic mean of Recall and Precision the G-measure is the [[Geometric Mean]] of Recall and Precision. Information content corresponds to the Arithmetic Mean of the Information represented by Recall and Precision.{{Citation needed|date=January 2014}}
 
:<math>G =  \sqrt{\mathrm{precision} \cdot \mathrm{recall}}</math>.
 
==See also==
* [[Precision and recall]]
* [[BLEU]]
* [[NIST (metric)]]
* [[METEOR]]
* [[ROUGE (metric)]]
* [[Word error rate|Word Error Rate (WER)]]
* [[Receiver operating characteristic]]
* [[Matthews correlation coefficient]]
 
== References ==
{{reflist}}
 
{{DEFAULTSORT:F1 Score}}
[[Category:Statistical natural language processing]]
[[Category:Evaluation of machine translation]]
[[Category:Statistical ratios]]
[[Category:Summary statistics for contingency tables]]
[[Category:Clustering criteria]]
 
[[de:Beurteilung eines Klassifikators#Kombinierte Maße]]

Revision as of 19:38, 8 February 2014

Myrtle Benny is how I'm known as and I feel comfy when individuals use the complete title. The preferred hobby for home std test kit my children and me is to perform baseball but I haven't produced a dime drupal.12thirty4.com with it. South Dakota is exactly where me and my spouse live. I used to be unemployed but kitakeluarga.com now I am a librarian and the wage has been truly home std test kit satisfying.

my web site over at home std testing the counter std test (Suggested Webpage)