Block-matching algorithm: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>Forderud
Reverting, since Mean Absolute Error (MAE) should already be covered by Mean absolute difference (MAD)
en>CommonsDelinker
m Removing "NTSS.png", it has been deleted from Commons by Fastily because: Copyright violation: If you are the copyright holder/author and/or have authorization to publish the file, please email our...
 
Line 1: Line 1:
A '''diversity index''' is a quantitative measure that reflects how many different types (such as [[species]]) there are in a dataset, and simultaneously takes into account how evenly the basic entities (such as individuals) are distributed among those types. The value of a diversity index increases both when the number of types increases and when [[Species_evenness|evenness]] increases. For a given number of types, the value of a diversity index is maximized when all types are equally abundant.
The name of the author is Numbers. Body building is what my family and I appreciate. California is where her home is but she requirements to transfer simply because of her family. Managing people is what I do and the salary has been really satisfying.<br><br>my weblog ... std testing at home ([http://biztoc.com/mod/groups/topicposts.php?topic=1311479&group_guid=1311468 This Internet site])
 
When diversity indices are used in [[ecology]], the types of interest are usually species, but they can also be other categories, such as [[Genus|genera]], [[Family (biology)|families]], [[Plant functional type|functional type]]s or [[haplotype]]s. The entities of interest are usually individual plants or animals, and the measure of abundance can be, for example, number of individuals, biomass or coverage. In [[demography]], the entities of interest can be people, and the types of interest various demographic groups. In [[information science]], the entities can be characters and the types the different letters of the alphabet. The most commonly used diversity indices are simple transformations of the effective number of types (also known as 'true diversity'), but each diversity index can also be interpreted in its own right as a measure corresponding to some real phenomenon (but a different one for each diversity index).<ref name=Hill1973>Hill, M. O. (1973) Diversity and evenness: a unifying notation and its consequences. ''Ecology'', 54, 427–432. [http://dx.doi.org/10.2307/1934352]</ref><ref name=Jost2006>Jost, L. (2006) Entropy and diversity. ''Oikos'', 113, 363–375. {{DOI|10.1111/j.2006.0030-1299.14714.x}}</ref><ref name=Tuomisto2010a>Tuomisto, H. (2010) A diversity of beta diversities: straightening up a concept gone awry. Part 1. Defining beta diversity as a function of alpha and gamma diversity. ''Ecography'', 33, 2–22. {{doi|10.1111/j.1600-0587.2009.05880.x}}</ref><ref name=Tuomisto2010c>Tuomisto, H. 2010. "A consistent terminology for quantifying species diversity? Yes, it does exist". ''Oecologia'' 4: 853–860. {{doi|10.1007/s00442-010-1812-0}}</ref>
 
==True diversity (The effective number of types)==
True diversity, or the effective number of types, refers to the number of equally-abundant types needed for the average proportional abundance of the types to equal that observed in the dataset of interest (where all types may not be equally abundant). The true diversity in a dataset is calculated by first taking the weighted [[generalized mean]] of the proportional abundances of the types in the dataset, and then taking the [[Multiplicative inverse|inverse]] of this. The equation is:<ref name=Tuomisto2010a /><ref name=Tuomisto2010c />
 
:<math>{}^q\!D={1 \over \sqrt[q-1]{{\sum_{i=1}^R p_i p_i^{q-1}}}}</math>
 
The [[Fraction (mathematics)|denominator]] equals average proportional abundance of the types in the dataset as calculated with the weighted [[generalized mean]] with exponent ''q'' − 1. In the equation, ''R'' is richness (the total number of types in the dataset), and the proportional abundance of the ''i''th type is <math>p_i</math>. The proportional abundances themselves are used as the nominal weights. When ''q'' = 1, the above equation is undefined, so the corresponding mean is calculated with the following equation instead:
:<math>{}^1\!D={1 \over {\prod_{i=1}^R p_i^{p_i}}}</math>
 
The value of ''q'' is often referred to as the order of the diversity. It defines the sensitivity of the diversity value to rare vs. abundant species by modifying how the mean of the species proportional abundances is calculated. With some values of the parameter ''q'', the generalized mean with exponent ''q'' − 1 gives familiar kinds of mean as special cases. In particular, ''q'' = 0 corresponds to the [[harmonic mean]], ''q'' = 1 to the [[geometric mean]] and ''q'' = 2 to the [[arithmetic mean]]. As ''q'' approaches [[infinity]], the generalized mean with exponent ''q'' − 1 approaches the maximum <math>p_i</math> value, which is the proportional abundance of the most abundant species in the dataset. In practice, increasing the value of ''q'' hence increases the effective weight given to the most abundant species. This leads to obtaining a larger mean <math>p_i</math> value and a smaller true diversity (''<sup>q</sup>D'') value.
 
When ''q'' = 1, the geometric mean of the <math>p_i</math> values is used, and each species is exactly weighted by its proportional abundance (in the geometric mean, weights are the exponents). When ''q'' > 1, the weight given to abundant species is exaggerated, and when ''q'' < 1, the weight given to rare species is. At ''q'' = 0, the species weights exactly cancel out the species proportional abundances, such that mean <math>p_i</math> equals 1&nbsp;/&nbsp;''R'' even when all species are not equally abundant. At ''q'' = 0, the effective number of species, <math>{}^0\!D</math>, hence equals the actual number of species (''R''). In the context of diversity, ''q'' is generally limited to non-negative values. This is because negative values of ''q'' would give rare species so much more weight than abundant ones that <math>{}^q\!D</math> would exceed ''R''.<ref name=Tuomisto2010a /><ref name=Tuomisto2010c />
 
The general equation of diversity is often written in the form:<ref name=Hill1973 /><ref name=Jost2006 />
 
:<math>{}^q\!D=\left ( {\sum_{i=1}^R p_i^q} \right )^{1/(1-q)}</math>
 
The term inside the parentheses is called the basic sum. Some popular diversity indices correspond to the basic sum as calculated with different values of ''q''.<ref name=Jost2006 />
 
For diversity of order one, an alternative equation is:<ref name=Hill1973 /><ref name=Jost2006 />
 
:<math>{}^1\!D= \exp\left(-\sum_{i=1}^R p_i \ln p_i\right) =\exp(H')</math>
 
where ''H''' is the Shannon index as calculated with natural logarithms (see below).
 
==Richness==
[[Species richness|Richness]] ''R'' simply quantifies how many different types the dataset of interest contains. For example, species richness (usually notated ''S'') of a dataset is the number of different species in the corresponding species list. Richness is a simple measure, so it has been a popular diversity index in ecology, where abundance data are often not available for the datasets of interest. Because richness does not take the abundances of the types into account, it is not the same thing as diversity, which does take abundances into account. However, if true diversity is calculated with ''q'' = 0, the effective number of types (<sup>0</sup>''D'') equals the actual number of types (''R'').<ref name=Jost2006 /><ref name=Tuomisto2010c />
 
==Shannon index==
The Shannon index has been a popular diversity index in the ecological literature, where it is also known as Shannon's diversity index, the Shannon–Wiener index, the Shannon–Weaver index and the Shannon entropy. The measure was originally proposed by [[Claude Shannon]] to quantify the [[Entropy (information theory)|entropy]] (uncertainty or information content) in strings of text.<ref name=Shannon1948>Shannon, C. E. (1948) A mathematical theory of communication.  The Bell System Technical Journal, 27, 379–423 and 623–656.</ref> The idea is that the more different letters there are, and the more equal their proportional abundances in the string of interest, the more difficult it is to correctly predict which letter will be the next one in the string. The Shannon entropy quantifies the uncertainty (entropy or degree of surprise) associated with this prediction. It is most often calculated as follows:
 
:<math> H' = -\sum_{i=1}^R p_i \ln p_i </math>
 
where <math>p_i</math> is the proportion of characters belonging to the ''i''th type of letter in the string of interest. In ecology, <math>p_i</math> is often the proportion of individuals belonging to the ''i''th species in the dataset of interest. Then the Shannon entropy quantifies the uncertainty in predicting the species identity of an individual that is taken at random from the dataset.
 
Although the equation is here written with natural logarithms, the base of the logarithm used when calculating the Shannon entropy can be chosen freely. Shannon himself discussed logarithm bases 2, 10 and ''e'', and these have since become the most popular bases in applications that use the Shannon entropy. Each log base corresponds to a different measurement unit, which have been called binary digits (bits), decimal digits (decits) and natural digits (nats) for the bases 2, 10 and ''e'', respectively. Comparing Shannon entropy values that were originally calculated with different log bases requires converting them to the same log base: change from the base ''a'' to base ''b'' is obtained with multiplication by log<sub>''b''</sub>''a''.<ref name=Shannon1948 />
 
It has been shown that the Shannon index is based on the weighted geometric mean of the proportional abundances of the types, and that it equals the logarithm of true diversity as calculated with ''q'' = 1:<ref name=Tuomisto2010a />
 
:<math> H' = -\sum_{i=1}^R p_i \ln p_i = -\sum_{i=1}^R \ln p_i^{p_i}</math>
 
This can also be written
 
:<math> H' = -(\ln p_1^{p_1} +\ln p_2^{p_2} +\ln p_3^{p_3} + \cdots + \ln p_R^{p_R}) </math>
 
which equals
 
:<math> H' = -\ln p_1^{p_1}p_2^{p_2}p_3^{p_3} \cdots p_R^{p_R} = \ln \left ( {1 \over p_1^{p_1}p_2^{p_2}p_3^{p_3} \cdots p_R^{p_R}} \right ) = \ln \left ( {1 \over {\prod_{i=1}^R p_i^{p_i}}} \right )</math>
 
Since the sum of the <math>p_i</math> values equals unity by definition, the denominator equals the weighted geometric mean of the <math>p_i</math> values, with the <math>p_i</math> values themselves being used as the weights (exponents in the equation). The term within the parentheses hence equals true diversity <sup>1</sup>''D'', and ''H''' equals ln(<sup>1</sup>''D'').<ref name=Hill1973 /><ref name=Tuomisto2010a /><ref name=Tuomisto2010c />
 
When all types in the dataset of interest are equally common, all <math>p_i</math> values equal 1/''R'', and the Shannon index hence takes the value ln(''R''). The more unequal the abundances of the types, the larger the weighted geometric mean of the <math>p_i</math> values, and the smaller the corresponding Shannon entropy. If practically all abundance is concentrated to one type, and the other types are very rare (even if there are many of them), Shannon entropy approaches zero. When there is only one type in the dataset, Shannon entropy exactly equals zero (there is no uncertainty in predicting the type of the next randomly chosen entity).
 
==Simpson index==
The Simpson index was introduced in 1949 by [[Edward H. Simpson]] to measure the degree of concentration when individuals are classified into types.<ref name=Simpson1949>Simpson, E. H. (1949) Measurement of diversity. Nature, 163, 688.</ref> The same index was rediscovered by Orris C. Herfindahl in 1950.<ref>Herfindahl, O. C. (1950) Concentration in the U.S. Steel Industry. Unpublished doctoral dissertation, Columbia University.</ref> The square root of the index had already been introduced in 1945 by the economist [[Albert O. Hirschman]].<ref>Hirschman, A. O. (1945) National power and the structure of foreign trade. Berkeley.</ref> As a result, the same measure is usually known as the Simpson index in ecology, and as the [[Herfindahl index]] or the Herfindahl–Hirschman index (HHI) in economics.
 
The measure equals the probability that two entities taken at random from the dataset of interest represent the same type.<ref name=Simpson1949 /> It equals:
 
:<math> \lambda = \sum_{i=1}^R p_i^2</math>
 
This also equals the weighted arithmetic mean of the proportional abundances <math>p_i</math> of the types of interest, with the proportional abundances themselves being used as the weights.<ref name=Hill1973 /> Proportional abundances are by definition constrained to values between zero and unity, but their weighted arithmetic mean, and hence λ, can never be smaller than 1/''S'', which is reached when all types are equally abundant.
 
By comparing the equation used to calculate λ with the equations used to calculate true diversity, it can be seen that 1/λ equals <sup>2</sup>''D'', i.e. true diversity as calculated with ''q'' = 2. The original Simpson's index hence equals the corresponding basic sum.<ref name=Jost2006 />
 
The interpretation of λ as the probability that two entities taken at random from the dataset of interest represent the same type assumes that the first entity is replaced to the dataset before taking the second entity. If the dataset is very large, sampling without replacement gives approximately the same result, but in small datasets the difference can be substantial. If the dataset is small, and sampling without replacement is assumed, the probability of obtaining the same type with both random draws is:
 
:<math> l = \frac{\sum_{i=1}^R n_i (n_i -1)}{N (N-1)} </math>
 
where <math>n_i</math> is the number of entities belonging to the ''i''th type and ''N'' is the total number of entities in the dataset.<ref name=Simpson1949 /> This form of the Simpson index is also known as the Hunter–Gaston index in microbiology.<ref name=Hunter1988>Hunter PR, Gaston MA (1988) Numerical index of the discriminatory ability of typing systems: an application of Simpson's index of diversity. J Clin Microbiol 26(11): 2465–2466</ref>
 
Since mean proportional abundance of the types increases with decreasing number of types and increasing abundance of the most abundant type, λ obtains small values in datasets of high diversity and large values in datasets of low diversity. This is counterintuitive behavior for a diversity index, so often such transformations of λ that increase with increasing diversity have been used instead. The most popular of such indices have been the inverse Simpson index (1/λ) and the Gini–Simpson index (1&nbsp;−&nbsp;λ).<ref name=Hill1973 /><ref name=Jost2006 /> Both of these have also been called the Simpson index in the ecological literature, so care is needed to avoid accidentally comparing the different indices as if they were the same.
 
==Inverse Simpson index==
The inverse Simpson index equals:
 
:<math> 1/ \lambda = {1 \over\sum_{i=1}^R p_i^2} = {}^2D</math>
 
This simply equals true diversity of order 2, i.e. the effective number of types that is obtained when the weighted arithmetic mean is used to quantify average proportional abundance of types in the dataset of interest.
 
==Gini–Simpson index==
The original Simpson index λ equals the probability that two entities taken at random from the dataset of interest (with replacement) represent the same type. Its transformation 1&nbsp;−&nbsp;λ therefore equals the probability that the two entities represent different types. This measure is also known in ecology as the probability of interspecific encounter (''PIE'')<ref>Hurlbert, S.H. (1971) The nonconcept of species diversity: A critique and alternative parameters. ''Ecology'' 52, 577–586. [http://dx.doi.org/10.2307/1934145]</ref> and the Gini–Simpson index.<ref name=Jost2006 /> It can be expressed as a transformation of true diversity of order 2:
 
:<math> 1 - \lambda = 1 - \sum_{i=1}^R p_i^2 = 1 - 1/{}^2D</math>
 
The Gibbs–Martin index of sociology, psychology and management studies,<ref>Gibbs, Jack P., and William T. Martin, 1962. Urbanization, technology and the division of labor. American Sociological Review 27: 667–677.</ref> which is also known as the Blau index, is the same measure as the Gini–Simpson index.
 
==Berger–Parker index==
The Berger–Parker index equals the maximum <math>p_i</math> value in the dataset, i.e. the proportional abundance of the most abundant type. This corresponds to the weighted generalized mean of the <math>p_i</math> values when ''q'' approaches infinity, and hence equals the inverse of true diversity of order infinity (<math>1/{}^\infty\!D</math>).
 
==Rényi entropy==
The [[Rényi entropy]] is a generalization of the Shannon entropy to other values of ''q'' than unity. It can be expressed:
 
:<math>{}^qH = \frac{1}{1-q} \; \ln\left ( \sum_{i=1}^R p_i^q \right ) </math>
 
which equals
 
:<math>{}^qH = \ln\left ( {1 \over \sqrt[q-1]{{\sum_{i=1}^R p_i p_i^{q-1}}}} \right ) = \ln({}^q\!D)</math>
 
This means that taking the logarithm of true diversity based on any value of ''q'' gives the Rényi entropy corresponding to the same value of ''q''.
 
==See also==
{{div col|colwidth=18em}}
*[[Species diversity]]
*[[Species richness]]
*[[Alpha diversity]]
*[[Beta diversity]]
*[[Gamma diversity]]
*[[Qualitative variation]]
*[[Isolation index]]
{{div col end}}
 
==References==
{{Reflist|32em}}
 
==Further reading==
*{{cite book | author=[[Paul Colinvaux|Colinvaux, Paul A.]] | title=Introduction to Ecology | publisher=Wiley | year=1973 | isbn=0-471-16498-4}}
*{{cite book | author=Cover, Thomas M. | coauthors=and Thomas, Joy A. | title = Elements of Information Theory| publisher=Wiley | year=1991| isbn=0-471-06259-6}}  See ''chapter 5'' for an elaboration of coding procedures described informally above.
*Chao, A.; Shen, T-J. (2003) [http://chao.stat.nthu.edu.tw/paper/2003_EEST_10_P429.pdf "Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample"], ''Environmental and Ecological Statistics'', 10 (4),429–443 {{doi|10.1023/A:1026096204727}}
 
==External links==
*[http://www.countrysideinfo.co.uk/simpsons.htm Simpson's Diversity index]
*[http://www.tiem.utk.edu/~gross/bioed/bealsmodules/simpsonDI.html Diversity indices] gives some examples of estimates of Simpson's index for real ecosystems.
 
[[Category:Measurement of biodiversity]]
[[Category:Index numbers]]
[[Category:Summary statistics for categorical data]]

Latest revision as of 05:59, 16 December 2014

The name of the author is Numbers. Body building is what my family and I appreciate. California is where her home is but she requirements to transfer simply because of her family. Managing people is what I do and the salary has been really satisfying.

my weblog ... std testing at home (This Internet site)