|
|
Line 1: |
Line 1: |
| The '''Theil index''' is a statistic used to measure [[economic inequality]].<ref>[http://utip.gov.utexas.edu/papers/utip_14.pdf Introduction to the Theil index from the University of Texas]</ref> It has also been used to measure the lack of racial diversity.<ref>http://geodacenter.asu.edu/node/236</ref> The basic Theil index T<sub>T</sub> is the same as [[Redundancy (information theory)|redundancy in information theory]] which is the maximum possible entropy of the data minus the observed entropy. It is a special case of the [[generalized entropy index]]. It can be viewed as a measure of redundancy, lack of diversity, isolation, segregation, inequality, non-randomness, and compressibility. It was proposed by [[econometrics|econometrician]] [[Henri Theil]], a successor of [[Jan Tinbergen]] at the [[Erasmus University Rotterdam]].
| | I would like to introduce myself to you, I am Andrew and my spouse doesn't like it at all. For a whilst I've been in Mississippi but now I'm considering other choices. Invoicing is my profession. What me and my family members love is performing ballet but I've been taking on new things recently.<br><br>Here is my weblog :: [http://www.octionx.sinfauganda.co.ug/node/22469 psychic chat online] |
| | |
| ==Formulas==
| |
| The basic Theil index is<ref name="Formulas" />
| |
| | |
| : <math>
| |
| T_T=T_{\alpha=1}=\frac{1}{N}\sum_{i=1}^N \left( \frac{x_i}{\overline{x}} \cdot \ln{\frac{x_i}{\overline{x}}} \right)
| |
| </math>
| |
| where <math>x</math> is income/person. When <math>x</math> is inverted to be people/income, or if changes in lower incomes are more important, a different formula is used that is derivable from <math>T_T</math> by
| |
| :<math>T_L(x)=T_T\left(\frac{1}{x} \right)</math>
| |
| : <math>
| |
| T_L=T_{\alpha=0}=MLD=\frac{1}{N}\sum_{i=1}^N \left( \ln{\frac{\overline{x}}{x_i}} \right)
| |
| </math>
| |
| | |
| <math>T_L</math> is also known as the MLD (mean log deviation) because it gives the mean deviation of <math>ln(x)</math>. Sometimes the average of <math>T_T</math> and <math>T_L</math> is used, which has the advantage of being "symmetric" like the Gini, Hoover, and Coulter indices. "Symmetric" means it gives the same result for x as it does for 1/x:
| |
| | |
| : <math>
| |
| T_S=(T_T+T_L)/2=\frac{1}{2 N}\sum_{i=1}^N \left[\left(\frac{x_i}{\overline{x}} - 1\right)\ln(x_i) \right]
| |
| </math>
| |
| | |
| For these equations, <math>x_i</math> is the income of the <math>i</math>th person or subgroup, <math>\overline{x}</math> is the mean income of the persons or subgroups, and <math>N</math> is the population or number of subgroups.
| |
|
| |
| If everyone has the same income, the indices give 0 which, counter-intuitively, is when the population's income has maximum disorder. If one person has all the income, then T<sub>T</sub> gives the result <math>ln(N)</math>, which is maximum order. Dividing T<sub>T</sub> by <math>ln(N)</math> can normalize the equation to range from 0 to 1.
| |
|
| |
| The indices measure an entropic "distance" the population is away from the "ideal" egalitarian state of everyone having the same income. The numerical result is in terms of negative entropy so that a higher number indicates more order that is further away from the "ideal" of maximum disorder. Formulating the index to represent negative entropy instead of entropy allows it to be a measure of inequality rather than equality.
| |
|
| |
| If <math>T_T</math> applies to the distribution of income in people, then <math>T_L</math> can be used to get the same numerical result for the distribution of people in income.
| |
| | |
| The two Theil indices <math>T_T</math> and <math>T_L</math> are special cases of the [[generalized entropy index]] with <math>{\alpha} = 1</math> and <math>{\alpha} = 0</math>. The [[Atkinson index]] with <math>{\epsilon} = 1</math> is a transformation of <math>T_{\alpha=0}</math> by A=1-e^-T.
| |
| | |
| == Derivation from Entropy ==
| |
| | |
| The Theil index is derived from [[Claude Shannon|Shannon]]'s measure of [[information entropy]] (S), where entropy is a measure of randomness in a given set of information. In information theory, physics, and the Theil index, the general form of entropy is
| |
| | |
| :<math>S = k \sum_{i=1}^N \left( p_i \log{\frac{1}{p_i}} \right) = - k \sum_{i=1}^N \left( p_i \log{p_i} \right)</math>
| |
| | |
| where p<sub>i</sub> is the probability of finding member i from a random sample of the population. In physics, k is Boltzmann's constant. In information theory k=1 if it is in terms of bits and the log base is 2. Physics and the Theil index have chosen the natural logarithm as the logarithmic base. When p<sub>i</sub> is chosen to be income per person (x<sub>i</sub>), it needs to be normalized by dividing by the total population income, N*avg(x). This gives the observed entropy of a Theil population to be:
| |
| | |
| : <math>S_{Theil} = \sum_{i=1}^N \left( \frac{x_i}{N \overline{x}} \ln{\frac{N \overline{x}}{x_i}} \right)</math>
| |
|
| |
| The Theil index is T<sub>T</sub> = S<sub>max</sub> - S<sub>Theil</sub> where the theoretical maximum entropy S<sub>max</sub> is when all incomes are equal, i.e. each x<sub>i</sub> = average x<sub>i</sub> = a constant. This is substituted into S<sub>Theil</sub> to give S<sub>max</sub> = ln(N) for T<sub>T</sub>, a constant determined solely by the population. So the Theil index gives a value in terms of an entropy that measures how far S<sub>Theil</sub> is away from the "ideal" S<sub>max</sub>. The index is a "negative entropy" in the sense that it gets smaller as the disorder gets larger, so it is a measure of order rather than disorder.
| |
|
| |
| When x is in units of population/species, <math>S_{Theil}</math> is a measure of biodiversity and is called the [[Shannon index]]. If the Theil index is used with x=population/species, it is a measure of inequality of population among a set of species, or "bio-isolation" as opposed to "wealth isolation".
| |
| | |
| The Theil index measures what is called [[Redundancy (information theory)|redundancy]] in information theory.<ref name="Formulas">http://www.poorcity.richcity.org (Redundancy, Entropy and Inequality Measures)</ref> It is the left over "information space" that was not utilized to convey information, which reduces the effectiveness of the [[price signal]]. The Theil index is a measure of the redundancy of income (or other measure of wealth) in some individuals. Redundancy in some individuals implies scarcity in others. A high Theil index indicates the total income is not distributed evenly among individuals in the same way an uncompressed text file does not have a similar number of byte locations assigned to the available unique byte characters.
| |
|
| |
| {| class="wikitable"
| |
| |-
| |
| ! Notation !! Information Theory !! Theil Index T<sub>T</sub>
| |
| |-
| |
| | N || number of unique characters || number of individuals
| |
| |-
| |
| | i || a particular character || a particular individual
| |
| |-
| |
| | x<sub>i</sub> || character<sub>i</sub> count || income of individual<sub>i</sub>
| |
| |-
| |
| | N*avg(x) || total characters in document || total income in population
| |
| |-
| |
| | T<sub>T</sub> || unused information space || unused potential in price mechanism
| |
| |-
| |
| | || data compression || progressive tax
| |
| |}
| |
| | |
| ==Application of the Theil index==
| |
| Theil's measure can be converted<ref name="Formulas" /> by the operation <math>1-e^{- T}</math> into one of the indexes of [[Anthony Barnes Atkinson]], where <math>\epsilon</math> may or may not be used to introduce an ''inequality aversion factor'' into the formula, with <math>\epsilon=1</math> being the default. The result of the conversion also has been called ''normalized Theil index''.<ref name="normalized">Juana Domínguez-Domínguez, José Javier Núñez-Velázquez: ''[http://www.uib.es/congres/ecopub/ecineq/papers/100Dominguez-Nunez.pdf The Evolution of Economic Inequality in the EU Countries During the Nineties]'', 2005</ref>
| |
| | |
| James E. Foster<ref name="Sen">James E. Foster and [[Amartya Sen]], 1996, ''On Economic Inequality'', expanded edition with annexe, ISBN 0-19-828193-5</ref> used such a measure to replace the Gini coefficient in [[Amartya Sen]]'s ''welfare function'' W = f(income,inequality). The income e.g. is the average income for individuals in a group of income earners. Thus, Foster's welfare function can be computed directly from the Theil index <math>T</math>, if the conversion is included into the computation of the average per capita [[Social welfare function|welfare function]]:
| |
| | |
| : <math>W = \overline{\text{income}} \cdot {e^{-T}}.\,</math>
| |
| | |
| Using the "Theil-L" index <math>{T_L}</math> (see below) for <math>T</math> in that formula yields results similar to using the [[Atkinson index]] for computing the welfare function.
| |
| | |
| ==Meaning of "U"==
| |
| If U = 1 Then the "Naive" (NF1) Method is as good as the current Forecast Method
| |
| | |
| If U < 1 Then the Forecasting Method is better than the NF1 Method
| |
| | |
| If U > 1 Then the NF1 Method is better than the Forecasting Method. There is no need to waste time applying further Forecasting methods.
| |
| | |
| ==Decomposability==
| |
| One of the advantages of the Theil index is that it is a weighted average of inequality within subgroups, plus inequality among those subgroups. For example, inequality within the United States is the average inequality within each state, weighted by state income, plus the inequality among states.
| |
| | |
| If for the Theil-T index the population is divided into <math>m</math> certain subgroups and <math>s_i</math> is the income share of group <math>i</math>, <math>T_{Ti}</math> is the Theil-T index for that subgroup, and <math>\overline{x}_i</math> is the average income in group <math>i</math>, then the Theil index is
| |
| | |
| : <math>
| |
| T_T = \sum_{i=1}^m s_i T_{T_i} + \sum_{i=1}^m s_i \log{\frac{\overline{x}_i}{\overline{x}}}
| |
| </math>
| |
| | |
| The formula for the Theil-L index is:
| |
| : <math>
| |
| T_L = \frac{1}{m} \sum_{i=1}^m T_{L_i} + \frac{1}{m} \sum_{i=1}^m \log{\frac{\overline{x}}{\overline{x_i}}}
| |
| </math>
| |
| | |
| [[Image:Theil USCounties.png|center|500px|Map of economic inequality in the United States using the Theil Index. A high positive theil index indicates more income than population while a negative value shows more population than income. A value of zero shows equality between population and income.]]
| |
| | |
| :'''Note''': This image is not the Theil Index in each area of the United States, but of contributions to the US Theil Index by each area (the Theil Index is always positive, individual contributions to the Theil Index may be negative or positive).
| |
| | |
| The decomposition of the overall Theil index which identifies the share attributable to the between-region component becomes a helpful tool for the positive analysis of regional inequality as it suggests the relative importance of spatial dimension of inequality.<ref name="Spatial decomposition">Novotny, J., 2007, On the measurement of regional inequality: Does spatial dimension of income inequality matter? Annals of Regional Science, 41, 3, 563-580. http://web.natur.cuni.cz/~pepino/NOVOTNY2007AnnalsofRegionalScience.pdf</ref>
| |
| | |
| The decomposability is a property of the Theil index which the more popular [[Gini coefficient]] does not offer. The Gini coefficient is more intuitive to many people since it is based on the [[Lorenz curve]]. However, it is not easily decomposable like the Theil.
| |
| | |
| ==Applications==
| |
| In addition to multitude of economic applications, the Theil index has been applied to assess performance of [[irrigation]] systems<ref>Rajan K. Sampath. Equity Measures for Irrigation Performance Evaluation. Water International, 13(1), 1988.</ref> and distribution of [[software metrics]].<ref>A. Serebrenik, M. van den Brand. Theil index for aggregation of software metrics values. 26th IEEE International Conference on Software Maintenance. IEEE Computer Society.</ref>
| |
| | |
| ==See also==
| |
| * [[Generalized entropy index]]
| |
| * [[Atkinson index]]
| |
| * [[Gini coefficent]]
| |
| * [[Hoover index]]
| |
| * [[Income inequality metrics]]
| |
| * [[Suits index]]
| |
| * [[Wealth condensation]]
| |
| * [[Diversity index]]
| |
| | |
| ==References==
| |
| {{Reflist|2}}
| |
| | |
| == External links ==
| |
| * Software:
| |
| ** [http://www.wessa.net/co.wasp Free Online Calculator] computes the Gini Coefficient, plots the Lorenz curve, and computes many other measures of concentration for any dataset
| |
| ** Free Calculator: [http://www.poorcity.richcity.org/calculator.htm Online] and [http://luaforge.net/project/showfiles.php?group_id=49 downloadable scripts] ([[Python (programming language)|Python]] and [[Lua programming language|Lua]]) for Atkinson, Gini, and Hoover inequalities
| |
| ** Users of the [http://www.r-project.org/ R] data analysis software can install the "ineq" package which allows for computation of a variety of inequality indices including Gini, Atkinson, Theil.
| |
| ** A [http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=19968 MATLAB Inequality Package], including code for computing Gini, Atkinson, Theil indexes and for plotting the Lorenz Curve. Many examples are available.
| |
| | |
| [[Category:Econometrics]]
| |
| [[Category:Information theory]]
| |
| [[Category:Economic indicators]]
| |
| [[Category:Index numbers]]
| |
| [[Category:Income distribution]]
| |
| [[Category:Welfare economics]]
| |
| [[Category:Summary statistics]]
| |
| [[Category:Economic inequality]]
| |