Template:User latex expert: Difference between revisions
en>WOSlinker m add cat |
en>WOSlinker change cat |
||
Line 1: | Line 1: | ||
The | '''Correspondence analysis (CA)''' is a multivariate [[statistics|statistical technique]] proposed<ref>Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP ISBN 0-19-850994-4</ref> by Hirschfeld<ref>Hirschfeld, H.O. (1935) "A connection between correlation and contingency", ''Proc. Cambridge Philosophical Society'', 31, 520–524</ref> and later developed by [[Jean-Paul Benzécri]].<ref>{{cite book | author = Benzécri, J.-P. | publisher=Dunod |location= Paris, France | year = 1973 | title = L'Analyse des Données. Volume II. L'Analyse des Correspondances}}</ref> It is conceptually similar to [[principal component analysis]], but applies to categorical rather than continuous data. In a similar manner to principal component analysis, it provides a means of displaying or summarising a set of data in two-dimensional graphical form. | ||
All data should be nonnegative and on the same scale for CA to be applicable, and the method treats rows and columns equivalently. It is traditionally applied to [[contingency tables]] — CA decomposes the [[chi-squared statistic]] associated with this table into orthogonal factors. Because CA is a descriptive technique, it can be applied to tables whether or not the <math alt="χ²">\chi^2</math> statistic is appropriate.<ref>{{cite book | author = Greenacre, Michael | publisher=Academic Press |location= London | year = 1983 | title = Theory and Applications of Correspondence Analysis | isbn = 0-12-299050-1 }}</ref><ref>{{cite book | author = Greenacre, Michael | publisher=Chapman & Hall/CRC |location= London | year = 2007 | title = Correspondence Analysis in Practice, Second Edition }}</ref> | |||
== Details == | |||
Like [[principal components analysis]], correspondence analysis creates orthogonal components and, for each item in a table, a set of scores (sometimes called factor scores, see [[Factor analysis]]). Correspondence analysis is performed on a [[contingency table]], ''C'', of size ''m×n'' where ''m'' is the number of rows and ''n'' is the number of columns. | |||
===Preprocessing=== | |||
From table ''C'', compute a sets of weights for the columns and the rows (sometimes called masses),<ref>{{cite book | author = Greenacre, Michael | publisher=Academic Press |location= London | year = 1983 | title = Theory and Applications of Correspondence Analysis | isbn = 0-12-299050-1 }}</ref><ref>{{cite book | author = Greenacre, Michael | publisher=Chapman & Hall/CRC |location= London | year = 2007 | title = Correspondence Analysis in Practice, Second Edition }}</ref> where row weights are | |||
:<math>w_m = (1C1)^{-1} C1</math> | |||
and column weights are | |||
:<math>w_n = (1C1)^{-1} 1C</math>. | |||
Next, compute a table ''S'' (called the stochastic matrix), where ''C'' is divided by the sum of ''C'' | |||
:<math>S = (1C1)^{-1} C</math>. | |||
Finally, compute a table ''M'' from ''S'' and the weights as such | |||
:<math>M = S-w_{m}w_{n}^{*}</math> | |||
where <math>w_{n}^{*}</math> denotes the [[conjugate transpose]] of <math>w_{n}</math>. | |||
===Orthogonal Components=== | |||
The table ''M'' is then decomposed with the [[generalized singular value decomposition]] where the left and right singular vectors are constrained by weights. The weights are diagonal tables | |||
:<math>W_{m} = diag\{w_{m}\}</math> | |||
and | |||
:<math>W_{n} = diag\{w_{n}\}</math> | |||
where the diagonal elements of <math>W_{n}</math> are <math>w_{n}</math> and the off-diagonal elements are all 0. | |||
''M'' is then decomposed via the [[generalized singular value decomposition]] | |||
:<math>M = U\Sigma V^* \,</math> | |||
where | |||
:<math>U^* W_m U = V^* W_n V = I.</math>. | |||
===Factor scores=== | |||
Factor scores for the row items of table ''C'' are | |||
:<math>F_{m} = W_{m} U \Sigma</math> | |||
and for the column items | |||
:<math>F_{n} = W_{n} V \Sigma</math>. | |||
==Extensions and Applications== | |||
Several variants of CA are available, including [[detrended correspondence analysis]] (DCA) and [[canonical correspondence analysis]] (CCA). The extension of correspondence analysis to many categorical variables is called [[multiple correspondence analysis]]. An adaptation of correspondence analysis to the problem of discrimination based upon qualitative variables (i.e., the equivalent of [[discriminant analysis]] for qualitative data) is called discriminant correspondence analysis or barycentric discriminant analysis. | |||
In the social sciences, correspondence analysis, and particularly its extension [[multiple correspondence analysis]], was made known outside France through French sociologist [[Pierre Bourdieu]]'s application of it.<ref>{{cite book| last= Bourdieu| first= Pierre| title= Distinction | year= 1984 | publisher= [[Routledge]]|ISBN= 0674212770| pages= 41}}</ref> | |||
==Implementations== | |||
* The data visualization system [[Orange (software)|Orange]] include the module: [http://www.ailab.si/orange/doc/modules/orngCA.htm orngCA]. | |||
* The statistical system [[R (programming language)|R]] includes the packages: <code>ade4</code>, <code>ca</code>,<ref>Nenadic, O. and Greenacre, M. (2007) [http://www.jstatsoft.org/v20/i03/ "Correspondence analysis in R, with two- and three-dimensional graphics: the ca package"], ''Journal of Statistical Software'', 20(3)</ref> <code>vegan</code>, <code>ExPosition</code>, and<code>[http://factominer.free.fr/] FactoMineR</code> which perform correspondence analysis and multiple correspondence analysis. | |||
* A [[MATLAB]] program (with a tutorial) for correspondence analysis: [http://www.utdallas.edu/~herve/abdi-CorrespondenceAnalysisMatlabProgram.zip]. | |||
* A [[JavaScript]] library, under MIT-License on [[GitHub]], which works both on client-side Javascript and server-side (with [[Node.js]]) : [http://github.com/piercus/CorrespondenceAnalysis CorrespondenceAnalysis]. | |||
== See also == | |||
*[[Detrended correspondence analysis]] | |||
*[[Principal Component Analysis]] | |||
==References== | |||
{{reflist}} | |||
==External links== | |||
* Greenacre, Michael (2008), ''La Práctica del Análisis de Correspondencias'', BBVA Foundation, Madrid, Spanish translation of ''Correspondence Analysis in Practice'', available for free download from [http://www.fbbva.es/TLFU/tlfu/esp/publicaciones/libros/fichalibro/index.jsp?codigo=300 BBVA Foundation publications] | |||
* Greenacre, Michael (2010), ''Biplots in Practice'', BBVA Foundation, Madrid, available for free download at [http://www.multivariatestatistics.org multivariatestatistics.org] | |||
[[Category:Multivariate statistics]] | |||
[[Category:Data analysis]] |
Latest revision as of 19:57, 11 December 2012
Correspondence analysis (CA) is a multivariate statistical technique proposed[1] by Hirschfeld[2] and later developed by Jean-Paul Benzécri.[3] It is conceptually similar to principal component analysis, but applies to categorical rather than continuous data. In a similar manner to principal component analysis, it provides a means of displaying or summarising a set of data in two-dimensional graphical form.
All data should be nonnegative and on the same scale for CA to be applicable, and the method treats rows and columns equivalently. It is traditionally applied to contingency tables — CA decomposes the chi-squared statistic associated with this table into orthogonal factors. Because CA is a descriptive technique, it can be applied to tables whether or not the statistic is appropriate.[4][5]
Details
Like principal components analysis, correspondence analysis creates orthogonal components and, for each item in a table, a set of scores (sometimes called factor scores, see Factor analysis). Correspondence analysis is performed on a contingency table, C, of size m×n where m is the number of rows and n is the number of columns.
Preprocessing
From table C, compute a sets of weights for the columns and the rows (sometimes called masses),[6][7] where row weights are
and column weights are
Next, compute a table S (called the stochastic matrix), where C is divided by the sum of C
Finally, compute a table M from S and the weights as such
where denotes the conjugate transpose of .
Orthogonal Components
The table M is then decomposed with the generalized singular value decomposition where the left and right singular vectors are constrained by weights. The weights are diagonal tables
and
where the diagonal elements of are and the off-diagonal elements are all 0.
M is then decomposed via the generalized singular value decomposition
where
Factor scores
Factor scores for the row items of table C are
and for the column items
Extensions and Applications
Several variants of CA are available, including detrended correspondence analysis (DCA) and canonical correspondence analysis (CCA). The extension of correspondence analysis to many categorical variables is called multiple correspondence analysis. An adaptation of correspondence analysis to the problem of discrimination based upon qualitative variables (i.e., the equivalent of discriminant analysis for qualitative data) is called discriminant correspondence analysis or barycentric discriminant analysis.
In the social sciences, correspondence analysis, and particularly its extension multiple correspondence analysis, was made known outside France through French sociologist Pierre Bourdieu's application of it.[8]
Implementations
- The data visualization system Orange include the module: orngCA.
- The statistical system R includes the packages:
ade4
,ca
,[9]vegan
,ExPosition
, and[1] FactoMineR
which perform correspondence analysis and multiple correspondence analysis. - A MATLAB program (with a tutorial) for correspondence analysis: [2].
- A JavaScript library, under MIT-License on GitHub, which works both on client-side Javascript and server-side (with Node.js) : CorrespondenceAnalysis.
See also
References
43 year old Petroleum Engineer Harry from Deep River, usually spends time with hobbies and interests like renting movies, property developers in singapore new condominium and vehicle racing. Constantly enjoys going to destinations like Camino Real de Tierra Adentro.
External links
- Greenacre, Michael (2008), La Práctica del Análisis de Correspondencias, BBVA Foundation, Madrid, Spanish translation of Correspondence Analysis in Practice, available for free download from BBVA Foundation publications
- Greenacre, Michael (2010), Biplots in Practice, BBVA Foundation, Madrid, available for free download at multivariatestatistics.org
- ↑ Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, OUP ISBN 0-19-850994-4
- ↑ Hirschfeld, H.O. (1935) "A connection between correlation and contingency", Proc. Cambridge Philosophical Society, 31, 520–524
- ↑ 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.
My blog: http://www.primaboinca.com/view_profile.php?userid=5889534 - ↑ 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.
My blog: http://www.primaboinca.com/view_profile.php?userid=5889534 - ↑ 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.
My blog: http://www.primaboinca.com/view_profile.php?userid=5889534 - ↑ 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.
My blog: http://www.primaboinca.com/view_profile.php?userid=5889534 - ↑ 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.
My blog: http://www.primaboinca.com/view_profile.php?userid=5889534 - ↑ 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.
My blog: http://www.primaboinca.com/view_profile.php?userid=5889534 - ↑ Nenadic, O. and Greenacre, M. (2007) "Correspondence analysis in R, with two- and three-dimensional graphics: the ca package", Journal of Statistical Software, 20(3)