Gromov boundary: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>AnomieBOT
m Dating maintenance tags: {{Expand section}}
 
 
Line 1: Line 1:
'''Weighted correlation network analysis''', also known as weighted gene co-expression [[Biological network|network]] analysis, is a widely used [[data mining]] method especially  for studying  [[biological network]]s based on pairwise  [[Correlation coefficient|correlations]] between variables. While it can be applied to most  [[Clustering high-dimensional data|high dimensional]] data sets, it has been most widely used in  [[Genomics|genomic]] applications.  It allows one to define modules (clusters), intramodular hubs, and network nodes with regard to module membership, to study the relationships between co-expression modules, and to compare the network topology of different networks (differential network analysis). WGCNA can be used as  [[Data reduction|data reduction technique]] (related to oblique  [[factor analysis]] ), as  [[Cluster analysis|clustering]] method (fuzzy clustering), as  [[Variable selection|feature]] selection method (e.g. as gene screening method), as framework for integrating complementary (genomic) data (based on weighted correlations between quantitative variables), and as  [[Exploratory data analysis|data exploratory]] technique.<ref name="Horvath2011"/> Although WGCNA incorporates traditional data exploratory techniques, its intuitive network language and analysis framework transcend any standard analysis technique.  Since it uses network methodology and is well suited for integrating complementary genomic data sets, it can be interpreted as [[Systems biology|systems biologic]] or systems genetic data analysis method. By selecting intramodular hubs in consensus modules, WGCNA also gives rise to network based  [[meta analysis]] techniques <ref name="Langfelder2013"/>
My name is Clarita and I am studying Neuroscience and Psychology at Sao Paulo / Brazil.<br><br>Feel free to surf to my homepage [http://www.berlitz.com.sg/?option=com_k2&view=itemlist&task=user&id=36321 Hostgator 1 cent coupon]
 
== History ==
The WGCNA method was developed by Steve Horvath, a professor of [[human genetics]] at the David Geffen School of Medicine at [[UCLA]] and of [[biostatistics]] at the [[UCLA]] Fielding School of Public Health and his colleagues at UCLA, and (former) lab members (in particular Peter Langfelder, Bin Zhang, Jun Dong). Much of the work arose from collaborations with applied researchers. In particular, weighted correlation networks were developed in joint discussions with cancer researchers Paul Mischel, Stanley F. Nelson, and neuroscientists Daniel H. Geschwind, Michael C. Oldham (according to the acknowledgement section in <ref name="Horvath2011"/>). There is a vast literature on dependency networks, scale free networks and coexpression networks.
 
== Comparison between weighted and unweighted correlation networks ==
A weighted correlation network can be interpreted as special case of a  [[weighted network]],  [[dependency network]] or correlation network.  Weighted correlation network analysis can be attractive for the following reasons:
* The network construction (based on soft thresholding the  [[correlation coefficient]]) preserves the continuous nature of the underlying correlation information. For example, weighted correlation networks that are constructed on the basis of correlations between numeric variables do not require the choice of a hard threshold. Dichotomizing information and (hard)-thresholding may lead to information loss.<ref name="Zhang2005"/>
* The network construction is highly robust results with respect to different choices of the soft threshold.<ref name="Zhang2005"/> In contrast, results based on unweighted networks, constructed by thresholding a pairwise association measure, often strongly depend on the threshold.
* Weighted correlation networks facilitate a geometric interpretation based on the angular interpretation of the correlation, chapter 6 in.<ref name="Horvath2008"/>
* Resulting network statistics can be used to enhance standard data-mining methods such as cluster analysis since (dis)-similarity measures can often be transformed into weighted networks.,<ref name="Oldham2012"/> chapter 6 in <ref name="Horvath2008"/>
* WGCNA provides powerful module preservation statistics which can be used to quantify whether can be found in another condition. Also module preservation statistics allow one to study differences between the modular structure of networks.<ref name="Langfelder2011"/>
* Weighted networks and correlation networks can often be approximated by “factorizable” networks.<ref name="Horvath2008"/><ref name="Dong2007"/> Such approximations are often difficult to achieve for sparse, unweighted networks. Therefore, weighted (correlation) networks allow for a parsimonious parametrization (in terms of modules and module membership) (chapters 2, 6 in <ref name="Horvath2011"/>) and <ref name="Ranola2013"/>
 
== Method ==
First, one defines a gene co-expression similarity measure which is used to define the network. We denote the gene co-expression similarity measure of a pair of genes i and j by <math>s_{ij}</math>. Many co-expression studies use the absolute value of the correlation as an unsigned co-expression similarity measure,
 
<math> s^{unsigned}_{ij}=|cor(x_i,x_j)|</math>
 
where gene expression profiles <math>x_{i}</math> and <math>x_{j}</math> consist of the expression of genes i and j across multiple samples. However, using the absolute value of the correlation may obfuscate biologically relevant information, since no distinction is made between gene repression and activation. In contrast, in signed networks the similarity between genes reflects the sign of the correlation of their expression profiles. To define a signed co-expression measure between gene expression profiles <math>x_{i}</math> and <math>x_{j}</math> , one can use a simple transformation of the correlation:
 
<math> s^{signed}_{ij}=0.5+0.5 cor(x_i,x_j)</math>
 
As the unsigned measure <math> s^{unsigned}_{ij}</math>
, the signed similarity <math> s^{signed}_{ij}</math> takes on a value between 0 and 1. Note that the unsigned similarity between two oppositely expressed genes (<math>cor(x_i,x_j) = -1</math>) equals 1 while it equals 0 for the signed similarity. Similarly, while the unsigned co-expression measure of two genes with zero correlation remains zero, the signed similarity equals 0.5.
 
Next, an adjacency matrix (network), <math>A=[a_{ij}] </math>, is used to quantify how strongly genes are connected to one another. <math>A </math> is defined by thresholding the co-expression similarity matrix <math> S = [s_{ij}] </math> . 'Hard' thresholding (dichotomizing) the similarity measure <math> S  </math> results in an unweighted gene co-expression network. Specifically an unweighted network adjacency is defined to be 1 if <math>s_{ij}>\tau </math> and 0 otherwise.
Because hard thresholding encodes gene connections in a binary fashion, it can be sensitive to the choice of the threshold and result in the loss of co-expression information.<ref name="Zhang2005"/> The continuous nature of the co-expression information can be preserved by employing soft thresholding, which results in a weighted network. Specifically, WGCNA uses the following power function assess their connection strength: ),
 
<math>  a_{ij} = (s_{ij})^\beta </math>
 
where the power <math>  \beta </math> is the soft thresholding parameter. The default values
<math>  \beta=6 </math> and <math>  \beta=12 </math> are used for unsigned and signed networks, respectively. Alternatively, <math>  \beta </math>  and be chosen using the  [[Scale-free network|scale-free topology]] criterion which amounts to choosing the smallest value of <math>  \beta </math>  such that approximate scale free topology is reached.<ref name="Zhang2005"/>
 
Since <math> log (a_{ij}) = \beta log (s_{ij}) </math>, the weighted network adjacency is linearly related to the co-expression similarity on a logarithmic scale. Note that a high power <math>  \beta </math>  transforms high similarities into high adjacencies, while pushing low similarities towards 0. Since this soft-thresholding procedure applied to a pairwise correlation matrix leads to weighted adjacency matrix, the ensuing analysis is referred to as weighted gene co-expression network analysis.
 
A major step in the module centric analysis is to cluster genes into network modules using a network proximity measure. Roughly speaking, a pair of genes has a high proximity if it is closely interconnected. By convention, the maximal proximity between two genes is 1 and the minimum proximity is 0. Typically, WGCNA uses the define the topological overlap measure (TOM) as proximity.<ref name="Ravasz2002"/><ref name="Yip2007"/> which can also be defined for weighted networks.<ref name="Zhang2005"/> The TOM combines the adjacency of two genes and the connection strengths these two genes share with other "third party" genes. The TOM is a highly robust measure of network interconnectedness (proximity). This proximity is used as input of average linkage hierarchical clustering. Modules are defined as branches of the resulting cluster tree using the dynamic branch cutting approach <ref name="Langfelder2007"/>
Next the genes inside a given module are summarize with the module eigengene, which can be considered as the best summary of the standardized module expression data.<ref name="Horvath2008"/> The module eigengene of a given module is defined as the first principal component of the standardized expression profiles. To find modules that relate to a clinical trait of interest, module eigengenes are correlated with the clinical trait of interest, which gives rise to an eigengene significance measure. One can also construct co-expression networks between module eigengenes (eigengene networks), i.e. networks whose nodes are modules <ref name="Langfelder2007Eigengene"/>
To identify intramodular hub genes insider a given module, one can use two types of connectivity measures. The first, referred to as <math>kME_i=cor(x_i,ME) </math>, is defined based on correlating each gene with the respective module eigengene. The second, referred to as kIN, is defined as a sum of adjacencies with respect to the module genes. In practice, these two measures are equivalent.<ref name="Horvath2008"/>
To test whether a module is preserved in another data set, one can use various network statistics, e.g. <math>Zsummary</math>.<ref name="Langfelder2011"/>
 
== Applications ==
WGCNA has been widely used for analyzing gene expression data (i.e. transcriptional data), e.g. to find intramodular hub genes.<ref name="Langfelder2013"/><ref name="Horvath2006"/>
 
It is often used as data reduction step in systems genetic applications where modules are represented by "module eigengenes" e.g.<ref name="Chen2008"/><ref name="Plaisier2009"/> Module eigengenes can be used to correlate modules with clinical traits. Eigengene networks are coexpression networks between module eigengenes (i.e. networks whose nodes are modules) .
WGCNA is widely used in neuroscientific applications, e.g.<ref name="Voineagu2011"/><ref name="Hawrylycz2012"/> and for analyzing genomic data including  [[microarray]] data, single cell  [[RNA seq]] data,<ref name="Xue2013"/>  [[DNA methylation]] data,<ref name="Horvath2012aging"/> miRNA data, peptide counts <ref name="Shirasaki2012"/> and  [[Human Microbiome Project|microbiota]] data (16S rRNA gene sequencing).<ref name="Tong2013"/> Other applications include brain imaging data, e.g.  [[FMRI|functional MRI]] data <ref name="Mumford2010"/>
 
== R software package ==
The WGCNA  [[R (programming language)|R software]] package <ref name="Langfelder2008"/>
provides functions for carrying out all aspects of weighted network analysis (module construction, hub gene selection, module preservation statistics, differential network analysis, network statistics). The WGCNA package is available from the Comprehensive  [[R (programming language)|R]]  Archive Network (CRAN), the standard repository for
R add-on packages.
 
== References ==
{{reflist|2|refs=
 
<ref name="Zhang2005">Zhang B, Horvath S (2005) A General Framework for Weighted Gene Co-Expression Network Analysis", Statistical Applications in Genetics and Molecular Biology: Vol. 4: No. 1, Article 17 PMID 16646834 [http://dibernardo.tigem.it/files/papers/2008/zhangbin-statappsgeneticsmolbio.pdf]</ref>
 
<ref name="Horvath2006">Horvath S, Zhang B, Carlson M, Lu KV, Zhu S, Felciano RM, Laurance MF, Zhao W, Shu, Q, Lee Y, Scheck AC, Liau LM, Wu H, Geschwind DH, Febbo PG, Kornblum HI, Cloughesy TF, Nelson SF, Mischel PS (2006) "Analysis of Oncogenic Signaling Networks in Glioblastoma Identifies ASPM as a Novel Molecular Target", PNAS November 14, 2006 vol. 103  no. 46 17402-17407</ref>
 
<ref name="Dong2007">Dong J, Horvath S (2007) Understanding Network Concepts in Modules, BMC Systems Biology 2007, 1:24 PMID 17547772 PMCID: PMC3238286
[http://www.biomedcentral.com/1752-0509/1/24  BMC Systems Biology]</ref><ref name="Horvath2008">Horvath S, Dong J (2008) Geometric Interpretation of Gene Coexpression Network Analysis. PLoS Comput Biol 4(8): e1000117 PMID 18704157 PMCID: PMC2446438 [http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000117  Plos Comp Biol]</ref>
 
<ref name="Horvath2011">Horvath S (2011). Weighted Network Analysis: Applications in Genomics and Systems Biology. Springer Book.  1st Edition., 2011, XXII, 414 p Hardcover ISBN 978-1-4419-8818-8 [http://www.springer.com/new+&+forthcoming+titles+(default)/book/978-1-4419-8818-8?changeHeader|Springer website]</ref><ref name="Langfelder2007">Langfelder P, Zhang B, Horvath S (2007) Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut library for R. Bioinformatics. November/btm563 PMID 18024473 [http://bioinformatics.oxfordjournals.org/content/24/5/719.abstract  Bioinformatics]</ref>
 
<ref name="Langfelder2007Eigengene">Langfelder P, Horvath S (2007) Eigengene networks for studying the relationships between co-expression modules. BMC Systems Biology 2007, 1:54 PMID 18031580 [http://www.biomedcentral.com/1752-0509/1/54/abstract  BMC Systems Biology]</ref>
 
<ref name="Yip2007">Yip A, Horvath S (2007) Gene network interconnectedness and the generalized topological overlap measure. BMC Bioinformatics 2007, 8:22 PMID 17250769 PMCID: PMC1797055 [http://www.biomedcentral.com/content/pdf/1471-2105-8-22.pdf  BMC Bioinformatics]</ref>
 
<ref name="Ravasz2002">Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL: Hierarchical
organization of modularity in metabolic networks. Science 2002, 297(5586):1551-1555.</ref>
 
<ref name="Langfelder2008">Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008, 9:559 PMID 19114008 PMCID: PMC2631488 [http://www.biomedcentral.com/1471-2105/9/559  BMC Bioinformatics]</ref>
 
<ref name="Langfelder2011">Langfelder P, Luo R, Oldham MC, Horvath S (2011) Is my network module preserved and reproducible? PloS Comp Biol. 7(1): e1001057 PMID 21283776 PMCID:PMC3024255 [http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1001057  PloS Comp Biol]</ref>
 
<!--ref name="Song2013">Song L, Langfelder P, Horvath S (2013) Random generalized linear model: a highly accurate and interpretable ensemble predictor. BMC Bioinformatics 14:5 PMID 23323760 DOI: 10.1186/1471-2105-14-5.[ http://www.biomedcentral.com/1471-2105/14/5 |BMC Bioinformatics]</ref-->
 
<ref name="Ranola2013">Ranola JM, Langfelder P, Lange K, Horvath S Cluster and propensity based approximation of a network. BMC Syst Biol. 2013 Mar 14;7(1):21 PMID 23497424 [http://www.biomedcentral.com/1752-0509/7/21/ BMC Systems Biology]</ref>
 
<ref name="Langfelder2013">Langfelder P, Mischel PS, Horvath S (2013) When Is Hub Gene Selection Better than Standard Meta-Analysis? PLoS ONE 8(4): e61505. doi:10.1371/journal.pone.0061505 PMID: PMCID: PMC3629234 [http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0061505  PloS One]</ref>
 
<ref name="Oldham2012">Oldham MC, Langfelder P, Horvath S (2012) Network methods for describing sample relationships in genomic datasets: application to Huntington's disease. BMC Syst Biol. 2012 Jun 12;6(1):63. PMID 22691535 46(11) 1-17</ref>
 
<!--ref name="Oldham2008">Oldham MC, Konopka G, Iwamoto K, Langfelder P, Kato T, Horvath S, Geschwind DH (2008) Functional organization of the transcriptome in human brain. Nature Neuroscience. Nature Neuroscience 11, 1271 - 1282 (2008) doi:10.1038 nn.2207 [http://www.nature.com/neuro/journal/v11/n11/abs/nn.2207.html  Nature Neuroscience]</ref-->
 
<ref name="Chen2008">Chen Y, Zhu J, Lum PY, Yang X, Pinto S, MacNeil DJ, Zhang C, Lamb J, Edwards S, Sieberts SK, Leonardson A, Castellini LW, Wang S, Champy MF, Zhang B, Emilsson V, Doss S, Ghazalpour A, Horvath S, Drake TA, Lusis AJ, Schadt EE. Variations in DNA elucidate molecular networks that cause disease. Nature. 2008 Mar 27;452(7186):429-35.</ref>
 
<ref name="Plaisier2009">Plaisier CL, Horvath S, Huertas-Vazquez A, Cruz-Bautista I, Herrera MF, Tusie-Luna T, Aguilar-Salinas C, Pajukanta P (2009) A systems genetics approach implicates USF1, FADS3 and other causal candidate genes for familial combined hyperlipidemia. PloS Genetics;5(9):e1000642</ref>
 
<ref name="Voineagu2011">Voineagu I, Wang X, Johnston P, Lowe JK, Tian Y, Horvath S, Mill J, Cantor R, Blencowe BJ, Geschwind DH (2011) Transcriptomic analysis of autistic brain reveals convergent molecular pathology. Nature. PMID 21614001</ref>
 
<ref name="Hawrylycz2012">Hawrylycz MJ, Lein ES, Guillozet-Bongaarts AL, Shen EH, Ng L, Miller JA, van de Lagemaat LN, Smith KA, Ebbert A, Riley ZL, Abajian C, Beckmann CF, Bernard A, Bertagnolli D, Boe AF, Cartagena PM, Chakravarty MM, Chapin M, Chong J, Dalley RA, Daly BD, Dang C, Datta S, et al, Koch C, Grant SG, Jones AR (2012) An anatomically comprehensive atlas of the adult human brain transcriptome. Nature. 2012 Sep 20;489(7416):391-9. doi: 10.1038 nature11405. PMID 22996553 [http://www.nature.com/nature/journal/v489/n7416/full/nature11405.html  Nature]</ref>
 
<ref name="Mumford2010">Mumford JA, Horvath S, Oldham MC, Langfelder P, Geschwind DH, Poldrack RA (2010) Detecting network modules in fMRI time series: A weighted network analysis approach. Neuroimage. 2010 Oct 1;52(4):1465-1476. Epub 2010 May 27.PMID 20553896. [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3632300/  PMC]</ref><ref name="Xue2013">Xue Z, Huang K, Cai C, Cai L, Jiang CY, Feng Y, Liu Z, Zeng Q, Cheng L, Sun YE, Liu JY, Horvath S, Fan G. (2013) Genetic programs in human and mouse early embryos revealed by single-cell RNA?sequencing. Nature. 2013 Jul 28. doi: 10.1038/nature12364 PMID 23892778 [http://www.nature.com/nature/journal/v500/n7464/full/nature12364.html  Nature]</ref>
 
<ref name="Shirasaki2012">Shirasaki DI, Greiner ER, Al-Ramahi I, Gray M, Boontheung P, Geschwind DH, Botas J, Coppola G, Horvath S, Loo JA, Yang XW. (2012) Network organization of the huntingtin proteomic interactome in Mammalian brain. Neuron. 2012 Jul 12;75(1):41-57. PMID 22794259 [http://www.cell.com/neuron/retrieve/pii/S0896627312005132  Neuron]</ref>
 
<ref name="Horvath2012aging">Horvath S, Zhang Y, Langfelder P, Kahn RS, Boks MP, van Eijk K, van den Berg LH, Ophoff RA. Aging effects on DNA methylation modules in human brain and blood tissue. Genome Biol. 2012 Oct 3;13(10):R97. PMID 23034122 [http://genomebiology.com/2012/13/10/R97/abstract  Genome Biology]</ref>
 
<ref name="Tong2013">Tong M, Li X, Wegener Parfrey L, Roth B, Ippoliti A, Wei B, Borneman J, McGovern DP, Frank DN, Li E, Horvath S, Knight R, Braun J (2013) A modular organization of the human intestinal mucosal microbiota and its association with inflammatory bowel disease. PLoS One. 2013 Nov 19;8(11):e80702. doi: 10.1371/journal.pone.0080702. PMID 24260458 [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3834335/  PMC]</ref>
}}
 
[[Category:Bioinformatics]]
[[Category:Data mining]]

Latest revision as of 16:52, 2 July 2014

My name is Clarita and I am studying Neuroscience and Psychology at Sao Paulo / Brazil.

Feel free to surf to my homepage Hostgator 1 cent coupon