|
|
Line 1: |
Line 1: |
| This page describes '''[[Data mining|mining]] for [[molecule]]s'''. Since molecules may be represented by [[molecular graph]]s this is strongly related to [[graph mining]] and [[structured data mining]]. The main problem is how to represent molecules while discriminating the data instances. One way to do this is chemical similarity [[Metric (mathematics)|metrics]], which has a long tradition in the field of [[cheminformatics]].
| | Let me initial begin by introducing myself. My name is Boyd Butts although it is not over the counter std test ([http://www.escuelavirtual.Registraduria.gov.co/user/view.php?id=140944&course=1 Learn More Here]) name on my beginning certificate. One of the very best issues in the globe for me is to do aerobics and now I'm trying to make cash with it. California is our beginning location. I am a meter reader. |
| | |
| Typical approaches to calculate chemical similarities use chemical fingerprints, but this loses the underlying information about the [[topology (chemistry)|molecule topology]]. Mining the molecular graphs directly
| |
| avoids this problem. So does the [[inverse QSAR problem]] which is preferable for vectorial mappings.
| |
| | |
| ==Coding(Molecule<sub>i</sub>,Molecule<sub>j<math>\neq</math>i</sub>)==
| |
| | |
| ===Kernel methods===
| |
| * Marginalized [[graph kernel]]<ref name="kti03">H. Kashima, K. Tsuda, A. Inokuchi, Marginalized Kernels Between Labeled Graphs, The 20th International Conference on Machine Learning (ICML2003), 2003. PDF</ref>
| |
| * Optimal assignment kernel<ref name="fwz05b"> H. Fröhlich, J. K. Wegner, A. Zell, ''Optimal Assignment Kernels For Attributed Molecular Graphs'', The 22nd International Conference on Machine Learning (ICML 2005), Omnipress, Madison, WI, USA, '''2005''', 225-232. PDF</ref><ref name="fwz06"> H. Fröhlich, J. K. Wegner, A. Zell, ''Kernel Functions for Attributed Molecular Graphs - A New Similarity Based Approach To ADME Prediction in Classification and Regression'', QSAR Comb. Sci., '''2006''', 25, 317-326. {{doi|10.1002/qsar.200510135}}</ref><ref name="fwz05a"> H. Fröhlich, J. K. Wegner, A. Zell, ''Assignment Kernels For Chemical Compounds'', International Joint Conference on Neural Networks 2005 (IJCNN'05), '''2005''', 913-918. CiteSeer</ref>
| |
| * Pharmacophore kernel<ref name="mrsv06">P. Mahe, L. Ralaivola, V. Stoven, J. Vert, ''The pharmacophore kernel for virtual screening with support vector machines'', J Chem Inf Model, '''2006''', 46, 2003-2014. {{doi|10.1021/ci060138m}}</ref>
| |
| * [http://www.bioinf.jku.at/software/Rchemcpp/ C++ (and R) implementation] combining
| |
| ** the marginalized graph kernel between labeled graphs<ref name="kti03"/>
| |
| ** extensions of the marginalized kernel<ref name="Mahe2004">{{cite journal |
| |
| author = P. Mahé, N. Ueda, T. Akutsu, J.-L. Perret and P. Vert,
| |
| J.-P. | title = Extensions of marginalized graph kernels | journal = Proceedings of the 21st ICML | year = 2004 | pages = 552–559 }}</ref>
| |
| ** Tanimoto kernels<ref name="Ralaivola2005">{{cite journal | author =L. Ralaivola, S. J. Swamidass, S. Hiroto and P. Baldi| title =Graph kernels for chemical informatics | journal = Neural Networks | year = 2005 | volume = 18 | pages = 1093–1110 | url = http://www.sciencedirect.com/science/article/pii/S0893608005001693 | doi=10.1016/j.neunet.2005.07.009}}</ref>
| |
| ** graph kernels based on tree patterns<ref name="Mahe2009">{{cite journal | author = P. Mahé and J.-P. Vert| title = Graph kernels based on tree patterns for molecules | journal = Machine Learning | volume = 75 | number = 1 | year = 2009 | issn = 0885-6125 |pages = 3–35 | url = http://dx.doi.org/10.1007/s10994-008-5086-2 | doi = 10.1007/s10994-008-5086-2 }}</ref>
| |
| ** kernels based on pharmacophores for 3D structure of molecules<ref name="mrsv06"/>
| |
| | |
| ===Maximum Common Graph methods===
| |
| * [[Maximum common subgraph isomorphism problem|MCS]]-HSCS<ref name="wfmz06"> J. K. Wegner, H. Fröhlich, H. Mielenz, A. Zell, Data and Graph Mining in Chemical Space for ADME and Activity Data Sets, QSAR Comb. Sci., 2006, 25, 205-220. {{doi|10.1002/qsar.200510009}}</ref> (Highest Scoring Common Substructure (HSCS) ranking strategy for single MCS)
| |
| * [[Small Molecule]] Subgraph Detector [[Small Molecule Subgraph Detector|(SMSD)]]<ref name="SMSD09">S. A. Rahman, M. Bashton, G. L. Holliday, R. Schrader and J. M. Thornton, Small Molecule Subgraph Detector (SMSD) toolkit, Journal of Cheminformatics 2009, 1:12. {{doi|10.1186/1758-2946-1-12}}</ref>- is a Java based software library for calculating Maximum Common Subgraph (MCS) between small molecules. This will help us to find similarity/distance between two molecules. MCS is also used for screening drug like compounds by hitting molecules, which share common subgraph (substructure).<ref name="SMSD">http://www.ebi.ac.uk/thornton-srv/software/SMSD/</ref>
| |
| | |
| ==Coding(Molecule<sub>i</sub>)==
| |
| | |
| ===Molecular query methods===
| |
| * Warmr<ref name="ksd01">R. D. King, A. Srinivasan, L. Dehaspe, ''Wamr: a data mining tool for chemical data'', J. Comput.-Aid. Mol. Des., '''2001''', 15, 173-181. {{doi|10.1023/A:1008171016861}}</ref><ref name="dtk98">L. Dehaspe, H. Toivonen, King, ''Finding frequent substructures in chemical compounds'', 4th International Conference on Knowledge Discovery and Data Mining, AAAI Press., 1998, 30-36.</ref>
| |
| * AGM<ref name="iwom01">A. Inokuchi, T. Washio, T. Okada, H. Motoda, ''Applying the Apriori-based Graph Mining Method to Mutagenesis Data Analysis'', Journal of Computer Aided Chemistry, '''2001''', 2, 87-92.</ref><ref name="iwnm02">A. Inokuchi, T. Washio, K. Nishimura, H. Motoda, ''A Fast Algorithm for Mining Frequent Connected Subgraphs'', IBM Research, Tokyo Research Laboratory, '''2002'''.</ref>
| |
| * PolyFARM<ref name="ck03">A. Clare, R. D. King, ''Data mining the yeast genome in a lazy functional language'', Practical Aspects of Declarative Languages (PADL2003), '''2003'''.</ref>
| |
| * FSG<ref name="fsg04">M. Kuramochi, G. Karypis, ''An Efficient Algorithm for Discovering Frequent Subgraphs'', IEEE Transactions on Knowledge and Data Engineering, '''2004''', 16(9), 1038-1051.</ref><ref name="fsg05">M. Deshpande, M. Kuramochi, N. Wale, G. Karypis, ''Frequent Substructure-Based Approaches for Classifying Chemical Compounds'', IEEE Transactions on Knowledge and Data Engineering, '''2005''', 17(8), 1036-1050.</ref>
| |
| * MolFea<ref name="hckr04"> C. Helma, T. Cramer, S. Kramer, L. de Raedt, ''Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds'', J. Chem. Inf. Comput. Sci., '''2004''', 44, 1402-1411. {{doi|10.1021/ci034254q}}</ref>
| |
| * MoFa/MoSS<ref name="mbb04">T. Meinl, C. Borgelt, M. R. Berthold, ''Discriminative Closed Fragment Mining and Perfect Extensions in MoFa'', Proceedings of the Second Starting AI Researchers Symposium (STAIRS 2004), '''2004'''.</ref><ref name="mbbp04">T. Meinl, C. Borgelt, M. R. Berthold, M. Philippsen, ''Mining Fragments with Fuzzy Chains in Molecular Databases'', Second International Workshop on Mining Graphs, Trees and Sequences (MGTS2004), '''2004'''.</ref><ref name="mb04">T. Meinl, M. R. Berthold, ''Hybrid Fragment Mining with MoFa and FSG'', Proceedings of the 2004 IEEE Conference on Systems, Man & Cybernetics (SMC2004), '''2004'''.</ref>
| |
| * Gaston<ref name="nk04">S. Nijssen, J. N. Kok. ''Frequent Graph Mining and its Application to Molecular Databases'', Proceedings of the 2004 IEEE Conference on Systems, Man & Cybernetics (SMC2004), '''2004'''.</ref>
| |
| * LAZAR<ref name="hel05">C. Helma, Predictive Toxicology, CRC Press, '''2005'''.</ref>
| |
| * ParMol<ref name="woe06">M. Wörlein, ''Extension and parallelization of a graph-mining-algorithm'', Friedrich-Alexander-Universität, '''2006'''. PDF</ref> (contains MoFa, FFSM, gSpan, and Gaston)
| |
| * optimized gSpan<ref name="jk05">K. Jahn, S. Kramer, ''Optimizing gSpan for Molecular Datasets'', Proceedings of the Third International Workshop on Mining Graphs, Trees and Sequences (MGTS-2005), '''2005'''.</ref><ref name="yh02a">X. Yan, J. Han, ''gSpan: Graph-Based Substructure Pattern Mining'', Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), IEEE Computer Society, '''2002''', 721-724.</ref>
| |
| * SMIREP<ref name="kr06">A. Karwath, L. D. Raedt, ''SMIREP: predicting chemical activity from SMILES'', J Chem Inf Model, '''2006''', 46, 2432-2444. {{doi|10.1021/ci060159g}}</ref>
| |
| * DMax<ref name="ahlcvm06">H. Ando, L. Dehaspe, W. Luyten, E. Craenenbroeck, H. Vandecasteele, L. Meervelt, ''Discovering H-Bonding Rules in Crystals with Inductive Logic Programming'', Mol Pharm, '''2006''', 3, 665-674 . {{doi|10.1021/mp060034z}}</ref>
| |
| * SAm/AIm/RHC<ref name="mtsg06">P. Mazzatorta, L. Tran, B. Schilter, M. Grigorov, ''Integration of Structure-Activity Relationship and Artificial Intelligence Systems To Improve in Silico Prediction of Ames Test Mutagenicity'', J. Chem. Inf. Model., '''2006''', ''ASAP alert''. {{doi|10.1021/ci600411v}}</ref>
| |
| * AFGen<ref name="afgen06">N. Wale, G. Karypis. ''Comparison of Descriptor Spaces for Chemical Compound Retrieval and Classification'', ICDM, ''''2006'', 678-689.</ref>
| |
| * gRed<ref name="gago08">A. Gago Alonso, J.E. Medina Pagola, J.A. Carrasco-Ochoa and J.F. Martínez-Trinidad ''Mining Connected Subgraph Mining Reducing the Number of Candidates'', In Proc. of ECML--PKDD, pp. 365–376, ''2008''.</ref>
| |
| * G-Hash<ref name="wang10">Xiaohong Wang, Jun Huan , Aaron Smalter, Gerald Lushington, ''Application of Kernel Functions for Accurate Similarity Search in Large Chemical Databases '', in BMC Bioinformatics Vol. 11 (Suppl 3):S8 ''2010''.</ref>
| |
| | |
| ===Methods based on special architectures of neural networks===
| |
| * BPZ<ref>{{cite journal | last = Baskin | first = I. I. | coauthors = V. A. Palyulin and N. S. Zefirov | title = [A methodology for searching direct correlations between structures and properties of organic compounds by using computational neural networks] | journal = [[Doklady Akademii Nauk SSSR]] | year = 1993 | volume = 333 | issue = 2 | pages = 176–179}}</ref><ref>{{cite journal | author = I. I. Baskin, V. A. Palyulin, N. S. Zefirov | title = A Neural Device for Searching Direct Correlations between Structures and Properties of Organic Compounds | journal = J. Chem. Inf. Comput. Sci. | year = 1997 | volume = 37 | issue = 4 | pages = 715–721 | doi = 10.1021/ci940128y}}</ref>
| |
| * ChemNet<ref>{{cite journal | author = D. B. Kireev | title = ChemNet: A Novel Neural Network Based Method for Graph/Property Mapping | journal = J. Chem. Inf. Comput. Sci. | year = 1995 | volume = 35 | issue = 2 | pages = 175–180 | doi = 10.1021/ci00024a001}}</ref>
| |
| * CCS<ref>{{cite journal | doi = 10.1023/A:1008368105614 | author = A. M. Bianucci | last2 = Micheli | first2 = Alessio | last3 = Sperduti | first3 = Alessandro | last4 = Starita | first4 = Antonina | title = Application of Cascade Correlation Networks for Structures to Chemistry | journal = Applied Intelligence | year = 2000 | volume = 12 | issue = 1-2 | pages = 117–146}}</ref><ref>{{cite journal | author = A. Micheli, A. Sperduti, A. Starita, A. M. Bianucci | title = Analysis of the Internal Representations Developed by Neural Networks for Structures Applied to Quantitative Structure-Activity Relationship Studies of Benzodiazepines | journal = J. Chem. Inf. Comput. Sci. | year = 2001 | volume = 41 | issue = 1 | pages = 202–218 | doi = 10.1021/ci9903399 | pmid = 11206375}}</ref>
| |
| * MolNet<ref>{{cite journal | author = O. Ivanciuc | title = Molecular Structure Encoding into Artificial Neural Networks Topology | journal = Roumanian Chemical Quarterly Reviews | year = 2001 | volume = 8 | pages = 197–220}}</ref>
| |
| * Graph machines<ref>{{cite journal | author = A. Goulon, T. Picot, A. Duprat, G. Dreyfus | title = Predicting activities without computing descriptors: Graph machines for QSAR | journal = SAR and QSAR in Environmental Research | year = 2007 | volume = 18 | issue = 1-2 | pages = 141–153 | doi = 10.1080/10629360601054313 | pmid = 17365965}}</ref>
| |
| | |
| == See also ==
| |
| * [[Molecular Query Language]]
| |
| * [[Chemical graph theory]]
| |
| | |
| ==References==
| |
| {{reflist|2}}
| |
| | |
| ===Further reading===
| |
| * Schölkopf, B., K. Tsuda and J. P. Vert: ''Kernel Methods in Computational Biology'', MIT Press, Cambridge, MA, '''2004'''.
| |
| * R.O. Duda, P.E. Hart, D.G. Stork, ''Pattern Classification'', John Wiley & Sons, '''2001'''. ISBN 0-471-05669-3
| |
| * Gusfield, D., ''Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology'', Cambridge University Press, '''1997'''. ISBN 0-521-58519-8
| |
| * R. Todeschini, V. Consonni, ''Handbook of Molecular Descriptors'', Wiley-VCH, '''2000'''. ISBN 3-527-29913-0
| |
| | |
| ==See also==
| |
| *[[QSAR]]
| |
| *[[ADME]]
| |
| *[[partition coefficient]]
| |
| | |
| ==External links==
| |
| * [http://www.ebi.ac.uk/thornton-srv/software/SMSD/ Small Molecule Subgraph Detector (SMSD)] - is a Java based software library for calculating Maximum Common Subgraph (MCS) between small molecules.
| |
| * [http://mlg07.dsi.unifi.it 5th International Workshop on Mining and Learning with Graphs, 2007]
| |
| * [http://miningdrugs.blogspot.com/2007/01/molecule-mining-review-2006.html Overview for 2006]
| |
| * [http://hms.liacs.nl/molecules.html Molecule mining (basic chemical expert systems)]
| |
| * [http://www2.informatik.uni-erlangen.de/Forschung/Projekte/ParMol/?language=en ParMol] and [http://www2.informatik.uni-erlangen.de/Lehre/SA-DA/download/DA-simawoer.pdf?language=en master thesis documentation] - Java - Open source - Distributed mining - Benchmark algorithm library
| |
| * [http://wwwkramer.in.tum.de/ TU München - Kramer group]
| |
| * [http://joelib.sourceforge.net/wiki/index.php/Structured_Data_Mining Molecule mining (advanced chemical expert systems)]
| |
| * [http://www.pharmadm.com/DMaxChemistryAssistant.asp DMax Chemistry Assistant] - commercial software
| |
| * [http://glaros.dtc.umn.edu/gkhome/afgen/overview AFGen] - Software for generating fragment-based descriptors
| |
| | |
| [[Category:Cheminformatics]]
| |
| [[Category:Computational chemistry]]
| |
| [[Category:Data mining]]
| |