分子探勘
分子探勘(Molecule mining)為使用分子的資料探勘。由於分子可由分子圖表示,這與圖形挖掘和結構化數據挖掘密切相關。主要問題是如何在區分數據實例時表示分子。其中一種方法是化學相似性度量,這在化學資訊學領域具有悠久的傳統。
計算化學相似性的典型方法是使用化學指紋,但這會導致丟失有關分子拓撲的基礎資訊。挖掘分子圖直接避免了這個問題。反向QSAR問題也適用於向量對映問題。
編碼(分子i,分子j\neq i)
核心方法
最大值共同圖形方法(Maximum Common Graph methods)
編碼(分子i)
分子查詢方法
- Warmr[12][13]
- AGM[14][15]
- PolyFARM[16]
- FSG[17][18]
- MolFea[19]
- MoFa/MoSS[20][21][22]
- Gaston[23]
- LAZAR[24]
- ParMol[25] (包括 MoFa, FFSM, gSpan 和 Gaston)
- optimized gSpan[26][27]
- SMIREP[28]
- DMax[29]
- SAm/AIm/RHC[30]
- AFGen[31]
- gRed[32]
- G-Hash[33]
基於神經網路特殊架構的方法
參見
參考文獻
- ^ 1.0 1.1 H. Kashima, K. Tsuda, A. Inokuchi, Marginalized Kernels Between Labeled Graphs, The 20th International Conference on Machine Learning (ICML2003), 2003. PDF
- ^ H. Fröhlich, J. K. Wegner, A. Zell, Optimal Assignment Kernels For Attributed Molecular Graphs, The 22nd International Conference on Machine Learning (ICML 2005), Omnipress, Madison, WI, USA, 2005, 225-232. PDF
- ^ H. Fröhlich, J. K. Wegner, A. Zell, Kernel Functions for Attributed Molecular Graphs - A New Similarity Based Approach To ADME Prediction in Classification and Regression, QSAR Comb. Sci., 2006, 25, 317-326. doi:10.1002/qsar.200510135
- ^ H. Fröhlich, J. K. Wegner, A. Zell, Assignment Kernels For Chemical Compounds, International Joint Conference on Neural Networks 2005 (IJCNN'05), 2005, 913-918. CiteSeer
- ^ 5.0 5.1 P. Mahe, L. Ralaivola, V. Stoven, J. Vert, The pharmacophore kernel for virtual screening with support vector machines, J Chem Inf Model, 2006, 46, 2003-2014. doi:10.1021/ci060138m
- ^ P. Mahé, N. Ueda, T. Akutsu, J.-L. Perret and P. Vert, J.-P. Extensions of marginalized graph kernels. Proceedings of the 21st ICML. 2004: 552–559.
- ^ L. Ralaivola, S. J. Swamidass, S. Hiroto and P. Baldi. Graph kernels for chemical informatics. Neural Networks. 2005, 18: 1093–1110 [2017-07-02]. doi:10.1016/j.neunet.2005.07.009. (原始內容存檔於2015-09-24).
- ^ P. Mahé and J.-P. Vert. Graph kernels based on tree patterns for molecules. Machine Learning. 2009, 75 (1): 3–35. ISSN 0885-6125. doi:10.1007/s10994-008-5086-2.
- ^ J. K. Wegner, H. Fröhlich, H. Mielenz, A. Zell, Data and Graph Mining in Chemical Space for ADME and Activity Data Sets, QSAR Comb. Sci., 2006, 25, 205-220. doi:10.1002/qsar.200510009
- ^ S. A. Rahman, M. Bashton, G. L. Holliday, R. Schrader and J. M. Thornton, Small Molecule Subgraph Detector (SMSD) toolkit, Journal of Cheminformatics 2009, 1:12. doi:10.1186/1758-2946-1-12
- ^ 存档副本. [2017-07-02]. (原始內容存檔於2020-01-28).
- ^ R. D. King, A. Srinivasan, L. Dehaspe, Wamr: a data mining tool for chemical data, J. Comput.-Aid. Mol. Des., 2001, 15, 173-181. doi:10.1023/A:1008171016861
- ^ L. Dehaspe, H. Toivonen, King, Finding frequent substructures in chemical compounds, 4th International Conference on Knowledge Discovery and Data Mining, AAAI Press., 1998, 30-36.
- ^ A. Inokuchi, T. Washio, T. Okada, H. Motoda, Applying the Apriori-based Graph Mining Method to Mutagenesis Data Analysis, Journal of Computer Aided Chemistry, 2001, 2, 87-92.
- ^ A. Inokuchi, T. Washio, K. Nishimura, H. Motoda, A Fast Algorithm for Mining Frequent Connected Subgraphs, IBM Research, Tokyo Research Laboratory, 2002.
- ^ A. Clare, R. D. King, Data mining the yeast genome in a lazy functional language, Practical Aspects of Declarative Languages (PADL2003), 2003.
- ^ M. Kuramochi, G. Karypis, An Efficient Algorithm for Discovering Frequent Subgraphs, IEEE Transactions on Knowledge and Data Engineering, 2004, 16(9), 1038-1051.
- ^ M. Deshpande, M. Kuramochi, N. Wale, G. Karypis, Frequent Substructure-Based Approaches for Classifying Chemical Compounds, IEEE Transactions on Knowledge and Data Engineering, 2005, 17(8), 1036-1050.
- ^ C. Helma, T. Cramer, S. Kramer, L. de Raedt, Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds, J. Chem. Inf. Comput. Sci., 2004, 44, 1402-1411. doi:10.1021/ci034254q
- ^ T. Meinl, C. Borgelt, M. R. Berthold, Discriminative Closed Fragment Mining and Perfect Extensions in MoFa, Proceedings of the Second Starting AI Researchers Symposium (STAIRS 2004), 2004.
- ^ T. Meinl, C. Borgelt, M. R. Berthold, M. Philippsen, Mining Fragments with Fuzzy Chains in Molecular Databases, Second International Workshop on Mining Graphs, Trees and Sequences (MGTS2004), 2004.
- ^ T. Meinl, M. R. Berthold, Hybrid Fragment Mining with MoFa and FSG, Proceedings of the 2004 IEEE Conference on Systems, Man & Cybernetics (SMC2004), 2004.
- ^ S. Nijssen, J. N. Kok. Frequent Graph Mining and its Application to Molecular Databases, Proceedings of the 2004 IEEE Conference on Systems, Man & Cybernetics (SMC2004), 2004.
- ^ C. Helma, Predictive Toxicology, CRC Press, 2005.
- ^ M. Wörlein, Extension and parallelization of a graph-mining-algorithm, Friedrich-Alexander-Universität, 2006. PDF
- ^ K. Jahn, S. Kramer, Optimizing gSpan for Molecular Datasets, Proceedings of the Third International Workshop on Mining Graphs, Trees and Sequences (MGTS-2005), 2005.
- ^ X. Yan, J. Han, gSpan: Graph-Based Substructure Pattern Mining, Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), IEEE Computer Society, 2002, 721-724.
- ^ A. Karwath, L. D. Raedt, SMIREP: predicting chemical activity from SMILES, J Chem Inf Model, 2006, 46, 2432-2444. doi:10.1021/ci060159g
- ^ H. Ando, L. Dehaspe, W. Luyten, E. Craenenbroeck, H. Vandecasteele, L. Meervelt, Discovering H-Bonding Rules in Crystals with Inductive Logic Programming, Mol Pharm, 2006, 3, 665-674 . doi:10.1021/mp060034z
- ^ P. Mazzatorta, L. Tran, B. Schilter, M. Grigorov, Integration of Structure-Activity Relationship and Artificial Intelligence Systems To Improve in Silico Prediction of Ames Test Mutagenicity, J. Chem. Inf. Model., 2006, ASAP alert. doi:10.1021/ci600411v
- ^ N. Wale, G. Karypis. Comparison of Descriptor Spaces for Chemical Compound Retrieval and Classification, ICDM, ''2006, 678-689.
- ^ A. Gago Alonso, J.E. Medina Pagola, J.A. Carrasco-Ochoa and J.F. Martínez-Trinidad Mining Connected Subgraph Mining Reducing the Number of Candidates, In Proc. of ECML--PKDD, pp. 365–376, 2008.
- ^ Xiaohong Wang, Jun Huan , Aaron Smalter, Gerald Lushington, Application of Kernel Functions for Accurate Similarity Search in Large Chemical Databases , in BMC Bioinformatics Vol. 11 (Suppl 3):S8 2010.
- ^ Baskin, I. I.; V. A. Palyulin; N. S. Zefirov. [A methodology for searching direct correlations between structures and properties of organic compounds by using computational neural networks]. Doklady Akademii Nauk SSSR. 1993, 333 (2): 176–179.
- ^ I. I. Baskin, V. A. Palyulin, N. S. Zefirov. A Neural Device for Searching Direct Correlations between Structures and Properties of Organic Compounds. J. Chem. Inf. Comput. Sci. 1997, 37 (4): 715–721. doi:10.1021/ci940128y.
- ^ D. B. Kireev. ChemNet: A Novel Neural Network Based Method for Graph/Property Mapping. J. Chem. Inf. Comput. Sci. 1995, 35 (2): 175–180. doi:10.1021/ci00024a001.
- ^ A. M. Bianucci; Micheli, Alessio; Sperduti, Alessandro; Starita, Antonina. Application of Cascade Correlation Networks for Structures to Chemistry. Applied Intelligence. 2000, 12 (1-2): 117–146. doi:10.1023/A:1008368105614.
- ^ A. Micheli, A. Sperduti, A. Starita, A. M. Bianucci. Analysis of the Internal Representations Developed by Neural Networks for Structures Applied to Quantitative Structure-Activity Relationship Studies of Benzodiazepines. J. Chem. Inf. Comput. Sci. 2001, 41 (1): 202–218. PMID 11206375. doi:10.1021/ci9903399.
- ^ O. Ivanciuc. Molecular Structure Encoding into Artificial Neural Networks Topology. Roumanian Chemical Quarterly Reviews. 2001, 8: 197–220.
- ^ A. Goulon, T. Picot, A. Duprat, G. Dreyfus. Predicting activities without computing descriptors: Graph machines for QSAR. SAR and QSAR in Environmental Research. 2007, 18 (1-2): 141–153. PMID 17365965. doi:10.1080/10629360601054313.
進一步閱讀
- Schölkopf, B., K. Tsuda and J. P. Vert: Kernel Methods in Computational Biology, MIT Press, Cambridge, MA, 2004.
- R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, John Wiley & Sons, 2001. ISBN 0-471-05669-3
- Gusfield, D., Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997。 ISBN 0-521-58519-8
- R. Todeschini, V. Consonni, Handbook of Molecular Descriptors, Wiley-VCH, 2000. ISBN 3-527-29913-0
參見
外部連結
- 小分子子圖檢測器(SMSD) (頁面存檔備份,存於網際網路檔案館) - 是一個基於Java的軟體庫,用於計算小分子之間的最大共同子圖(MCS)。
- 2007年第五屆國際挖掘與學習研討會 (頁面存檔備份,存於網際網路檔案館)
- 2006年概覽 (頁面存檔備份,存於網際網路檔案館)
- 分子開採(基礎化學專家系統)
- ParMol 和 碩士論文文件(頁面存檔備份,存於網際網路檔案館) - Java - 開源 - 分佈式挖掘 - 基準演算法庫
- TU慕尼黑 - 克萊默集團
- 分子採礦(高級化學專家系統)
- DMax化學助理 -商業軟體
- AFGen (頁面存檔備份,存於網際網路檔案館) -用於生成基於片段的描述符的軟體