分子挖掘

分子探勘(Molecule mining)为使用分子数据挖掘。由于分子可由分子图表示,这与图形挖掘结构化数据挖掘密切相关。主要问题是如何在区分数据实例时表示分子。其中一种方法是化学相似性度量,这在化学信息学领域具有悠久的传统。

计算化学相似性的典型方法是使用化学指纹,但这会导致丢失有关分子拓扑的基础信息。挖掘分子图直接避免了这个问题。反向QSAR问题也适用于矢量映射问题。

编码(分子i,分子j\neq i)

核心方法

最大值共同图形方法(Maximum Common Graph methods)

  • MCS-HSCS[9] (单MCS最高得分普通子结构(HSCS)排名策略)
  • 小分子子图检测器(SMSD)[10]-是一个基于Java的软件库,用于计算小分子之间的最大共同子图(MCS)。这将有助于我们找到两个分子之间的相似性/距离。 MCS也用于通过击打分子来筛选药物化合物,其分享共同的子图(子结构)。[11]

编码(分子i)

分子查询方法

基于神经网络特殊架构的方法

参见

参考文献

  1. ^ 1.0 1.1 H. Kashima, K. Tsuda, A. Inokuchi, Marginalized Kernels Between Labeled Graphs, The 20th International Conference on Machine Learning (ICML2003), 2003. PDF
  2. ^ H. Fröhlich, J. K. Wegner, A. Zell, Optimal Assignment Kernels For Attributed Molecular Graphs, The 22nd International Conference on Machine Learning (ICML 2005), Omnipress, Madison, WI, USA, 2005, 225-232. PDF
  3. ^ H. Fröhlich, J. K. Wegner, A. Zell, Kernel Functions for Attributed Molecular Graphs - A New Similarity Based Approach To ADME Prediction in Classification and Regression, QSAR Comb. Sci., 2006, 25, 317-326. doi:10.1002/qsar.200510135
  4. ^ H. Fröhlich, J. K. Wegner, A. Zell, Assignment Kernels For Chemical Compounds, International Joint Conference on Neural Networks 2005 (IJCNN'05), 2005, 913-918. CiteSeer
  5. ^ 5.0 5.1 P. Mahe, L. Ralaivola, V. Stoven, J. Vert, The pharmacophore kernel for virtual screening with support vector machines, J Chem Inf Model, 2006, 46, 2003-2014. doi:10.1021/ci060138m
  6. ^ P. Mahé, N. Ueda, T. Akutsu, J.-L. Perret and P. Vert, J.-P. Extensions of marginalized graph kernels. Proceedings of the 21st ICML. 2004: 552–559. 
  7. ^ L. Ralaivola, S. J. Swamidass, S. Hiroto and P. Baldi. Graph kernels for chemical informatics. Neural Networks. 2005, 18: 1093–1110 [2017-07-02]. doi:10.1016/j.neunet.2005.07.009. (原始内容存档于2015-09-24). 
  8. ^ P. Mahé and J.-P. Vert. Graph kernels based on tree patterns for molecules. Machine Learning. 2009, 75 (1): 3–35. ISSN 0885-6125. doi:10.1007/s10994-008-5086-2. 
  9. ^ J. K. Wegner, H. Fröhlich, H. Mielenz, A. Zell, Data and Graph Mining in Chemical Space for ADME and Activity Data Sets, QSAR Comb. Sci., 2006, 25, 205-220. doi:10.1002/qsar.200510009
  10. ^ S. A. Rahman, M. Bashton, G. L. Holliday, R. Schrader and J. M. Thornton, Small Molecule Subgraph Detector (SMSD) toolkit, Journal of Cheminformatics 2009, 1:12. doi:10.1186/1758-2946-1-12
  11. ^ 存档副本. [2017-07-02]. (原始内容存档于2020-01-28). 
  12. ^ R. D. King, A. Srinivasan, L. Dehaspe, Wamr: a data mining tool for chemical data, J. Comput.-Aid. Mol. Des., 2001, 15, 173-181. doi:10.1023/A:1008171016861
  13. ^ L. Dehaspe, H. Toivonen, King, Finding frequent substructures in chemical compounds, 4th International Conference on Knowledge Discovery and Data Mining, AAAI Press., 1998, 30-36.
  14. ^ A. Inokuchi, T. Washio, T. Okada, H. Motoda, Applying the Apriori-based Graph Mining Method to Mutagenesis Data Analysis, Journal of Computer Aided Chemistry, 2001, 2, 87-92.
  15. ^ A. Inokuchi, T. Washio, K. Nishimura, H. Motoda, A Fast Algorithm for Mining Frequent Connected Subgraphs, IBM Research, Tokyo Research Laboratory, 2002.
  16. ^ A. Clare, R. D. King, Data mining the yeast genome in a lazy functional language, Practical Aspects of Declarative Languages (PADL2003), 2003.
  17. ^ M. Kuramochi, G. Karypis, An Efficient Algorithm for Discovering Frequent Subgraphs, IEEE Transactions on Knowledge and Data Engineering, 2004, 16(9), 1038-1051.
  18. ^ M. Deshpande, M. Kuramochi, N. Wale, G. Karypis, Frequent Substructure-Based Approaches for Classifying Chemical Compounds, IEEE Transactions on Knowledge and Data Engineering, 2005, 17(8), 1036-1050.
  19. ^ C. Helma, T. Cramer, S. Kramer, L. de Raedt, Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds, J. Chem. Inf. Comput. Sci., 2004, 44, 1402-1411. doi:10.1021/ci034254q
  20. ^ T. Meinl, C. Borgelt, M. R. Berthold, Discriminative Closed Fragment Mining and Perfect Extensions in MoFa, Proceedings of the Second Starting AI Researchers Symposium (STAIRS 2004), 2004.
  21. ^ T. Meinl, C. Borgelt, M. R. Berthold, M. Philippsen, Mining Fragments with Fuzzy Chains in Molecular Databases, Second International Workshop on Mining Graphs, Trees and Sequences (MGTS2004), 2004.
  22. ^ T. Meinl, M. R. Berthold, Hybrid Fragment Mining with MoFa and FSG, Proceedings of the 2004 IEEE Conference on Systems, Man & Cybernetics (SMC2004), 2004.
  23. ^ S. Nijssen, J. N. Kok. Frequent Graph Mining and its Application to Molecular Databases, Proceedings of the 2004 IEEE Conference on Systems, Man & Cybernetics (SMC2004), 2004.
  24. ^ C. Helma, Predictive Toxicology, CRC Press, 2005.
  25. ^ M. Wörlein, Extension and parallelization of a graph-mining-algorithm, Friedrich-Alexander-Universität, 2006. PDF
  26. ^ K. Jahn, S. Kramer, Optimizing gSpan for Molecular Datasets, Proceedings of the Third International Workshop on Mining Graphs, Trees and Sequences (MGTS-2005), 2005.
  27. ^ X. Yan, J. Han, gSpan: Graph-Based Substructure Pattern Mining, Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), IEEE Computer Society, 2002, 721-724.
  28. ^ A. Karwath, L. D. Raedt, SMIREP: predicting chemical activity from SMILES, J Chem Inf Model, 2006, 46, 2432-2444. doi:10.1021/ci060159g
  29. ^ H. Ando, L. Dehaspe, W. Luyten, E. Craenenbroeck, H. Vandecasteele, L. Meervelt, Discovering H-Bonding Rules in Crystals with Inductive Logic Programming, Mol Pharm, 2006, 3, 665-674 . doi:10.1021/mp060034z
  30. ^ P. Mazzatorta, L. Tran, B. Schilter, M. Grigorov, Integration of Structure-Activity Relationship and Artificial Intelligence Systems To Improve in Silico Prediction of Ames Test Mutagenicity, J. Chem. Inf. Model., 2006, ASAP alert. doi:10.1021/ci600411v
  31. ^ N. Wale, G. Karypis. Comparison of Descriptor Spaces for Chemical Compound Retrieval and Classification, ICDM, ''2006, 678-689.
  32. ^ A. Gago Alonso, J.E. Medina Pagola, J.A. Carrasco-Ochoa and J.F. Martínez-Trinidad Mining Connected Subgraph Mining Reducing the Number of Candidates, In Proc. of ECML--PKDD, pp. 365–376, 2008.
  33. ^ Xiaohong Wang, Jun Huan , Aaron Smalter, Gerald Lushington, Application of Kernel Functions for Accurate Similarity Search in Large Chemical Databases , in BMC Bioinformatics Vol. 11 (Suppl 3):S8 2010.
  34. ^ Baskin, I. I.; V. A. Palyulin; N. S. Zefirov. [A methodology for searching direct correlations between structures and properties of organic compounds by using computational neural networks]. Doklady Akademii Nauk SSSR. 1993, 333 (2): 176–179. 
  35. ^ I. I. Baskin, V. A. Palyulin, N. S. Zefirov. A Neural Device for Searching Direct Correlations between Structures and Properties of Organic Compounds. J. Chem. Inf. Comput. Sci. 1997, 37 (4): 715–721. doi:10.1021/ci940128y. 
  36. ^ D. B. Kireev. ChemNet: A Novel Neural Network Based Method for Graph/Property Mapping. J. Chem. Inf. Comput. Sci. 1995, 35 (2): 175–180. doi:10.1021/ci00024a001. 
  37. ^ A. M. Bianucci; Micheli, Alessio; Sperduti, Alessandro; Starita, Antonina. Application of Cascade Correlation Networks for Structures to Chemistry. Applied Intelligence. 2000, 12 (1-2): 117–146. doi:10.1023/A:1008368105614. 
  38. ^ A. Micheli, A. Sperduti, A. Starita, A. M. Bianucci. Analysis of the Internal Representations Developed by Neural Networks for Structures Applied to Quantitative Structure-Activity Relationship Studies of Benzodiazepines. J. Chem. Inf. Comput. Sci. 2001, 41 (1): 202–218. PMID 11206375. doi:10.1021/ci9903399. 
  39. ^ O. Ivanciuc. Molecular Structure Encoding into Artificial Neural Networks Topology. Roumanian Chemical Quarterly Reviews. 2001, 8: 197–220. 
  40. ^ A. Goulon, T. Picot, A. Duprat, G. Dreyfus. Predicting activities without computing descriptors: Graph machines for QSAR. SAR and QSAR in Environmental Research. 2007, 18 (1-2): 141–153. PMID 17365965. doi:10.1080/10629360601054313. 

进一步阅读

  • Schölkopf, B., K. Tsuda and J. P. Vert: Kernel Methods in Computational Biology, MIT Press, Cambridge, MA, 2004.
  • R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, John Wiley & Sons, 2001. ISBN 0-471-05669-3
  • Gusfield, D., Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997ISBN 0-521-58519-8
  • R. Todeschini, V. Consonni, Handbook of Molecular Descriptors, Wiley-VCH, 2000. ISBN 3-527-29913-0

参见

外部链接