International Journal of Information Technology & Computer Science ( IJITCS )
In this paper an algorithm for the extraction of patterns in chemical fingerprints is described. As input this algorithm uses a fingerprint representation of the molecule dataset, generating a group of consistent disjoint patterns also represented as binary arrays, which are satisfied by not necessarily disjoint subsets of molecules in the dataset. The algorithm has been completely developed in Java, allowing its integration into free applications of computational chemistry. The algorithm has been tested, and the use of the patterns instead of the original fingerprints has presented an increase in the efficiency in the processes of datasets classification. The results show that it is possible to reconstruct the original fingerprints using the final group of patterns that characterize all the elements of the dataset.
:clustering algorithms, chemical fingerprint, molecular classification
- J. W. Raymond, et al., "Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures," Journal of Molecular Graphics & Modelling, vol. 21, pp. 421-433, 2003.
- S. C. Basak, et al., "A graph-theoretic approach to predicting molecular properties," Mathematical and Computer Modelling, vol. 14, pp. 511-516, 1990.
- P. Guttiérrez Toscano and F. H. C. Marriott, "Unsupervised Classification of Chemical Compounds," Journal of the Royal Statistical Society: Series C (Applied Statistics), vol. 48, pp. 153-163, 1999.
- M. Vogt and J. Bajorath, "Bayesian Screening for Active Compounds in High-dimensional Chemical Spaces Combining Property Descriptors and Molecular Fingerprints," Chemical Biology & Drug Design, vol. 71, pp. 8-14, 2008.
- A. Kumar and M. I. Siddiqi, "Virtual screening against Mycobacterium tuberculosis dihydrofolate reductase: Suggested workflow for compound prioritization using structure interaction fingerprints," Journal of Molecular Graphics and Modelling, vol. 27, pp. 476-488, 2008.
- B. Ivan P, "Generation of molecular graphs based on flexible utilization of the available structural information," Discrete Applied Mathematics, vol. 67, pp. 27-49, 1996.
- R. Raveaux, et al., "A graph matching method and a graph matching distance based on subgraph assignments," Pattern Recognition Letters, vol. 31, pp. 394-406, 2010.
- M. Gary L, "Graph isomorphism, general remarks," Journal of Computer and System Sciences, vol. 18, pp. 128-142, 1979.
- M. Dehmer and F. Emmert-Streib, "Comparing large graphs efficiently by margins of feature vectors," Applied Mathematics and Computation, vol. 188, pp. 1699-1710, 2007.
- G. M. Maggiora, et al., "Looking for buried treasures: The search for new drug leads in large chemical databases," Mathematical and Computer Modelling, vol. 11, pp. 626-629, 1988.
- JChem, ed: version 5.3.7. Chemaxon Ltd, 2010.
- B. Palacios-Bejarano, et al., "An Open Environment to Support the Development of Computational Chemistry Solutions im AIP Conference Proceedings," in AIP Conference Proceedings, 2009, pp. 519-522.
- G. Cerruela-García, et al., "Step-by-step calculation of all maximum common substructures through a constraint satisfaction based algorithm," Journal of Chemical Information and Computer Sciences, vol. 44, pp. 30-41, 2004.
- R. O. Duda, et al., Pattern Classification: John Wiley & Sons, 2000.
- S. Y. Choi, et al., "The development of 3D-QSAR study and recursive partitioning of heterocyclic quinone derivatives with antifungal activity," Bioorganic & Medicinal Chemistry, vol. 14, pp. 1608-1617, 2006.
- A. K. Jain, "Data clustering: 50 years beyond K-means," Pattern Recognition Letters, vol. 31, pp. 651-666, 2010.