Volume 5 , Issue 1 , PP: 08-14, 2024 | Cite this article as | XML | Html | PDF | Full Length Article
Mahmoud M. Ismail 1 * , Mahmoud M. Ibrahim 2 * , Shereen Zaki 3
Doi: https://doi.org/10.54216/IJAACI.050101
The cornerstone of crucial biochemical processes is enzymes, and this requires a need for precise detection methods to understand it all together and be able to intervene. This paper provides an innovative framework that addresses the problem of multi-label detection of enzyme-substrate interactions based on multi-label fusion. To overcome the limitations of traditional single-label detection approaches, our methodology combines several different data types and gradient boosting classifiers with CatBoost and AdaBoost classifiers as an ensemble. Our aim is to overcome the limitations of traditional single-label detection methods by integrating several data modalities and using a combination of Gradient Boosting, AdaBoost, and CatBoost classifiers. By means of comprehensive molecular descriptor analysis, clustering results, and model performance metrics visualization we demonstrate the intricate landscape of enzyme-substrate interactions in our research. Visualization techniques provide insights into the important molecular characteristics that influence the classes of enzymes while cluster analysis reveals inherent groupings within the dataset. The approach also employs confusion matrices to illustrate how well the model has been classified which supports the success of this framework. This method pushes forward multi-label information fusion as well as grounds for untangling biochemical complexities promising transformative applications across various scientific fields.
Enzyme substrate detection , information fusion , Machine learning , Biochemical analysis Computational biology , Signal processing fusion , Pattern recognition.
[1] Zou, Z., Tian, S., Gao, X., & Li, Y. (2019). mlDEEPre: multi-functional enzyme function prediction with hierarchical multi-label deep learning. Frontiers in genetics, 9, 714.
[2] Liu, X., Yang, H., Ai, C., Ding, Y., Guo, F., & Tang, J. (2023). MVML-MPI: Multi-View Multi-Label Learning for Metabolic Pathway Inference. Briefings in Bioinformatics, 24(6), bbad393.
[3] Wang, X., Zhu, X., Ye, M., Wang, Y., Li, C. D., Xiong, Y., & Wei, D. Q. (2019). STS-NLSP: a network-based label space partition method for predicting the specificity of membrane transporter substrates using a hybrid feature of structural and semantic similarity. Frontiers in bioengineering and biotechnology, 7, 306.
[4] Liu, X., Yang, H., Ai, C., Ding, Y., Guo, F., & Tang, J. (2023). MVML-MPI: Multi-View Multi-Label Learning for Metabolic Pathway Inference. Briefings in Bioinformatics, 24(6), bbad393.
[5] He, J., Gu, H., & Liu, W. (2012). Imbalanced multi-modal multi-label learning for subcellular localization prediction of human proteins with both single and multiple sites. PloS one, 7(6), e37155.
[6] Chu, Y., Shan, X., Salahub, D. R., Xiong, Y., & Wei, D. Q. (2020). Predicting drug-target interactions using multi-label learning with community detection method (DTI-MLCD). bioRxiv, 2020-05.
[7] Xiao, X., Cheng, X., Chen, G., Mao, Q. I., & Chou, K. C. (2019). pLoc_bal-mVirus: predict subcellular localization of multi-label virus proteins by Chou's general PseAAC and IHTS treatment to balance training dataset. Medicinal Chemistry, 15(5), 496-509.
[8] Rana, Pratip, Carter Berry, Preetam Ghosh, and Stephen S. Fong. "Recent advances on constraint-based models by integrating machine learning." Current Opinion in Biotechnology 64 (2020): 85-91.
[9] Wang, H., Huang, M., & Zhu, X. (2008, December). A generative probabilistic model for multi-label classification. In 2008 Eighth IEEE International Conference on Data Mining (pp. 628-637). IEEE.
[10] Jia, W., Peng, J., Zhang, Y., Zhu, J., Qiang, X., Zhang, R., & Shi, L. (2023). Exploring novel ANGICon-EIPs through ameliorated peptidomics techniques: Can deep learning strategies as a core breakthrough in peptide structure and function prediction?. Food Research International, 113640.
[11] Hu, F., Wang, L., Hu, Y., Wang, D., Wang, W., Jiang, J., ... & Yin, P. (2021). A novel framework integrating AI model and enzymological experiments promotes identification of SARS-CoV-2 3CL protease inhibitors and activity-based probe. Briefings in bioinformatics, 22(6), bbab301.
[12] Shi, Z., Yuan, Q., Wang, R., Li, H., Liao, X., & Ma, H. (2022). ECRECer: Enzyme Commission Number Recommendation and Benchmarking based on Multiagent Dual-core Learning. arXiv preprint arXiv:2202.03632.
[13] Wang, Y. G., Huang, S. Y., Wang, L. N., Zhou, Z. Y., & Qiu, J. D. (2020). Accurate prediction of species-specific 2-hydroxyisobutyrylation sites based on machine learning frameworks. Analytical biochemistry, 602, 113793.
[14] Dong, J., Li, Z., Wang, Y., Jin, M., Shen, Y., Xu, Z., ... & Wang, H. (2021). Generation of functional single-chain fragment variable from hybridoma and development of chemiluminescence enzyme immunoassay for determination of total malachite green in tilapia fish. Food chemistry, 337, 127780.
[15] Invergo, B. M. (2022). Accurate, high-coverage assignment of in vivo protein kinases to phosphosites from in vitro phosphoproteomic specificity data. PLoS Computational Biology, 18(5), e1010110.