Volume 19 , Issue 1 , PP: 57-74, 2025 | Cite this article as | XML | Html | PDF | Full Length Article
Hemalatha Dendukuri 1 , Kachapuram Basava Raju 2 , S. Phani Praveen 3 * , Janjhyam V. Naga Ramesh 4 , Vahiduddin Shariff 5 , N. S. Koti Mani Kumar Tirumanadham 6 *
Doi: https://doi.org/10.54216/FPA.190106
This study proposes the novel machine learning concepts to enhance both prediction accuracy of diabetes detection and interpretation of diagnostic models. First, the methodology uses multiple imputations by chained equations (MICE) to complete data before analysis through missing data imputation procedures. The class imbalance problem is solved through the implementation of Synthetic Minority Over-sampling Technique (SMOTE). The Interquartile Range (IQR) outlier detection method helps remove outliers because it enhances model robustness. The hybrid RFE-WWO selection process combines Recursive Feature Elimination (RFE) with Water Wave optimization (WWO) to select important features that strike the right balance between model complexity and prediction accuracy. The HFM framework contains the Hybrid Fusion Model as its essential component, which merges AdaBoost's and CatBoost's most favorable aspects. The hyperparameter optimization with TPE leads to model tuning which reaches a prediction accuracy of 97.84% through the application of Tree-Structured Parzen Estimator. The entire approach delivers enhanced accuracy and it improves precision along with recall metrics and F1 score performance of the predictive model. The framework shows significant potential for early diagnosis by merging these advanced techniques since ensemble methods are essential for healthcare data analysis while accurate interpretable models are vital to create dependable diagnostic tools.
Healthcare , AdaBoost, CatBoost , hyperparameter optimization , Water Wave optimization (WWO) Synthetic Minority Over-sampling Technique (SMOTE) , Machine learning (ML)
[1] Olorunfemi, B.O., Ogunde, A.O., Almogren, A. et al. Efficient diagnosis of diabetes mellitus using an improved ensemble method. Sci Rep 15, 3235 (2025). https://doi.org/10.1038/s41598-025-87767-1.
[2] S. Sasidharan Pillai and K. Millington, “Co-existence of Type 1 Diabetes Mellitus and Myasthenia Gravis: A Case Report and Review of the Literature,” AACE Clinical Case Reports, vol. 10, no. 2, pp. 52–54, Mar. 2024, doi: 10.1016/j.aace.2023.12.004.
[3] N. Nisha Nadhira Nazirun et al., "Prediction Models for Type 2 Diabetes Progression: A Systematic Review," in IEEE Access, vol. 12, pp. 161595-161619, 2024, doi: 10.1109/ACCESS.2024.3432118.
[4] S. Konda, C. Goswami, S. J, R. K, R. Yajjala and N. S. Koti Mani Kumar Tirumanadham, "Optimizing Diabetes Prediction: A Comparative Analysis of Ensemble Machine Learning Models with PSO-AdaBoost and ACO-XGBoost," 2023 International Conference on Sustainable Communication Networks and Application (ICSCNA), Theni, India, 2023, pp. 1025-1031, doi: 10.1109/ICSCNA58489.2023.10370452.
[5] Chang, V., Bailey, J., Xu, Q.A. et al. Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms. Neural Comput & Applic 35, 16157–16173 (2023). https://doi.org/10.1007/s00521-022-07049-z
[6] F. Alaa Khaleel and A. M. Al-Bakry, “Diagnosis of diabetes using machine learning algorithms,” Materials Today: Proceedings, vol. 80, pp. 3200–3203, 2023, doi: 10.1016/j.matpr.2021.07.196.
[7] S. Prasanth, K. Banujan, and K. Btgs, "Hyper Parameter Tuned Ensemble Approach for Gestational Diabetes Prediction," in 2021 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT), Zallaq, Bahrain, 2021, pp. 18-23, doi: 10.1109/3ICT53449.2021.9581926.
[8] D. C. E. Saputra, A. Ma'arif, and K. Sunat, "Optimizing Predictive Performance: Hyperparameter Tuning in Stacked Multi-Kernel Support Vector Machine Random Forest Models for Diabetes Identification," Journal of Robotics and Control (JRC), vol. 4, no. 6, pp. 896-904, 2024, doi: 10.18196/jrc.v4i6.20898.
[9] Md. Maniruzzaman, Md. J. Rahman, B. Ahammed, and Md. M. Abedin, “Classification and prediction of diabetes disease using machine learning paradigm,” Health Inf Sci Syst, vol. 8, no. 1, p. 7, Dec. 2020, doi: 10.1007/s13755-019-0095-z.
[10] “Diabetes Dataset,” Kaggle, Aug. 05, 2020. https://www.kaggle.com/datasets/mathchi/diabetes-data-set
[11] Porsolt RD, Bertin A, Jalfre M. Behavioral despair in mice: a primary screening test for antidepressants. Archives Internationales de Pharmacodynamie et de Therapie. 1977 Oct;229(2):327-336. PMID: 596982.
[12] DONEPUDI, S., SIRISHA, G., & PAPPULA MADHAVI, S. P. (2024). OPTIMIZING DIABETES DIAGNOSIS: ADGB WITH HYPERBAND FOR ENHANCED PREDICTIVE ACCURACY. Journal of Theoretical and Applied Information Technology, 102(23).
[13] R. Swami, M. Dave, and V. Ranga, “IQR-based approach for DDoS detection and mitigation in SDN,” Defence Technology, vol. 25, pp. 76–87, Oct. 2022, doi: 10.1016/j.dt.2022.10.006. Available: https://doi.org/10.1016/j.dt.2022.10.006
[14] H. Sanz, C. Valim, E. Vegas, J. M. Oller, and F. Reverter, “SVM-RFE: selection and visualization of the most relevant features through non-linear kernels,” BMC Bioinformatics, vol. 19, no. 1, Nov. 2018, doi: 10.1186/s12859-018-2451-4. Available: https://doi.org/10.1186/s12859-018-2451-4
[15] Voddi, S., Sirisha, U., Praveen, S. P., Pandraju, T. K. S., Al-Dmour, N. A., & Islam, S. (2024, December). Hybrid CNN-GCN Model for Tumor Classification: Integrating Spatial Relationships in Medical Imaging. In 2024 International Conference on Decision Aid Sciences and Applications (DASA) (pp. 1-6). IEEE.
[16] N. S. K. M. K. Tirumanadham, T. S, and S. M, “Evaluating Boosting Algorithms for Academic Performance Prediction in E-Learning Environments,” 2024 International Conference on Intelligent and Innovative Technologies in Computing, Electrical and Electronics (IITCEE), pp. 1–8, Jan. 2024, doi: 10.1109/iitcee59897.2024.10467968. Available: https://doi.org/10.1109/iitcee59897.2024.10467968
[17] J. T. Hancock and T. M. Khoshgoftaar, “CatBoost for big data: an interdisciplinary review,” Journal of Big Data, vol. 7, no. 1, Nov. 2020, doi: 10.1186/s40537-020-00369-8. Available: https://doi.org/10.1186/s40537-020-00369-8
[18] H.-P. Nguyen, J. Liu, and E. Zio, “A long-term prediction approach based on long short-term memory neural networks with automatic parameter optimization by Tree-structured Parzen Estimator and applied to time-series data of NPP steam generators,” Applied Soft Computing, vol. 89, p. 106116, Jan. 2020, doi: 10.1016/j.asoc.2020.106116. Available: https://doi.org/10.1016/j.asoc.2020.106116
[19] Y. Zhou, J. Zhang, X. Yang, and Y. Ling, “Optimal reactive power dispatch using water wave optimization algorithm,” Operational Research, vol. 20, no. 4, pp. 2537–2553, Aug. 2018, doi: 10.1007/s12351-018-0420-3. Available: https://doi.org/10.1007/s12351-018-0420-3
[20] Swaroop, C. R. et al. Optimizing diabetes prediction through Intelligent feature selection: a comparative analysis of Grey Wolf Optimization with AdaBoost and Ant Colony Optimization with XGBoost. In Algorithms in Advanced Artificial Intelligence: ICAAAI-2023. 8, 311 (2024).
[21] Praveen, S. P., Saripudi, V., Harshalokh, V., Sohitha, T., Karthik, S. V. S., & Sreekar, T. V. P. S. (2023, December). Diabetes Prediction with Ensemble Learning Techniques in Machine Learning. In 2023 2nd International Conference on Automation, Computing and Renewable Systems (ICACRS) (pp. 1082-1089). IEEE.
[22] M. Mukherjee and M. Khushi, “SMOTE-ENC: a novel SMOTE-Based method to generate synthetic data for nominal and continuous features,” Applied System Innovation, vol. 4, no. 1, p. 18, Mar. 2021, doi: 10.3390/asi4010018. Available: https://doi.org/10.3390/asi4010018