ASPG Menu
search

American Scientific Publishing Group

verified Journal

Fusion: Practice and Applications

ISSN
Online: 2692-4048 Print: 2770-0070
Frequency

Continuous publication

Publication Model

Open access · Articles freely available online · APC applies after acceptance

Fusion: Practice and Applications
Full Length Article

Volume 19Issue 2PP: 194-210 • 2025

Over-Under Sampling Approach with Adaptive Synthetic and Tomek Links Methods to Handle Data Imbalance in Sentence Classification on Halal Assurance Certificate Documents

Dadang Heksaputra * ,
Rahmat Gernowo 2 ,
R. Rizal Isnanto 2
1Doctoral Program of Information System School of Postgraduate Studies, Diponegoro University, Semarang, Indonesia; Faculty of Computer and Engineering, Department of Information System, Alma
2Doctoral Program of Information System School of Postgraduate Studies, Diponegoro University, Semarang, Indonesia
* Corresponding Author.
Received: December 23, 2024 Revised: February 15, 2025 Accepted: March 05, 2025

Abstract

Data imbalance is a common problem in machine learning, specifically in classification, in which examples in a dominant class outnumber examples in a minority class many times over. Besides, such a problem keeps a model unable to discover meaningful patterns for a minority class —hence, such a problem reduces model performance specifically in terms of Recall and F1-Score.  In current work, activity is performed in overcoming data imbalance problem in sentence classification model of documents of assurance certificate for halal with a combination of over-sampling and under-sampling techniques, namely Adaptive Synthetic (ADASYN) and Tomek Links. Text Classification technique is adopted in classifying sentences regarding assurance of halal in documents of assurance certificate for halal Text Classification; since incorrect classification of such sentences is not preferable, therefore, it is important to make sure no information about halal product is missed out. Over-sampling techniques considered include the SMOTE, Borderline SMOTE, ADASYN, and SMOTENC, and under-sampling techniques include the Random Under-Sampler, Near Miss, and Tomek Links. As comparative result, best performance gain in terms of Accuracy (0.759), F1-Score (0.748), Recall (0.759), and Precision (0.768) is generated with ADASYN. In our use case, ADASYN + Tomek Links is effective; recall is important in case of classification of documents for assurance certificate for halal and therefore, we cannot miss any relevant sentences. The proposed approach remarkably enhances the accuracy level for halal-related sentence identification and can be adopted in the halal product checking systems in industries with a halal feature.

Keywords

Data Imbalance Halal Assurance Documents Adaptive Synthetic (ADASYN) Tomek Links Text Classification Halal Information Systems

References

 

Over-Under Sampling Approach with Adaptive Synthetic and Tomek Links Methods to Handle Data Imbalance in Sentence Classification on Halal Assurance Certificate Documents

 

Dadang Heksaputra1, 2, 3,*, Rahmat Gernowo1, R. Rizal Isnanto1

1Doctoral Program of Information System School of Postgraduate Studies, Diponegoro University, Semarang, Indonesia

2Faculty of Computer and Engineering, Department of Information System, Alma Ata University, Yogyakarta, Indonesia

3Alma Ata Center for Medical Informatics, Alma Ata University, Yogyakarta, Indonesia

Emails: dadang@almaata.ac.id; rahmatgernowo@lecturer.undip.ac.id; rizal_isnanto@yahoo.com

 

Abstract

Data imbalance is a common problem in machine learning, specifically in classification, in which examples in a dominant class outnumber examples in a minority class many times over. Besides, such a problem keeps a model unable to discover meaningful patterns for a minority class —hence, such a problem reduces model performance specifically in terms of Recall and F1-Score.  In current work, activity is performed in overcoming data imbalance problem in sentence classification model of documents of assurance certificate for halal with a combination of over-sampling and under-sampling techniques, namely Adaptive Synthetic (ADASYN) and Tomek Links. Text Classification technique is adopted in classifying sentences regarding assurance of halal in documents of assurance certificate for halal Text Classification; since incorrect classification of such sentences is not preferable, therefore, it is important to make sure no information about halal product is missed out. Over-sampling techniques considered include the SMOTE, Borderline SMOTE, ADASYN, and SMOTENC, and under-sampling techniques include the Random Under-Sampler, Near Miss, and Tomek Links. As comparative result, best performance gain in terms of Accuracy (0.759), F1-Score (0.748), Recall (0.759), and Precision (0.768) is generated with ADASYN. In our use case, ADASYN + Tomek Links is effective; recall is important in case of classification of documents for assurance certificate for halal and therefore, we cannot miss any relevant sentences. The proposed approach remarkably enhances the accuracy level for halal-related sentence identification and can be adopted in the halal product checking systems in industries with a halal feature.

Keywords: Data Imbalance; Halal Assurance Documents; Adaptive Synthetic (ADASYN); Tomek Links; Text Classification; Halal Information Systems

Cite This Article

Choose your preferred format

format_quote
Heksaputra, Dadang, Gernowo, Rahmat, Isnanto, R. Rizal. "Over-Under Sampling Approach with Adaptive Synthetic and Tomek Links Methods to Handle Data Imbalance in Sentence Classification on Halal Assurance Certificate Documents." Fusion: Practice and Applications, vol. Volume 19, no. Issue 2, 2025, pp. 194-210. DOI: https://doi.org/10.54216/FPA.190215
Heksaputra, D., Gernowo, R., Isnanto, R. (2025). Over-Under Sampling Approach with Adaptive Synthetic and Tomek Links Methods to Handle Data Imbalance in Sentence Classification on Halal Assurance Certificate Documents. Fusion: Practice and Applications, Volume 19(Issue 2), 194-210. DOI: https://doi.org/10.54216/FPA.190215
Heksaputra, Dadang, Gernowo, Rahmat, Isnanto, R. Rizal. "Over-Under Sampling Approach with Adaptive Synthetic and Tomek Links Methods to Handle Data Imbalance in Sentence Classification on Halal Assurance Certificate Documents." Fusion: Practice and Applications Volume 19, no. Issue 2 (2025): 194-210. DOI: https://doi.org/10.54216/FPA.190215
Heksaputra, D., Gernowo, R., Isnanto, R. (2025) 'Over-Under Sampling Approach with Adaptive Synthetic and Tomek Links Methods to Handle Data Imbalance in Sentence Classification on Halal Assurance Certificate Documents', Fusion: Practice and Applications, Volume 19(Issue 2), pp. 194-210. DOI: https://doi.org/10.54216/FPA.190215
Heksaputra D, Gernowo R, Isnanto R. Over-Under Sampling Approach with Adaptive Synthetic and Tomek Links Methods to Handle Data Imbalance in Sentence Classification on Halal Assurance Certificate Documents. Fusion: Practice and Applications. 2025;Volume 19(Issue 2):194-210. DOI: https://doi.org/10.54216/FPA.190215
D. Heksaputra, R. Gernowo, R. Isnanto, "Over-Under Sampling Approach with Adaptive Synthetic and Tomek Links Methods to Handle Data Imbalance in Sentence Classification on Halal Assurance Certificate Documents," Fusion: Practice and Applications, vol. Volume 19, no. Issue 2, pp. 194-210, 2025. DOI: https://doi.org/10.54216/FPA.190215
Digital Archive Ready