Full Length Article
DOI: https://doi.org/10.54216/IJAIET.040105
Machine Learning for At-Risk Student Identification in Virtual Learning Environments: A Multi-Classifier Analysis Using the Open University Learning Analytics Dataset
The detection of students who will face academic difficulties or leave their studies during their initial course period provides universities with a brief time frame to develop effective solutions. This research paper conducts a systematic analysis which tests multiple machine learning classifiers on the Open University Learning Analytics Dataset (OULAD) which serves as one of the most widely used public educational datasets that presents data from 32593 students who studied 22 different courses through distance learning. The four classification methods include logistic regression decision tree random forest and gradient boosting which use a feature set that combines student demographic information and virtual learning environment (VLE) clickstream-based engagement data. The primary discovery shows that VLE behavioral characteristics constitute the most important elements for Random Forest which identifies total click volume and active VLE days and typical daily click volume as its top four elements which make up 92.8% of total importance while demographic information has less impact. Random Forest achieves the strongest held-out test performance (AUC = 0.998, F1 = 0.978, accuracy = 98.2%) while Decision Tree shows lower results with AUC = 0.959 which demonstrates how performance losses occur when systems need to be understandable. At-risk students in the two groups present a 75.8% decrease in total VLEclicks which results in an average of 49.0 clicks compared to 203.0 clicks with a t value of 104.0 and a p value less than 0.001. The research describes its complete end-to-end prediction pipeline which includes details about its model evaluation framework and its dataset to enable future researchers to reproduce the study. The results have direct implications for the design of early-alert systems and the ethical deployment of predictive analytics in higher education.
Emad Bashkail,
Nesrin Merhi
visibility
488
download
517