Predicting Student Outcomes: Evaluating Regression Techniques in
Educational Data
Manish Kumar Singla 1 ∗, Faris H. Rizk2, Mahmoud Elshabrawy Mohamed2, Ahmed Mohamed Zaki2
1Department of Interdisciplinary Courses in Engineering, Chitkara University Institute of Engineering &
Technology, Chitkara University, Punjab, India.
2Computer Science and Intelligent Systems Research Center, Blacksburg 24060, Virginia, USA
Emails: manish.singla@chitkara.edu.in, faris.rizk@jcsis.org, mshabrawy@jcsis.org, Azaki@jcsis.org
Abstract
Student performance prediction is essential so that institutions can assist in identifying weak performers and
initiate corrective measures. This research assesses different regression models by applying data from Kaggle,
which involves data cleaning like managing missing values and scaling of the data, hence feature extraction,
then model imposition and authenticity. The models followed are Linear Regression, SVR, MLPRegressor,
Gradient Boosting, Catboost, Xgboost, Random Forest, Extratrees, Decision Tree and K-neighbors. The analysis
shows that Linear Regression produced the best result as it has the lowest MSE score of 0. 000521 and
high accuracy regarding other measures, including RMSE, MAE, and R². The results reveal that regression
models can be used to predict students’ performance and be helpful to the various stakeholders in the system.
The findings of this study will help develop required models for decision-making to improve students’
performance.
Keywords: Student performance prediction, regression models, educational data, data preprocessing, predictive
analytics