Interpretable Rainfall Forecasting Using SHAP-Enhanced Machine

Learning: A Case Study on U.S. Urban Climate Data (2024–2025)

Khaled Sh. Gaber1,∗ , Mahmoud Elshabrawy Mohamed1,∗

1Computer Science and Intelligent Systems Research Center, Blacksburg 24060, Virginia, USA

Emails: khsherif@jcsis.org; mshabrawy@jcsis.org,

Abstract

Correct rainfall prediction is fundamental for developing resilient climates, guaranteeing sustainable farms and

planned water distribution networks, and reducing possible disasters. Many meteorological elements affect

rainfall patterns because rainfall shows nonlinear behavior and dependence across different timescales and

diverse spatial areas. Multiple problematic features defeat conventional forecasting techniques because they

produce insufficient accurate predictions of short-duration precipitation patterns. Because of rising climate

variability, we require predictive frameworks built with data with strong performance abilities and human-

understandable features. In this paper, we establish a machine learning that predicts daily rainfall in advance

with a refined dataset consisting of detailed weather measurements spanning 20 United States metropolises

from 2024 to 2025. The selected dataset contains six atmospheric factors: temperature, humidity, wind speed,

and cloud cover with pressure and precipitation and a binary outcome to show rainfall prediction for the

following day. Random Forest and Support Vector Machine (RBF) KNearest Neighbors (KNN), Logistic

Regression, Naive Bayes, and Linear SVM formed the set of machine learning models that underwent training

and evaluation. The SHAP method was integrated to improve prediction interpretation and trust through

Shapley additive explanations value measures. SHAP values provided quantitative measurement and graphical

visualization to explain the role of each input variable in making individual prediction outcomes. SHAP

analysis of the model showcased precipitation and humidity as their most crucial features because they match

the principles of meteorological theory and demonstrate the rational decision-making process of the model.

The Random Forest approach scored the highest performance from all models, reaching perfect measurements

for Precision = 100, Recall = 100 and F1-score = 100. The RBF SVM model alongside KNN showed strong

performance since they delivered F1 scores of 0.97 and 0.94. The evaluation revealed that Logistic Regression,

Linear SVM and Naive Bayes achieved satisfactory results, providing F1-score ratings between 0.76 and

0.77. The SHAP-based diagnostic results showed that Random Forest yielded exceptional classification results

while simultaneously showing consistent weighting patterns between features across diverse locations. The

integration of the Random Forest model with SHAP interpretation creates an effective solution for rainfall

forecasting despite its high prediction capabilities. The model achieves complete prediction accuracy with

precise explanation capabilities, generating trust for using it in actual deployment scenarios. According to

the results, weather-sensitive sectors like agriculture, urban planning, and disaster response can leverage these

transparent machine learning systems into their decision-making support pipelines. The approach described

has the potential to become a model structure for conducting future predictive analyses in meteorology and

environmental science.

Keywords: Rainfall prediction; SHAP; Machine Learning; Random Forest