Interpretable Rainfall Forecasting Using SHAP-Enhanced Machine
Learning: A Case Study on U.S. Urban Climate Data (2024–2025)
Khaled Sh. Gaber1,∗ , Mahmoud Elshabrawy Mohamed1,∗
1Computer Science and Intelligent Systems Research Center, Blacksburg 24060, Virginia, USA
Emails: khsherif@jcsis.org; mshabrawy@jcsis.org,
Abstract
Correct rainfall prediction is fundamental for developing resilient climates, guaranteeing sustainable farms and
planned water distribution networks, and reducing possible disasters. Many meteorological elements affect
rainfall patterns because rainfall shows nonlinear behavior and dependence across different timescales and
diverse spatial areas. Multiple problematic features defeat conventional forecasting techniques because they
produce insufficient accurate predictions of short-duration precipitation patterns. Because of rising climate
variability, we require predictive frameworks built with data with strong performance abilities and human-
understandable features. In this paper, we establish a machine learning that predicts daily rainfall in advance
with a refined dataset consisting of detailed weather measurements spanning 20 United States metropolises
from 2024 to 2025. The selected dataset contains six atmospheric factors: temperature, humidity, wind speed,
and cloud cover with pressure and precipitation and a binary outcome to show rainfall prediction for the
following day. Random Forest and Support Vector Machine (RBF) KNearest Neighbors (KNN), Logistic
Regression, Naive Bayes, and Linear SVM formed the set of machine learning models that underwent training
and evaluation. The SHAP method was integrated to improve prediction interpretation and trust through
Shapley additive explanations value measures. SHAP values provided quantitative measurement and graphical
visualization to explain the role of each input variable in making individual prediction outcomes. SHAP
analysis of the model showcased precipitation and humidity as their most crucial features because they match
the principles of meteorological theory and demonstrate the rational decision-making process of the model.
The Random Forest approach scored the highest performance from all models, reaching perfect measurements
for Precision = 100, Recall = 100 and F1-score = 100. The RBF SVM model alongside KNN showed strong
performance since they delivered F1 scores of 0.97 and 0.94. The evaluation revealed that Logistic Regression,
Linear SVM and Naive Bayes achieved satisfactory results, providing F1-score ratings between 0.76 and
0.77. The SHAP-based diagnostic results showed that Random Forest yielded exceptional classification results
while simultaneously showing consistent weighting patterns between features across diverse locations. The
integration of the Random Forest model with SHAP interpretation creates an effective solution for rainfall
forecasting despite its high prediction capabilities. The model achieves complete prediction accuracy with
precise explanation capabilities, generating trust for using it in actual deployment scenarios. According to
the results, weather-sensitive sectors like agriculture, urban planning, and disaster response can leverage these
transparent machine learning systems into their decision-making support pipelines. The approach described
has the potential to become a model structure for conducting future predictive analyses in meteorology and
environmental science.
Keywords: Rainfall prediction; SHAP; Machine Learning; Random Forest