Volume 18 , Issue 1 , PP: 56–69, 2026 | Cite this article as | XML | Html | PDF | Full Length Article
Reem Atassi 1 *
Doi: https://doi.org/10.54216/JCIM.180104
Phishing URLs still present a security threat to organizations because they enable credential theft and account takeover together with payment fraud and unauthorized digital service access. The existing research on phishing detection has been studied extensively yet most published papers still show a preference for predictive performance assessment compared to operational system capabilities and tests and governance system implementation. The researchers developed OPH-Guard as an operational security system which uses compact tree ensembles to identify phishing URLs for their secure web access management system. The integrated workflow system enables institutional and small enterprise to implement public data ingestion and feature validation together with tabular model learning and post-hoc explanation and security-action mapping. The empirical evaluation used a public GitHub-hosted phishing URL dataset which contains 11,481 labeled records and 87 predictive features. The researchers conducted a comparison between three tree-based learners according to a stratified 80/20 hold-out protocol which included Decision Tree and Random Forest and Extra Trees. The actual results from Extra Trees produced the highest accuracy score of 0.9856 which included 0.9921 precision and 0.9791 recall and 0.9855 F1-score and 0.9984 ROC-AUC from the held-out test results. The study investigates security relevance for top predictors through google index and page rank and domain age and phish hints which provide evidence that the resulting model enables organizations to manage browsing risk through URL triage together with secure information management controls. The study presents a reproducible framework together with a complete screening algorithm and a summary of existing research from ten studies and a system which connects model results to security operations.
Phishing URL detection , Tree ensembles , Extra Trees , Secure web access management , Operational interpretability , Cybersecurity analytics
[1] Rami M. Mohammad, Fadi Thabtah, and Lee McCluskey. “An Assessment of Features Related to Phishing Websites Using an Automated Technique”. In: 2012 International Conference for Internet Technology and Secured Transactions (2012), pp. 492–497. DOI: 10.1109/ICITST.2012.6473497.
[2] Rami M. Mohammad and Lee McCluskey. PhishingWebsites. UCI Machine Learning Repository. 2015. DOI: 10 . 24432 / C51W2X. URL: https : / / archive . ics . uci . edu / dataset / 327 / phishing+websites.
[3] Rakesh Verma and Avisha Das. “What’s in a URL: Fast Feature Extraction and Malicious URL Detection”. In: Proceedings of the 3rd ACM International Workshop on Security and Privacy Analytics. 2017, pp. 55–63. DOI: 10.1145/3041008.3041016.
[4] Routhu Srinivasa Rao and Alwyn Roshan Pais. “Detection of Phishing Websites Using an Efficient Feature Based Machine Learning Framework”. In: Neural Computing and Applications 31 (2019), pp. 3851–3873. DOI: 10.1007/s00521-017-3305-0.
[5] O. Koray Sahingoz et al. “Machine Learning Based Phishing Detection from URLs”. In: Expert Systems with Applications 117 (2019), pp. 345–357. DOI: 10.1016/j.eswa.2018.09.029.
[6] Ali Aljofey et al. “An Effective Phishing Detection Model Based on Character Level Convolutional Neural Network from URL”. In: Electronics 9.9 (2020), p. 1514. DOI: 10.3390/electronics9091514.
[7] Nguyen Quoc Do et al. “Phishing Webpage Classification via Deep Learning Algorithms”. In: Applied Sciences 11.19 (2021), p. 9210. DOI: 10.3390/app11199210.
[8] Musarat Hussain et al. “CNN-Fusion: An Effective and Lightweight Phishing Detection Method Based on Multi-Variant ConvNet”. In: Information Sciences 631 (2023), pp. 328–345. DOI: 10.1016/j.ins.2023.02.039.
[9] Hayk Ghalechyan et al. “Phishing URL Detection with Neural Networks: An Empirical Study”. In: Scientific Reports 14 (2024), p. 25134. DOI: 10.1038/s41598-024-74725-6.
[10] Khandaker Mohammad Mohi Uddin et al. “Explainable Machine Learning for Phishing Site Detection: A High-Efficiency Approach Using Boosting Models and SHAP”. In: The Journal of Engineering 2025.1 (2025), e70110. DOI: 10.1049/tje2.70110.
[11] Grega Vrban ci c, Iztok Fister, and Vili Podgorelec. “Datasets for Phishing Websites Detection”. In:Data in Brief 33 (2020), p. 106438. DOI: 10.1016/j.dib.2020.106438.
[12] Asadullah Safi and Satwinder Singh. “A Systematic Literature Review on Phishing Website Detection Techniques”. In: Journal of King Saud University – Computer and Information Sciences 35.2 (2023), pp. 590–611. DOI: 10.1016/j.jksuci.2023.01.004.
[13] Ammar Almomani et al. “PhishingWebsite Detection with Semantic Features Based on Machine Learning Classifiers: Underfitting and Overfitting Analysis”. In: International Journal of Secure Software Engineering 13.1 (2022), pp. 1–25. DOI: 10.4018/IJSSE.297032.
[14] S. K. Hasane Ahammad et al. “Phishing URL Detection Using Machine Learning Methods”. In: Advances in Engineering Software 173 (2022), p. 103288. DOI: 10.1016/j.advengsoft.2022. 103288.
[15] DPhi. Phishing URL Training Dataset. GitHub-hosted dataset repository. URL: https : / / raw . githubusercontent.com/dphi- official/Datasets/master/phishing_data/ Training_set_label.csv (visited on 04/12/2026).[16] Leo Breiman. “Random Forests”. In: Machine Learning 45.1 (2001), pp. 5–32. DOI: 10.1023/A:
1010933404324.
[17] Pierre Geurts, Damien Ernst, and LouisWehenkel. “Extremely Randomized Trees”. In: Machine Learning 63.1 (2006), pp. 3–42. DOI: 10.1007/s10994-006-6226-1.
[18] Maria Carla Calzarossa, Paolo Giudici, and Rasha Zieni. “Explainable Machine Learning for Phishing Feature Detection”. In: Quality and Reliability Engineering International 40.1 (2024), pp. 362–373. DOI: 10.1002/qre.3411.