Securing DNS over HTTPS: A Machine Learning Study on Traffic

Classification Using DoHBrw-2020

Al-Seyday.T. Qenawy 1 ∗, Hussein Alkattan2, Amany Khaled3

1Intelligent Systems and Machine Learning Lab, Shenzhen 518000, China

2Department of System Programming, South Ural State University, 454080 Chelyabinsk, Russia

3Department of Clinical Pharmacy and Pharmacy Practice, Faculty of Pharmacy, Mansoura University,

Mansoura, Egypt

Emails: S.Qenawy@asia.com, alkattan.hussein92@gmail.com, amany24khaled@gmail.com

Abstract

This paper provides a detailed review of related works for classifying secure DNS traffic, with emphasis on

the identification of threats relating to DoH using machine learning algorithms. In the present study, with the

help of DoHBrw-2020 dataset consisting the network traffic data of DoH protocol during its testing phase, we

compare the performance of various machine learning algorithms: Decision Tree, SVM, KNN, Na¨ıve Bayes,

Neural Network (MLP), Gradient Boosting, and SVM with RBF kernel. As for each model, we have Accuracy,

Sensitivity, Specificity, Positive Predicted Value, Negative Predicted Value, and F Score. They reveal the fact

that the chosen Decision Tree model produces the highest accuracy and equals to 99. 65% and all the criteria of

the assessment should be well managed. It is important that the various machine learning methods contribute

to the study’s discovery of high potential in improving DNS traffic security and offers an understanding on the

best models to use for real-time detection of DoH threats. From these outcomes, it can draw many perspectives

to the further creation and implementation of safer DNS solutions within contemporary information security

paradigms.

Keywords: DNS over HTTPS, Machine Learning, Traffic Classification, DoHBrw-2020, Cybersecurity