386 208
Full Length Article
Volume 2 , Issue 2, PP: 44-57 , 2021


A novel approach for Spam Email Filtering Using Machine Learning

Authors Names :   Subhalaxmi Sahoo   1 *     Sudan Jha   2     Deepak Prashar   3  

1  Affiliation :  Research Scholar, Electrical Engineering, India

    Email :  subhalaxmisahoo166@gmail.com

2  Affiliation :  School of Computer Science & Engineering, Lovely Professional University, India

    Email :  jhasudan@hotmail.com

3  Affiliation :  School of Computer Science & Engineering, Lovely Professional University, India

    Email :  deepak.prashar@lpu.co.in

Doi   :  DOI: 10.5281/zenodo.3715417

Abstract :

Spam emails also known as unsolicited emails (maybe commercial or maybe not) i.e. those mails which are sent without our request or concern. Email spam is the practice of sending unwanted emails, mostly contains commercial messages to randomly generated persons. In the internet email spam is widespread because of such low cost of sending emails other than any other means of communication. It is important to filter spam emails because most of the malicious activities performed in the internet done through email spamming. Though there are many spam filters are available we still get huge amount of spam emails. This is not because the filters are not accurate & effective; the reason is the generation of quick and effective counters of the algorithm used in the filters. In our project we used mainly three supervised learning algorithms namely Linear SVC, Multinomial NB, and k-NN to implement the filter. We used these algorithms to train the system about spam email by using the feature called word count vector which is generated by processing a dataset filled with existing emails containing both spam and legitimate emails. The full process of the project and the result of the execution by implementing the three models/algorithms are discussed.

Keywords :

Word Count Vector , Linear SVC , Multinomial NB , KNN

References :

[1] Nadji, Y., Antonakakis, M., Perdisci, R., Dagon, D., & Lee, W. (2013, November). Beheading hydras: performing effective botnet takedowns. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security (pp. 121-132). ACM.

[2] Cho, C. Y., Caballero, J., Grier, C., Paxson, V., & Song, D. (2010). Insights from the Inside: A View of Botnet Management from Infiltration. LEET, 10, 1-1.


[3] Dittrich, D. (2012, April). So You Want to Take Over a Botnet... In LEET.


[4] Goodman, N. (2017). A Survey of Advances in Botnet Technologies. arXiv preprint arXiv:1702.01132.


[5] Schiavoni, S., Maggi, F., Cavallaro, L., & Zanero, S. (2014, July). Phoenix: DGA-based botnet tracking and intelligence. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (pp. 192-211). Springer, Cham.


[6] Dagon, D. (2005, July). Botnet detection and response. In OARC workshop (Vol. 2005).


[7] Micro, T. (2006). Taxonomy of botnet threats. Whitepaper, November.


[8] Dagon, D., Gu, G., Lee, C. P., & Lee, W. (2007, December). A taxonomy of botnet structures. In Computer Security Applications Conference, 2007. ACSAC 2007. Twenty-Third Annual (pp. 325-339). IEEE.


[9][ Nazario, J. (2008). Bot and botnet taxonomy. Computer Security Institute. Computer Security Institute Security Exchange.


[10] Al-Jarrah, O. Y., Alhussein, O., Yoo, P. D., Muhaidat, S., Taha, K., & Kim, K. (2016). Data randomization and cluster-based partitioning for botnet intrusion detection. IEEE transactions on cybernetics, 46(8), 1796-1806.


[11] Plohmann, D., Gerhards-Padilla, E., &Leder, F. (2011). Botnets: Detection, measurement, disinfection &defence. European Network and Information Security Agency (ENISA), 1(1), 1-153.


[12] Khattak, S., Ramay, N. R., Khan, K. R., Syed, A. A., &Khayam, S. A. (2014). A taxonomy of botnet behavior, detection, and defense. IEEE communications surveys & tutorials, 16(2), 898-924.


[13] Anagnostopoulos, M., Kambourakis, G., &Gritzalis, S. (2016). New facets of mobile botnet: architecture and evaluation. International Journal of Information Security, 15(5), 455-473.


[14] Kwon, J., Lee, J., Lee, H., &Perrig, A. (2016). PsyBoG: a scalable botnet detection method for large-scale DNS traffic. Computer Networks, 97, 48-73.


[15] Maratea, A., Petrosino, A., & Manzo, M. (2014). Adjusted F-measure and kernel scaling for imbalanced data learning. Information Sciences, 257, 331-341.


[16] Ng, A. (2011). Advice for applying machine learning.


[17] The CAIDA UCSD Dataset 2008-11-21, 2008. https://data.caida.org/datasets/security/telescope-3days-conficker/


[18] Singh, K., Guntuku, S. C., Thakur, A., &Hota, C. (2014). Big data analytics framework for peer-to-peer botnet detection using random forests. Information Sciences, 278, 488-497.


[19] Abu Rajab, M., Zarfoss, J., Monrose, F., & Terzis, A. (2006, October). A multifaceted approach to understanding the botnet phenomenon. In Proceedings of the 6th ACM SIGCOMM conference on Internet measurement (pp. 41-52). ACM.


[20] Feily, M., Shahrestani, A., &Ramadass, S. (2009, June). A survey of botnet and botnet detection. In Emerging Security Information, Systems and Technologies, 2009. SECURWARE'09. Third International Conference on (pp. 268-273). IEEE.