Detection of Spam Email

Authors

  • Manish Panwar Department of MCA, Vishwakarma Institute of Technology Pune, India
  • Jayesh Rajesh Jogi Department of MCA, Vishwakarma Institute of Technology Pune, India
  • Mahesh Vijay Mankar Department of MCA, Vishwakarma Institute of Technology Pune, India
  • Mohamed Alhassan Department of MCA, Vishwakarma Institute of Technology Pune, India
  • Shreyas Kulkarni Department of MCA, Vishwakarma Institute of Technology Pune, India

DOI:

https://doi.org/10.54536/ajise.v1i1.996

Keywords:

Spam Email, Classification, Dataset, Performance Metrics

Abstract

Spam, often known as unsolicited email, has grown to be a major worry for every email user. Nowadays, it is quite challenging to filter spam emails since they are made, created, or written in such a unique way that anti-spam filters cannot recognize them. In order to predict or categorize emails as spam, this paper compares and reviews the performance metrics of a few categories of supervised machine learning techniques, including Svm (Support Vector Machine), Random Forest, Decision Tree, Cnn, (Convolutional Neural Network), Knn(K Nearest Neighbor), Mlp(Multi-Layer Perceptron), Adaboost (AdaptiveBoosting), and Nave Bayes algorithm. Thegoal of this study is to analyze the specificsor content of the emails, discover a limited dataset, and create a classification model that can predict or categorize whether spam is present in an email. Transformers’ Bidirectional Encoder Representations) has been optimized to perform the duty of separating spam emails from legitimate emails (Ham). To put the text’s context into perspective, Bert uses attention layers. Results are contrasted with a baseline Dnn (deep neural network) modelthat consists of two stacked Dense layers and a Bilstm (bidirectional Long Short-Term Memory) layer. Results are also contrasted with a group of traditional classifiers, including k- Nn (k-nearest neighbours) and Nb (Naive Bayes). The model is tested for robustness andpersistence using two open-source data sets, one of which is utilized to train the model.

Downloads

Download data is not yet available.

References

Shukor Bin Abd Razak, Ahmad Fahrulrazie Bin Mohamad (2013). Identification of Spam Email Based on Information from Email Header. 13th International Conference on Intelligent Systems Design and Applications (ISDA).

Mohammed Reza Parsei, Mohammed Salehi (2015). E-Mail Spam Detection Based on Part of Speech Tagging. 2 nd International Conference on Knowledge Based Engineering and Innovation (KBEI).

Sunil B. Rathod, Tareek M. Pattewar (2015). Content Based Spam Detection in Email using Bayesian Classifier, presented at the IEEE ICCSP conference.

Aakash Atul Alurkar, Sourabh Bharat Ranade, Shreeya Vijay Joshi, Siddhesh Sanjay Ranade, Piyush A. Sonewa, Parikshit N. Mahalle, Arvind V. Deshpande (2017). A Proposed Data Science Approach for Email Spam Classification using Machine Learning Techniques.

Kriti Agarwal, Tarun Kumar (2018). Email Spam Detection using integrated approach of Naïve Bayes and Particle Swarm Optimization, Proceedings of the Second International Conference on Intelligent Computing and Control Systems (ICICCS).

Cihan Varol, Hezha M.Tareq Abdulhadi (2018).Comparison of StringMatching Algorithmson Spam Email Detection, International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism Dec.

Duan, Lixin, Dong Xu, and Ivor Wai-Hung Tsang. (2012). Domain adaptation from multiple sources: A domaindependent regularization approach. IEEE Transactions on Neural Networks and Learning Systems 23.3.

Mujtaba, Ghulam, et al. (2017). Email classification research trends: Review and open issues. IEEE Access 5

Trivedi, Shrawan Kumar (2016). A study of machine learning classifiers for spam detection Computational and Business Intelligence (ISCBI), 4th International Symposium on. IEEE.

You, Wanqing, et al. (2015). Web Service-Enabled Spam Filtering with Naïve Bayes Classification. IEEE First International Conference on Big Data Computing Service and Applications (BigDataService). IEEE.

Rathod, Sunil B., and Tareek M. Pattewar. (2015). Content based spam detection in email using Bayesian classifier. International Conference on. IEEE.

Sahın, Esra, Murat Aydos, and Fatih Orhan. (2018). Spam/ham e-mail classification using machine learning methods based on bag of words technique. 26th Signal Processing and Communications Application Conference (SIU). IEEE, 2018.

Kuldeep Vayadande, Aditya Bodhankar, Ajinkya Mahajan, Diksha Prasad, Shivani Mahajan, Aishwarya Pujari and Riya Dhakalkar (2022). Classification of Depression on social media using Distant Supervision, ITM Web Conf. 50.

Kuldeep Vayadande, Rahebar Shaikh, Suraj Rothe, Sangam Patil, Tanuj Baware and Sameer Naik, (2022). Blockchain-Based Land Record Syste M, ITM Web Conf. 50.

Samruddhi Mumbare, Kunal Shivam, Priyanka Lokhande, Samruddhi Zaware, VaradDeshpande and Kuldeep Vayadande, (2022). Software Controller using Hand Gestures, ITM Web Conf. 50.

Downloads

Published

2022-12-30

How to Cite

Panwar, M., Jogi, J. R., Mankar, M. V., Alhassan, M., & Kulkarni, S. (2022). Detection of Spam Email. American Journal of Innovation in Science and Engineering, 1(1), 18–21. https://doi.org/10.54536/ajise.v1i1.996