SPAMGUARD: AN INTEGRATED KALMAN FILTER AND CNN APPROACH FOR EMAIL SPAM CLASSIFICATION
DOI:
https://doi.org/10.29121/ijetmr.v10.i6.2023.1600Keywords:
Kalman, Cnn, Email, Classification, SpamAbstract
Email remains a primary mode of communication for both professional and personal use due to its low cost, accessibility, and widespread adoption. However, the open nature of email systems exposes users to spam — unsolicited, irrelevant, or malicious messages — posing risks such as phishing, fraud, and information overload. Existing spam detection mechanisms face challenges in keeping pace with the evolving strategies used by spammers and must balance aggressive filtering with the risk of legitimate message loss. To address these limitations, this study proposes a novel spam detection framework combining Kalman Filters and Convolutional Neural Networks (CNNs). Kalman Filters are utilized to pre-process and denoise input text data, effectively mitigating irregularities and improving feature consistency. CNNs are then employed to automatically learn hierarchical text representations, enabling robust classification of emails into spam or legitimate categories. The integration of Kalman-based preprocessing with deep learning enhances both detection accuracy and system reliability. Additionally, the system provides a quick summary view of classified emails to assist users in rapidly assessing message content. Experimental results demonstrate the potential of the proposed method to outperform traditional spam detection techniques, offering a scalable and adaptive solution to modern email security challenges.
Downloads
References
Aggarwal, C. C., Zhai, C. (2012). MininG Text Data. Springer. https://doi.org/10.1007/978-1-4614-3223-4
Al-Azani, S., El-Alfy, E.-S. M. (2019). A Framework for Email Spam Filtering Using Word2vec and Deep Learning. Journal of Information Security and Applications.
Almeida, T. A., Hidalgo, J. M. G., & Yamakami, A. (2011). Contribution To the Study of SMS Spam Filtering: New Collection and Results. Proceedings of ACM SAC. https://doi.org/10.1145/2034691.2034742
Androutsopoulos, I., et al. (2000). An Evaluation of Naive Bayesian Anti-Spam Filtering. Workshop on Machine Learning in the New Information Age.
Blanzieri, E., Bryl, A. (2008). A Survey of Learning-Based Techniques of Email Spam Filtering. Artificial Intelligence Review. https://doi.org/10.1007/s10462-009-9109-6
Carreras, X., Márquez, L. (2001). Boosting Trees for Anti-Spam Email Filtering. Proceedings of RANLP.
Chen, T., Guestrin, C. (2016). Xgboost: A Scalable Tree Boosting System. Proceedings of KDD. https://doi.org/10.1145/2939672.2939785
Chen, X., et al. (2006). Kalman Filter for Speech Enhancement. IEEE Transactions on Audio, Speech, and Language Processing.
Cormack, G. V. (2008). Email Spam Filtering: A Systematic Review. Foundations and Trends in Information Retrieval. https://doi.org/10.1561/9781601981479
Drucker, H., Wu, D., & Vapnik, V. (1999). Support Vector Machines for Spam Categorization. IEEE Transactions on Neural Networks. https://doi.org/10.1109/72.788645
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
Goodman, J., Heckerman, D., & Rounthwaite, R. (2005). Stopping Spam. Scientific American. https://doi.org/10.1038/scientificamerican0405-42
Hochreiter, S., Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation. https://doi.org/10.1162/neco.1997.9.8.1735
Joachims, T. (1998). Text Categorization With Support Vector Machines: Learning With Many Relevant Features. Proceedings of ECML. https://doi.org/10.1007/BFb0026683
Johnson, R., Zhang, T. (2015). Effective Use of Word Order for Text Categorization with Convolutional Neural Networks. Proceedings of NAACL-HLT. https://doi.org/10.3115/v1/N15-1011
Kalman, R. E. (1960). A New Approach To Linear Filtering and Prediction Problems. Journal of Basic Engineering. https://doi.org/10.1115/1.3662552
Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. Proceedings of EMNLP. https://doi.org/10.3115/v1/D14-1181
Kotsiantis, S. B. (2007). Supervised Machine Learning: A Review of Classification Techniques. Informatica.
Liu, P., Qiu, X., & Huang, X. (2016). Recurrent Neural Network for Text Classification With Multi-Task Learning. Proceedings of IJCAI.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv preprint.
Qi, P., et al. (2020). A Multimodal Approach for Spam Detection in Short Texts. Proceedings of EMNLP.
Radicati Group. (2021). Email Statistics Report, 2021-2025.
Ramos, J. (2003). Using TF-IDF To Determine Word Relevance in Document Queries. Proceedings of the First Instructional Conference on Machine Learning.
Sahami, M., Dumais, S., Heckerman, D., & Horvitz, E. (1998). A Bayesian Approach To Filtering Junk E-Mail. AAAI Workshop on Learning for Text Categorization.
Salton, G., McGill, M. J. (1983). Introduction To Modern Information Retrieval. McGraw-Hill.
Sebastiani, F. (2002). Machine Learning in Automated Text Categorization. ACM Computing Surveys. https://doi.org/10.1145/505282.505283
SpamAssassin, Apache Software Foundation.
U.S. Congress. (2003). CAN-SPAM Act of 2003.
Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer. https://doi.org/10.1007/978-1-4757-2440-0
Vaswani, A., et al. (2017). ATtention Is All You Need. Proceedings of NeurIPS.
Waseem, Z., Hovy, D. (2016). Hateful Symbols or Hateful People? Predictive features for hate speech detection on Twitter. Proceedings of NAACL. https://doi.org/10.18653/v1/N16-2013
Wei, J., Zou, K. (2019). EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. Proceedings of EMNLP. https://doi.org/10.18653/v1/D19-1670
Welch, G., Bishop, G. (1995). An iNtroduction To the Kalman Filter. University of North Carolina at Chapel Hill.
Zhang, L., Zhu, J., & Yao, T. (2004). An Evaluation of Statistical Spam Filtering Techniques. ACM Transactions on Asian Language Information Processing. https://doi.org/10.1145/1039621.1039625
Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-Level Convolutional Networks for Text Classification. Proceedings of NeurIPS.
Zhou, Z.-H. (2012). Ensemble Methods: Foundations and Algorithms. CRC Press. https://doi.org/10.1201/b12207
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Umesh, Yuvraj Pawar, Abhay Sharma, Akshat Chauhan, Suman

This work is licensed under a Creative Commons Attribution 4.0 International License.
License and Copyright Agreement
In submitting the manuscript to the journal, the authors certify that:
- They are authorized by their co-authors to enter into these arrangements.
- The work described has not been formally published before, except in the form of an abstract or as part of a published lecture, review, thesis, or overlay journal.
- That it is not under consideration for publication elsewhere.
- That its release has been approved by all the author(s) and by the responsible authorities – tacitly or explicitly – of the institutes where the work has been carried out.
- They secure the right to reproduce any material that has already been published or copyrighted elsewhere.
- They agree to the following license and copyright agreement.
Copyright
Authors who publish with International Journal of Engineering Technologies and Management Research agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors can enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or edit it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) before and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
For More info, please visit CopyRight Section