Unsupervised Approach for Email Spam Filtering using Data Mining

The computer networks overwhelm with unwanted emails, which are called spam emails. This email brings financial damage to companies and losses of user reputation. In this paper, the increasing volume of these emails has created the intense need to design and impl...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Mehdi Manaa, Ahmed Obaid, Mohammed Dosh
Formato:	article
Lenguaje:	EN
Publicado:	European Alliance for Innovation (EAI) 2021
Materias:	spam emails vector space model data security machine learning m-dbscan Science Q Mathematics QA1-939 Electronic computers. Computer science QA75.5-76.95
Acceso en línea:	https://doaj.org/article/c66df5d27fd4473a9b38f666d67b4960
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:c66df5d27fd4473a9b38f666d67b4960
record_format	dspace
spelling	oai:doaj.org-article:c66df5d27fd4473a9b38f666d67b49602021-11-30T11:07:24ZUnsupervised Approach for Email Spam Filtering using Data Mining2032-944X10.4108/eai.9-3-2021.168962https://doaj.org/article/c66df5d27fd4473a9b38f666d67b49602021-11-01T00:00:00Zhttps://eudl.eu/pdf/10.4108/eai.9-3-2021.168962https://doaj.org/toc/2032-944XThe computer networks overwhelm with unwanted emails, which are called spam emails. This email brings financial damage to companies and losses of user reputation. In this paper, the increasing volume of these emails has created the intense need to design and implement robust anti-spam filtering using the vector space model and Machine Learning (ML). ML algorithms have successfully used to detect and filter spam emails that jeopardize the network resources and consume the bandwidth. The main objective is to apply unsupervised learning M-DBSCAN to classify spam and ham emails. A robust method using the Modified Density-Based Spatial Clustering of Applications with Noise (M-DBSCAN) is implemented. The extracted N-representative points from each cluster are applied in the online test. These points represent the cluster objects to detect spherical and non-spherical clusters. These N-representative points are formed from the training step to detect spam email using distance measures. The data set used from the Kaggle website included many objects of ham and spam emails. The results show good performance accuracy with 97.848% in M-DBSCAN compared with 95.918% for standard DBSCAN accuracy and efficient values in false-negative rate, false-positive rate, f-score and online time detection.Mehdi ManaaAhmed ObaidMohammed DoshEuropean Alliance for Innovation (EAI)articlespam emailsvector space modeldata securitymachine learningm-dbscanScienceQMathematicsQA1-939Electronic computers. Computer scienceQA75.5-76.95ENEAI Endorsed Transactions on Energy Web, Vol 8, Iss 36 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	spam emails vector space model data security machine learning m-dbscan Science Q Mathematics QA1-939 Electronic computers. Computer science QA75.5-76.95
spellingShingle	spam emails vector space model data security machine learning m-dbscan Science Q Mathematics QA1-939 Electronic computers. Computer science QA75.5-76.95 Mehdi Manaa Ahmed Obaid Mohammed Dosh Unsupervised Approach for Email Spam Filtering using Data Mining
description	The computer networks overwhelm with unwanted emails, which are called spam emails. This email brings financial damage to companies and losses of user reputation. In this paper, the increasing volume of these emails has created the intense need to design and implement robust anti-spam filtering using the vector space model and Machine Learning (ML). ML algorithms have successfully used to detect and filter spam emails that jeopardize the network resources and consume the bandwidth. The main objective is to apply unsupervised learning M-DBSCAN to classify spam and ham emails. A robust method using the Modified Density-Based Spatial Clustering of Applications with Noise (M-DBSCAN) is implemented. The extracted N-representative points from each cluster are applied in the online test. These points represent the cluster objects to detect spherical and non-spherical clusters. These N-representative points are formed from the training step to detect spam email using distance measures. The data set used from the Kaggle website included many objects of ham and spam emails. The results show good performance accuracy with 97.848% in M-DBSCAN compared with 95.918% for standard DBSCAN accuracy and efficient values in false-negative rate, false-positive rate, f-score and online time detection.
format	article
author	Mehdi Manaa Ahmed Obaid Mohammed Dosh
author_facet	Mehdi Manaa Ahmed Obaid Mohammed Dosh
author_sort	Mehdi Manaa
title	Unsupervised Approach for Email Spam Filtering using Data Mining
title_short	Unsupervised Approach for Email Spam Filtering using Data Mining
title_full	Unsupervised Approach for Email Spam Filtering using Data Mining
title_fullStr	Unsupervised Approach for Email Spam Filtering using Data Mining
title_full_unstemmed	Unsupervised Approach for Email Spam Filtering using Data Mining
title_sort	unsupervised approach for email spam filtering using data mining
publisher	European Alliance for Innovation (EAI)
publishDate	2021
url	https://doaj.org/article/c66df5d27fd4473a9b38f666d67b4960
work_keys_str_mv	AT mehdimanaa unsupervisedapproachforemailspamfilteringusingdatamining AT ahmedobaid unsupervisedapproachforemailspamfilteringusingdatamining AT mohammeddosh unsupervisedapproachforemailspamfilteringusingdatamining
_version_	1718406709698887680

Unsupervised Approach for Email Spam Filtering using Data Mining

Ejemplares similares