Software Engineering-Based Design for a Bayesian Spam Filter

The rapid spread and the easy availability of a free e-mail service have made it the medium of choice for the sending of unsolicited advertising and bulk e-mail in general. These messages, known as junk e-mail or spam mail, are an increasing problem to both Internet users and Internet service provid...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autor principal: Mumtaz Mohammed Ali AL-Mukhtar
Formato: article
Lenguaje:EN
Publicado: Al-Khwarizmi College of Engineering – University of Baghdad 2010
Materias:
Acceso en línea:https://doaj.org/article/d71bad9471a74b6fb7f7e07543aef1e8
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:d71bad9471a74b6fb7f7e07543aef1e8
record_format dspace
spelling oai:doaj.org-article:d71bad9471a74b6fb7f7e07543aef1e82021-12-02T08:14:20ZSoftware Engineering-Based Design for a Bayesian Spam Filter1818-1171https://doaj.org/article/d71bad9471a74b6fb7f7e07543aef1e82010-01-01T00:00:00Zhttp://www.iasj.net/iasj?func=fulltext&aId=2331https://doaj.org/toc/1818-1171The rapid spread and the easy availability of a free e-mail service have made it the medium of choice for the sending of unsolicited advertising and bulk e-mail in general. These messages, known as junk e-mail or spam mail, are an increasing problem to both Internet users and Internet service providers (ISPs). <br />The research resolves one aspect of the spam problem by developing an appropriate filter for the e-mail client. The proposed filter is a combination of three forms of filters: Whitelist, Blacklist, and a Bayesian filter. Whitelist-based filter only accepts e-mails from known addresses. Blacklist filter blocks e-mails from addresses known to send out spam. Bayesian content-based filter makes estimations of spam probability based on the text and filters messages based on a pre-selected threshold.<br />The Bayesian filter is selected to be the main filter. The Bayesian filter is manually trained on a set of gathered e-mails; some of them are spam and the others are legitimate based on the contents of an e-mail. Thereafter the classification phase has been implemented for new entered e-mails. All the required databases are constructed in form of tables stored in the Structured Query Language (SQL) server. The filter at the client side can transparently access the database in order to carry on the intended filtering. The proposed system (e-mail client interface and the filters) can manage messages written in both Arabic and English languages which is crucial for the users in our region.<br />Software engineering principals are implemented throughout the design process to make the system less vulnerable to faults and easily maintained. The design steps have followed the Waterfall-model using the ASCENT software. A user-friendly interface has been developed to access the features of the spam filter at the client side. Visual Basic version 6 has been used to develop the system. As well, the SQL server has been implemented to build and process the database.<br />A number of performance measurements have been carried out with asset of gathered e-mails. The results are used to evaluate the performance of the filter and to prove its efficiency. <br />Mumtaz Mohammed Ali AL-MukhtarAl-Khwarizmi College of Engineering – University of BaghdadarticleSpamclient e-mailbayesian filterSQL serverwaterfall model.Chemical engineeringTP155-156Engineering (General). Civil engineering (General)TA1-2040ENAl-Khawarizmi Engineering Journal, Vol 6, Iss 2, Pp 83-92 (2010)
institution DOAJ
collection DOAJ
language EN
topic Spam
client e-mail
bayesian filter
SQL server
waterfall model.
Chemical engineering
TP155-156
Engineering (General). Civil engineering (General)
TA1-2040
spellingShingle Spam
client e-mail
bayesian filter
SQL server
waterfall model.
Chemical engineering
TP155-156
Engineering (General). Civil engineering (General)
TA1-2040
Mumtaz Mohammed Ali AL-Mukhtar
Software Engineering-Based Design for a Bayesian Spam Filter
description The rapid spread and the easy availability of a free e-mail service have made it the medium of choice for the sending of unsolicited advertising and bulk e-mail in general. These messages, known as junk e-mail or spam mail, are an increasing problem to both Internet users and Internet service providers (ISPs). <br />The research resolves one aspect of the spam problem by developing an appropriate filter for the e-mail client. The proposed filter is a combination of three forms of filters: Whitelist, Blacklist, and a Bayesian filter. Whitelist-based filter only accepts e-mails from known addresses. Blacklist filter blocks e-mails from addresses known to send out spam. Bayesian content-based filter makes estimations of spam probability based on the text and filters messages based on a pre-selected threshold.<br />The Bayesian filter is selected to be the main filter. The Bayesian filter is manually trained on a set of gathered e-mails; some of them are spam and the others are legitimate based on the contents of an e-mail. Thereafter the classification phase has been implemented for new entered e-mails. All the required databases are constructed in form of tables stored in the Structured Query Language (SQL) server. The filter at the client side can transparently access the database in order to carry on the intended filtering. The proposed system (e-mail client interface and the filters) can manage messages written in both Arabic and English languages which is crucial for the users in our region.<br />Software engineering principals are implemented throughout the design process to make the system less vulnerable to faults and easily maintained. The design steps have followed the Waterfall-model using the ASCENT software. A user-friendly interface has been developed to access the features of the spam filter at the client side. Visual Basic version 6 has been used to develop the system. As well, the SQL server has been implemented to build and process the database.<br />A number of performance measurements have been carried out with asset of gathered e-mails. The results are used to evaluate the performance of the filter and to prove its efficiency. <br />
format article
author Mumtaz Mohammed Ali AL-Mukhtar
author_facet Mumtaz Mohammed Ali AL-Mukhtar
author_sort Mumtaz Mohammed Ali AL-Mukhtar
title Software Engineering-Based Design for a Bayesian Spam Filter
title_short Software Engineering-Based Design for a Bayesian Spam Filter
title_full Software Engineering-Based Design for a Bayesian Spam Filter
title_fullStr Software Engineering-Based Design for a Bayesian Spam Filter
title_full_unstemmed Software Engineering-Based Design for a Bayesian Spam Filter
title_sort software engineering-based design for a bayesian spam filter
publisher Al-Khwarizmi College of Engineering – University of Baghdad
publishDate 2010
url https://doaj.org/article/d71bad9471a74b6fb7f7e07543aef1e8
work_keys_str_mv AT mumtazmohammedalialmukhtar softwareengineeringbaseddesignforabayesianspamfilter
_version_ 1718398629193973760