Improving outliers detection in data streams using LiCS and voting

Detecting outliers in real-time is increasingly important for many real-world applications such as detecting abnormal heart activity, intrusions to systems, spams or abnormal credit card transactions. However, detecting outliers in data streams rises many challenges such as high-dimensionality, dyna...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Fatima-Zahra Benjelloun, Ahmed Oussous, Amine Bennani, Samir Belfkih, Ayoub Ait Lahcen
Formato: article
Lenguaje:EN
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://doaj.org/article/27690865aa2042bcb85cf54db30f0f6b
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:27690865aa2042bcb85cf54db30f0f6b
record_format dspace
spelling oai:doaj.org-article:27690865aa2042bcb85cf54db30f0f6b2021-11-22T04:19:40ZImproving outliers detection in data streams using LiCS and voting1319-157810.1016/j.jksuci.2019.08.003https://doaj.org/article/27690865aa2042bcb85cf54db30f0f6b2021-12-01T00:00:00Zhttp://www.sciencedirect.com/science/article/pii/S1319157819301454https://doaj.org/toc/1319-1578Detecting outliers in real-time is increasingly important for many real-world applications such as detecting abnormal heart activity, intrusions to systems, spams or abnormal credit card transactions. However, detecting outliers in data streams rises many challenges such as high-dimensionality, dynamic data distribution and unpredictable relationships. Our simulations demonstrate that some advanced solutions still show drawbacks. In this paper, first, we improve the capacity to detect outliers of both micro-clusters based algorithms (MCOD) and distance-based algorithms (Abstract-C and Exact-Storm) known for their performance. This is by adding a layer called LiCS that classifies online the K-nearest-neighbors (Knn) of each node based on their evolutionary status. This layer aggregates the results and uses a count threshold to better classify nodes. Experiments on SpamBase datasets confirmed that our technique enhances the accuracy and the precision of such algorithm and helps to reduce the unclassified nodes.Second, we propose a hybrid solution based on iterative majority voting and our LiCS. Experiments on real data proves that it outperforms discussed algorithms in terms of accuracy, precision and sensitivity in detecting outliers. It also minimizes the issue of unclassified instances and consolidate the different outputs of algorithms.Fatima-Zahra BenjellounAhmed OussousAmine BennaniSamir BelfkihAyoub Ait LahcenElsevierarticleData streamsOutlier detectionHigh-dimensional dataBig data miningIntrusion detectionElectronic computers. Computer scienceQA75.5-76.95ENJournal of King Saud University: Computer and Information Sciences, Vol 33, Iss 10, Pp 1177-1185 (2021)
institution DOAJ
collection DOAJ
language EN
topic Data streams
Outlier detection
High-dimensional data
Big data mining
Intrusion detection
Electronic computers. Computer science
QA75.5-76.95
spellingShingle Data streams
Outlier detection
High-dimensional data
Big data mining
Intrusion detection
Electronic computers. Computer science
QA75.5-76.95
Fatima-Zahra Benjelloun
Ahmed Oussous
Amine Bennani
Samir Belfkih
Ayoub Ait Lahcen
Improving outliers detection in data streams using LiCS and voting
description Detecting outliers in real-time is increasingly important for many real-world applications such as detecting abnormal heart activity, intrusions to systems, spams or abnormal credit card transactions. However, detecting outliers in data streams rises many challenges such as high-dimensionality, dynamic data distribution and unpredictable relationships. Our simulations demonstrate that some advanced solutions still show drawbacks. In this paper, first, we improve the capacity to detect outliers of both micro-clusters based algorithms (MCOD) and distance-based algorithms (Abstract-C and Exact-Storm) known for their performance. This is by adding a layer called LiCS that classifies online the K-nearest-neighbors (Knn) of each node based on their evolutionary status. This layer aggregates the results and uses a count threshold to better classify nodes. Experiments on SpamBase datasets confirmed that our technique enhances the accuracy and the precision of such algorithm and helps to reduce the unclassified nodes.Second, we propose a hybrid solution based on iterative majority voting and our LiCS. Experiments on real data proves that it outperforms discussed algorithms in terms of accuracy, precision and sensitivity in detecting outliers. It also minimizes the issue of unclassified instances and consolidate the different outputs of algorithms.
format article
author Fatima-Zahra Benjelloun
Ahmed Oussous
Amine Bennani
Samir Belfkih
Ayoub Ait Lahcen
author_facet Fatima-Zahra Benjelloun
Ahmed Oussous
Amine Bennani
Samir Belfkih
Ayoub Ait Lahcen
author_sort Fatima-Zahra Benjelloun
title Improving outliers detection in data streams using LiCS and voting
title_short Improving outliers detection in data streams using LiCS and voting
title_full Improving outliers detection in data streams using LiCS and voting
title_fullStr Improving outliers detection in data streams using LiCS and voting
title_full_unstemmed Improving outliers detection in data streams using LiCS and voting
title_sort improving outliers detection in data streams using lics and voting
publisher Elsevier
publishDate 2021
url https://doaj.org/article/27690865aa2042bcb85cf54db30f0f6b
work_keys_str_mv AT fatimazahrabenjelloun improvingoutliersdetectionindatastreamsusinglicsandvoting
AT ahmedoussous improvingoutliersdetectionindatastreamsusinglicsandvoting
AT aminebennani improvingoutliersdetectionindatastreamsusinglicsandvoting
AT samirbelfkih improvingoutliersdetectionindatastreamsusinglicsandvoting
AT ayoubaitlahcen improvingoutliersdetectionindatastreamsusinglicsandvoting
_version_ 1718418212681416704