Improving outliers detection in data streams using LiCS and voting
Detecting outliers in real-time is increasingly important for many real-world applications such as detecting abnormal heart activity, intrusions to systems, spams or abnormal credit card transactions. However, detecting outliers in data streams rises many challenges such as high-dimensionality, dyna...
Guardado en:
Autores principales: | , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Elsevier
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/27690865aa2042bcb85cf54db30f0f6b |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:27690865aa2042bcb85cf54db30f0f6b |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:27690865aa2042bcb85cf54db30f0f6b2021-11-22T04:19:40ZImproving outliers detection in data streams using LiCS and voting1319-157810.1016/j.jksuci.2019.08.003https://doaj.org/article/27690865aa2042bcb85cf54db30f0f6b2021-12-01T00:00:00Zhttp://www.sciencedirect.com/science/article/pii/S1319157819301454https://doaj.org/toc/1319-1578Detecting outliers in real-time is increasingly important for many real-world applications such as detecting abnormal heart activity, intrusions to systems, spams or abnormal credit card transactions. However, detecting outliers in data streams rises many challenges such as high-dimensionality, dynamic data distribution and unpredictable relationships. Our simulations demonstrate that some advanced solutions still show drawbacks. In this paper, first, we improve the capacity to detect outliers of both micro-clusters based algorithms (MCOD) and distance-based algorithms (Abstract-C and Exact-Storm) known for their performance. This is by adding a layer called LiCS that classifies online the K-nearest-neighbors (Knn) of each node based on their evolutionary status. This layer aggregates the results and uses a count threshold to better classify nodes. Experiments on SpamBase datasets confirmed that our technique enhances the accuracy and the precision of such algorithm and helps to reduce the unclassified nodes.Second, we propose a hybrid solution based on iterative majority voting and our LiCS. Experiments on real data proves that it outperforms discussed algorithms in terms of accuracy, precision and sensitivity in detecting outliers. It also minimizes the issue of unclassified instances and consolidate the different outputs of algorithms.Fatima-Zahra BenjellounAhmed OussousAmine BennaniSamir BelfkihAyoub Ait LahcenElsevierarticleData streamsOutlier detectionHigh-dimensional dataBig data miningIntrusion detectionElectronic computers. Computer scienceQA75.5-76.95ENJournal of King Saud University: Computer and Information Sciences, Vol 33, Iss 10, Pp 1177-1185 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Data streams Outlier detection High-dimensional data Big data mining Intrusion detection Electronic computers. Computer science QA75.5-76.95 |
spellingShingle |
Data streams Outlier detection High-dimensional data Big data mining Intrusion detection Electronic computers. Computer science QA75.5-76.95 Fatima-Zahra Benjelloun Ahmed Oussous Amine Bennani Samir Belfkih Ayoub Ait Lahcen Improving outliers detection in data streams using LiCS and voting |
description |
Detecting outliers in real-time is increasingly important for many real-world applications such as detecting abnormal heart activity, intrusions to systems, spams or abnormal credit card transactions. However, detecting outliers in data streams rises many challenges such as high-dimensionality, dynamic data distribution and unpredictable relationships. Our simulations demonstrate that some advanced solutions still show drawbacks. In this paper, first, we improve the capacity to detect outliers of both micro-clusters based algorithms (MCOD) and distance-based algorithms (Abstract-C and Exact-Storm) known for their performance. This is by adding a layer called LiCS that classifies online the K-nearest-neighbors (Knn) of each node based on their evolutionary status. This layer aggregates the results and uses a count threshold to better classify nodes. Experiments on SpamBase datasets confirmed that our technique enhances the accuracy and the precision of such algorithm and helps to reduce the unclassified nodes.Second, we propose a hybrid solution based on iterative majority voting and our LiCS. Experiments on real data proves that it outperforms discussed algorithms in terms of accuracy, precision and sensitivity in detecting outliers. It also minimizes the issue of unclassified instances and consolidate the different outputs of algorithms. |
format |
article |
author |
Fatima-Zahra Benjelloun Ahmed Oussous Amine Bennani Samir Belfkih Ayoub Ait Lahcen |
author_facet |
Fatima-Zahra Benjelloun Ahmed Oussous Amine Bennani Samir Belfkih Ayoub Ait Lahcen |
author_sort |
Fatima-Zahra Benjelloun |
title |
Improving outliers detection in data streams using LiCS and voting |
title_short |
Improving outliers detection in data streams using LiCS and voting |
title_full |
Improving outliers detection in data streams using LiCS and voting |
title_fullStr |
Improving outliers detection in data streams using LiCS and voting |
title_full_unstemmed |
Improving outliers detection in data streams using LiCS and voting |
title_sort |
improving outliers detection in data streams using lics and voting |
publisher |
Elsevier |
publishDate |
2021 |
url |
https://doaj.org/article/27690865aa2042bcb85cf54db30f0f6b |
work_keys_str_mv |
AT fatimazahrabenjelloun improvingoutliersdetectionindatastreamsusinglicsandvoting AT ahmedoussous improvingoutliersdetectionindatastreamsusinglicsandvoting AT aminebennani improvingoutliersdetectionindatastreamsusinglicsandvoting AT samirbelfkih improvingoutliersdetectionindatastreamsusinglicsandvoting AT ayoubaitlahcen improvingoutliersdetectionindatastreamsusinglicsandvoting |
_version_ |
1718418212681416704 |