Dataset Generation for Development of Multi-Node Cyber Threat Detection Systems

This paper presents a new approach to generate datasets for cyber threat research in a multi-node system. For this purpose, the proof-of-concept of such a system is implemented. The system will be used to collect unique datasets with examples of information hiding techniques. These techniques are no...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Jędrzej Bieniasz, Krzysztof Szczypiorski
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
Acceso en línea:https://doaj.org/article/c2e10f9808da4a71a167022b5299b0bb
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:c2e10f9808da4a71a167022b5299b0bb
record_format dspace
spelling oai:doaj.org-article:c2e10f9808da4a71a167022b5299b0bb2021-11-11T15:42:17ZDataset Generation for Development of Multi-Node Cyber Threat Detection Systems10.3390/electronics102127112079-9292https://doaj.org/article/c2e10f9808da4a71a167022b5299b0bb2021-11-01T00:00:00Zhttps://www.mdpi.com/2079-9292/10/21/2711https://doaj.org/toc/2079-9292This paper presents a new approach to generate datasets for cyber threat research in a multi-node system. For this purpose, the proof-of-concept of such a system is implemented. The system will be used to collect unique datasets with examples of information hiding techniques. These techniques are not present in publicly available cyber threat detection datasets, while the cyber threats that use them represent an emerging cyber defense challenge worldwide. The network data were collected thanks to the development of a dedicated application that automatically generates random network configurations and runs scenarios of information hiding techniques. The generated datasets were used in the data-driven research workflow for cyber threat detection, including the generation of data representations (network flows), feature selection based on correlations, data augmentation of training datasets, and preparation of machine learning classifiers based on Random Forest and Multilayer Perceptron architectures. The presented results show the usefulness and correctness of the design process to detect information hiding techniques. The challenges and research directions to detect cyber deception methods are discussed in general in the paper.Jędrzej BieniaszKrzysztof SzczypiorskiMDPI AGarticlecybersecuritydata sciencemachine learningdatasetscyber threats modelingmulti-agent systemsElectronicsTK7800-8360ENElectronics, Vol 10, Iss 2711, p 2711 (2021)
institution DOAJ
collection DOAJ
language EN
topic cybersecurity
data science
machine learning
datasets
cyber threats modeling
multi-agent systems
Electronics
TK7800-8360
spellingShingle cybersecurity
data science
machine learning
datasets
cyber threats modeling
multi-agent systems
Electronics
TK7800-8360
Jędrzej Bieniasz
Krzysztof Szczypiorski
Dataset Generation for Development of Multi-Node Cyber Threat Detection Systems
description This paper presents a new approach to generate datasets for cyber threat research in a multi-node system. For this purpose, the proof-of-concept of such a system is implemented. The system will be used to collect unique datasets with examples of information hiding techniques. These techniques are not present in publicly available cyber threat detection datasets, while the cyber threats that use them represent an emerging cyber defense challenge worldwide. The network data were collected thanks to the development of a dedicated application that automatically generates random network configurations and runs scenarios of information hiding techniques. The generated datasets were used in the data-driven research workflow for cyber threat detection, including the generation of data representations (network flows), feature selection based on correlations, data augmentation of training datasets, and preparation of machine learning classifiers based on Random Forest and Multilayer Perceptron architectures. The presented results show the usefulness and correctness of the design process to detect information hiding techniques. The challenges and research directions to detect cyber deception methods are discussed in general in the paper.
format article
author Jędrzej Bieniasz
Krzysztof Szczypiorski
author_facet Jędrzej Bieniasz
Krzysztof Szczypiorski
author_sort Jędrzej Bieniasz
title Dataset Generation for Development of Multi-Node Cyber Threat Detection Systems
title_short Dataset Generation for Development of Multi-Node Cyber Threat Detection Systems
title_full Dataset Generation for Development of Multi-Node Cyber Threat Detection Systems
title_fullStr Dataset Generation for Development of Multi-Node Cyber Threat Detection Systems
title_full_unstemmed Dataset Generation for Development of Multi-Node Cyber Threat Detection Systems
title_sort dataset generation for development of multi-node cyber threat detection systems
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/c2e10f9808da4a71a167022b5299b0bb
work_keys_str_mv AT jedrzejbieniasz datasetgenerationfordevelopmentofmultinodecyberthreatdetectionsystems
AT krzysztofszczypiorski datasetgenerationfordevelopmentofmultinodecyberthreatdetectionsystems
_version_ 1718434114726526976