Malware detection based on semi-supervised learning with malware visualization

The traditional signature-based detection method requires detailed manual analysis to extract the signatures of malicious samples, and requires a large number of manual markers to maintain the signature library, which brings a great time and resource costs, and makes it difficult to adapt to the rap...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Tan Gao, Lan Zhao, Xudong Li, Wen Chen
Formato: article
Lenguaje:EN
Publicado: AIMS Press 2021
Materias:
Acceso en línea:https://doaj.org/article/0c87bf4d9e3e4872be29094823d4ce70
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:0c87bf4d9e3e4872be29094823d4ce70
record_format dspace
spelling oai:doaj.org-article:0c87bf4d9e3e4872be29094823d4ce702021-11-09T06:00:31ZMalware detection based on semi-supervised learning with malware visualization10.3934/mbe.20213001551-0018https://doaj.org/article/0c87bf4d9e3e4872be29094823d4ce702021-07-01T00:00:00Zhttps://www.aimspress.com/article/doi/10.3934/mbe.2021300?viewType=HTMLhttps://doaj.org/toc/1551-0018The traditional signature-based detection method requires detailed manual analysis to extract the signatures of malicious samples, and requires a large number of manual markers to maintain the signature library, which brings a great time and resource costs, and makes it difficult to adapt to the rapid generation and mutation of malware. Methods based on traditional machine learning often require a lot of time and resources in sample labeling, which results in a sufficient inventory of unlabeled samples but not directly usable. In view of these issues, this paper proposes an effective malware classification framework based on malware visualization and semi-supervised learning. This framework includes mainly three parts: malware visualization, feature extraction, and classification algorithm. Firstly, binary files are processed directly through visual methods, without assembly, decompression, and decryption; Then the global and local features of the gray image are extracted, and the visual image features extracted are fused on the whole by a special feature fusion method to eliminate the exclusion between different feature variables. Finally, an improved collaborative learning algorithm is proposed to continuously train and optimize the classifier by introducing features of inexpensive unlabeled samples. The proposed framework was evaluated over two extensively researched benchmark datasets, i.e., Malimg and Microsoft. The results show that compared with traditional machine learning algorithms, the improved collaborative learning algorithm can not only reduce the cost of sample labeling but also can continuously improve the model performance through the input of unlabeled samples, thereby achieving higher classification accuracy.Tan GaoLan ZhaoXudong Li Wen Chen AIMS Pressarticlemalicious sample detectioncollaborative learningfeature fusionnoise robustnessBiotechnologyTP248.13-248.65MathematicsQA1-939ENMathematical Biosciences and Engineering, Vol 18, Iss 5, Pp 5995-6011 (2021)
institution DOAJ
collection DOAJ
language EN
topic malicious sample detection
collaborative learning
feature fusion
noise robustness
Biotechnology
TP248.13-248.65
Mathematics
QA1-939
spellingShingle malicious sample detection
collaborative learning
feature fusion
noise robustness
Biotechnology
TP248.13-248.65
Mathematics
QA1-939
Tan Gao
Lan Zhao
Xudong Li
Wen Chen
Malware detection based on semi-supervised learning with malware visualization
description The traditional signature-based detection method requires detailed manual analysis to extract the signatures of malicious samples, and requires a large number of manual markers to maintain the signature library, which brings a great time and resource costs, and makes it difficult to adapt to the rapid generation and mutation of malware. Methods based on traditional machine learning often require a lot of time and resources in sample labeling, which results in a sufficient inventory of unlabeled samples but not directly usable. In view of these issues, this paper proposes an effective malware classification framework based on malware visualization and semi-supervised learning. This framework includes mainly three parts: malware visualization, feature extraction, and classification algorithm. Firstly, binary files are processed directly through visual methods, without assembly, decompression, and decryption; Then the global and local features of the gray image are extracted, and the visual image features extracted are fused on the whole by a special feature fusion method to eliminate the exclusion between different feature variables. Finally, an improved collaborative learning algorithm is proposed to continuously train and optimize the classifier by introducing features of inexpensive unlabeled samples. The proposed framework was evaluated over two extensively researched benchmark datasets, i.e., Malimg and Microsoft. The results show that compared with traditional machine learning algorithms, the improved collaborative learning algorithm can not only reduce the cost of sample labeling but also can continuously improve the model performance through the input of unlabeled samples, thereby achieving higher classification accuracy.
format article
author Tan Gao
Lan Zhao
Xudong Li
Wen Chen
author_facet Tan Gao
Lan Zhao
Xudong Li
Wen Chen
author_sort Tan Gao
title Malware detection based on semi-supervised learning with malware visualization
title_short Malware detection based on semi-supervised learning with malware visualization
title_full Malware detection based on semi-supervised learning with malware visualization
title_fullStr Malware detection based on semi-supervised learning with malware visualization
title_full_unstemmed Malware detection based on semi-supervised learning with malware visualization
title_sort malware detection based on semi-supervised learning with malware visualization
publisher AIMS Press
publishDate 2021
url https://doaj.org/article/0c87bf4d9e3e4872be29094823d4ce70
work_keys_str_mv AT tangao malwaredetectionbasedonsemisupervisedlearningwithmalwarevisualization
AT lanzhao malwaredetectionbasedonsemisupervisedlearningwithmalwarevisualization
AT xudongli malwaredetectionbasedonsemisupervisedlearningwithmalwarevisualization
AT wenchen malwaredetectionbasedonsemisupervisedlearningwithmalwarevisualization
_version_ 1718441266875727872