A Context-Aware Neural Embedding for Function-Level Vulnerability Detection

Exploitable vulnerabilities in software systems are major security concerns. To date, machine learning (ML) based solutions have been proposed to automate and accelerate the detection of vulnerabilities. Most ML techniques aim to isolate a unit of source code, be it a line or a function, as being vu...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Hongwei Wei, Guanjun Lin, Lin Li, Heming Jia
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
Acceso en línea:https://doaj.org/article/27cd83465d714f0ab5bfa399d1c9a74a
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:27cd83465d714f0ab5bfa399d1c9a74a
record_format dspace
spelling oai:doaj.org-article:27cd83465d714f0ab5bfa399d1c9a74a2021-11-25T16:13:20ZA Context-Aware Neural Embedding for Function-Level Vulnerability Detection10.3390/a141103351999-4893https://doaj.org/article/27cd83465d714f0ab5bfa399d1c9a74a2021-11-01T00:00:00Zhttps://www.mdpi.com/1999-4893/14/11/335https://doaj.org/toc/1999-4893Exploitable vulnerabilities in software systems are major security concerns. To date, machine learning (ML) based solutions have been proposed to automate and accelerate the detection of vulnerabilities. Most ML techniques aim to isolate a unit of source code, be it a line or a function, as being vulnerable. We argue that a code segment is vulnerable if it exists in certain semantic contexts, such as the control flow and data flow; therefore, it is important for the detection to be context aware. In this paper, we evaluate the performance of mainstream word embedding techniques in the scenario of software vulnerability detection. Based on the evaluation, we propose a supervised framework leveraging pre-trained context-aware embeddings from language models (ELMo) to capture deep contextual representations, further summarized by a bidirectional long short-term memory (Bi-LSTM) layer for learning long-range code dependency. The framework takes directly a source code function as an input and produces corresponding function embeddings, which can be treated as feature sets for conventional ML classifiers. Experimental results showed that the proposed framework yielded the best performance in its downstream detection tasks. Using the feature representations generated by our framework, random forest and support vector machine outperformed four baseline systems on our data sets, demonstrating that the framework incorporated with ELMo can effectively capture the vulnerable data flow patterns and facilitate the vulnerability detection task.Hongwei WeiGuanjun LinLin LiHeming JiaMDPI AGarticlecode neural embeddingcontextual learningvulnerability detectionIndustrial engineering. Management engineeringT55.4-60.8Electronic computers. Computer scienceQA75.5-76.95ENAlgorithms, Vol 14, Iss 335, p 335 (2021)
institution DOAJ
collection DOAJ
language EN
topic code neural embedding
contextual learning
vulnerability detection
Industrial engineering. Management engineering
T55.4-60.8
Electronic computers. Computer science
QA75.5-76.95
spellingShingle code neural embedding
contextual learning
vulnerability detection
Industrial engineering. Management engineering
T55.4-60.8
Electronic computers. Computer science
QA75.5-76.95
Hongwei Wei
Guanjun Lin
Lin Li
Heming Jia
A Context-Aware Neural Embedding for Function-Level Vulnerability Detection
description Exploitable vulnerabilities in software systems are major security concerns. To date, machine learning (ML) based solutions have been proposed to automate and accelerate the detection of vulnerabilities. Most ML techniques aim to isolate a unit of source code, be it a line or a function, as being vulnerable. We argue that a code segment is vulnerable if it exists in certain semantic contexts, such as the control flow and data flow; therefore, it is important for the detection to be context aware. In this paper, we evaluate the performance of mainstream word embedding techniques in the scenario of software vulnerability detection. Based on the evaluation, we propose a supervised framework leveraging pre-trained context-aware embeddings from language models (ELMo) to capture deep contextual representations, further summarized by a bidirectional long short-term memory (Bi-LSTM) layer for learning long-range code dependency. The framework takes directly a source code function as an input and produces corresponding function embeddings, which can be treated as feature sets for conventional ML classifiers. Experimental results showed that the proposed framework yielded the best performance in its downstream detection tasks. Using the feature representations generated by our framework, random forest and support vector machine outperformed four baseline systems on our data sets, demonstrating that the framework incorporated with ELMo can effectively capture the vulnerable data flow patterns and facilitate the vulnerability detection task.
format article
author Hongwei Wei
Guanjun Lin
Lin Li
Heming Jia
author_facet Hongwei Wei
Guanjun Lin
Lin Li
Heming Jia
author_sort Hongwei Wei
title A Context-Aware Neural Embedding for Function-Level Vulnerability Detection
title_short A Context-Aware Neural Embedding for Function-Level Vulnerability Detection
title_full A Context-Aware Neural Embedding for Function-Level Vulnerability Detection
title_fullStr A Context-Aware Neural Embedding for Function-Level Vulnerability Detection
title_full_unstemmed A Context-Aware Neural Embedding for Function-Level Vulnerability Detection
title_sort context-aware neural embedding for function-level vulnerability detection
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/27cd83465d714f0ab5bfa399d1c9a74a
work_keys_str_mv AT hongweiwei acontextawareneuralembeddingforfunctionlevelvulnerabilitydetection
AT guanjunlin acontextawareneuralembeddingforfunctionlevelvulnerabilitydetection
AT linli acontextawareneuralembeddingforfunctionlevelvulnerabilitydetection
AT hemingjia acontextawareneuralembeddingforfunctionlevelvulnerabilitydetection
AT hongweiwei contextawareneuralembeddingforfunctionlevelvulnerabilitydetection
AT guanjunlin contextawareneuralembeddingforfunctionlevelvulnerabilitydetection
AT linli contextawareneuralembeddingforfunctionlevelvulnerabilitydetection
AT hemingjia contextawareneuralembeddingforfunctionlevelvulnerabilitydetection
_version_ 1718413245858971648