Improving Text-to-Code Generation with Features of Code Graph on GPT-2

Code generation, as a very hot application area of deep learning models for text, consists of two different fields: code-to-code and text-to-code. A recent approach, GraphCodeBERT uses code graph, which is called data flow, and showed good performance improvement. The base model architecture of it i...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Incheon Paik, Jun-Wei Wang
Formato:	article
Lenguaje:	EN
Publicado:	MDPI AG 2021
Materias:	code generation data flow BERT AST GPT-2 Electronics TK7800-8360
Acceso en línea:	https://doaj.org/article/0dfd2a59b1ae40249d53cb816a316f1d
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:0dfd2a59b1ae40249d53cb816a316f1d
record_format	dspace
spelling	oai:doaj.org-article:0dfd2a59b1ae40249d53cb816a316f1d2021-11-11T15:42:01ZImproving Text-to-Code Generation with Features of Code Graph on GPT-210.3390/electronics102127062079-9292https://doaj.org/article/0dfd2a59b1ae40249d53cb816a316f1d2021-11-01T00:00:00Zhttps://www.mdpi.com/2079-9292/10/21/2706https://doaj.org/toc/2079-9292Code generation, as a very hot application area of deep learning models for text, consists of two different fields: code-to-code and text-to-code. A recent approach, GraphCodeBERT uses code graph, which is called data flow, and showed good performance improvement. The base model architecture of it is bidirectional encoder representations from transformers (BERT), which uses the encoder part of a transformer. On the other hand, generative pre-trained transformer (GPT)—another multiple transformer architecture—uses the decoder part and shows great performance in the multilayer perceptron model. In this study, we investigate the improvement of code graphs with several variances on GPT-2 to refer to the abstract semantic tree used to collect the features of variables in the code. Here, we mainly focus on GPT-2 with additional features of code graphs that allow the model to learn the effect of the data stream. The experimental phase is divided into two parts: fine-tuning of the existing GPT-2 model, and pre-training from scratch using code data. When we pre-train a new model from scratch, the model produces an outperformed result compared with using the code graph with enough data.Incheon PaikJun-Wei WangMDPI AGarticlecode generationdata flowBERTASTGPT-2ElectronicsTK7800-8360ENElectronics, Vol 10, Iss 2706, p 2706 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	code generation data flow BERT AST GPT-2 Electronics TK7800-8360
spellingShingle	code generation data flow BERT AST GPT-2 Electronics TK7800-8360 Incheon Paik Jun-Wei Wang Improving Text-to-Code Generation with Features of Code Graph on GPT-2
description	Code generation, as a very hot application area of deep learning models for text, consists of two different fields: code-to-code and text-to-code. A recent approach, GraphCodeBERT uses code graph, which is called data flow, and showed good performance improvement. The base model architecture of it is bidirectional encoder representations from transformers (BERT), which uses the encoder part of a transformer. On the other hand, generative pre-trained transformer (GPT)—another multiple transformer architecture—uses the decoder part and shows great performance in the multilayer perceptron model. In this study, we investigate the improvement of code graphs with several variances on GPT-2 to refer to the abstract semantic tree used to collect the features of variables in the code. Here, we mainly focus on GPT-2 with additional features of code graphs that allow the model to learn the effect of the data stream. The experimental phase is divided into two parts: fine-tuning of the existing GPT-2 model, and pre-training from scratch using code data. When we pre-train a new model from scratch, the model produces an outperformed result compared with using the code graph with enough data.
format	article
author	Incheon Paik Jun-Wei Wang
author_facet	Incheon Paik Jun-Wei Wang
author_sort	Incheon Paik
title	Improving Text-to-Code Generation with Features of Code Graph on GPT-2
title_short	Improving Text-to-Code Generation with Features of Code Graph on GPT-2
title_full	Improving Text-to-Code Generation with Features of Code Graph on GPT-2
title_fullStr	Improving Text-to-Code Generation with Features of Code Graph on GPT-2
title_full_unstemmed	Improving Text-to-Code Generation with Features of Code Graph on GPT-2
title_sort	improving text-to-code generation with features of code graph on gpt-2
publisher	MDPI AG
publishDate	2021
url	https://doaj.org/article/0dfd2a59b1ae40249d53cb816a316f1d
work_keys_str_mv	AT incheonpaik improvingtexttocodegenerationwithfeaturesofcodegraphongpt2 AT junweiwang improvingtexttocodegenerationwithfeaturesofcodegraphongpt2
_version_	1718434067490275328

Improving Text-to-Code Generation with Features of Code Graph on GPT-2

Ejemplares similares