Enhance Text-to-Text Transfer Transformer with Generated Questions for Thai Question Answering

Question Answering (QA) is a natural language processing task that enables the machine to understand a given context and answer a given question. There are several QA research trials containing high resources of the English language. However, Thai is one of the languages that have low availability o...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Puri Phakmongkol, Peerapon Vateekul
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
T
Acceso en línea:https://doaj.org/article/ecf1cd17b35a4ae1b396740540879c07
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:ecf1cd17b35a4ae1b396740540879c07
record_format dspace
spelling oai:doaj.org-article:ecf1cd17b35a4ae1b396740540879c072021-11-11T15:18:11ZEnhance Text-to-Text Transfer Transformer with Generated Questions for Thai Question Answering10.3390/app1121102672076-3417https://doaj.org/article/ecf1cd17b35a4ae1b396740540879c072021-11-01T00:00:00Zhttps://www.mdpi.com/2076-3417/11/21/10267https://doaj.org/toc/2076-3417Question Answering (QA) is a natural language processing task that enables the machine to understand a given context and answer a given question. There are several QA research trials containing high resources of the English language. However, Thai is one of the languages that have low availability of labeled corpora in QA studies. According to previous studies, while the English QA models could achieve more than 90% of F1 scores, Thai QA models could obtain only 70% in our baseline. In this study, we aim to improve the performance of Thai QA models by generating more question-answer pairs with Multilingual Text-to-Text Transfer Transformer (mT5) along with data preprocessing methods for Thai. With this method, the question-answer pairs can synthesize more than 100 thousand pairs from provided Thai Wikipedia articles. Utilizing our synthesized data, many fine-tuning strategies were investigated to achieve the highest model performance. Furthermore, we have presented that the syllable-level F1 is a more suitable evaluation measure than Exact Match (EM) and the word-level F1 for Thai QA corpora. The experiment was conducted on two Thai QA corpora: Thai Wiki QA and iApp Wiki QA. The results show that our augmented model is the winner on both datasets compared to other modern transformer models: Roberta and mT5.Puri PhakmongkolPeerapon VateekulMDPI AGarticlenatural language processingquestion answeringmachine reading comprehensionTechnologyTEngineering (General). Civil engineering (General)TA1-2040Biology (General)QH301-705.5PhysicsQC1-999ChemistryQD1-999ENApplied Sciences, Vol 11, Iss 10267, p 10267 (2021)
institution DOAJ
collection DOAJ
language EN
topic natural language processing
question answering
machine reading comprehension
Technology
T
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999
spellingShingle natural language processing
question answering
machine reading comprehension
Technology
T
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999
Puri Phakmongkol
Peerapon Vateekul
Enhance Text-to-Text Transfer Transformer with Generated Questions for Thai Question Answering
description Question Answering (QA) is a natural language processing task that enables the machine to understand a given context and answer a given question. There are several QA research trials containing high resources of the English language. However, Thai is one of the languages that have low availability of labeled corpora in QA studies. According to previous studies, while the English QA models could achieve more than 90% of F1 scores, Thai QA models could obtain only 70% in our baseline. In this study, we aim to improve the performance of Thai QA models by generating more question-answer pairs with Multilingual Text-to-Text Transfer Transformer (mT5) along with data preprocessing methods for Thai. With this method, the question-answer pairs can synthesize more than 100 thousand pairs from provided Thai Wikipedia articles. Utilizing our synthesized data, many fine-tuning strategies were investigated to achieve the highest model performance. Furthermore, we have presented that the syllable-level F1 is a more suitable evaluation measure than Exact Match (EM) and the word-level F1 for Thai QA corpora. The experiment was conducted on two Thai QA corpora: Thai Wiki QA and iApp Wiki QA. The results show that our augmented model is the winner on both datasets compared to other modern transformer models: Roberta and mT5.
format article
author Puri Phakmongkol
Peerapon Vateekul
author_facet Puri Phakmongkol
Peerapon Vateekul
author_sort Puri Phakmongkol
title Enhance Text-to-Text Transfer Transformer with Generated Questions for Thai Question Answering
title_short Enhance Text-to-Text Transfer Transformer with Generated Questions for Thai Question Answering
title_full Enhance Text-to-Text Transfer Transformer with Generated Questions for Thai Question Answering
title_fullStr Enhance Text-to-Text Transfer Transformer with Generated Questions for Thai Question Answering
title_full_unstemmed Enhance Text-to-Text Transfer Transformer with Generated Questions for Thai Question Answering
title_sort enhance text-to-text transfer transformer with generated questions for thai question answering
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/ecf1cd17b35a4ae1b396740540879c07
work_keys_str_mv AT puriphakmongkol enhancetexttotexttransfertransformerwithgeneratedquestionsforthaiquestionanswering
AT peeraponvateekul enhancetexttotexttransfertransformerwithgeneratedquestionsforthaiquestionanswering
_version_ 1718435596610830337