Enhance Text-to-Text Transfer Transformer with Generated Questions for Thai Question Answering
Question Answering (QA) is a natural language processing task that enables the machine to understand a given context and answer a given question. There are several QA research trials containing high resources of the English language. However, Thai is one of the languages that have low availability o...
Guardado en:
Autores principales: | , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
MDPI AG
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/ecf1cd17b35a4ae1b396740540879c07 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:ecf1cd17b35a4ae1b396740540879c07 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:ecf1cd17b35a4ae1b396740540879c072021-11-11T15:18:11ZEnhance Text-to-Text Transfer Transformer with Generated Questions for Thai Question Answering10.3390/app1121102672076-3417https://doaj.org/article/ecf1cd17b35a4ae1b396740540879c072021-11-01T00:00:00Zhttps://www.mdpi.com/2076-3417/11/21/10267https://doaj.org/toc/2076-3417Question Answering (QA) is a natural language processing task that enables the machine to understand a given context and answer a given question. There are several QA research trials containing high resources of the English language. However, Thai is one of the languages that have low availability of labeled corpora in QA studies. According to previous studies, while the English QA models could achieve more than 90% of F1 scores, Thai QA models could obtain only 70% in our baseline. In this study, we aim to improve the performance of Thai QA models by generating more question-answer pairs with Multilingual Text-to-Text Transfer Transformer (mT5) along with data preprocessing methods for Thai. With this method, the question-answer pairs can synthesize more than 100 thousand pairs from provided Thai Wikipedia articles. Utilizing our synthesized data, many fine-tuning strategies were investigated to achieve the highest model performance. Furthermore, we have presented that the syllable-level F1 is a more suitable evaluation measure than Exact Match (EM) and the word-level F1 for Thai QA corpora. The experiment was conducted on two Thai QA corpora: Thai Wiki QA and iApp Wiki QA. The results show that our augmented model is the winner on both datasets compared to other modern transformer models: Roberta and mT5.Puri PhakmongkolPeerapon VateekulMDPI AGarticlenatural language processingquestion answeringmachine reading comprehensionTechnologyTEngineering (General). Civil engineering (General)TA1-2040Biology (General)QH301-705.5PhysicsQC1-999ChemistryQD1-999ENApplied Sciences, Vol 11, Iss 10267, p 10267 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
natural language processing question answering machine reading comprehension Technology T Engineering (General). Civil engineering (General) TA1-2040 Biology (General) QH301-705.5 Physics QC1-999 Chemistry QD1-999 |
spellingShingle |
natural language processing question answering machine reading comprehension Technology T Engineering (General). Civil engineering (General) TA1-2040 Biology (General) QH301-705.5 Physics QC1-999 Chemistry QD1-999 Puri Phakmongkol Peerapon Vateekul Enhance Text-to-Text Transfer Transformer with Generated Questions for Thai Question Answering |
description |
Question Answering (QA) is a natural language processing task that enables the machine to understand a given context and answer a given question. There are several QA research trials containing high resources of the English language. However, Thai is one of the languages that have low availability of labeled corpora in QA studies. According to previous studies, while the English QA models could achieve more than 90% of F1 scores, Thai QA models could obtain only 70% in our baseline. In this study, we aim to improve the performance of Thai QA models by generating more question-answer pairs with Multilingual Text-to-Text Transfer Transformer (mT5) along with data preprocessing methods for Thai. With this method, the question-answer pairs can synthesize more than 100 thousand pairs from provided Thai Wikipedia articles. Utilizing our synthesized data, many fine-tuning strategies were investigated to achieve the highest model performance. Furthermore, we have presented that the syllable-level F1 is a more suitable evaluation measure than Exact Match (EM) and the word-level F1 for Thai QA corpora. The experiment was conducted on two Thai QA corpora: Thai Wiki QA and iApp Wiki QA. The results show that our augmented model is the winner on both datasets compared to other modern transformer models: Roberta and mT5. |
format |
article |
author |
Puri Phakmongkol Peerapon Vateekul |
author_facet |
Puri Phakmongkol Peerapon Vateekul |
author_sort |
Puri Phakmongkol |
title |
Enhance Text-to-Text Transfer Transformer with Generated Questions for Thai Question Answering |
title_short |
Enhance Text-to-Text Transfer Transformer with Generated Questions for Thai Question Answering |
title_full |
Enhance Text-to-Text Transfer Transformer with Generated Questions for Thai Question Answering |
title_fullStr |
Enhance Text-to-Text Transfer Transformer with Generated Questions for Thai Question Answering |
title_full_unstemmed |
Enhance Text-to-Text Transfer Transformer with Generated Questions for Thai Question Answering |
title_sort |
enhance text-to-text transfer transformer with generated questions for thai question answering |
publisher |
MDPI AG |
publishDate |
2021 |
url |
https://doaj.org/article/ecf1cd17b35a4ae1b396740540879c07 |
work_keys_str_mv |
AT puriphakmongkol enhancetexttotexttransfertransformerwithgeneratedquestionsforthaiquestionanswering AT peeraponvateekul enhancetexttotexttransfertransformerwithgeneratedquestionsforthaiquestionanswering |
_version_ |
1718435596610830337 |