Machine Translation in Low-Resource Languages by an Adversarial Neural Network

Existing Sequence-to-Sequence (Seq2Seq) Neural Machine Translation (NMT) shows strong capability with High-Resource Languages (HRLs). However, this approach poses serious challenges when processing Low-Resource Languages (LRLs), because the model expression is limited by the training scale of parall...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Mengtao Sun, Hao Wang, Mark Pasquine, Ibrahim A. Hameed
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
T
Acceso en línea:https://doaj.org/article/c042e7e12b1748c3830acce4f976d5e0
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:c042e7e12b1748c3830acce4f976d5e0
record_format dspace
spelling oai:doaj.org-article:c042e7e12b1748c3830acce4f976d5e02021-11-25T16:39:30ZMachine Translation in Low-Resource Languages by an Adversarial Neural Network10.3390/app1122108602076-3417https://doaj.org/article/c042e7e12b1748c3830acce4f976d5e02021-11-01T00:00:00Zhttps://www.mdpi.com/2076-3417/11/22/10860https://doaj.org/toc/2076-3417Existing Sequence-to-Sequence (Seq2Seq) Neural Machine Translation (NMT) shows strong capability with High-Resource Languages (HRLs). However, this approach poses serious challenges when processing Low-Resource Languages (LRLs), because the model expression is limited by the training scale of parallel sentence pairs. This study utilizes adversary and transfer learning techniques to mitigate the lack of sentence pairs in LRL corpora. We propose a new Low resource, Adversarial, Cross-lingual (LAC) model for NMT. In terms of the adversary technique, LAC model consists of a generator and discriminator. The generator is a Seq2Seq model that produces the translations from source to target languages, while the discriminator measures the gap between machine and human translations. In addition, we introduce transfer learning on LAC model to help capture the features in rare resources because some languages share the same subject-verb-object grammatical structure. Rather than using the entire pretrained LAC model, we separately utilize the pretrained generator and discriminator. The pretrained discriminator exhibited better performance in all experiments. Experimental results demonstrate that the LAC model achieves higher Bilingual Evaluation Understudy (BLEU) scores and has good potential to augment LRL translations.Mengtao SunHao WangMark PasquineIbrahim A. HameedMDPI AGarticlemachine learningadversarial machine learningimbalanced datasetstransfer learningTechnologyTEngineering (General). Civil engineering (General)TA1-2040Biology (General)QH301-705.5PhysicsQC1-999ChemistryQD1-999ENApplied Sciences, Vol 11, Iss 10860, p 10860 (2021)
institution DOAJ
collection DOAJ
language EN
topic machine learning
adversarial machine learning
imbalanced datasets
transfer learning
Technology
T
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999
spellingShingle machine learning
adversarial machine learning
imbalanced datasets
transfer learning
Technology
T
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999
Mengtao Sun
Hao Wang
Mark Pasquine
Ibrahim A. Hameed
Machine Translation in Low-Resource Languages by an Adversarial Neural Network
description Existing Sequence-to-Sequence (Seq2Seq) Neural Machine Translation (NMT) shows strong capability with High-Resource Languages (HRLs). However, this approach poses serious challenges when processing Low-Resource Languages (LRLs), because the model expression is limited by the training scale of parallel sentence pairs. This study utilizes adversary and transfer learning techniques to mitigate the lack of sentence pairs in LRL corpora. We propose a new Low resource, Adversarial, Cross-lingual (LAC) model for NMT. In terms of the adversary technique, LAC model consists of a generator and discriminator. The generator is a Seq2Seq model that produces the translations from source to target languages, while the discriminator measures the gap between machine and human translations. In addition, we introduce transfer learning on LAC model to help capture the features in rare resources because some languages share the same subject-verb-object grammatical structure. Rather than using the entire pretrained LAC model, we separately utilize the pretrained generator and discriminator. The pretrained discriminator exhibited better performance in all experiments. Experimental results demonstrate that the LAC model achieves higher Bilingual Evaluation Understudy (BLEU) scores and has good potential to augment LRL translations.
format article
author Mengtao Sun
Hao Wang
Mark Pasquine
Ibrahim A. Hameed
author_facet Mengtao Sun
Hao Wang
Mark Pasquine
Ibrahim A. Hameed
author_sort Mengtao Sun
title Machine Translation in Low-Resource Languages by an Adversarial Neural Network
title_short Machine Translation in Low-Resource Languages by an Adversarial Neural Network
title_full Machine Translation in Low-Resource Languages by an Adversarial Neural Network
title_fullStr Machine Translation in Low-Resource Languages by an Adversarial Neural Network
title_full_unstemmed Machine Translation in Low-Resource Languages by an Adversarial Neural Network
title_sort machine translation in low-resource languages by an adversarial neural network
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/c042e7e12b1748c3830acce4f976d5e0
work_keys_str_mv AT mengtaosun machinetranslationinlowresourcelanguagesbyanadversarialneuralnetwork
AT haowang machinetranslationinlowresourcelanguagesbyanadversarialneuralnetwork
AT markpasquine machinetranslationinlowresourcelanguagesbyanadversarialneuralnetwork
AT ibrahimahameed machinetranslationinlowresourcelanguagesbyanadversarialneuralnetwork
_version_ 1718413100582961152