High-Performance English–Chinese Machine Translation Based on GPU-Enabled Deep Neural Networks with Domain Corpus

The ability to automate machine translation has various applications in international commerce, medicine, travel, education, and text digitization. Due to the different grammar and lack of clear word boundaries in Chinese, it is challenging to conduct translation from word-based languages (e.g., Eng...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Lanxin Zhao, Wanrong Gao, Jianbin Fang
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
T
Acceso en línea:https://doaj.org/article/7d69fc6909834514a76851fe72bfceb4
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:7d69fc6909834514a76851fe72bfceb4
record_format dspace
spelling oai:doaj.org-article:7d69fc6909834514a76851fe72bfceb42021-11-25T16:40:43ZHigh-Performance English–Chinese Machine Translation Based on GPU-Enabled Deep Neural Networks with Domain Corpus10.3390/app1122109152076-3417https://doaj.org/article/7d69fc6909834514a76851fe72bfceb42021-11-01T00:00:00Zhttps://www.mdpi.com/2076-3417/11/22/10915https://doaj.org/toc/2076-3417The ability to automate machine translation has various applications in international commerce, medicine, travel, education, and text digitization. Due to the different grammar and lack of clear word boundaries in Chinese, it is challenging to conduct translation from word-based languages (e.g., English) to Chinese. This article has implemented a GPU-enabled deep learning machine translation system based on a domain-specific corpus. Our system takes English text as input and uses an encoder-decoder model with an attention mechanism based on Google’s Transformer to translate the text to Chinese output. The model was trained using a simple self-designed entropy loss function and an Adam optimizer on English–Chinese bilingual text sentences from the News area of the UM-Corpus. The parallel training process of our model can be performed on common laptops, desktops, and servers with one or more GPUs. At training time, we not only track loss over training epochs but also measure the quality of our model’s translations with the BLEU score. We also provide an easy-to-use web interface for users so as to manage corpus, training projects, and trained models. The experimental results show that we can achieve a maximum BLEU score of 29.2. We can further improve this score by tuning other hyperparameters. The GPU-enabled model training runs over 15x faster than on a multi-core CPU, which facilitates us having a shorter turn-around time. As a case study, we compare the performance of our model to that of Baidu’s, which shows that our model can compete with the industry-level translation system. We argue that our deep-learning-based translation system is particularly suitable for teaching purposes and small/medium-sized enterprises.Lanxin ZhaoWanrong GaoJianbin FangMDPI AGarticleneural machine translationtransformerGPUsmulti-domain corpusTechnologyTEngineering (General). Civil engineering (General)TA1-2040Biology (General)QH301-705.5PhysicsQC1-999ChemistryQD1-999ENApplied Sciences, Vol 11, Iss 10915, p 10915 (2021)
institution DOAJ
collection DOAJ
language EN
topic neural machine translation
transformer
GPUs
multi-domain corpus
Technology
T
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999
spellingShingle neural machine translation
transformer
GPUs
multi-domain corpus
Technology
T
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999
Lanxin Zhao
Wanrong Gao
Jianbin Fang
High-Performance English–Chinese Machine Translation Based on GPU-Enabled Deep Neural Networks with Domain Corpus
description The ability to automate machine translation has various applications in international commerce, medicine, travel, education, and text digitization. Due to the different grammar and lack of clear word boundaries in Chinese, it is challenging to conduct translation from word-based languages (e.g., English) to Chinese. This article has implemented a GPU-enabled deep learning machine translation system based on a domain-specific corpus. Our system takes English text as input and uses an encoder-decoder model with an attention mechanism based on Google’s Transformer to translate the text to Chinese output. The model was trained using a simple self-designed entropy loss function and an Adam optimizer on English–Chinese bilingual text sentences from the News area of the UM-Corpus. The parallel training process of our model can be performed on common laptops, desktops, and servers with one or more GPUs. At training time, we not only track loss over training epochs but also measure the quality of our model’s translations with the BLEU score. We also provide an easy-to-use web interface for users so as to manage corpus, training projects, and trained models. The experimental results show that we can achieve a maximum BLEU score of 29.2. We can further improve this score by tuning other hyperparameters. The GPU-enabled model training runs over 15x faster than on a multi-core CPU, which facilitates us having a shorter turn-around time. As a case study, we compare the performance of our model to that of Baidu’s, which shows that our model can compete with the industry-level translation system. We argue that our deep-learning-based translation system is particularly suitable for teaching purposes and small/medium-sized enterprises.
format article
author Lanxin Zhao
Wanrong Gao
Jianbin Fang
author_facet Lanxin Zhao
Wanrong Gao
Jianbin Fang
author_sort Lanxin Zhao
title High-Performance English–Chinese Machine Translation Based on GPU-Enabled Deep Neural Networks with Domain Corpus
title_short High-Performance English–Chinese Machine Translation Based on GPU-Enabled Deep Neural Networks with Domain Corpus
title_full High-Performance English–Chinese Machine Translation Based on GPU-Enabled Deep Neural Networks with Domain Corpus
title_fullStr High-Performance English–Chinese Machine Translation Based on GPU-Enabled Deep Neural Networks with Domain Corpus
title_full_unstemmed High-Performance English–Chinese Machine Translation Based on GPU-Enabled Deep Neural Networks with Domain Corpus
title_sort high-performance english–chinese machine translation based on gpu-enabled deep neural networks with domain corpus
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/7d69fc6909834514a76851fe72bfceb4
work_keys_str_mv AT lanxinzhao highperformanceenglishchinesemachinetranslationbasedongpuenableddeepneuralnetworkswithdomaincorpus
AT wanronggao highperformanceenglishchinesemachinetranslationbasedongpuenableddeepneuralnetworkswithdomaincorpus
AT jianbinfang highperformanceenglishchinesemachinetranslationbasedongpuenableddeepneuralnetworkswithdomaincorpus
_version_ 1718413086088495104