Efficient Estimate of Low-Frequency Words’ Embeddings Based on the Dictionary: A Case Study on Chinese
Obtaining high-quality embeddings of out-of-vocabularies (OOVs) and low-frequency words is a challenge in natural language processing (NLP). To efficiently estimate the embeddings of OOVs and low-frequency words, we propose a new method that uses the dictionary to estimate the embeddings of OOVs and...
Guardado en:
Autores principales: | , , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
MDPI AG
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/e07fd774836c4f9d9dc1720ba49603f4 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:e07fd774836c4f9d9dc1720ba49603f4 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:e07fd774836c4f9d9dc1720ba49603f42021-11-25T16:42:58ZEfficient Estimate of Low-Frequency Words’ Embeddings Based on the Dictionary: A Case Study on Chinese10.3390/app1122110182076-3417https://doaj.org/article/e07fd774836c4f9d9dc1720ba49603f42021-11-01T00:00:00Zhttps://www.mdpi.com/2076-3417/11/22/11018https://doaj.org/toc/2076-3417Obtaining high-quality embeddings of out-of-vocabularies (OOVs) and low-frequency words is a challenge in natural language processing (NLP). To efficiently estimate the embeddings of OOVs and low-frequency words, we propose a new method that uses the dictionary to estimate the embeddings of OOVs and low-frequency words. More specifically, the explanatory note of an entry in dictionaries accurately describes the semantics of the corresponding word. Naturally, we adopt the sentence representation model to extract the semantics of the explanatory note and regard the semantics as the embedding of the corresponding word. We design a new sentence representation model to encode sentences to extract the semantics from the explanatory notes of entries more efficiently. Based on the assumption that the higher quality of word embeddings will lead to better performance, we design an extrinsic experiment to evaluate the quality of low-frequency words’ embeddings. The experimental results show that the embeddings of low-frequency words estimated by our proposed method have higher quality. In addition, both intrinsic and extrinsic experiments show that our proposed sentence representation model can represent the semantics of sentences well.Xianwen LiaoYongzhong HuangChangfu WeiChenhao ZhangYongqing DengKe YiMDPI AGarticlenatural language processingword embeddingBERTdictionaryTechnologyTEngineering (General). Civil engineering (General)TA1-2040Biology (General)QH301-705.5PhysicsQC1-999ChemistryQD1-999ENApplied Sciences, Vol 11, Iss 11018, p 11018 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
natural language processing word embedding BERT dictionary Technology T Engineering (General). Civil engineering (General) TA1-2040 Biology (General) QH301-705.5 Physics QC1-999 Chemistry QD1-999 |
spellingShingle |
natural language processing word embedding BERT dictionary Technology T Engineering (General). Civil engineering (General) TA1-2040 Biology (General) QH301-705.5 Physics QC1-999 Chemistry QD1-999 Xianwen Liao Yongzhong Huang Changfu Wei Chenhao Zhang Yongqing Deng Ke Yi Efficient Estimate of Low-Frequency Words’ Embeddings Based on the Dictionary: A Case Study on Chinese |
description |
Obtaining high-quality embeddings of out-of-vocabularies (OOVs) and low-frequency words is a challenge in natural language processing (NLP). To efficiently estimate the embeddings of OOVs and low-frequency words, we propose a new method that uses the dictionary to estimate the embeddings of OOVs and low-frequency words. More specifically, the explanatory note of an entry in dictionaries accurately describes the semantics of the corresponding word. Naturally, we adopt the sentence representation model to extract the semantics of the explanatory note and regard the semantics as the embedding of the corresponding word. We design a new sentence representation model to encode sentences to extract the semantics from the explanatory notes of entries more efficiently. Based on the assumption that the higher quality of word embeddings will lead to better performance, we design an extrinsic experiment to evaluate the quality of low-frequency words’ embeddings. The experimental results show that the embeddings of low-frequency words estimated by our proposed method have higher quality. In addition, both intrinsic and extrinsic experiments show that our proposed sentence representation model can represent the semantics of sentences well. |
format |
article |
author |
Xianwen Liao Yongzhong Huang Changfu Wei Chenhao Zhang Yongqing Deng Ke Yi |
author_facet |
Xianwen Liao Yongzhong Huang Changfu Wei Chenhao Zhang Yongqing Deng Ke Yi |
author_sort |
Xianwen Liao |
title |
Efficient Estimate of Low-Frequency Words’ Embeddings Based on the Dictionary: A Case Study on Chinese |
title_short |
Efficient Estimate of Low-Frequency Words’ Embeddings Based on the Dictionary: A Case Study on Chinese |
title_full |
Efficient Estimate of Low-Frequency Words’ Embeddings Based on the Dictionary: A Case Study on Chinese |
title_fullStr |
Efficient Estimate of Low-Frequency Words’ Embeddings Based on the Dictionary: A Case Study on Chinese |
title_full_unstemmed |
Efficient Estimate of Low-Frequency Words’ Embeddings Based on the Dictionary: A Case Study on Chinese |
title_sort |
efficient estimate of low-frequency words’ embeddings based on the dictionary: a case study on chinese |
publisher |
MDPI AG |
publishDate |
2021 |
url |
https://doaj.org/article/e07fd774836c4f9d9dc1720ba49603f4 |
work_keys_str_mv |
AT xianwenliao efficientestimateoflowfrequencywordsembeddingsbasedonthedictionaryacasestudyonchinese AT yongzhonghuang efficientestimateoflowfrequencywordsembeddingsbasedonthedictionaryacasestudyonchinese AT changfuwei efficientestimateoflowfrequencywordsembeddingsbasedonthedictionaryacasestudyonchinese AT chenhaozhang efficientestimateoflowfrequencywordsembeddingsbasedonthedictionaryacasestudyonchinese AT yongqingdeng efficientestimateoflowfrequencywordsembeddingsbasedonthedictionaryacasestudyonchinese AT keyi efficientestimateoflowfrequencywordsembeddingsbasedonthedictionaryacasestudyonchinese |
_version_ |
1718413049046499328 |