A Survey of Part of Speech Tagging of Latin and non-Latin Script Languages: A more vivid view on Persian

This research is a general overview of the Latin script languages part of speech (POS) tagging with a specific focus on the non-Latin script languages, especially Persian. The study reviews the progress in POS tagging among the 23 highest native spoken languages in the world. Some of these languages...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Meisam Moghadam, Niloufar Jafarpour
Formato: article
Lenguaje:AR
EN
FA
RU
Publicado: Language Art 2021
Materias:
P
Acceso en línea:https://doaj.org/article/fc666b183be242808dec24dc558b3932
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:fc666b183be242808dec24dc558b3932
record_format dspace
spelling oai:doaj.org-article:fc666b183be242808dec24dc558b39322021-11-26T21:24:37ZA Survey of Part of Speech Tagging of Latin and non-Latin Script Languages: A more vivid view on Persian2476-65262538-271310.22046/LA.2021.05https://doaj.org/article/fc666b183be242808dec24dc558b39322021-02-01T00:00:00Zhttps://www.languageart.ir/index.php/LA/article/view/180https://doaj.org/toc/2476-6526https://doaj.org/toc/2538-2713This research is a general overview of the Latin script languages part of speech (POS) tagging with a specific focus on the non-Latin script languages, especially Persian. The study reviews the progress in POS tagging among the 23 highest native spoken languages in the world. Some of these languages follow the right-to-left (RTL) writing system such as Arabic, Urdu and Persian which have their own specific issues in POS tagging. This paper also goes through the issues and challenges which occurs during the tokenization and part of speech tagging of these languages. The challenges can be common between the languages or be specified to one. The Persian Language is chosen as the main interest of this paper and an attempt is made to critically overview the recent studies on Persian part of speech tagging and enumerate the specific challenges occurring in these studies. Reviewing the bulk of literature and examining the features, challenges, issues, and POS tagging tools in Persian, it was concluded that significant challenges of the researches on Persian were generally in the tokenization level and mostly as a result of using the Arabic script and its characteristics.Meisam MoghadamNiloufar JafarpourLanguage Artarticlepart of speech tagging, latin script language, non-latin script language, rtl system, persian languageLanguage and LiteraturePLanguage. Linguistic theory. Comparative grammarP101-410ARENFARUHunar-i zabān, Vol 6, Iss 1, Pp 75-90 (2021)
institution DOAJ
collection DOAJ
language AR
EN
FA
RU
topic part of speech tagging, latin script language, non-latin script language, rtl system, persian language
Language and Literature
P
Language. Linguistic theory. Comparative grammar
P101-410
spellingShingle part of speech tagging, latin script language, non-latin script language, rtl system, persian language
Language and Literature
P
Language. Linguistic theory. Comparative grammar
P101-410
Meisam Moghadam
Niloufar Jafarpour
A Survey of Part of Speech Tagging of Latin and non-Latin Script Languages: A more vivid view on Persian
description This research is a general overview of the Latin script languages part of speech (POS) tagging with a specific focus on the non-Latin script languages, especially Persian. The study reviews the progress in POS tagging among the 23 highest native spoken languages in the world. Some of these languages follow the right-to-left (RTL) writing system such as Arabic, Urdu and Persian which have their own specific issues in POS tagging. This paper also goes through the issues and challenges which occurs during the tokenization and part of speech tagging of these languages. The challenges can be common between the languages or be specified to one. The Persian Language is chosen as the main interest of this paper and an attempt is made to critically overview the recent studies on Persian part of speech tagging and enumerate the specific challenges occurring in these studies. Reviewing the bulk of literature and examining the features, challenges, issues, and POS tagging tools in Persian, it was concluded that significant challenges of the researches on Persian were generally in the tokenization level and mostly as a result of using the Arabic script and its characteristics.
format article
author Meisam Moghadam
Niloufar Jafarpour
author_facet Meisam Moghadam
Niloufar Jafarpour
author_sort Meisam Moghadam
title A Survey of Part of Speech Tagging of Latin and non-Latin Script Languages: A more vivid view on Persian
title_short A Survey of Part of Speech Tagging of Latin and non-Latin Script Languages: A more vivid view on Persian
title_full A Survey of Part of Speech Tagging of Latin and non-Latin Script Languages: A more vivid view on Persian
title_fullStr A Survey of Part of Speech Tagging of Latin and non-Latin Script Languages: A more vivid view on Persian
title_full_unstemmed A Survey of Part of Speech Tagging of Latin and non-Latin Script Languages: A more vivid view on Persian
title_sort survey of part of speech tagging of latin and non-latin script languages: a more vivid view on persian
publisher Language Art
publishDate 2021
url https://doaj.org/article/fc666b183be242808dec24dc558b3932
work_keys_str_mv AT meisammoghadam asurveyofpartofspeechtaggingoflatinandnonlatinscriptlanguagesamorevividviewonpersian
AT niloufarjafarpour asurveyofpartofspeechtaggingoflatinandnonlatinscriptlanguagesamorevividviewonpersian
AT meisammoghadam surveyofpartofspeechtaggingoflatinandnonlatinscriptlanguagesamorevividviewonpersian
AT niloufarjafarpour surveyofpartofspeechtaggingoflatinandnonlatinscriptlanguagesamorevividviewonpersian
_version_ 1718409267728351232