Grammatical categories determination for Turkish and Kazakh languages based on machine learning algorithms and fulfilling dictionaries of link grammar parser

This research is aimed at identifying the parts of speech for the Kazakh and Turkish languages in an information retrieval system. The proposed algorithms are based on machine learning techniques. In this paper, we consider the binary classification of words according to parts of speech. We decided...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Aigerim Yerimbetova, Madina Tussupova, Madina Sambetbayeva, Mussa Turdalyuly, Bakzhan Sakenov
Formato: article
Lenguaje:EN
RU
UK
Publicado: PC Technology Center 2021
Materias:
Acceso en línea:https://doaj.org/article/56c34b6bc69a421498854528eacad6ff
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:56c34b6bc69a421498854528eacad6ff
record_format dspace
spelling oai:doaj.org-article:56c34b6bc69a421498854528eacad6ff2021-11-04T14:06:13ZGrammatical categories determination for Turkish and Kazakh languages based on machine learning algorithms and fulfilling dictionaries of link grammar parser1729-37741729-406110.15587/1729-4061.2021.238743https://doaj.org/article/56c34b6bc69a421498854528eacad6ff2021-10-01T00:00:00Zhttp://journals.uran.ua/eejet/article/view/238743https://doaj.org/toc/1729-3774https://doaj.org/toc/1729-4061This research is aimed at identifying the parts of speech for the Kazakh and Turkish languages in an information retrieval system. The proposed algorithms are based on machine learning techniques. In this paper, we consider the binary classification of words according to parts of speech. We decided to take the most popular machine learning algorithms. In this paper, the following approaches and well-known machine learning algorithms are studied and considered. We defined 7 dictionaries and tagged 135 million words in Kazakh and 9 dictionaries and 50 million words in the Turkish language. The main problem considered in the paper is to create algorithms for the execution of dictionaries of the so-called Link Grammar Parser (LGP) system, in particular for the Kazakh and Turkish languages, using machine learning techniques. The focus of the research is on the review and comparison of machine learning algorithms and methods that have accomplished results on various natural language processing tasks such as grammatical categories determination. For the operation of the LGP system, a dictionary is created in which a connector for each word is indicated – the type of connection that can be created using this word. The authors considered methods of filling in LGP dictionaries using machine learning.  The complexities of natural language processing, however, do not exclude the possibility of identifying narrower tasks that can already be solved algorithmically: for example, determining parts of speech or splitting texts into logical groups. However, some features of natural languages significantly reduce the effectiveness of these solutions. Thus, taking into account all word forms for each word in the Kazakh and Turkish languages increases the complexity of text processing by an order of magnitudeAigerim YerimbetovaMadina TussupovaMadina SambetbayevaMussa TurdalyulyBakzhan SakenovPC Technology Centerarticlenatural language processingpart-of-speechmachine learning algorithmsagglutinative languageword2vecTechnology (General)T1-995IndustryHD2321-4730.9ENRUUKEastern-European Journal of Enterprise Technologies, Vol 5, Iss 2 (113), Pp 55-65 (2021)
institution DOAJ
collection DOAJ
language EN
RU
UK
topic natural language processing
part-of-speech
machine learning algorithms
agglutinative language
word2vec
Technology (General)
T1-995
Industry
HD2321-4730.9
spellingShingle natural language processing
part-of-speech
machine learning algorithms
agglutinative language
word2vec
Technology (General)
T1-995
Industry
HD2321-4730.9
Aigerim Yerimbetova
Madina Tussupova
Madina Sambetbayeva
Mussa Turdalyuly
Bakzhan Sakenov
Grammatical categories determination for Turkish and Kazakh languages based on machine learning algorithms and fulfilling dictionaries of link grammar parser
description This research is aimed at identifying the parts of speech for the Kazakh and Turkish languages in an information retrieval system. The proposed algorithms are based on machine learning techniques. In this paper, we consider the binary classification of words according to parts of speech. We decided to take the most popular machine learning algorithms. In this paper, the following approaches and well-known machine learning algorithms are studied and considered. We defined 7 dictionaries and tagged 135 million words in Kazakh and 9 dictionaries and 50 million words in the Turkish language. The main problem considered in the paper is to create algorithms for the execution of dictionaries of the so-called Link Grammar Parser (LGP) system, in particular for the Kazakh and Turkish languages, using machine learning techniques. The focus of the research is on the review and comparison of machine learning algorithms and methods that have accomplished results on various natural language processing tasks such as grammatical categories determination. For the operation of the LGP system, a dictionary is created in which a connector for each word is indicated – the type of connection that can be created using this word. The authors considered methods of filling in LGP dictionaries using machine learning.  The complexities of natural language processing, however, do not exclude the possibility of identifying narrower tasks that can already be solved algorithmically: for example, determining parts of speech or splitting texts into logical groups. However, some features of natural languages significantly reduce the effectiveness of these solutions. Thus, taking into account all word forms for each word in the Kazakh and Turkish languages increases the complexity of text processing by an order of magnitude
format article
author Aigerim Yerimbetova
Madina Tussupova
Madina Sambetbayeva
Mussa Turdalyuly
Bakzhan Sakenov
author_facet Aigerim Yerimbetova
Madina Tussupova
Madina Sambetbayeva
Mussa Turdalyuly
Bakzhan Sakenov
author_sort Aigerim Yerimbetova
title Grammatical categories determination for Turkish and Kazakh languages based on machine learning algorithms and fulfilling dictionaries of link grammar parser
title_short Grammatical categories determination for Turkish and Kazakh languages based on machine learning algorithms and fulfilling dictionaries of link grammar parser
title_full Grammatical categories determination for Turkish and Kazakh languages based on machine learning algorithms and fulfilling dictionaries of link grammar parser
title_fullStr Grammatical categories determination for Turkish and Kazakh languages based on machine learning algorithms and fulfilling dictionaries of link grammar parser
title_full_unstemmed Grammatical categories determination for Turkish and Kazakh languages based on machine learning algorithms and fulfilling dictionaries of link grammar parser
title_sort grammatical categories determination for turkish and kazakh languages based on machine learning algorithms and fulfilling dictionaries of link grammar parser
publisher PC Technology Center
publishDate 2021
url https://doaj.org/article/56c34b6bc69a421498854528eacad6ff
work_keys_str_mv AT aigerimyerimbetova grammaticalcategoriesdeterminationforturkishandkazakhlanguagesbasedonmachinelearningalgorithmsandfulfillingdictionariesoflinkgrammarparser
AT madinatussupova grammaticalcategoriesdeterminationforturkishandkazakhlanguagesbasedonmachinelearningalgorithmsandfulfillingdictionariesoflinkgrammarparser
AT madinasambetbayeva grammaticalcategoriesdeterminationforturkishandkazakhlanguagesbasedonmachinelearningalgorithmsandfulfillingdictionariesoflinkgrammarparser
AT mussaturdalyuly grammaticalcategoriesdeterminationforturkishandkazakhlanguagesbasedonmachinelearningalgorithmsandfulfillingdictionariesoflinkgrammarparser
AT bakzhansakenov grammaticalcategoriesdeterminationforturkishandkazakhlanguagesbasedonmachinelearningalgorithmsandfulfillingdictionariesoflinkgrammarparser
_version_ 1718444839149764608