Application Of Machine Learning Methods To Compare Disciplines Content Using Text Data

The paper investigates one of the approaches based on machine learning methods aimed at finding and identifying similar disciplines. In the research we used two most popular methods of machine learning to process text data BERT and Doc2Vec. Machine learning was conducted using the datasets of variou...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Roman Kupriyanov, Dmitry Zvonarev, Ruslan Suleymanov
Formato: article
Lenguaje:EN
Publicado: FRUCT 2021
Materias:
Acceso en línea:https://doaj.org/article/36e925b6df394c2e8f9ab6483f73ab39
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:36e925b6df394c2e8f9ab6483f73ab39
record_format dspace
spelling oai:doaj.org-article:36e925b6df394c2e8f9ab6483f73ab392021-11-20T15:59:33ZApplication Of Machine Learning Methods To Compare Disciplines Content Using Text Data2305-72542343-073710.23919/FRUCT53335.2021.9599988https://doaj.org/article/36e925b6df394c2e8f9ab6483f73ab392021-10-01T00:00:00Zhttps://www.fruct.org/publications/fruct30/files/Kup.pdfhttps://doaj.org/toc/2305-7254https://doaj.org/toc/2343-0737The paper investigates one of the approaches based on machine learning methods aimed at finding and identifying similar disciplines. In the research we used two most popular methods of machine learning to process text data BERT and Doc2Vec. Machine learning was conducted using the datasets of various disciplines with the total of 2,5 million entries. To assess the quality of the developed models, 30 experts from different scientific fields were engaged in the study to evaluate the level of similarity between the disciplines defined by the trained models. Based on the results of the research, both methods trained using identical datasets generated similar outputs. Another algorithm Doc2Vec, trained on a relatively small data sample with 15 000 entries of the target discipline database that included disciplines descriptions and curriculums, showed better results which justifies the need for developing specific solutions for particular tasks. Further development of machine learning methods and models design to solve specific tasks in the educational field will promote digitalization of education within the area of university operations management.Roman KupriyanovDmitry ZvonarevRuslan SuleymanovFRUCTarticletext miningeducational data miningeducationtext's similarityTelecommunicationTK5101-6720ENProceedings of the XXth Conference of Open Innovations Association FRUCT, Vol 30, Iss 1, Pp 115-120 (2021)
institution DOAJ
collection DOAJ
language EN
topic text mining
educational data mining
education
text's similarity
Telecommunication
TK5101-6720
spellingShingle text mining
educational data mining
education
text's similarity
Telecommunication
TK5101-6720
Roman Kupriyanov
Dmitry Zvonarev
Ruslan Suleymanov
Application Of Machine Learning Methods To Compare Disciplines Content Using Text Data
description The paper investigates one of the approaches based on machine learning methods aimed at finding and identifying similar disciplines. In the research we used two most popular methods of machine learning to process text data BERT and Doc2Vec. Machine learning was conducted using the datasets of various disciplines with the total of 2,5 million entries. To assess the quality of the developed models, 30 experts from different scientific fields were engaged in the study to evaluate the level of similarity between the disciplines defined by the trained models. Based on the results of the research, both methods trained using identical datasets generated similar outputs. Another algorithm Doc2Vec, trained on a relatively small data sample with 15 000 entries of the target discipline database that included disciplines descriptions and curriculums, showed better results which justifies the need for developing specific solutions for particular tasks. Further development of machine learning methods and models design to solve specific tasks in the educational field will promote digitalization of education within the area of university operations management.
format article
author Roman Kupriyanov
Dmitry Zvonarev
Ruslan Suleymanov
author_facet Roman Kupriyanov
Dmitry Zvonarev
Ruslan Suleymanov
author_sort Roman Kupriyanov
title Application Of Machine Learning Methods To Compare Disciplines Content Using Text Data
title_short Application Of Machine Learning Methods To Compare Disciplines Content Using Text Data
title_full Application Of Machine Learning Methods To Compare Disciplines Content Using Text Data
title_fullStr Application Of Machine Learning Methods To Compare Disciplines Content Using Text Data
title_full_unstemmed Application Of Machine Learning Methods To Compare Disciplines Content Using Text Data
title_sort application of machine learning methods to compare disciplines content using text data
publisher FRUCT
publishDate 2021
url https://doaj.org/article/36e925b6df394c2e8f9ab6483f73ab39
work_keys_str_mv AT romankupriyanov applicationofmachinelearningmethodstocomparedisciplinescontentusingtextdata
AT dmitryzvonarev applicationofmachinelearningmethodstocomparedisciplinescontentusingtextdata
AT ruslansuleymanov applicationofmachinelearningmethodstocomparedisciplinescontentusingtextdata
_version_ 1718419417102024704