A comparative study of keyword extraction algorithms for English texts

This study mainly analyzed the keyword extraction of English text. First, two commonly used algorithms, the term frequency–inverse document frequency (TF–IDF) algorithm and the keyphrase extraction algorithm (KEA), were introduced. Then, an improved TF–IDF algorithm was designed, which improved the...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autor principal: Li Jinye
Formato: article
Lenguaje:EN
Publicado: De Gruyter 2021
Materias:
kea
Q
Acceso en línea:https://doaj.org/article/b4bc37e00cf5454ebd2972800844d3fc
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:b4bc37e00cf5454ebd2972800844d3fc
record_format dspace
spelling oai:doaj.org-article:b4bc37e00cf5454ebd2972800844d3fc2021-12-05T14:10:51ZA comparative study of keyword extraction algorithms for English texts2191-026X10.1515/jisys-2021-0040https://doaj.org/article/b4bc37e00cf5454ebd2972800844d3fc2021-07-01T00:00:00Zhttps://doi.org/10.1515/jisys-2021-0040https://doaj.org/toc/2191-026XThis study mainly analyzed the keyword extraction of English text. First, two commonly used algorithms, the term frequency–inverse document frequency (TF–IDF) algorithm and the keyphrase extraction algorithm (KEA), were introduced. Then, an improved TF–IDF algorithm was designed, which improved the calculation of word frequency, and it was combined with the position weight to improve the performance of keyword extraction. Finally, 100 English literature was selected from the British Academic Written English Corpus for the analysis experiment. The results showed that the improved TF–IDF algorithm had the shortest running time and took only 4.93 s in processing 100 texts; the precision of the algorithms decreased with the increase of the number of extracted keywords. The comparison between the two algorithms demonstrated that the improved TF–IDF algorithm had the best performance, with a precision rate of 71.2%, a recall rate of 52.98%, and an F 1 score of 60.75%, when five keywords were extracted from each article. The experimental results show that the improved TF–IDF algorithm is effective in extracting English text keywords, which can be further promoted and applied in practice.Li JinyeDe Gruyterarticleenglish textkeyword extractiontf–idf algorithmkeaScienceQElectronic computers. Computer scienceQA75.5-76.95ENJournal of Intelligent Systems, Vol 30, Iss 1, Pp 808-815 (2021)
institution DOAJ
collection DOAJ
language EN
topic english text
keyword extraction
tf–idf algorithm
kea
Science
Q
Electronic computers. Computer science
QA75.5-76.95
spellingShingle english text
keyword extraction
tf–idf algorithm
kea
Science
Q
Electronic computers. Computer science
QA75.5-76.95
Li Jinye
A comparative study of keyword extraction algorithms for English texts
description This study mainly analyzed the keyword extraction of English text. First, two commonly used algorithms, the term frequency–inverse document frequency (TF–IDF) algorithm and the keyphrase extraction algorithm (KEA), were introduced. Then, an improved TF–IDF algorithm was designed, which improved the calculation of word frequency, and it was combined with the position weight to improve the performance of keyword extraction. Finally, 100 English literature was selected from the British Academic Written English Corpus for the analysis experiment. The results showed that the improved TF–IDF algorithm had the shortest running time and took only 4.93 s in processing 100 texts; the precision of the algorithms decreased with the increase of the number of extracted keywords. The comparison between the two algorithms demonstrated that the improved TF–IDF algorithm had the best performance, with a precision rate of 71.2%, a recall rate of 52.98%, and an F 1 score of 60.75%, when five keywords were extracted from each article. The experimental results show that the improved TF–IDF algorithm is effective in extracting English text keywords, which can be further promoted and applied in practice.
format article
author Li Jinye
author_facet Li Jinye
author_sort Li Jinye
title A comparative study of keyword extraction algorithms for English texts
title_short A comparative study of keyword extraction algorithms for English texts
title_full A comparative study of keyword extraction algorithms for English texts
title_fullStr A comparative study of keyword extraction algorithms for English texts
title_full_unstemmed A comparative study of keyword extraction algorithms for English texts
title_sort comparative study of keyword extraction algorithms for english texts
publisher De Gruyter
publishDate 2021
url https://doaj.org/article/b4bc37e00cf5454ebd2972800844d3fc
work_keys_str_mv AT lijinye acomparativestudyofkeywordextractionalgorithmsforenglishtexts
AT lijinye comparativestudyofkeywordextractionalgorithmsforenglishtexts
_version_ 1718371685170675712