FOCT: Fast Overlapping Clustering for Textual Data

Text clustering is used to extract specific information from textual data and even categorizes text based on topic and sentiment. Due to inherent overlapping in textual documents, overlapping clustering algorithms have become a suitable approach for text analysing. However, state-of-the-art algorith...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Atefeh Khazaei, Hamidreza Khaleghzadeh, Mohammad Ghasemzadeh
Formato:	article
Lenguaje:	EN
Publicado:	IEEE 2021
Materias:	FOCT overlapping clustering self-organizing feature maps text mining Electrical engineering. Electronics. Nuclear engineering TK1-9971
Acceso en línea:	https://doaj.org/article/eab1e46741194d2dbf29a5068e5ee23e
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:eab1e46741194d2dbf29a5068e5ee23e
record_format	dspace
spelling	oai:doaj.org-article:eab1e46741194d2dbf29a5068e5ee23e2021-12-03T00:00:30ZFOCT: Fast Overlapping Clustering for Textual Data2169-353610.1109/ACCESS.2021.3130094https://doaj.org/article/eab1e46741194d2dbf29a5068e5ee23e2021-01-01T00:00:00Zhttps://ieeexplore.ieee.org/document/9624964/https://doaj.org/toc/2169-3536Text clustering is used to extract specific information from textual data and even categorizes text based on topic and sentiment. Due to inherent overlapping in textual documents, overlapping clustering algorithms have become a suitable approach for text analysing. However, state-of-the-art algorithms are not fast enough to analyse a large volume of textual data within tolerable time limits. In this research, we propose our text clustering algorithm, FOCT, which is a fast overlapping extension of SOM, one of the best algorithms for clustering textual data. We apply some heuristics to extract special characteristics presented in textual data and establish a very fast overlapping clustering algorithm. We use fast methods to represent the vectors of documents, compute the similarity of documents and neurons and update the weights of neurons. In our algorithm, each document can belong to one or more neurons and this is in line with what many documents have in their essence. We analyse the efficiency of the proposed algorithm over k-means, OKM, SOM and OSOM clustering approaches and experimentally demonstrate that it runs 12 to 690 times faster, and the overlap size of FOCT clusters is closer to the overlap size of the original data. The quality of clusters is also measured by four different internal and external evaluation criteria where FOCT clusters represent up to 64% better quality.Atefeh KhazaeiHamidreza KhaleghzadehMohammad GhasemzadehIEEEarticleFOCToverlapping clusteringself-organizing feature mapstext miningElectrical engineering. Electronics. Nuclear engineeringTK1-9971ENIEEE Access, Vol 9, Pp 157670-157680 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	FOCT overlapping clustering self-organizing feature maps text mining Electrical engineering. Electronics. Nuclear engineering TK1-9971
spellingShingle	FOCT overlapping clustering self-organizing feature maps text mining Electrical engineering. Electronics. Nuclear engineering TK1-9971 Atefeh Khazaei Hamidreza Khaleghzadeh Mohammad Ghasemzadeh FOCT: Fast Overlapping Clustering for Textual Data
description	Text clustering is used to extract specific information from textual data and even categorizes text based on topic and sentiment. Due to inherent overlapping in textual documents, overlapping clustering algorithms have become a suitable approach for text analysing. However, state-of-the-art algorithms are not fast enough to analyse a large volume of textual data within tolerable time limits. In this research, we propose our text clustering algorithm, FOCT, which is a fast overlapping extension of SOM, one of the best algorithms for clustering textual data. We apply some heuristics to extract special characteristics presented in textual data and establish a very fast overlapping clustering algorithm. We use fast methods to represent the vectors of documents, compute the similarity of documents and neurons and update the weights of neurons. In our algorithm, each document can belong to one or more neurons and this is in line with what many documents have in their essence. We analyse the efficiency of the proposed algorithm over k-means, OKM, SOM and OSOM clustering approaches and experimentally demonstrate that it runs 12 to 690 times faster, and the overlap size of FOCT clusters is closer to the overlap size of the original data. The quality of clusters is also measured by four different internal and external evaluation criteria where FOCT clusters represent up to 64% better quality.
format	article
author	Atefeh Khazaei Hamidreza Khaleghzadeh Mohammad Ghasemzadeh
author_facet	Atefeh Khazaei Hamidreza Khaleghzadeh Mohammad Ghasemzadeh
author_sort	Atefeh Khazaei
title	FOCT: Fast Overlapping Clustering for Textual Data
title_short	FOCT: Fast Overlapping Clustering for Textual Data
title_full	FOCT: Fast Overlapping Clustering for Textual Data
title_fullStr	FOCT: Fast Overlapping Clustering for Textual Data
title_full_unstemmed	FOCT: Fast Overlapping Clustering for Textual Data
title_sort	foct: fast overlapping clustering for textual data
publisher	IEEE
publishDate	2021
url	https://doaj.org/article/eab1e46741194d2dbf29a5068e5ee23e
work_keys_str_mv	AT atefehkhazaei foctfastoverlappingclusteringfortextualdata AT hamidrezakhaleghzadeh foctfastoverlappingclusteringfortextualdata AT mohammadghasemzadeh foctfastoverlappingclusteringfortextualdata
_version_	1718374021811142656

FOCT: Fast Overlapping Clustering for Textual Data

Ejemplares similares