Short Text Classification Using Contextual Analysis

Micro blogging tools provide a real time service for the public to express opinions, to broadcast news and information and offer an opportunity to comment and respond to such output. Word usage in social media is continually evolving. Micro bloggers may use different sets of words to describe a spec...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Sami Al Sulaimani, Andrew Starkey
Formato: article
Lenguaje:EN
Publicado: IEEE 2021
Materias:
Acceso en línea:https://doaj.org/article/027ee14735574f97b30170336adaa229
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:027ee14735574f97b30170336adaa229
record_format dspace
spelling oai:doaj.org-article:027ee14735574f97b30170336adaa2292021-11-18T00:02:29ZShort Text Classification Using Contextual Analysis2169-353610.1109/ACCESS.2021.3125768https://doaj.org/article/027ee14735574f97b30170336adaa2292021-01-01T00:00:00Zhttps://ieeexplore.ieee.org/document/9605590/https://doaj.org/toc/2169-3536Micro blogging tools provide a real time service for the public to express opinions, to broadcast news and information and offer an opportunity to comment and respond to such output. Word usage in social media is continually evolving. Micro bloggers may use different sets of words to describe a specific event and they may use new words (i.e. neither exist in the training dataset nor in informal or formal dictionaries) or use words in new contexts. Dynamically capturing new words and their potential meaning from their context can help to reflect the words relationship in social media, which then can be useful for solving various problems, like the event classification task. Different approaches have been proposed in this regard, one of them is Contextual Analysis. This paper focuses on examining the potential of this approach for grouping short texts (tweets) talking about the same event into the same category. A new transparent method for text multi-class categorization is presented. It uses the Contextual Analysis approach to capture the most important words in the context of an event and to detect the usage of similar words in different contexts. In order to test the efficacy in these areas, this study evaluates the performance of the proposed method and other well known methods, such as Naïve Bayes, Support Vector Machines, K-Nearest Neighbors and Convolutional Neural Networks. On average, the experiments’ results show that the proposed multi-class classification method can effectively categorize tweets into various event groups, with a high f1-measure score f1>97.09% and f1>95.27%, in the imbalanced classes and high number of classes experiments, respectively. However, similar to the baseline methods, the performance is negatively influenced by the imbalanced dataset. The Convolutional Neural Networks method produces the best performance among the other algorithms with f1>97.74% in all experiments, which is 1.73% and 2.72% higher than the lowest performance of Naive Bayes and K-Nearest Neighbors, respectively, but does not meet the requirements of transparency of results.Sami Al SulaimaniAndrew StarkeyIEEEarticleText analysisevent classificationcontextual analysissupervised machine learningElectrical engineering. Electronics. Nuclear engineeringTK1-9971ENIEEE Access, Vol 9, Pp 149619-149629 (2021)
institution DOAJ
collection DOAJ
language EN
topic Text analysis
event classification
contextual analysis
supervised machine learning
Electrical engineering. Electronics. Nuclear engineering
TK1-9971
spellingShingle Text analysis
event classification
contextual analysis
supervised machine learning
Electrical engineering. Electronics. Nuclear engineering
TK1-9971
Sami Al Sulaimani
Andrew Starkey
Short Text Classification Using Contextual Analysis
description Micro blogging tools provide a real time service for the public to express opinions, to broadcast news and information and offer an opportunity to comment and respond to such output. Word usage in social media is continually evolving. Micro bloggers may use different sets of words to describe a specific event and they may use new words (i.e. neither exist in the training dataset nor in informal or formal dictionaries) or use words in new contexts. Dynamically capturing new words and their potential meaning from their context can help to reflect the words relationship in social media, which then can be useful for solving various problems, like the event classification task. Different approaches have been proposed in this regard, one of them is Contextual Analysis. This paper focuses on examining the potential of this approach for grouping short texts (tweets) talking about the same event into the same category. A new transparent method for text multi-class categorization is presented. It uses the Contextual Analysis approach to capture the most important words in the context of an event and to detect the usage of similar words in different contexts. In order to test the efficacy in these areas, this study evaluates the performance of the proposed method and other well known methods, such as Naïve Bayes, Support Vector Machines, K-Nearest Neighbors and Convolutional Neural Networks. On average, the experiments’ results show that the proposed multi-class classification method can effectively categorize tweets into various event groups, with a high f1-measure score f1>97.09% and f1>95.27%, in the imbalanced classes and high number of classes experiments, respectively. However, similar to the baseline methods, the performance is negatively influenced by the imbalanced dataset. The Convolutional Neural Networks method produces the best performance among the other algorithms with f1>97.74% in all experiments, which is 1.73% and 2.72% higher than the lowest performance of Naive Bayes and K-Nearest Neighbors, respectively, but does not meet the requirements of transparency of results.
format article
author Sami Al Sulaimani
Andrew Starkey
author_facet Sami Al Sulaimani
Andrew Starkey
author_sort Sami Al Sulaimani
title Short Text Classification Using Contextual Analysis
title_short Short Text Classification Using Contextual Analysis
title_full Short Text Classification Using Contextual Analysis
title_fullStr Short Text Classification Using Contextual Analysis
title_full_unstemmed Short Text Classification Using Contextual Analysis
title_sort short text classification using contextual analysis
publisher IEEE
publishDate 2021
url https://doaj.org/article/027ee14735574f97b30170336adaa229
work_keys_str_mv AT samialsulaimani shorttextclassificationusingcontextualanalysis
AT andrewstarkey shorttextclassificationusingcontextualanalysis
_version_ 1718425203406536704