Event classification from the Urdu language text on social media

The real-time availability of the Internet has engaged millions of users around the world. The usage of regional languages is being preferred for effective and ease of communication that is causing multilingual data on social networks and news channels. People share ideas, opinions, and events that...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Malik Daler Ali Awan, Nadeem Iqbal Kajla, Amnah Firdous, Mujtaba Husnain, Malik Muhammad Saad Missen
Formato: article
Lenguaje:EN
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://doaj.org/article/74cdda872e97477aab1220104afa6ea8
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:74cdda872e97477aab1220104afa6ea8
record_format dspace
spelling oai:doaj.org-article:74cdda872e97477aab1220104afa6ea82021-11-20T15:05:12ZEvent classification from the Urdu language text on social media10.7717/peerj-cs.7752376-5992https://doaj.org/article/74cdda872e97477aab1220104afa6ea82021-11-01T00:00:00Zhttps://peerj.com/articles/cs-775.pdfhttps://peerj.com/articles/cs-775/https://doaj.org/toc/2376-5992The real-time availability of the Internet has engaged millions of users around the world. The usage of regional languages is being preferred for effective and ease of communication that is causing multilingual data on social networks and news channels. People share ideas, opinions, and events that are happening globally i.e., sports, inflation, protest, explosion, and sexual assault, etc. in regional (local) languages on social media. Extraction and classification of events from multilingual data have become bottlenecks because of resource lacking. In this research paper, we presented the event classification task for the Urdu language text existing on social media and the news channels by using machine learning classifiers. The dataset contains more than 0.1 million (102,962) labeled instances of twelve (12) different types of events. The title, its length, and the last four words of a sentence are used as features to classify the events. The Term Frequency-Inverse Document Frequency (tf-idf) showed the best results as a feature vector to evaluate the performance of the six popular machine learning classifiers. Random Forest (RF) and K-Nearest Neighbor (KNN) are among the classifiers that out-performed among other classifiers by achieving 98.00% and 99.00% accuracy, respectively. The novelty lies in the fact that the features aforementioned are not applied, up to the best of our knowledge, in the event extraction of the text written in the Urdu language.Malik Daler Ali AwanNadeem Iqbal KajlaAmnah FirdousMujtaba HusnainMalik Muhammad Saad MissenPeerJ Inc.articleSocial mediaEvent classificationResource poorMachine learningTextNatural language processingElectronic computers. Computer scienceQA75.5-76.95ENPeerJ Computer Science, Vol 7, p e775 (2021)
institution DOAJ
collection DOAJ
language EN
topic Social media
Event classification
Resource poor
Machine learning
Text
Natural language processing
Electronic computers. Computer science
QA75.5-76.95
spellingShingle Social media
Event classification
Resource poor
Machine learning
Text
Natural language processing
Electronic computers. Computer science
QA75.5-76.95
Malik Daler Ali Awan
Nadeem Iqbal Kajla
Amnah Firdous
Mujtaba Husnain
Malik Muhammad Saad Missen
Event classification from the Urdu language text on social media
description The real-time availability of the Internet has engaged millions of users around the world. The usage of regional languages is being preferred for effective and ease of communication that is causing multilingual data on social networks and news channels. People share ideas, opinions, and events that are happening globally i.e., sports, inflation, protest, explosion, and sexual assault, etc. in regional (local) languages on social media. Extraction and classification of events from multilingual data have become bottlenecks because of resource lacking. In this research paper, we presented the event classification task for the Urdu language text existing on social media and the news channels by using machine learning classifiers. The dataset contains more than 0.1 million (102,962) labeled instances of twelve (12) different types of events. The title, its length, and the last four words of a sentence are used as features to classify the events. The Term Frequency-Inverse Document Frequency (tf-idf) showed the best results as a feature vector to evaluate the performance of the six popular machine learning classifiers. Random Forest (RF) and K-Nearest Neighbor (KNN) are among the classifiers that out-performed among other classifiers by achieving 98.00% and 99.00% accuracy, respectively. The novelty lies in the fact that the features aforementioned are not applied, up to the best of our knowledge, in the event extraction of the text written in the Urdu language.
format article
author Malik Daler Ali Awan
Nadeem Iqbal Kajla
Amnah Firdous
Mujtaba Husnain
Malik Muhammad Saad Missen
author_facet Malik Daler Ali Awan
Nadeem Iqbal Kajla
Amnah Firdous
Mujtaba Husnain
Malik Muhammad Saad Missen
author_sort Malik Daler Ali Awan
title Event classification from the Urdu language text on social media
title_short Event classification from the Urdu language text on social media
title_full Event classification from the Urdu language text on social media
title_fullStr Event classification from the Urdu language text on social media
title_full_unstemmed Event classification from the Urdu language text on social media
title_sort event classification from the urdu language text on social media
publisher PeerJ Inc.
publishDate 2021
url https://doaj.org/article/74cdda872e97477aab1220104afa6ea8
work_keys_str_mv AT malikdaleraliawan eventclassificationfromtheurdulanguagetextonsocialmedia
AT nadeemiqbalkajla eventclassificationfromtheurdulanguagetextonsocialmedia
AT amnahfirdous eventclassificationfromtheurdulanguagetextonsocialmedia
AT mujtabahusnain eventclassificationfromtheurdulanguagetextonsocialmedia
AT malikmuhammadsaadmissen eventclassificationfromtheurdulanguagetextonsocialmedia
_version_ 1718419431658356736