Multi-Ideology ISIS/Jihadist White Supremacist (MIWS) Dataset for Multi-Class Extremism Text Classification

Social media platforms are a popular choice for extremist organizations to disseminate their perceptions, beliefs, and ideologies. This information is generally based on selective reporting and is subjective in content. However, the radical presentation of this disinformation and its outreach on soc...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mayur Gaikwad, Swati Ahirrao, Shraddha Phansalkar, Ketan Kotecha
Format:	article
Language:	EN
Published:	MDPI AG 2021
Subjects:	artificial intelligence extremism disinformation ideology propaganda radicalization Bibliography. Library science. Information resources Z
Online Access:	https://doaj.org/article/6ecae841f3cd4e089960dc72af4ae85f
Tags:	Add Tag No Tags, Be the first to tag this record!

id	oai:doaj.org-article:6ecae841f3cd4e089960dc72af4ae85f
record_format	dspace
spelling	oai:doaj.org-article:6ecae841f3cd4e089960dc72af4ae85f2021-11-25T17:19:52ZMulti-Ideology ISIS/Jihadist White Supremacist (MIWS) Dataset for Multi-Class Extremism Text Classification10.3390/data61101172306-5729https://doaj.org/article/6ecae841f3cd4e089960dc72af4ae85f2021-11-01T00:00:00Zhttps://www.mdpi.com/2306-5729/6/11/117https://doaj.org/toc/2306-5729Social media platforms are a popular choice for extremist organizations to disseminate their perceptions, beliefs, and ideologies. This information is generally based on selective reporting and is subjective in content. However, the radical presentation of this disinformation and its outreach on social media leads to an increased number of susceptible audiences. Hence, detection of extremist text on social media platforms is a significant area of research. The unavailability of extremism text datasets is a challenge in online extremism research. The lack of emphasis on classifying extremism text into propaganda, radicalization, and recruitment classes is a challenge. The lack of data validation methods also challenges the accuracy of extremism detection. This research addresses these challenges and presents a seed dataset with a multi-ideology and multi-class extremism text dataset. This research presents the construction of a multi-ideology ISIS/Jihadist White supremacist (MIWS) dataset with recent tweets collected from Twitter. The presented dataset can be employed effectively and importantly to classify extremist text into popular types like propaganda, radicalization, and recruitment. Additionally, the seed dataset is statistically validated with a coherence score of Latent Dirichlet Allocation (LDA) and word mover’s distance using a pretrained Google News vector. The dataset shows effectiveness in its construction with good coherence scores within a topic and appropriate distance measures between topics. This dataset is the first publicly accessible multi-ideology, multi-class extremism text dataset to reinforce research on extremism text detection on social media platforms.Mayur GaikwadSwati AhirraoShraddha PhansalkarKetan KotechaMDPI AGarticleartificial intelligenceextremismdisinformationideologypropagandaradicalizationBibliography. Library science. Information resourcesZENData, Vol 6, Iss 117, p 117 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	artificial intelligence extremism disinformation ideology propaganda radicalization Bibliography. Library science. Information resources Z
spellingShingle	artificial intelligence extremism disinformation ideology propaganda radicalization Bibliography. Library science. Information resources Z Mayur Gaikwad Swati Ahirrao Shraddha Phansalkar Ketan Kotecha Multi-Ideology ISIS/Jihadist White Supremacist (MIWS) Dataset for Multi-Class Extremism Text Classification
description	Social media platforms are a popular choice for extremist organizations to disseminate their perceptions, beliefs, and ideologies. This information is generally based on selective reporting and is subjective in content. However, the radical presentation of this disinformation and its outreach on social media leads to an increased number of susceptible audiences. Hence, detection of extremist text on social media platforms is a significant area of research. The unavailability of extremism text datasets is a challenge in online extremism research. The lack of emphasis on classifying extremism text into propaganda, radicalization, and recruitment classes is a challenge. The lack of data validation methods also challenges the accuracy of extremism detection. This research addresses these challenges and presents a seed dataset with a multi-ideology and multi-class extremism text dataset. This research presents the construction of a multi-ideology ISIS/Jihadist White supremacist (MIWS) dataset with recent tweets collected from Twitter. The presented dataset can be employed effectively and importantly to classify extremist text into popular types like propaganda, radicalization, and recruitment. Additionally, the seed dataset is statistically validated with a coherence score of Latent Dirichlet Allocation (LDA) and word mover’s distance using a pretrained Google News vector. The dataset shows effectiveness in its construction with good coherence scores within a topic and appropriate distance measures between topics. This dataset is the first publicly accessible multi-ideology, multi-class extremism text dataset to reinforce research on extremism text detection on social media platforms.
format	article
author	Mayur Gaikwad Swati Ahirrao Shraddha Phansalkar Ketan Kotecha
author_facet	Mayur Gaikwad Swati Ahirrao Shraddha Phansalkar Ketan Kotecha
author_sort	Mayur Gaikwad
title	Multi-Ideology ISIS/Jihadist White Supremacist (MIWS) Dataset for Multi-Class Extremism Text Classification
title_short	Multi-Ideology ISIS/Jihadist White Supremacist (MIWS) Dataset for Multi-Class Extremism Text Classification
title_full	Multi-Ideology ISIS/Jihadist White Supremacist (MIWS) Dataset for Multi-Class Extremism Text Classification
title_fullStr	Multi-Ideology ISIS/Jihadist White Supremacist (MIWS) Dataset for Multi-Class Extremism Text Classification
title_full_unstemmed	Multi-Ideology ISIS/Jihadist White Supremacist (MIWS) Dataset for Multi-Class Extremism Text Classification
title_sort	multi-ideology isis/jihadist white supremacist (miws) dataset for multi-class extremism text classification
publisher	MDPI AG
publishDate	2021
url	https://doaj.org/article/6ecae841f3cd4e089960dc72af4ae85f
work_keys_str_mv	AT mayurgaikwad multiideologyisisjihadistwhitesupremacistmiwsdatasetformulticlassextremismtextclassification AT swatiahirrao multiideologyisisjihadistwhitesupremacistmiwsdatasetformulticlassextremismtextclassification AT shraddhaphansalkar multiideologyisisjihadistwhitesupremacistmiwsdatasetformulticlassextremismtextclassification AT ketankotecha multiideologyisisjihadistwhitesupremacistmiwsdatasetformulticlassextremismtextclassification
_version_	1718412476617326592

Multi-Ideology ISIS/Jihadist White Supremacist (MIWS) Dataset for Multi-Class Extremism Text Classification

Similar Items