Detection of Hidden Communities in Twitter Discussions of Varying Volumes

The community-based structure of communication on social networking sites has long been a focus of scholarly attention. However, the problem of discovery and description of hidden communities, including defining the proper level of user aggregation, remains an important problem not yet resolved. Stu...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Ivan Blekanov, Svetlana S. Bodrunova, Askar Akhmetov
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
Acceso en línea:https://doaj.org/article/026f432622f8412f8e6357aaf6db0b61
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:026f432622f8412f8e6357aaf6db0b61
record_format dspace
spelling oai:doaj.org-article:026f432622f8412f8e6357aaf6db0b612021-11-25T17:40:04ZDetection of Hidden Communities in Twitter Discussions of Varying Volumes10.3390/fi131102951999-5903https://doaj.org/article/026f432622f8412f8e6357aaf6db0b612021-11-01T00:00:00Zhttps://www.mdpi.com/1999-5903/13/11/295https://doaj.org/toc/1999-5903The community-based structure of communication on social networking sites has long been a focus of scholarly attention. However, the problem of discovery and description of hidden communities, including defining the proper level of user aggregation, remains an important problem not yet resolved. Studies of online communities have clear social implications, as they allow for assessment of preference-based user grouping and the detection of socially hazardous groups. The aim of this study is to comparatively assess the algorithms that effectively analyze large user networks and extract hidden user communities from them. The results we have obtained show the most suitable algorithms for Twitter datasets of different volumes (dozen thousands, hundred thousands, and millions of tweets). We show that the Infomap and Leiden algorithms provide for the best results overall, and we advise testing a combination of these algorithms for detecting discursive communities based on user traits or views. We also show that the generalized <i>K</i>-means algorithm does not apply to big datasets, while a range of other algorithms tend to prioritize the detection of just one big community instead of many that would mirror the reality better. For isolating overlapping communities, the GANXiS algorithm should be used, while OSLOM is not advised.Ivan BlekanovSvetlana S. BodrunovaAskar AkhmetovMDPI AGarticlesocial networksuser discussionsuser web-graphclusteringhidden community detectionInfomapInformation technologyT58.5-58.64ENFuture Internet, Vol 13, Iss 295, p 295 (2021)
institution DOAJ
collection DOAJ
language EN
topic social networks
user discussions
user web-graph
clustering
hidden community detection
Infomap
Information technology
T58.5-58.64
spellingShingle social networks
user discussions
user web-graph
clustering
hidden community detection
Infomap
Information technology
T58.5-58.64
Ivan Blekanov
Svetlana S. Bodrunova
Askar Akhmetov
Detection of Hidden Communities in Twitter Discussions of Varying Volumes
description The community-based structure of communication on social networking sites has long been a focus of scholarly attention. However, the problem of discovery and description of hidden communities, including defining the proper level of user aggregation, remains an important problem not yet resolved. Studies of online communities have clear social implications, as they allow for assessment of preference-based user grouping and the detection of socially hazardous groups. The aim of this study is to comparatively assess the algorithms that effectively analyze large user networks and extract hidden user communities from them. The results we have obtained show the most suitable algorithms for Twitter datasets of different volumes (dozen thousands, hundred thousands, and millions of tweets). We show that the Infomap and Leiden algorithms provide for the best results overall, and we advise testing a combination of these algorithms for detecting discursive communities based on user traits or views. We also show that the generalized <i>K</i>-means algorithm does not apply to big datasets, while a range of other algorithms tend to prioritize the detection of just one big community instead of many that would mirror the reality better. For isolating overlapping communities, the GANXiS algorithm should be used, while OSLOM is not advised.
format article
author Ivan Blekanov
Svetlana S. Bodrunova
Askar Akhmetov
author_facet Ivan Blekanov
Svetlana S. Bodrunova
Askar Akhmetov
author_sort Ivan Blekanov
title Detection of Hidden Communities in Twitter Discussions of Varying Volumes
title_short Detection of Hidden Communities in Twitter Discussions of Varying Volumes
title_full Detection of Hidden Communities in Twitter Discussions of Varying Volumes
title_fullStr Detection of Hidden Communities in Twitter Discussions of Varying Volumes
title_full_unstemmed Detection of Hidden Communities in Twitter Discussions of Varying Volumes
title_sort detection of hidden communities in twitter discussions of varying volumes
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/026f432622f8412f8e6357aaf6db0b61
work_keys_str_mv AT ivanblekanov detectionofhiddencommunitiesintwitterdiscussionsofvaryingvolumes
AT svetlanasbodrunova detectionofhiddencommunitiesintwitterdiscussionsofvaryingvolumes
AT askarakhmetov detectionofhiddencommunitiesintwitterdiscussionsofvaryingvolumes
_version_ 1718412084793835520