Characterisation of COVID-19-Related Tweets in the Croatian Language: Framework Based on the Cro-CoV-cseBERT Model
This study aims to provide insights into the COVID-19-related communication on Twitter in the Republic of Croatia. For that purpose, we developed an NL-based framework that enables automatic analysis of a large dataset of tweets in the Croatian language. We collected and analysed 206,196 tweets rela...
Guardado en:
Autores principales: | , , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
MDPI AG
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/bee88f1fe5e1455ca45cbc279d714be0 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:bee88f1fe5e1455ca45cbc279d714be0 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:bee88f1fe5e1455ca45cbc279d714be02021-11-11T15:24:16ZCharacterisation of COVID-19-Related Tweets in the Croatian Language: Framework Based on the Cro-CoV-cseBERT Model10.3390/app1121104422076-3417https://doaj.org/article/bee88f1fe5e1455ca45cbc279d714be02021-11-01T00:00:00Zhttps://www.mdpi.com/2076-3417/11/21/10442https://doaj.org/toc/2076-3417This study aims to provide insights into the COVID-19-related communication on Twitter in the Republic of Croatia. For that purpose, we developed an NL-based framework that enables automatic analysis of a large dataset of tweets in the Croatian language. We collected and analysed 206,196 tweets related to COVID-19 and constructed a dataset of 10,000 tweets which we manually annotated with a sentiment label. We trained the Cro-CoV-cseBERT language model for the representation and clustering of tweets. Additionally, we compared the performance of four machine learning algorithms on the task of sentiment classification. After identifying the best performing setup of NLP methods, we applied the proposed framework in the task of characterisation of COVID-19 tweets in Croatia. More precisely, we performed sentiment analysis and tracked the sentiment over time. Furthermore, we detected how tweets are grouped into clusters with similar themes across three pandemic waves. Additionally, we characterised the tweets by analysing the distribution of sentiment polarity (in each thematic cluster and over time) and the number of retweets (in each thematic cluster and sentiment class). These results could be useful for additional research and interpretation in the domains of sociology, psychology or other sciences, as well as for the authorities, who could use them to address crisis communication problems.Karlo BabićMilan PetrovićSlobodan BeligaSanda Martinčić-IpšićMihaela MatešićAna MeštrovićMDPI AGarticlesentiment analysisclusteringBERT modelnatural language processingCOVID-19Twitter dataTechnologyTEngineering (General). Civil engineering (General)TA1-2040Biology (General)QH301-705.5PhysicsQC1-999ChemistryQD1-999ENApplied Sciences, Vol 11, Iss 10442, p 10442 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
sentiment analysis clustering BERT model natural language processing COVID-19 Twitter data Technology T Engineering (General). Civil engineering (General) TA1-2040 Biology (General) QH301-705.5 Physics QC1-999 Chemistry QD1-999 |
spellingShingle |
sentiment analysis clustering BERT model natural language processing COVID-19 Twitter data Technology T Engineering (General). Civil engineering (General) TA1-2040 Biology (General) QH301-705.5 Physics QC1-999 Chemistry QD1-999 Karlo Babić Milan Petrović Slobodan Beliga Sanda Martinčić-Ipšić Mihaela Matešić Ana Meštrović Characterisation of COVID-19-Related Tweets in the Croatian Language: Framework Based on the Cro-CoV-cseBERT Model |
description |
This study aims to provide insights into the COVID-19-related communication on Twitter in the Republic of Croatia. For that purpose, we developed an NL-based framework that enables automatic analysis of a large dataset of tweets in the Croatian language. We collected and analysed 206,196 tweets related to COVID-19 and constructed a dataset of 10,000 tweets which we manually annotated with a sentiment label. We trained the Cro-CoV-cseBERT language model for the representation and clustering of tweets. Additionally, we compared the performance of four machine learning algorithms on the task of sentiment classification. After identifying the best performing setup of NLP methods, we applied the proposed framework in the task of characterisation of COVID-19 tweets in Croatia. More precisely, we performed sentiment analysis and tracked the sentiment over time. Furthermore, we detected how tweets are grouped into clusters with similar themes across three pandemic waves. Additionally, we characterised the tweets by analysing the distribution of sentiment polarity (in each thematic cluster and over time) and the number of retweets (in each thematic cluster and sentiment class). These results could be useful for additional research and interpretation in the domains of sociology, psychology or other sciences, as well as for the authorities, who could use them to address crisis communication problems. |
format |
article |
author |
Karlo Babić Milan Petrović Slobodan Beliga Sanda Martinčić-Ipšić Mihaela Matešić Ana Meštrović |
author_facet |
Karlo Babić Milan Petrović Slobodan Beliga Sanda Martinčić-Ipšić Mihaela Matešić Ana Meštrović |
author_sort |
Karlo Babić |
title |
Characterisation of COVID-19-Related Tweets in the Croatian Language: Framework Based on the Cro-CoV-cseBERT Model |
title_short |
Characterisation of COVID-19-Related Tweets in the Croatian Language: Framework Based on the Cro-CoV-cseBERT Model |
title_full |
Characterisation of COVID-19-Related Tweets in the Croatian Language: Framework Based on the Cro-CoV-cseBERT Model |
title_fullStr |
Characterisation of COVID-19-Related Tweets in the Croatian Language: Framework Based on the Cro-CoV-cseBERT Model |
title_full_unstemmed |
Characterisation of COVID-19-Related Tweets in the Croatian Language: Framework Based on the Cro-CoV-cseBERT Model |
title_sort |
characterisation of covid-19-related tweets in the croatian language: framework based on the cro-cov-csebert model |
publisher |
MDPI AG |
publishDate |
2021 |
url |
https://doaj.org/article/bee88f1fe5e1455ca45cbc279d714be0 |
work_keys_str_mv |
AT karlobabic characterisationofcovid19relatedtweetsinthecroatianlanguageframeworkbasedonthecrocovcsebertmodel AT milanpetrovic characterisationofcovid19relatedtweetsinthecroatianlanguageframeworkbasedonthecrocovcsebertmodel AT slobodanbeliga characterisationofcovid19relatedtweetsinthecroatianlanguageframeworkbasedonthecrocovcsebertmodel AT sandamartincicipsic characterisationofcovid19relatedtweetsinthecroatianlanguageframeworkbasedonthecrocovcsebertmodel AT mihaelamatesic characterisationofcovid19relatedtweetsinthecroatianlanguageframeworkbasedonthecrocovcsebertmodel AT anamestrovic characterisationofcovid19relatedtweetsinthecroatianlanguageframeworkbasedonthecrocovcsebertmodel |
_version_ |
1718435389385998336 |