Can Fake News Detection Models Maintain the Performance through Time? A Longitudinal Evaluation of Twitter Publications

The negative impact of false information on social networks is rapidly growing. Current research on the topic focused on the detection of fake news in a particular context or event (such as elections) or using data from a short period of time. Therefore, an evaluation of the current proposals in a l...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Nuno Guimarães, Álvaro Figueira, Luís Torgo
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
Acceso en línea:https://doaj.org/article/01a7984f65744068a6f891df26d94ed9
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:01a7984f65744068a6f891df26d94ed9
record_format dspace
spelling oai:doaj.org-article:01a7984f65744068a6f891df26d94ed92021-11-25T18:17:50ZCan Fake News Detection Models Maintain the Performance through Time? A Longitudinal Evaluation of Twitter Publications10.3390/math92229882227-7390https://doaj.org/article/01a7984f65744068a6f891df26d94ed92021-11-01T00:00:00Zhttps://www.mdpi.com/2227-7390/9/22/2988https://doaj.org/toc/2227-7390The negative impact of false information on social networks is rapidly growing. Current research on the topic focused on the detection of fake news in a particular context or event (such as elections) or using data from a short period of time. Therefore, an evaluation of the current proposals in a long-term scenario where the topics discussed may change is lacking. In this work, we deviate from current approaches to the problem and instead focus on a longitudinal evaluation using social network publications spanning an 18-month period. We evaluate different combinations of features and supervised models in a long-term scenario where the training and testing data are ordered chronologically, and thus the robustness and stability of the models can be evaluated through time. We experimented with 3 different scenarios where the models are trained with 15-, 30-, and 60-day data periods. The results show that detection models trained with word-embedding features are the ones that perform better and are less likely to be affected by the change of topics (for example, the rise of COVID-19 conspiracy theories). Furthermore, the additional days of training data also increase the performance of the best feature/model combinations, although not very significantly (around 2%). The results presented in this paper build the foundations towards a more pragmatic approach to the evaluation of fake news detection models in social networks.Nuno GuimarãesÁlvaro FigueiraLuís TorgoMDPI AGarticlefake news detectionsocial networksfalse informationmachine learningdata miningMathematicsQA1-939ENMathematics, Vol 9, Iss 2988, p 2988 (2021)
institution DOAJ
collection DOAJ
language EN
topic fake news detection
social networks
false information
machine learning
data mining
Mathematics
QA1-939
spellingShingle fake news detection
social networks
false information
machine learning
data mining
Mathematics
QA1-939
Nuno Guimarães
Álvaro Figueira
Luís Torgo
Can Fake News Detection Models Maintain the Performance through Time? A Longitudinal Evaluation of Twitter Publications
description The negative impact of false information on social networks is rapidly growing. Current research on the topic focused on the detection of fake news in a particular context or event (such as elections) or using data from a short period of time. Therefore, an evaluation of the current proposals in a long-term scenario where the topics discussed may change is lacking. In this work, we deviate from current approaches to the problem and instead focus on a longitudinal evaluation using social network publications spanning an 18-month period. We evaluate different combinations of features and supervised models in a long-term scenario where the training and testing data are ordered chronologically, and thus the robustness and stability of the models can be evaluated through time. We experimented with 3 different scenarios where the models are trained with 15-, 30-, and 60-day data periods. The results show that detection models trained with word-embedding features are the ones that perform better and are less likely to be affected by the change of topics (for example, the rise of COVID-19 conspiracy theories). Furthermore, the additional days of training data also increase the performance of the best feature/model combinations, although not very significantly (around 2%). The results presented in this paper build the foundations towards a more pragmatic approach to the evaluation of fake news detection models in social networks.
format article
author Nuno Guimarães
Álvaro Figueira
Luís Torgo
author_facet Nuno Guimarães
Álvaro Figueira
Luís Torgo
author_sort Nuno Guimarães
title Can Fake News Detection Models Maintain the Performance through Time? A Longitudinal Evaluation of Twitter Publications
title_short Can Fake News Detection Models Maintain the Performance through Time? A Longitudinal Evaluation of Twitter Publications
title_full Can Fake News Detection Models Maintain the Performance through Time? A Longitudinal Evaluation of Twitter Publications
title_fullStr Can Fake News Detection Models Maintain the Performance through Time? A Longitudinal Evaluation of Twitter Publications
title_full_unstemmed Can Fake News Detection Models Maintain the Performance through Time? A Longitudinal Evaluation of Twitter Publications
title_sort can fake news detection models maintain the performance through time? a longitudinal evaluation of twitter publications
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/01a7984f65744068a6f891df26d94ed9
work_keys_str_mv AT nunoguimaraes canfakenewsdetectionmodelsmaintaintheperformancethroughtimealongitudinalevaluationoftwitterpublications
AT alvarofigueira canfakenewsdetectionmodelsmaintaintheperformancethroughtimealongitudinalevaluationoftwitterpublications
AT luistorgo canfakenewsdetectionmodelsmaintaintheperformancethroughtimealongitudinalevaluationoftwitterpublications
_version_ 1718411358427414528