Discovering health topics in social media using topic models.

By aggregating self-reported health statuses across millions of users, we seek to characterize the variety of health information discussed in Twitter. We describe a topic modeling framework for discovering health topics in Twitter, a social media website. This is an exploratory approach with the goa...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Michael J Paul, Mark Dredze
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2014
Materias:
R
Q
Acceso en línea:https://doaj.org/article/1fdcc426da0045f2bf259b7cdc8ef67e
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:1fdcc426da0045f2bf259b7cdc8ef67e
record_format dspace
spelling oai:doaj.org-article:1fdcc426da0045f2bf259b7cdc8ef67e2021-11-25T06:06:16ZDiscovering health topics in social media using topic models.1932-620310.1371/journal.pone.0103408https://doaj.org/article/1fdcc426da0045f2bf259b7cdc8ef67e2014-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/25084530/?tool=EBIhttps://doaj.org/toc/1932-6203By aggregating self-reported health statuses across millions of users, we seek to characterize the variety of health information discussed in Twitter. We describe a topic modeling framework for discovering health topics in Twitter, a social media website. This is an exploratory approach with the goal of understanding what health topics are commonly discussed in social media. This paper describes in detail a statistical topic model created for this purpose, the Ailment Topic Aspect Model (ATAM), as well as our system for filtering general Twitter data based on health keywords and supervised classification. We show how ATAM and other topic models can automatically infer health topics in 144 million Twitter messages from 2011 to 2013. ATAM discovered 13 coherent clusters of Twitter messages, some of which correlate with seasonal influenza (r = 0.689) and allergies (r = 0.810) temporal surveillance data, as well as exercise (r =  .534) and obesity (r =  -.631) related geographic survey data in the United States. These results demonstrate that it is possible to automatically discover topics that attain statistically significant correlations with ground truth data, despite using minimal human supervision and no historical data to train the model, in contrast to prior work. Additionally, these results demonstrate that a single general-purpose model can identify many different health topics in social media.Michael J PaulMark DredzePublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 9, Iss 8, p e103408 (2014)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Michael J Paul
Mark Dredze
Discovering health topics in social media using topic models.
description By aggregating self-reported health statuses across millions of users, we seek to characterize the variety of health information discussed in Twitter. We describe a topic modeling framework for discovering health topics in Twitter, a social media website. This is an exploratory approach with the goal of understanding what health topics are commonly discussed in social media. This paper describes in detail a statistical topic model created for this purpose, the Ailment Topic Aspect Model (ATAM), as well as our system for filtering general Twitter data based on health keywords and supervised classification. We show how ATAM and other topic models can automatically infer health topics in 144 million Twitter messages from 2011 to 2013. ATAM discovered 13 coherent clusters of Twitter messages, some of which correlate with seasonal influenza (r = 0.689) and allergies (r = 0.810) temporal surveillance data, as well as exercise (r =  .534) and obesity (r =  -.631) related geographic survey data in the United States. These results demonstrate that it is possible to automatically discover topics that attain statistically significant correlations with ground truth data, despite using minimal human supervision and no historical data to train the model, in contrast to prior work. Additionally, these results demonstrate that a single general-purpose model can identify many different health topics in social media.
format article
author Michael J Paul
Mark Dredze
author_facet Michael J Paul
Mark Dredze
author_sort Michael J Paul
title Discovering health topics in social media using topic models.
title_short Discovering health topics in social media using topic models.
title_full Discovering health topics in social media using topic models.
title_fullStr Discovering health topics in social media using topic models.
title_full_unstemmed Discovering health topics in social media using topic models.
title_sort discovering health topics in social media using topic models.
publisher Public Library of Science (PLoS)
publishDate 2014
url https://doaj.org/article/1fdcc426da0045f2bf259b7cdc8ef67e
work_keys_str_mv AT michaeljpaul discoveringhealthtopicsinsocialmediausingtopicmodels
AT markdredze discoveringhealthtopicsinsocialmediausingtopicmodels
_version_ 1718414150603898880