Upscaling human activity data: A statistical ecology approach.

Big data require new techniques to handle the information they come with. Here we consider four datasets (email communication, Twitter posts, Wikipedia articles and Gutenberg books) and propose a novel statistical framework to predict global statistics from random samples. More precisely, we infer t...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Anna Tovo, Samuele Stivanello, Amos Maritan, Samir Suweis, Stefano Favaro, Marco Formentin
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/f99983cc4a644c4987227f7370badbc8
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:f99983cc4a644c4987227f7370badbc8
record_format dspace
spelling oai:doaj.org-article:f99983cc4a644c4987227f7370badbc82021-12-02T20:09:49ZUpscaling human activity data: A statistical ecology approach.1932-620310.1371/journal.pone.0253461https://doaj.org/article/f99983cc4a644c4987227f7370badbc82021-01-01T00:00:00Zhttps://doi.org/10.1371/journal.pone.0253461https://doaj.org/toc/1932-6203Big data require new techniques to handle the information they come with. Here we consider four datasets (email communication, Twitter posts, Wikipedia articles and Gutenberg books) and propose a novel statistical framework to predict global statistics from random samples. More precisely, we infer the number of senders, hashtags and words of the whole dataset and how their abundances (i.e. the popularity of a hashtag) change through scales from a small sample of sent emails per sender, posts per hashtag and word occurrences. Our approach is grounded on statistical ecology as we map inference of human activities into the unseen species problem in biodiversity. Our findings may have applications to resource management in emails, collective attention monitoring in Twitter and language learning process in word databases.Anna TovoSamuele StivanelloAmos MaritanSamir SuweisStefano FavaroMarco FormentinPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 16, Iss 7, p e0253461 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Anna Tovo
Samuele Stivanello
Amos Maritan
Samir Suweis
Stefano Favaro
Marco Formentin
Upscaling human activity data: A statistical ecology approach.
description Big data require new techniques to handle the information they come with. Here we consider four datasets (email communication, Twitter posts, Wikipedia articles and Gutenberg books) and propose a novel statistical framework to predict global statistics from random samples. More precisely, we infer the number of senders, hashtags and words of the whole dataset and how their abundances (i.e. the popularity of a hashtag) change through scales from a small sample of sent emails per sender, posts per hashtag and word occurrences. Our approach is grounded on statistical ecology as we map inference of human activities into the unseen species problem in biodiversity. Our findings may have applications to resource management in emails, collective attention monitoring in Twitter and language learning process in word databases.
format article
author Anna Tovo
Samuele Stivanello
Amos Maritan
Samir Suweis
Stefano Favaro
Marco Formentin
author_facet Anna Tovo
Samuele Stivanello
Amos Maritan
Samir Suweis
Stefano Favaro
Marco Formentin
author_sort Anna Tovo
title Upscaling human activity data: A statistical ecology approach.
title_short Upscaling human activity data: A statistical ecology approach.
title_full Upscaling human activity data: A statistical ecology approach.
title_fullStr Upscaling human activity data: A statistical ecology approach.
title_full_unstemmed Upscaling human activity data: A statistical ecology approach.
title_sort upscaling human activity data: a statistical ecology approach.
publisher Public Library of Science (PLoS)
publishDate 2021
url https://doaj.org/article/f99983cc4a644c4987227f7370badbc8
work_keys_str_mv AT annatovo upscalinghumanactivitydataastatisticalecologyapproach
AT samuelestivanello upscalinghumanactivitydataastatisticalecologyapproach
AT amosmaritan upscalinghumanactivitydataastatisticalecologyapproach
AT samirsuweis upscalinghumanactivitydataastatisticalecologyapproach
AT stefanofavaro upscalinghumanactivitydataastatisticalecologyapproach
AT marcoformentin upscalinghumanactivitydataastatisticalecologyapproach
_version_ 1718375099686453248