Upscaling human activity data: A statistical ecology approach.
Big data require new techniques to handle the information they come with. Here we consider four datasets (email communication, Twitter posts, Wikipedia articles and Gutenberg books) and propose a novel statistical framework to predict global statistics from random samples. More precisely, we infer t...
Guardado en:
Autores principales: | , , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Public Library of Science (PLoS)
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/f99983cc4a644c4987227f7370badbc8 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:f99983cc4a644c4987227f7370badbc8 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:f99983cc4a644c4987227f7370badbc82021-12-02T20:09:49ZUpscaling human activity data: A statistical ecology approach.1932-620310.1371/journal.pone.0253461https://doaj.org/article/f99983cc4a644c4987227f7370badbc82021-01-01T00:00:00Zhttps://doi.org/10.1371/journal.pone.0253461https://doaj.org/toc/1932-6203Big data require new techniques to handle the information they come with. Here we consider four datasets (email communication, Twitter posts, Wikipedia articles and Gutenberg books) and propose a novel statistical framework to predict global statistics from random samples. More precisely, we infer the number of senders, hashtags and words of the whole dataset and how their abundances (i.e. the popularity of a hashtag) change through scales from a small sample of sent emails per sender, posts per hashtag and word occurrences. Our approach is grounded on statistical ecology as we map inference of human activities into the unseen species problem in biodiversity. Our findings may have applications to resource management in emails, collective attention monitoring in Twitter and language learning process in word databases.Anna TovoSamuele StivanelloAmos MaritanSamir SuweisStefano FavaroMarco FormentinPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 16, Iss 7, p e0253461 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Medicine R Science Q |
spellingShingle |
Medicine R Science Q Anna Tovo Samuele Stivanello Amos Maritan Samir Suweis Stefano Favaro Marco Formentin Upscaling human activity data: A statistical ecology approach. |
description |
Big data require new techniques to handle the information they come with. Here we consider four datasets (email communication, Twitter posts, Wikipedia articles and Gutenberg books) and propose a novel statistical framework to predict global statistics from random samples. More precisely, we infer the number of senders, hashtags and words of the whole dataset and how their abundances (i.e. the popularity of a hashtag) change through scales from a small sample of sent emails per sender, posts per hashtag and word occurrences. Our approach is grounded on statistical ecology as we map inference of human activities into the unseen species problem in biodiversity. Our findings may have applications to resource management in emails, collective attention monitoring in Twitter and language learning process in word databases. |
format |
article |
author |
Anna Tovo Samuele Stivanello Amos Maritan Samir Suweis Stefano Favaro Marco Formentin |
author_facet |
Anna Tovo Samuele Stivanello Amos Maritan Samir Suweis Stefano Favaro Marco Formentin |
author_sort |
Anna Tovo |
title |
Upscaling human activity data: A statistical ecology approach. |
title_short |
Upscaling human activity data: A statistical ecology approach. |
title_full |
Upscaling human activity data: A statistical ecology approach. |
title_fullStr |
Upscaling human activity data: A statistical ecology approach. |
title_full_unstemmed |
Upscaling human activity data: A statistical ecology approach. |
title_sort |
upscaling human activity data: a statistical ecology approach. |
publisher |
Public Library of Science (PLoS) |
publishDate |
2021 |
url |
https://doaj.org/article/f99983cc4a644c4987227f7370badbc8 |
work_keys_str_mv |
AT annatovo upscalinghumanactivitydataastatisticalecologyapproach AT samuelestivanello upscalinghumanactivitydataastatisticalecologyapproach AT amosmaritan upscalinghumanactivitydataastatisticalecologyapproach AT samirsuweis upscalinghumanactivitydataastatisticalecologyapproach AT stefanofavaro upscalinghumanactivitydataastatisticalecologyapproach AT marcoformentin upscalinghumanactivitydataastatisticalecologyapproach |
_version_ |
1718375099686453248 |