Research note: Examining potential bias in large-scale censored data
We examine potential bias in Facebook’s 10-trillion cell URLs dataset, consisting of URLs shared on its platform and their engagement metrics. Despite the unprecedented size of the dataset, it was altered to protect user privacy in two ways: 1) by adding differentially private noise to engagement co...
Guardado en:
Autores principales: | , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Harvard Kennedy School
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/80016e4f3a1e416fb698034437bd6baa |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:80016e4f3a1e416fb698034437bd6baa |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:80016e4f3a1e416fb698034437bd6baa2021-11-20T05:25:47ZResearch note: Examining potential bias in large-scale censored data10.37016/mr-2020-742766-1652https://doaj.org/article/80016e4f3a1e416fb698034437bd6baa2021-07-01T00:00:00Zhttps://misinforeview.hks.harvard.edu/article/research-note-examining-potential-bias-in-large-scale-censored-data/https://doaj.org/toc/2766-1652We examine potential bias in Facebook’s 10-trillion cell URLs dataset, consisting of URLs shared on its platform and their engagement metrics. Despite the unprecedented size of the dataset, it was altered to protect user privacy in two ways: 1) by adding differentially private noise to engagement counts, and 2) by censoring the data with a 100-public-share threshold for a URL’s inclusion. To understand how these alterations affect conclusions drawn from the data, we estimate the prevalence of fake news in the massive, censored URLs dataset and compare it to an estimate from a smaller, representative dataset. We show that censoring can substantially alter conclusions that are drawn from the Facebook dataset. Because of this 100-public-share threshold, descriptive statistics from the Facebook URLs dataset overestimate the share of fake news and news overall by as much as 4X. We conclude with more general implications for censoring data.Jennifer AllenMarkus MobiusDavid M. RothschildDuncan J. WattsHarvard Kennedy Schoolarticlebig datafacebookfake newsInformation technologyT58.5-58.64Communication. Mass mediaP87-96ENHarvard Kennedy School Misinformation Review, Vol 2, Iss 4 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
big data fake news Information technology T58.5-58.64 Communication. Mass media P87-96 |
spellingShingle |
big data fake news Information technology T58.5-58.64 Communication. Mass media P87-96 Jennifer Allen Markus Mobius David M. Rothschild Duncan J. Watts Research note: Examining potential bias in large-scale censored data |
description |
We examine potential bias in Facebook’s 10-trillion cell URLs dataset, consisting of URLs shared on its platform and their engagement metrics. Despite the unprecedented size of the dataset, it was altered to protect user privacy in two ways: 1) by adding differentially private noise to engagement counts, and 2) by censoring the data with a 100-public-share threshold for a URL’s inclusion. To understand how these alterations affect conclusions drawn from the data, we estimate the prevalence of fake news in the massive, censored URLs dataset and compare it to an estimate from a smaller, representative dataset. We show that censoring can substantially alter conclusions that are drawn from the Facebook dataset. Because of this 100-public-share threshold, descriptive statistics from the Facebook URLs dataset overestimate the share of fake news and news overall by as much as 4X. We conclude with more general implications for censoring data. |
format |
article |
author |
Jennifer Allen Markus Mobius David M. Rothschild Duncan J. Watts |
author_facet |
Jennifer Allen Markus Mobius David M. Rothschild Duncan J. Watts |
author_sort |
Jennifer Allen |
title |
Research note: Examining potential bias in large-scale censored data |
title_short |
Research note: Examining potential bias in large-scale censored data |
title_full |
Research note: Examining potential bias in large-scale censored data |
title_fullStr |
Research note: Examining potential bias in large-scale censored data |
title_full_unstemmed |
Research note: Examining potential bias in large-scale censored data |
title_sort |
research note: examining potential bias in large-scale censored data |
publisher |
Harvard Kennedy School |
publishDate |
2021 |
url |
https://doaj.org/article/80016e4f3a1e416fb698034437bd6baa |
work_keys_str_mv |
AT jenniferallen researchnoteexaminingpotentialbiasinlargescalecensoreddata AT markusmobius researchnoteexaminingpotentialbiasinlargescalecensoreddata AT davidmrothschild researchnoteexaminingpotentialbiasinlargescalecensoreddata AT duncanjwatts researchnoteexaminingpotentialbiasinlargescalecensoreddata |
_version_ |
1718419491384197120 |