Research note: Examining potential bias in large-scale censored data

We examine potential bias in Facebook’s 10-trillion cell URLs dataset, consisting of URLs shared on its platform and their engagement metrics. Despite the unprecedented size of the dataset, it was altered to protect user privacy in two ways: 1) by adding differentially private noise to engagement co...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Jennifer Allen, Markus Mobius, David M. Rothschild, Duncan J. Watts
Formato: article
Lenguaje:EN
Publicado: Harvard Kennedy School 2021
Materias:
Acceso en línea:https://doaj.org/article/80016e4f3a1e416fb698034437bd6baa
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:80016e4f3a1e416fb698034437bd6baa
record_format dspace
spelling oai:doaj.org-article:80016e4f3a1e416fb698034437bd6baa2021-11-20T05:25:47ZResearch note: Examining potential bias in large-scale censored data10.37016/mr-2020-742766-1652https://doaj.org/article/80016e4f3a1e416fb698034437bd6baa2021-07-01T00:00:00Zhttps://misinforeview.hks.harvard.edu/article/research-note-examining-potential-bias-in-large-scale-censored-data/https://doaj.org/toc/2766-1652We examine potential bias in Facebook’s 10-trillion cell URLs dataset, consisting of URLs shared on its platform and their engagement metrics. Despite the unprecedented size of the dataset, it was altered to protect user privacy in two ways: 1) by adding differentially private noise to engagement counts, and 2) by censoring the data with a 100-public-share threshold for a URL’s inclusion. To understand how these alterations affect conclusions drawn from the data, we estimate the prevalence of fake news in the massive, censored URLs dataset and compare it to an estimate from a smaller, representative dataset. We show that censoring can substantially alter conclusions that are drawn from the Facebook dataset. Because of this 100-public-share threshold, descriptive statistics from the Facebook URLs dataset overestimate the share of fake news and news overall by as much as 4X. We conclude with more general implications for censoring data.Jennifer AllenMarkus MobiusDavid M. RothschildDuncan J. WattsHarvard Kennedy Schoolarticlebig datafacebookfake newsInformation technologyT58.5-58.64Communication. Mass mediaP87-96ENHarvard Kennedy School Misinformation Review, Vol 2, Iss 4 (2021)
institution DOAJ
collection DOAJ
language EN
topic big data
facebook
fake news
Information technology
T58.5-58.64
Communication. Mass media
P87-96
spellingShingle big data
facebook
fake news
Information technology
T58.5-58.64
Communication. Mass media
P87-96
Jennifer Allen
Markus Mobius
David M. Rothschild
Duncan J. Watts
Research note: Examining potential bias in large-scale censored data
description We examine potential bias in Facebook’s 10-trillion cell URLs dataset, consisting of URLs shared on its platform and their engagement metrics. Despite the unprecedented size of the dataset, it was altered to protect user privacy in two ways: 1) by adding differentially private noise to engagement counts, and 2) by censoring the data with a 100-public-share threshold for a URL’s inclusion. To understand how these alterations affect conclusions drawn from the data, we estimate the prevalence of fake news in the massive, censored URLs dataset and compare it to an estimate from a smaller, representative dataset. We show that censoring can substantially alter conclusions that are drawn from the Facebook dataset. Because of this 100-public-share threshold, descriptive statistics from the Facebook URLs dataset overestimate the share of fake news and news overall by as much as 4X. We conclude with more general implications for censoring data.
format article
author Jennifer Allen
Markus Mobius
David M. Rothschild
Duncan J. Watts
author_facet Jennifer Allen
Markus Mobius
David M. Rothschild
Duncan J. Watts
author_sort Jennifer Allen
title Research note: Examining potential bias in large-scale censored data
title_short Research note: Examining potential bias in large-scale censored data
title_full Research note: Examining potential bias in large-scale censored data
title_fullStr Research note: Examining potential bias in large-scale censored data
title_full_unstemmed Research note: Examining potential bias in large-scale censored data
title_sort research note: examining potential bias in large-scale censored data
publisher Harvard Kennedy School
publishDate 2021
url https://doaj.org/article/80016e4f3a1e416fb698034437bd6baa
work_keys_str_mv AT jenniferallen researchnoteexaminingpotentialbiasinlargescalecensoreddata
AT markusmobius researchnoteexaminingpotentialbiasinlargescalecensoreddata
AT davidmrothschild researchnoteexaminingpotentialbiasinlargescalecensoreddata
AT duncanjwatts researchnoteexaminingpotentialbiasinlargescalecensoreddata
_version_ 1718419491384197120