A systematic review of re-identification attacks on health data.

<h4>Background</h4>Privacy legislation in most jurisdictions allows the disclosure of health data for secondary purposes without patient consent if it is de-identified. Some recent articles in the medical, legal, and computer science literature have argued that de-identification methods...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Khaled El Emam, Elizabeth Jonker, Luk Arbuckle, Bradley Malin
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2011
Materias:
R
Q
Acceso en línea:https://doaj.org/article/a97ec34e4155490d874171b9111d1f50
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:a97ec34e4155490d874171b9111d1f50
record_format dspace
spelling oai:doaj.org-article:a97ec34e4155490d874171b9111d1f502021-11-18T07:33:12ZA systematic review of re-identification attacks on health data.1932-620310.1371/journal.pone.0028071https://doaj.org/article/a97ec34e4155490d874171b9111d1f502011-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/22164229/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203<h4>Background</h4>Privacy legislation in most jurisdictions allows the disclosure of health data for secondary purposes without patient consent if it is de-identified. Some recent articles in the medical, legal, and computer science literature have argued that de-identification methods do not provide sufficient protection because they are easy to reverse. Should this be the case, it would have significant and important implications on how health information is disclosed, including: (a) potentially limiting its availability for secondary purposes such as research, and (b) resulting in more identifiable health information being disclosed. Our objectives in this systematic review were to: (a) characterize known re-identification attacks on health data and contrast that to re-identification attacks on other kinds of data, (b) compute the overall proportion of records that have been correctly re-identified in these attacks, and (c) assess whether these demonstrate weaknesses in current de-identification methods.<h4>Methods and findings</h4>Searches were conducted in IEEE Xplore, ACM Digital Library, and PubMed. After screening, fourteen eligible articles representing distinct attacks were identified. On average, approximately a quarter of the records were re-identified across all studies (0.26 with 95% CI 0.046-0.478) and 0.34 for attacks on health data (95% CI 0-0.744). There was considerable uncertainty around the proportions as evidenced by the wide confidence intervals, and the mean proportion of records re-identified was sensitive to unpublished studies. Two of fourteen attacks were performed with data that was de-identified using existing standards. Only one of these attacks was on health data, which resulted in a success rate of 0.00013.<h4>Conclusions</h4>The current evidence shows a high re-identification rate but is dominated by small-scale studies on data that was not de-identified according to existing standards. This evidence is insufficient to draw conclusions about the efficacy of de-identification methods.Khaled El EmamElizabeth JonkerLuk ArbuckleBradley MalinPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 6, Iss 12, p e28071 (2011)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Khaled El Emam
Elizabeth Jonker
Luk Arbuckle
Bradley Malin
A systematic review of re-identification attacks on health data.
description <h4>Background</h4>Privacy legislation in most jurisdictions allows the disclosure of health data for secondary purposes without patient consent if it is de-identified. Some recent articles in the medical, legal, and computer science literature have argued that de-identification methods do not provide sufficient protection because they are easy to reverse. Should this be the case, it would have significant and important implications on how health information is disclosed, including: (a) potentially limiting its availability for secondary purposes such as research, and (b) resulting in more identifiable health information being disclosed. Our objectives in this systematic review were to: (a) characterize known re-identification attacks on health data and contrast that to re-identification attacks on other kinds of data, (b) compute the overall proportion of records that have been correctly re-identified in these attacks, and (c) assess whether these demonstrate weaknesses in current de-identification methods.<h4>Methods and findings</h4>Searches were conducted in IEEE Xplore, ACM Digital Library, and PubMed. After screening, fourteen eligible articles representing distinct attacks were identified. On average, approximately a quarter of the records were re-identified across all studies (0.26 with 95% CI 0.046-0.478) and 0.34 for attacks on health data (95% CI 0-0.744). There was considerable uncertainty around the proportions as evidenced by the wide confidence intervals, and the mean proportion of records re-identified was sensitive to unpublished studies. Two of fourteen attacks were performed with data that was de-identified using existing standards. Only one of these attacks was on health data, which resulted in a success rate of 0.00013.<h4>Conclusions</h4>The current evidence shows a high re-identification rate but is dominated by small-scale studies on data that was not de-identified according to existing standards. This evidence is insufficient to draw conclusions about the efficacy of de-identification methods.
format article
author Khaled El Emam
Elizabeth Jonker
Luk Arbuckle
Bradley Malin
author_facet Khaled El Emam
Elizabeth Jonker
Luk Arbuckle
Bradley Malin
author_sort Khaled El Emam
title A systematic review of re-identification attacks on health data.
title_short A systematic review of re-identification attacks on health data.
title_full A systematic review of re-identification attacks on health data.
title_fullStr A systematic review of re-identification attacks on health data.
title_full_unstemmed A systematic review of re-identification attacks on health data.
title_sort systematic review of re-identification attacks on health data.
publisher Public Library of Science (PLoS)
publishDate 2011
url https://doaj.org/article/a97ec34e4155490d874171b9111d1f50
work_keys_str_mv AT khaledelemam asystematicreviewofreidentificationattacksonhealthdata
AT elizabethjonker asystematicreviewofreidentificationattacksonhealthdata
AT lukarbuckle asystematicreviewofreidentificationattacksonhealthdata
AT bradleymalin asystematicreviewofreidentificationattacksonhealthdata
AT khaledelemam systematicreviewofreidentificationattacksonhealthdata
AT elizabethjonker systematicreviewofreidentificationattacksonhealthdata
AT lukarbuckle systematicreviewofreidentificationattacksonhealthdata
AT bradleymalin systematicreviewofreidentificationattacksonhealthdata
_version_ 1718423306609098752