Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes

Abstract There is a great and growing need to ascertain what exactly is the state of a patient, in terms of disease progression, actual care practices, pathology, adverse events, and much more, beyond the paucity of data available in structured medical record data. Ascertaining these harder-to-reach...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Beau Norgeot, Kathleen Muenzen, Thomas A. Peterson, Xuancheng Fan, Benjamin S. Glicksberg, Gundolf Schenk, Eugenia Rutenberg, Boris Oskotsky, Marina Sirota, Jinoos Yazdany, Gabriela Schmajuk, Dana Ludwig, Theodore Goldstein, Atul J. Butte
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2020
Materias:
Acceso en línea:https://doaj.org/article/65f57d47a8274a3a8bf988b25258315d
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:65f57d47a8274a3a8bf988b25258315d
record_format dspace
spelling oai:doaj.org-article:65f57d47a8274a3a8bf988b25258315d2021-12-02T15:51:17ZProtected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes10.1038/s41746-020-0258-y2398-6352https://doaj.org/article/65f57d47a8274a3a8bf988b25258315d2020-04-01T00:00:00Zhttps://doi.org/10.1038/s41746-020-0258-yhttps://doaj.org/toc/2398-6352Abstract There is a great and growing need to ascertain what exactly is the state of a patient, in terms of disease progression, actual care practices, pathology, adverse events, and much more, beyond the paucity of data available in structured medical record data. Ascertaining these harder-to-reach data elements is now critical for the accurate phenotyping of complex traits, detection of adverse outcomes, efficacy of off-label drug use, and longitudinal patient surveillance. Clinical notes often contain the most detailed and relevant digital information about individual patients, the nuances of their diseases, the treatment strategies selected by physicians, and the resulting outcomes. However, notes remain largely unused for research because they contain Protected Health Information (PHI), which is synonymous with individually identifying data. Previous clinical note de-identification approaches have been rigid and still too inaccurate to see any substantial real-world use, primarily because they have been trained with too small medical text corpora. To build a new de-identification tool, we created the largest manually annotated clinical note corpus for PHI and develop a customizable open-source de-identification software called Philter (“Protected Health Information filter”). Here we describe the design and evaluation of Philter, and show how it offers substantial real-world improvements over prior methods.Beau NorgeotKathleen MuenzenThomas A. PetersonXuancheng FanBenjamin S. GlicksbergGundolf SchenkEugenia RutenbergBoris OskotskyMarina SirotaJinoos YazdanyGabriela SchmajukDana LudwigTheodore GoldsteinAtul J. ButteNature PortfolioarticleComputer applications to medicine. Medical informaticsR858-859.7ENnpj Digital Medicine, Vol 3, Iss 1, Pp 1-8 (2020)
institution DOAJ
collection DOAJ
language EN
topic Computer applications to medicine. Medical informatics
R858-859.7
spellingShingle Computer applications to medicine. Medical informatics
R858-859.7
Beau Norgeot
Kathleen Muenzen
Thomas A. Peterson
Xuancheng Fan
Benjamin S. Glicksberg
Gundolf Schenk
Eugenia Rutenberg
Boris Oskotsky
Marina Sirota
Jinoos Yazdany
Gabriela Schmajuk
Dana Ludwig
Theodore Goldstein
Atul J. Butte
Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes
description Abstract There is a great and growing need to ascertain what exactly is the state of a patient, in terms of disease progression, actual care practices, pathology, adverse events, and much more, beyond the paucity of data available in structured medical record data. Ascertaining these harder-to-reach data elements is now critical for the accurate phenotyping of complex traits, detection of adverse outcomes, efficacy of off-label drug use, and longitudinal patient surveillance. Clinical notes often contain the most detailed and relevant digital information about individual patients, the nuances of their diseases, the treatment strategies selected by physicians, and the resulting outcomes. However, notes remain largely unused for research because they contain Protected Health Information (PHI), which is synonymous with individually identifying data. Previous clinical note de-identification approaches have been rigid and still too inaccurate to see any substantial real-world use, primarily because they have been trained with too small medical text corpora. To build a new de-identification tool, we created the largest manually annotated clinical note corpus for PHI and develop a customizable open-source de-identification software called Philter (“Protected Health Information filter”). Here we describe the design and evaluation of Philter, and show how it offers substantial real-world improvements over prior methods.
format article
author Beau Norgeot
Kathleen Muenzen
Thomas A. Peterson
Xuancheng Fan
Benjamin S. Glicksberg
Gundolf Schenk
Eugenia Rutenberg
Boris Oskotsky
Marina Sirota
Jinoos Yazdany
Gabriela Schmajuk
Dana Ludwig
Theodore Goldstein
Atul J. Butte
author_facet Beau Norgeot
Kathleen Muenzen
Thomas A. Peterson
Xuancheng Fan
Benjamin S. Glicksberg
Gundolf Schenk
Eugenia Rutenberg
Boris Oskotsky
Marina Sirota
Jinoos Yazdany
Gabriela Schmajuk
Dana Ludwig
Theodore Goldstein
Atul J. Butte
author_sort Beau Norgeot
title Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes
title_short Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes
title_full Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes
title_fullStr Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes
title_full_unstemmed Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes
title_sort protected health information filter (philter): accurately and securely de-identifying free-text clinical notes
publisher Nature Portfolio
publishDate 2020
url https://doaj.org/article/65f57d47a8274a3a8bf988b25258315d
work_keys_str_mv AT beaunorgeot protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes
AT kathleenmuenzen protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes
AT thomasapeterson protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes
AT xuanchengfan protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes
AT benjaminsglicksberg protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes
AT gundolfschenk protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes
AT eugeniarutenberg protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes
AT borisoskotsky protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes
AT marinasirota protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes
AT jinoosyazdany protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes
AT gabrielaschmajuk protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes
AT danaludwig protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes
AT theodoregoldstein protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes
AT atuljbutte protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes
_version_ 1718385574910361600