Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes
Abstract There is a great and growing need to ascertain what exactly is the state of a patient, in terms of disease progression, actual care practices, pathology, adverse events, and much more, beyond the paucity of data available in structured medical record data. Ascertaining these harder-to-reach...
Guardado en:
Autores principales: | , , , , , , , , , , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Nature Portfolio
2020
|
Materias: | |
Acceso en línea: | https://doaj.org/article/65f57d47a8274a3a8bf988b25258315d |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:65f57d47a8274a3a8bf988b25258315d |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:65f57d47a8274a3a8bf988b25258315d2021-12-02T15:51:17ZProtected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes10.1038/s41746-020-0258-y2398-6352https://doaj.org/article/65f57d47a8274a3a8bf988b25258315d2020-04-01T00:00:00Zhttps://doi.org/10.1038/s41746-020-0258-yhttps://doaj.org/toc/2398-6352Abstract There is a great and growing need to ascertain what exactly is the state of a patient, in terms of disease progression, actual care practices, pathology, adverse events, and much more, beyond the paucity of data available in structured medical record data. Ascertaining these harder-to-reach data elements is now critical for the accurate phenotyping of complex traits, detection of adverse outcomes, efficacy of off-label drug use, and longitudinal patient surveillance. Clinical notes often contain the most detailed and relevant digital information about individual patients, the nuances of their diseases, the treatment strategies selected by physicians, and the resulting outcomes. However, notes remain largely unused for research because they contain Protected Health Information (PHI), which is synonymous with individually identifying data. Previous clinical note de-identification approaches have been rigid and still too inaccurate to see any substantial real-world use, primarily because they have been trained with too small medical text corpora. To build a new de-identification tool, we created the largest manually annotated clinical note corpus for PHI and develop a customizable open-source de-identification software called Philter (“Protected Health Information filter”). Here we describe the design and evaluation of Philter, and show how it offers substantial real-world improvements over prior methods.Beau NorgeotKathleen MuenzenThomas A. PetersonXuancheng FanBenjamin S. GlicksbergGundolf SchenkEugenia RutenbergBoris OskotskyMarina SirotaJinoos YazdanyGabriela SchmajukDana LudwigTheodore GoldsteinAtul J. ButteNature PortfolioarticleComputer applications to medicine. Medical informaticsR858-859.7ENnpj Digital Medicine, Vol 3, Iss 1, Pp 1-8 (2020) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Computer applications to medicine. Medical informatics R858-859.7 |
spellingShingle |
Computer applications to medicine. Medical informatics R858-859.7 Beau Norgeot Kathleen Muenzen Thomas A. Peterson Xuancheng Fan Benjamin S. Glicksberg Gundolf Schenk Eugenia Rutenberg Boris Oskotsky Marina Sirota Jinoos Yazdany Gabriela Schmajuk Dana Ludwig Theodore Goldstein Atul J. Butte Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes |
description |
Abstract There is a great and growing need to ascertain what exactly is the state of a patient, in terms of disease progression, actual care practices, pathology, adverse events, and much more, beyond the paucity of data available in structured medical record data. Ascertaining these harder-to-reach data elements is now critical for the accurate phenotyping of complex traits, detection of adverse outcomes, efficacy of off-label drug use, and longitudinal patient surveillance. Clinical notes often contain the most detailed and relevant digital information about individual patients, the nuances of their diseases, the treatment strategies selected by physicians, and the resulting outcomes. However, notes remain largely unused for research because they contain Protected Health Information (PHI), which is synonymous with individually identifying data. Previous clinical note de-identification approaches have been rigid and still too inaccurate to see any substantial real-world use, primarily because they have been trained with too small medical text corpora. To build a new de-identification tool, we created the largest manually annotated clinical note corpus for PHI and develop a customizable open-source de-identification software called Philter (“Protected Health Information filter”). Here we describe the design and evaluation of Philter, and show how it offers substantial real-world improvements over prior methods. |
format |
article |
author |
Beau Norgeot Kathleen Muenzen Thomas A. Peterson Xuancheng Fan Benjamin S. Glicksberg Gundolf Schenk Eugenia Rutenberg Boris Oskotsky Marina Sirota Jinoos Yazdany Gabriela Schmajuk Dana Ludwig Theodore Goldstein Atul J. Butte |
author_facet |
Beau Norgeot Kathleen Muenzen Thomas A. Peterson Xuancheng Fan Benjamin S. Glicksberg Gundolf Schenk Eugenia Rutenberg Boris Oskotsky Marina Sirota Jinoos Yazdany Gabriela Schmajuk Dana Ludwig Theodore Goldstein Atul J. Butte |
author_sort |
Beau Norgeot |
title |
Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes |
title_short |
Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes |
title_full |
Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes |
title_fullStr |
Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes |
title_full_unstemmed |
Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes |
title_sort |
protected health information filter (philter): accurately and securely de-identifying free-text clinical notes |
publisher |
Nature Portfolio |
publishDate |
2020 |
url |
https://doaj.org/article/65f57d47a8274a3a8bf988b25258315d |
work_keys_str_mv |
AT beaunorgeot protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes AT kathleenmuenzen protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes AT thomasapeterson protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes AT xuanchengfan protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes AT benjaminsglicksberg protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes AT gundolfschenk protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes AT eugeniarutenberg protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes AT borisoskotsky protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes AT marinasirota protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes AT jinoosyazdany protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes AT gabrielaschmajuk protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes AT danaludwig protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes AT theodoregoldstein protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes AT atuljbutte protectedhealthinformationfilterphilteraccuratelyandsecurelydeidentifyingfreetextclinicalnotes |
_version_ |
1718385574910361600 |