Construcción de recursos de texto para la identificación automática de información clínica en narrativas no estructuradas

Background: A significant proportion of the clinical record is in free text format, making it difficult to extract key information and make secondary use of patient data. Automatic detection of information within narratives initially requires humans, following specific protocols and rules, to ident...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Báez,Pablo, Villena,Fabián, Zúñiga,Karen, Jones,Natalia, Fernández,Gustavo, Durán,Manuel, Dunstan,Jocelyn
Lenguaje:Spanish / Castilian
Publicado: Sociedad Médica de Santiago 2021
Materias:
Acceso en línea:http://www.scielo.cl/scielo.php?script=sci_arttext&pid=S0034-98872021000701014
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:scielo:S0034-98872021000701014
record_format dspace
spelling oai:scielo:S0034-988720210007010142021-11-04Construcción de recursos de texto para la identificación automática de información clínica en narrativas no estructuradasBáez,PabloVillena,FabiánZúñiga,KarenJones,NataliaFernández,GustavoDurán,ManuelDunstan,Jocelyn Data Curation Data Mining Medical Informatics Natural Language Processing Supervised Machine Learning Background: A significant proportion of the clinical record is in free text format, making it difficult to extract key information and make secondary use of patient data. Automatic detection of information within narratives initially requires humans, following specific protocols and rules, to identify medical entities of interest. Aim: To build a linguistic resource of annotated medical entities on texts produced in Chilean hospitals. Material and Methods: A clinical corpus was constructed using 150 referrals in public hospitals. Three annotators identified six medical entities: clinical findings, diagnoses, body parts, medications, abbreviations, and family members. An annotation scheme was designed, and an iterative approach to train the annotators was applied. The F1-Score metric was used to assess the progress of the annotator's agreement during their training. Results: An average F1-Score of 0.73 was observed at the beginning of the project. After the training period, it increased to 0.87. Annotation of clinical findings and body parts showed significant discrepancy, while abbreviations, medications, and family members showed high agreement. Conclusions: A linguistic resource with annotated medical entities on texts produced in Chilean hospitals was built and made available, working with annotators related to medicine. The iterative annotation approach allowed us to improve performance metrics. The corpus and annotation protocols will be released to the research community.info:eu-repo/semantics/openAccessSociedad Médica de SantiagoRevista médica de Chile v.149 n.7 20212021-07-01text/htmlhttp://www.scielo.cl/scielo.php?script=sci_arttext&pid=S0034-98872021000701014es10.4067/s0034-98872021000701014
institution Scielo Chile
collection Scielo Chile
language Spanish / Castilian
topic Data Curation
Data Mining
Medical Informatics
Natural Language Processing
Supervised Machine Learning
spellingShingle Data Curation
Data Mining
Medical Informatics
Natural Language Processing
Supervised Machine Learning
Báez,Pablo
Villena,Fabián
Zúñiga,Karen
Jones,Natalia
Fernández,Gustavo
Durán,Manuel
Dunstan,Jocelyn
Construcción de recursos de texto para la identificación automática de información clínica en narrativas no estructuradas
description Background: A significant proportion of the clinical record is in free text format, making it difficult to extract key information and make secondary use of patient data. Automatic detection of information within narratives initially requires humans, following specific protocols and rules, to identify medical entities of interest. Aim: To build a linguistic resource of annotated medical entities on texts produced in Chilean hospitals. Material and Methods: A clinical corpus was constructed using 150 referrals in public hospitals. Three annotators identified six medical entities: clinical findings, diagnoses, body parts, medications, abbreviations, and family members. An annotation scheme was designed, and an iterative approach to train the annotators was applied. The F1-Score metric was used to assess the progress of the annotator's agreement during their training. Results: An average F1-Score of 0.73 was observed at the beginning of the project. After the training period, it increased to 0.87. Annotation of clinical findings and body parts showed significant discrepancy, while abbreviations, medications, and family members showed high agreement. Conclusions: A linguistic resource with annotated medical entities on texts produced in Chilean hospitals was built and made available, working with annotators related to medicine. The iterative annotation approach allowed us to improve performance metrics. The corpus and annotation protocols will be released to the research community.
author Báez,Pablo
Villena,Fabián
Zúñiga,Karen
Jones,Natalia
Fernández,Gustavo
Durán,Manuel
Dunstan,Jocelyn
author_facet Báez,Pablo
Villena,Fabián
Zúñiga,Karen
Jones,Natalia
Fernández,Gustavo
Durán,Manuel
Dunstan,Jocelyn
author_sort Báez,Pablo
title Construcción de recursos de texto para la identificación automática de información clínica en narrativas no estructuradas
title_short Construcción de recursos de texto para la identificación automática de información clínica en narrativas no estructuradas
title_full Construcción de recursos de texto para la identificación automática de información clínica en narrativas no estructuradas
title_fullStr Construcción de recursos de texto para la identificación automática de información clínica en narrativas no estructuradas
title_full_unstemmed Construcción de recursos de texto para la identificación automática de información clínica en narrativas no estructuradas
title_sort construcción de recursos de texto para la identificación automática de información clínica en narrativas no estructuradas
publisher Sociedad Médica de Santiago
publishDate 2021
url http://www.scielo.cl/scielo.php?script=sci_arttext&pid=S0034-98872021000701014
work_keys_str_mv AT baezpablo construccionderecursosdetextoparalaidentificacionautomaticadeinformacionclinicaennarrativasnoestructuradas
AT villenafabian construccionderecursosdetextoparalaidentificacionautomaticadeinformacionclinicaennarrativasnoestructuradas
AT zunigakaren construccionderecursosdetextoparalaidentificacionautomaticadeinformacionclinicaennarrativasnoestructuradas
AT jonesnatalia construccionderecursosdetextoparalaidentificacionautomaticadeinformacionclinicaennarrativasnoestructuradas
AT fernandezgustavo construccionderecursosdetextoparalaidentificacionautomaticadeinformacionclinicaennarrativasnoestructuradas
AT duranmanuel construccionderecursosdetextoparalaidentificacionautomaticadeinformacionclinicaennarrativasnoestructuradas
AT dunstanjocelyn construccionderecursosdetextoparalaidentificacionautomaticadeinformacionclinicaennarrativasnoestructuradas
_version_ 1718437214346543104