An annotated corpus with nanomedicine and pharmacokinetic parameters

Nastassja A Lewinski,1 Ivan Jimenez,1 Bridget T McInnes2 1Department of Chemical and Life Science Engineering, Virginia Commonwealth University, Richmond, VA, 2Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA Abstract: A vast amount of data on nanomedicines is be...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Lewinski NA, Jimenez I, McInnes BT
Formato: article
Lenguaje:EN
Publicado: Dove Medical Press 2017
Materias:
Acceso en línea:https://doaj.org/article/2447f3a9c96c488ea56374487aaec32b
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:2447f3a9c96c488ea56374487aaec32b
record_format dspace
spelling oai:doaj.org-article:2447f3a9c96c488ea56374487aaec32b2021-12-02T03:11:39ZAn annotated corpus with nanomedicine and pharmacokinetic parameters1178-2013https://doaj.org/article/2447f3a9c96c488ea56374487aaec32b2017-10-01T00:00:00Zhttps://www.dovepress.com/an-annotated-corpus-with-nanomedicine-and-pharmacokinetic-parameters-peer-reviewed-article-IJNhttps://doaj.org/toc/1178-2013Nastassja A Lewinski,1 Ivan Jimenez,1 Bridget T McInnes2 1Department of Chemical and Life Science Engineering, Virginia Commonwealth University, Richmond, VA, 2Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA Abstract: A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP) approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP) efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration’s Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided. Keywords: nanotechnology, informatics, natural language processing, text mining, corporaLewinski NAJimenez IMcInnes BTDove Medical Pressarticlenanotechnologyinformaticsnatural language processingtext miningcorporaMedicine (General)R5-920ENInternational Journal of Nanomedicine, Vol Volume 12, Pp 7519-7527 (2017)
institution DOAJ
collection DOAJ
language EN
topic nanotechnology
informatics
natural language processing
text mining
corpora
Medicine (General)
R5-920
spellingShingle nanotechnology
informatics
natural language processing
text mining
corpora
Medicine (General)
R5-920
Lewinski NA
Jimenez I
McInnes BT
An annotated corpus with nanomedicine and pharmacokinetic parameters
description Nastassja A Lewinski,1 Ivan Jimenez,1 Bridget T McInnes2 1Department of Chemical and Life Science Engineering, Virginia Commonwealth University, Richmond, VA, 2Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA Abstract: A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP) approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP) efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration’s Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided. Keywords: nanotechnology, informatics, natural language processing, text mining, corpora
format article
author Lewinski NA
Jimenez I
McInnes BT
author_facet Lewinski NA
Jimenez I
McInnes BT
author_sort Lewinski NA
title An annotated corpus with nanomedicine and pharmacokinetic parameters
title_short An annotated corpus with nanomedicine and pharmacokinetic parameters
title_full An annotated corpus with nanomedicine and pharmacokinetic parameters
title_fullStr An annotated corpus with nanomedicine and pharmacokinetic parameters
title_full_unstemmed An annotated corpus with nanomedicine and pharmacokinetic parameters
title_sort annotated corpus with nanomedicine and pharmacokinetic parameters
publisher Dove Medical Press
publishDate 2017
url https://doaj.org/article/2447f3a9c96c488ea56374487aaec32b
work_keys_str_mv AT lewinskina anannotatedcorpuswithnanomedicineandpharmacokineticparameters
AT jimenezi anannotatedcorpuswithnanomedicineandpharmacokineticparameters
AT mcinnesbt anannotatedcorpuswithnanomedicineandpharmacokineticparameters
AT lewinskina annotatedcorpuswithnanomedicineandpharmacokineticparameters
AT jimenezi annotatedcorpuswithnanomedicineandpharmacokineticparameters
AT mcinnesbt annotatedcorpuswithnanomedicineandpharmacokineticparameters
_version_ 1718401841724653568