Learning to recognize phenotype candidates in the auto-immune literature using SVM re-ranking.
The identification of phenotype descriptions in the scientific literature, case reports and patient records is a rewarding task for bio-medical text mining. Any progress will support knowledge discovery and linkage to other resources. However because of their wide variation a number of challenges st...
Guardado en:
Autores principales: | , , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Public Library of Science (PLoS)
2013
|
Materias: | |
Acceso en línea: | https://doaj.org/article/e2f76355001a45e794b7c989a9e7d346 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:e2f76355001a45e794b7c989a9e7d346 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:e2f76355001a45e794b7c989a9e7d3462021-11-18T08:51:18ZLearning to recognize phenotype candidates in the auto-immune literature using SVM re-ranking.1932-620310.1371/journal.pone.0072965https://doaj.org/article/e2f76355001a45e794b7c989a9e7d3462013-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/24155869/?tool=EBIhttps://doaj.org/toc/1932-6203The identification of phenotype descriptions in the scientific literature, case reports and patient records is a rewarding task for bio-medical text mining. Any progress will support knowledge discovery and linkage to other resources. However because of their wide variation a number of challenges still remain in terms of their identification and semantic normalisation before they can be fully exploited for research purposes. This paper presents novel techniques for identifying potential complex phenotype mentions by exploiting a hybrid model based on machine learning, rules and dictionary matching. A systematic study is made of how to combine sequence labels from these modules as well as the merits of various ontological resources. We evaluated our approach on a subset of Medline abstracts cited by the Online Mendelian Inheritance of Man database related to auto-immune diseases. Using partial matching the best micro-averaged F-score for phenotypes and five other entity classes was 79.9%. A best performance of 75.3% was achieved for phenotype candidates using all semantics resources. We observed the advantage of using SVM-based learn-to-rank for sequence label combination over maximum entropy and a priority list approach. The results indicate that the identification of simple entity types such as chemicals and genes are robustly supported by single semantic resources, whereas phenotypes require combinations. Altogether we conclude that our approach coped well with the compositional structure of phenotypes in the auto-immune domain.Nigel CollierMai-vu TranHoang-quynh LeQuang-Thuy HaAnika OellrichDietrich Rebholz-SchuhmannPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 8, Iss 10, p e72965 (2013) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Medicine R Science Q |
spellingShingle |
Medicine R Science Q Nigel Collier Mai-vu Tran Hoang-quynh Le Quang-Thuy Ha Anika Oellrich Dietrich Rebholz-Schuhmann Learning to recognize phenotype candidates in the auto-immune literature using SVM re-ranking. |
description |
The identification of phenotype descriptions in the scientific literature, case reports and patient records is a rewarding task for bio-medical text mining. Any progress will support knowledge discovery and linkage to other resources. However because of their wide variation a number of challenges still remain in terms of their identification and semantic normalisation before they can be fully exploited for research purposes. This paper presents novel techniques for identifying potential complex phenotype mentions by exploiting a hybrid model based on machine learning, rules and dictionary matching. A systematic study is made of how to combine sequence labels from these modules as well as the merits of various ontological resources. We evaluated our approach on a subset of Medline abstracts cited by the Online Mendelian Inheritance of Man database related to auto-immune diseases. Using partial matching the best micro-averaged F-score for phenotypes and five other entity classes was 79.9%. A best performance of 75.3% was achieved for phenotype candidates using all semantics resources. We observed the advantage of using SVM-based learn-to-rank for sequence label combination over maximum entropy and a priority list approach. The results indicate that the identification of simple entity types such as chemicals and genes are robustly supported by single semantic resources, whereas phenotypes require combinations. Altogether we conclude that our approach coped well with the compositional structure of phenotypes in the auto-immune domain. |
format |
article |
author |
Nigel Collier Mai-vu Tran Hoang-quynh Le Quang-Thuy Ha Anika Oellrich Dietrich Rebholz-Schuhmann |
author_facet |
Nigel Collier Mai-vu Tran Hoang-quynh Le Quang-Thuy Ha Anika Oellrich Dietrich Rebholz-Schuhmann |
author_sort |
Nigel Collier |
title |
Learning to recognize phenotype candidates in the auto-immune literature using SVM re-ranking. |
title_short |
Learning to recognize phenotype candidates in the auto-immune literature using SVM re-ranking. |
title_full |
Learning to recognize phenotype candidates in the auto-immune literature using SVM re-ranking. |
title_fullStr |
Learning to recognize phenotype candidates in the auto-immune literature using SVM re-ranking. |
title_full_unstemmed |
Learning to recognize phenotype candidates in the auto-immune literature using SVM re-ranking. |
title_sort |
learning to recognize phenotype candidates in the auto-immune literature using svm re-ranking. |
publisher |
Public Library of Science (PLoS) |
publishDate |
2013 |
url |
https://doaj.org/article/e2f76355001a45e794b7c989a9e7d346 |
work_keys_str_mv |
AT nigelcollier learningtorecognizephenotypecandidatesintheautoimmuneliteratureusingsvmreranking AT maivutran learningtorecognizephenotypecandidatesintheautoimmuneliteratureusingsvmreranking AT hoangquynhle learningtorecognizephenotypecandidatesintheautoimmuneliteratureusingsvmreranking AT quangthuyha learningtorecognizephenotypecandidatesintheautoimmuneliteratureusingsvmreranking AT anikaoellrich learningtorecognizephenotypecandidatesintheautoimmuneliteratureusingsvmreranking AT dietrichrebholzschuhmann learningtorecognizephenotypecandidatesintheautoimmuneliteratureusingsvmreranking |
_version_ |
1718421293073694720 |