A multistage gene normalization system integrating multiple effective methods.
Gene/protein recognition and normalization is an important preliminary step for many biological text mining tasks. In this paper, we present a multistage gene normalization system which consists of four major subtasks: pre-processing, dictionary matching, ambiguity resolution and filtering. For the...
Guardado en:
Autores principales: | , , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Public Library of Science (PLoS)
2013
|
Materias: | |
Acceso en línea: | https://doaj.org/article/7f91ad66d832460faae95589572a508b |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:7f91ad66d832460faae95589572a508b |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:7f91ad66d832460faae95589572a508b2021-11-18T08:42:16ZA multistage gene normalization system integrating multiple effective methods.1932-620310.1371/journal.pone.0081956https://doaj.org/article/7f91ad66d832460faae95589572a508b2013-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/24349160/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203Gene/protein recognition and normalization is an important preliminary step for many biological text mining tasks. In this paper, we present a multistage gene normalization system which consists of four major subtasks: pre-processing, dictionary matching, ambiguity resolution and filtering. For the first subtask, we apply the gene mention tagger developed in our earlier work, which achieves an F-score of 88.42% on the BioCreative II GM testing set. In the stage of dictionary matching, the exact matching and approximate matching between gene names and the EntrezGene lexicon have been combined. For the ambiguity resolution subtask, we propose a semantic similarity disambiguation method based on Munkres' Assignment Algorithm. At the last step, a filter based on Wikipedia has been built to remove the false positives. Experimental results show that the presented system can achieve an F-score of 90.1%, outperforming most of the state-of-the-art systems.Lishuang LiShanshan LiuLihua LiWenting FanDegen HuangHuiwei ZhouPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 8, Iss 12, p e81956 (2013) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Medicine R Science Q |
spellingShingle |
Medicine R Science Q Lishuang Li Shanshan Liu Lihua Li Wenting Fan Degen Huang Huiwei Zhou A multistage gene normalization system integrating multiple effective methods. |
description |
Gene/protein recognition and normalization is an important preliminary step for many biological text mining tasks. In this paper, we present a multistage gene normalization system which consists of four major subtasks: pre-processing, dictionary matching, ambiguity resolution and filtering. For the first subtask, we apply the gene mention tagger developed in our earlier work, which achieves an F-score of 88.42% on the BioCreative II GM testing set. In the stage of dictionary matching, the exact matching and approximate matching between gene names and the EntrezGene lexicon have been combined. For the ambiguity resolution subtask, we propose a semantic similarity disambiguation method based on Munkres' Assignment Algorithm. At the last step, a filter based on Wikipedia has been built to remove the false positives. Experimental results show that the presented system can achieve an F-score of 90.1%, outperforming most of the state-of-the-art systems. |
format |
article |
author |
Lishuang Li Shanshan Liu Lihua Li Wenting Fan Degen Huang Huiwei Zhou |
author_facet |
Lishuang Li Shanshan Liu Lihua Li Wenting Fan Degen Huang Huiwei Zhou |
author_sort |
Lishuang Li |
title |
A multistage gene normalization system integrating multiple effective methods. |
title_short |
A multistage gene normalization system integrating multiple effective methods. |
title_full |
A multistage gene normalization system integrating multiple effective methods. |
title_fullStr |
A multistage gene normalization system integrating multiple effective methods. |
title_full_unstemmed |
A multistage gene normalization system integrating multiple effective methods. |
title_sort |
multistage gene normalization system integrating multiple effective methods. |
publisher |
Public Library of Science (PLoS) |
publishDate |
2013 |
url |
https://doaj.org/article/7f91ad66d832460faae95589572a508b |
work_keys_str_mv |
AT lishuangli amultistagegenenormalizationsystemintegratingmultipleeffectivemethods AT shanshanliu amultistagegenenormalizationsystemintegratingmultipleeffectivemethods AT lihuali amultistagegenenormalizationsystemintegratingmultipleeffectivemethods AT wentingfan amultistagegenenormalizationsystemintegratingmultipleeffectivemethods AT degenhuang amultistagegenenormalizationsystemintegratingmultipleeffectivemethods AT huiweizhou amultistagegenenormalizationsystemintegratingmultipleeffectivemethods AT lishuangli multistagegenenormalizationsystemintegratingmultipleeffectivemethods AT shanshanliu multistagegenenormalizationsystemintegratingmultipleeffectivemethods AT lihuali multistagegenenormalizationsystemintegratingmultipleeffectivemethods AT wentingfan multistagegenenormalizationsystemintegratingmultipleeffectivemethods AT degenhuang multistagegenenormalizationsystemintegratingmultipleeffectivemethods AT huiweizhou multistagegenenormalizationsystemintegratingmultipleeffectivemethods |
_version_ |
1718421454614167552 |