A multistage gene normalization system integrating multiple effective methods.

Gene/protein recognition and normalization is an important preliminary step for many biological text mining tasks. In this paper, we present a multistage gene normalization system which consists of four major subtasks: pre-processing, dictionary matching, ambiguity resolution and filtering. For the...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Lishuang Li, Shanshan Liu, Lihua Li, Wenting Fan, Degen Huang, Huiwei Zhou
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2013
Materias:
R
Q
Acceso en línea:https://doaj.org/article/7f91ad66d832460faae95589572a508b
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:7f91ad66d832460faae95589572a508b
record_format dspace
spelling oai:doaj.org-article:7f91ad66d832460faae95589572a508b2021-11-18T08:42:16ZA multistage gene normalization system integrating multiple effective methods.1932-620310.1371/journal.pone.0081956https://doaj.org/article/7f91ad66d832460faae95589572a508b2013-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/24349160/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203Gene/protein recognition and normalization is an important preliminary step for many biological text mining tasks. In this paper, we present a multistage gene normalization system which consists of four major subtasks: pre-processing, dictionary matching, ambiguity resolution and filtering. For the first subtask, we apply the gene mention tagger developed in our earlier work, which achieves an F-score of 88.42% on the BioCreative II GM testing set. In the stage of dictionary matching, the exact matching and approximate matching between gene names and the EntrezGene lexicon have been combined. For the ambiguity resolution subtask, we propose a semantic similarity disambiguation method based on Munkres' Assignment Algorithm. At the last step, a filter based on Wikipedia has been built to remove the false positives. Experimental results show that the presented system can achieve an F-score of 90.1%, outperforming most of the state-of-the-art systems.Lishuang LiShanshan LiuLihua LiWenting FanDegen HuangHuiwei ZhouPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 8, Iss 12, p e81956 (2013)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Lishuang Li
Shanshan Liu
Lihua Li
Wenting Fan
Degen Huang
Huiwei Zhou
A multistage gene normalization system integrating multiple effective methods.
description Gene/protein recognition and normalization is an important preliminary step for many biological text mining tasks. In this paper, we present a multistage gene normalization system which consists of four major subtasks: pre-processing, dictionary matching, ambiguity resolution and filtering. For the first subtask, we apply the gene mention tagger developed in our earlier work, which achieves an F-score of 88.42% on the BioCreative II GM testing set. In the stage of dictionary matching, the exact matching and approximate matching between gene names and the EntrezGene lexicon have been combined. For the ambiguity resolution subtask, we propose a semantic similarity disambiguation method based on Munkres' Assignment Algorithm. At the last step, a filter based on Wikipedia has been built to remove the false positives. Experimental results show that the presented system can achieve an F-score of 90.1%, outperforming most of the state-of-the-art systems.
format article
author Lishuang Li
Shanshan Liu
Lihua Li
Wenting Fan
Degen Huang
Huiwei Zhou
author_facet Lishuang Li
Shanshan Liu
Lihua Li
Wenting Fan
Degen Huang
Huiwei Zhou
author_sort Lishuang Li
title A multistage gene normalization system integrating multiple effective methods.
title_short A multistage gene normalization system integrating multiple effective methods.
title_full A multistage gene normalization system integrating multiple effective methods.
title_fullStr A multistage gene normalization system integrating multiple effective methods.
title_full_unstemmed A multistage gene normalization system integrating multiple effective methods.
title_sort multistage gene normalization system integrating multiple effective methods.
publisher Public Library of Science (PLoS)
publishDate 2013
url https://doaj.org/article/7f91ad66d832460faae95589572a508b
work_keys_str_mv AT lishuangli amultistagegenenormalizationsystemintegratingmultipleeffectivemethods
AT shanshanliu amultistagegenenormalizationsystemintegratingmultipleeffectivemethods
AT lihuali amultistagegenenormalizationsystemintegratingmultipleeffectivemethods
AT wentingfan amultistagegenenormalizationsystemintegratingmultipleeffectivemethods
AT degenhuang amultistagegenenormalizationsystemintegratingmultipleeffectivemethods
AT huiweizhou amultistagegenenormalizationsystemintegratingmultipleeffectivemethods
AT lishuangli multistagegenenormalizationsystemintegratingmultipleeffectivemethods
AT shanshanliu multistagegenenormalizationsystemintegratingmultipleeffectivemethods
AT lihuali multistagegenenormalizationsystemintegratingmultipleeffectivemethods
AT wentingfan multistagegenenormalizationsystemintegratingmultipleeffectivemethods
AT degenhuang multistagegenenormalizationsystemintegratingmultipleeffectivemethods
AT huiweizhou multistagegenenormalizationsystemintegratingmultipleeffectivemethods
_version_ 1718421454614167552