Drug knowledge discovery via multi-task learning and pre-trained models

Abstract Background Drug repurposing is to find new indications of approved drugs, which is essential for investigating new uses for approved or investigational drug efficiency. The active gene annotation corpus (named AGAC) is annotated by human experts, which was developed to support knowledge dis...

Description complète

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Dongfang Li, Ying Xiong, Baotian Hu, Buzhou Tang, Weihua Peng, Qingcai Chen
Format:	article
Langue:	EN
Publié:	BMC 2021
Sujets:	Gene mutation Drug repurposing Biomedical language models Computer applications to medicine. Medical informatics R858-859.7
Accès en ligne:	https://doaj.org/article/fab7c446aec74a4f9d9fe59723779ee3
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

id	oai:doaj.org-article:fab7c446aec74a4f9d9fe59723779ee3
record_format	dspace
spelling	oai:doaj.org-article:fab7c446aec74a4f9d9fe59723779ee32021-11-21T12:28:54ZDrug knowledge discovery via multi-task learning and pre-trained models10.1186/s12911-021-01614-71472-6947https://doaj.org/article/fab7c446aec74a4f9d9fe59723779ee32021-11-01T00:00:00Zhttps://doi.org/10.1186/s12911-021-01614-7https://doaj.org/toc/1472-6947Abstract Background Drug repurposing is to find new indications of approved drugs, which is essential for investigating new uses for approved or investigational drug efficiency. The active gene annotation corpus (named AGAC) is annotated by human experts, which was developed to support knowledge discovery for drug repurposing. The AGAC track of the BioNLP Open Shared Tasks using this corpus is organized by EMNLP-BioNLP 2019, where the “Selective annotation” attribution makes AGAC track more challenging than other traditional sequence labeling tasks. In this work, we show our methods for trigger word detection (Task 1) and its thematic role identification (Task 2) in the AGAC track. As a step forward to drug repurposing research, our work can also be applied to large-scale automatic extraction of medical text knowledge. Methods To meet the challenges of the two tasks, we consider Task 1 as the medical name entity recognition (NER), which cultivates molecular phenomena related to gene mutation. And we regard Task 2 as a relation extraction task, which captures the thematic roles between entities. In this work, we exploit pre-trained biomedical language representation models (e.g., BioBERT) in the information extraction pipeline for mutation-disease knowledge collection from PubMed. Moreover, we design the fine-tuning framework by using a multi-task learning technique and extra features. We further investigate different approaches to consolidate and transfer the knowledge from varying sources and illustrate the performance of our model on the AGAC corpus. Our approach is based on fine-tuned BERT, BioBERT, NCBI BERT, and ClinicalBERT using multi-task learning. Further experiments show the effectiveness of knowledge transformation and the ensemble integration of models of two tasks. We conduct a performance comparison of various algorithms. We also do an ablation study on the development set of Task 1 to examine the effectiveness of each component of our method. Results Compared with competitor methods, our model obtained the highest Precision (0.63), Recall (0.56), and F-score value (0.60) in Task 1, which ranks first place. It outperformed the baseline method provided by the organizers by 0.10 in F-score. The model shared the same encoding layers for the named entity recognition and relation extraction parts. And we obtained a second high F-score (0.25) in Task 2 with a simple but effective framework. Conclusions Experimental results on the benchmark annotation of genes with active mutation-centric function changes corpus show that integrating pre-trained biomedical language representation models (i.e., BERT, NCBI BERT, ClinicalBERT, BioBERT) into a pipe of information extraction methods with multi-task learning can improve the ability to collect mutation-disease knowledge from PubMed.Dongfang LiYing XiongBaotian HuBuzhou TangWeihua PengQingcai ChenBMCarticleGene mutationDrug repurposingBiomedical language modelsComputer applications to medicine. Medical informaticsR858-859.7ENBMC Medical Informatics and Decision Making, Vol 21, Iss S9, Pp 1-9 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Gene mutation Drug repurposing Biomedical language models Computer applications to medicine. Medical informatics R858-859.7
spellingShingle	Gene mutation Drug repurposing Biomedical language models Computer applications to medicine. Medical informatics R858-859.7 Dongfang Li Ying Xiong Baotian Hu Buzhou Tang Weihua Peng Qingcai Chen Drug knowledge discovery via multi-task learning and pre-trained models
description	Abstract Background Drug repurposing is to find new indications of approved drugs, which is essential for investigating new uses for approved or investigational drug efficiency. The active gene annotation corpus (named AGAC) is annotated by human experts, which was developed to support knowledge discovery for drug repurposing. The AGAC track of the BioNLP Open Shared Tasks using this corpus is organized by EMNLP-BioNLP 2019, where the “Selective annotation” attribution makes AGAC track more challenging than other traditional sequence labeling tasks. In this work, we show our methods for trigger word detection (Task 1) and its thematic role identification (Task 2) in the AGAC track. As a step forward to drug repurposing research, our work can also be applied to large-scale automatic extraction of medical text knowledge. Methods To meet the challenges of the two tasks, we consider Task 1 as the medical name entity recognition (NER), which cultivates molecular phenomena related to gene mutation. And we regard Task 2 as a relation extraction task, which captures the thematic roles between entities. In this work, we exploit pre-trained biomedical language representation models (e.g., BioBERT) in the information extraction pipeline for mutation-disease knowledge collection from PubMed. Moreover, we design the fine-tuning framework by using a multi-task learning technique and extra features. We further investigate different approaches to consolidate and transfer the knowledge from varying sources and illustrate the performance of our model on the AGAC corpus. Our approach is based on fine-tuned BERT, BioBERT, NCBI BERT, and ClinicalBERT using multi-task learning. Further experiments show the effectiveness of knowledge transformation and the ensemble integration of models of two tasks. We conduct a performance comparison of various algorithms. We also do an ablation study on the development set of Task 1 to examine the effectiveness of each component of our method. Results Compared with competitor methods, our model obtained the highest Precision (0.63), Recall (0.56), and F-score value (0.60) in Task 1, which ranks first place. It outperformed the baseline method provided by the organizers by 0.10 in F-score. The model shared the same encoding layers for the named entity recognition and relation extraction parts. And we obtained a second high F-score (0.25) in Task 2 with a simple but effective framework. Conclusions Experimental results on the benchmark annotation of genes with active mutation-centric function changes corpus show that integrating pre-trained biomedical language representation models (i.e., BERT, NCBI BERT, ClinicalBERT, BioBERT) into a pipe of information extraction methods with multi-task learning can improve the ability to collect mutation-disease knowledge from PubMed.
format	article
author	Dongfang Li Ying Xiong Baotian Hu Buzhou Tang Weihua Peng Qingcai Chen
author_facet	Dongfang Li Ying Xiong Baotian Hu Buzhou Tang Weihua Peng Qingcai Chen
author_sort	Dongfang Li
title	Drug knowledge discovery via multi-task learning and pre-trained models
title_short	Drug knowledge discovery via multi-task learning and pre-trained models
title_full	Drug knowledge discovery via multi-task learning and pre-trained models
title_fullStr	Drug knowledge discovery via multi-task learning and pre-trained models
title_full_unstemmed	Drug knowledge discovery via multi-task learning and pre-trained models
title_sort	drug knowledge discovery via multi-task learning and pre-trained models
publisher	BMC
publishDate	2021
url	https://doaj.org/article/fab7c446aec74a4f9d9fe59723779ee3
work_keys_str_mv	AT dongfangli drugknowledgediscoveryviamultitasklearningandpretrainedmodels AT yingxiong drugknowledgediscoveryviamultitasklearningandpretrainedmodels AT baotianhu drugknowledgediscoveryviamultitasklearningandpretrainedmodels AT buzhoutang drugknowledgediscoveryviamultitasklearningandpretrainedmodels AT weihuapeng drugknowledgediscoveryviamultitasklearningandpretrainedmodels AT qingcaichen drugknowledgediscoveryviamultitasklearningandpretrainedmodels
_version_	1718419041896366080

Drug knowledge discovery via multi-task learning and pre-trained models

Documents similaires