TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery
Abstract Target identification and prioritisation are prominent first steps in modern drug discovery. Traditionally, individual scientists have used their expertise to manually interpret scientific literature and prioritise opportunities. However, increasing publication rates and the wider routine c...
Guardado en:
Autores principales: | , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Nature Portfolio
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/5772d4e7752d418392e61c47ac3c588d |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:5772d4e7752d418392e61c47ac3c588d |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:5772d4e7752d418392e61c47ac3c588d2021-12-02T16:35:27ZTrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery10.1038/s41598-021-94897-92045-2322https://doaj.org/article/5772d4e7752d418392e61c47ac3c588d2021-08-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-94897-9https://doaj.org/toc/2045-2322Abstract Target identification and prioritisation are prominent first steps in modern drug discovery. Traditionally, individual scientists have used their expertise to manually interpret scientific literature and prioritise opportunities. However, increasing publication rates and the wider routine coverage of human genes by omic-scale research make it difficult to maintain meaningful overviews from which to identify promising new trends. Here we propose an automated yet flexible pipeline that identifies trends in the scientific corpus which align with the specific interests of a researcher and facilitate an initial prioritisation of opportunities. Using a procedure based on co-citation networks and machine learning, genes and diseases are first parsed from PubMed articles using a novel named entity recognition system together with publication date and supporting information. Then recurrent neural networks are trained to predict the publication dynamics of all human genes. For a user-defined therapeutic focus, genes generating more publications or citations are identified as high-interest targets. We also used topic detection routines to help understand why a gene is trendy and implement a system to propose the most prominent review articles for a potential target. This TrendyGenes pipeline detects emerging targets and pathways and provides a new way to explore the literature for individual researchers, pharmaceutical companies and funding agencies.Guillermo Serrano NájeraDavid Narganes CarlónDaniel J. CrowtherNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-18 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Medicine R Science Q |
spellingShingle |
Medicine R Science Q Guillermo Serrano Nájera David Narganes Carlón Daniel J. Crowther TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery |
description |
Abstract Target identification and prioritisation are prominent first steps in modern drug discovery. Traditionally, individual scientists have used their expertise to manually interpret scientific literature and prioritise opportunities. However, increasing publication rates and the wider routine coverage of human genes by omic-scale research make it difficult to maintain meaningful overviews from which to identify promising new trends. Here we propose an automated yet flexible pipeline that identifies trends in the scientific corpus which align with the specific interests of a researcher and facilitate an initial prioritisation of opportunities. Using a procedure based on co-citation networks and machine learning, genes and diseases are first parsed from PubMed articles using a novel named entity recognition system together with publication date and supporting information. Then recurrent neural networks are trained to predict the publication dynamics of all human genes. For a user-defined therapeutic focus, genes generating more publications or citations are identified as high-interest targets. We also used topic detection routines to help understand why a gene is trendy and implement a system to propose the most prominent review articles for a potential target. This TrendyGenes pipeline detects emerging targets and pathways and provides a new way to explore the literature for individual researchers, pharmaceutical companies and funding agencies. |
format |
article |
author |
Guillermo Serrano Nájera David Narganes Carlón Daniel J. Crowther |
author_facet |
Guillermo Serrano Nájera David Narganes Carlón Daniel J. Crowther |
author_sort |
Guillermo Serrano Nájera |
title |
TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery |
title_short |
TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery |
title_full |
TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery |
title_fullStr |
TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery |
title_full_unstemmed |
TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery |
title_sort |
trendygenes, a computational pipeline for the detection of literature trends in academia and drug discovery |
publisher |
Nature Portfolio |
publishDate |
2021 |
url |
https://doaj.org/article/5772d4e7752d418392e61c47ac3c588d |
work_keys_str_mv |
AT guillermoserranonajera trendygenesacomputationalpipelineforthedetectionofliteraturetrendsinacademiaanddrugdiscovery AT davidnarganescarlon trendygenesacomputationalpipelineforthedetectionofliteraturetrendsinacademiaanddrugdiscovery AT danieljcrowther trendygenesacomputationalpipelineforthedetectionofliteraturetrendsinacademiaanddrugdiscovery |
_version_ |
1718383689400844288 |