ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation.

Regulatory elements control gene expression through transcription initiation (promoters) and by enhancing transcription at distant regions (enhancers). Accurate identification of regulatory elements is fundamental for annotating genomes and understanding gene expression patterns. While there are man...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Ramzan Umarov, Yu Li, Takahiro Arakawa, Satoshi Takizawa, Xin Gao, Erik Arner
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2021
Materias:
Acceso en línea:https://doaj.org/article/d8d98f21607c4485a344943049cf5b7b
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:d8d98f21607c4485a344943049cf5b7b
record_format dspace
spelling oai:doaj.org-article:d8d98f21607c4485a344943049cf5b7b2021-12-02T19:57:50ZReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation.1553-734X1553-735810.1371/journal.pcbi.1009376https://doaj.org/article/d8d98f21607c4485a344943049cf5b7b2021-09-01T00:00:00Zhttps://doi.org/10.1371/journal.pcbi.1009376https://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358Regulatory elements control gene expression through transcription initiation (promoters) and by enhancing transcription at distant regions (enhancers). Accurate identification of regulatory elements is fundamental for annotating genomes and understanding gene expression patterns. While there are many attempts to develop computational promoter and enhancer identification methods, reliable tools to analyze long genomic sequences are still lacking. Prediction methods often perform poorly on the genome-wide scale because the number of negatives is much higher than that in the training sets. To address this issue, we propose a dynamic negative set updating scheme with a two-model approach, using one model for scanning the genome and the other one for testing candidate positions. The developed method achieves good genome-level performance and maintains robust performance when applied to other vertebrate species, without re-training. Moreover, the unannotated predicted regulatory regions made on the human genome are enriched for disease-associated variants, suggesting them to be potentially true regulatory elements rather than false positives. We validated high scoring "false positive" predictions using reporter assay and all tested candidates were successfully validated, demonstrating the ability of our method to discover novel human regulatory regions.Ramzan UmarovYu LiTakahiro ArakawaSatoshi TakizawaXin GaoErik ArnerPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 17, Iss 9, p e1009376 (2021)
institution DOAJ
collection DOAJ
language EN
topic Biology (General)
QH301-705.5
spellingShingle Biology (General)
QH301-705.5
Ramzan Umarov
Yu Li
Takahiro Arakawa
Satoshi Takizawa
Xin Gao
Erik Arner
ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation.
description Regulatory elements control gene expression through transcription initiation (promoters) and by enhancing transcription at distant regions (enhancers). Accurate identification of regulatory elements is fundamental for annotating genomes and understanding gene expression patterns. While there are many attempts to develop computational promoter and enhancer identification methods, reliable tools to analyze long genomic sequences are still lacking. Prediction methods often perform poorly on the genome-wide scale because the number of negatives is much higher than that in the training sets. To address this issue, we propose a dynamic negative set updating scheme with a two-model approach, using one model for scanning the genome and the other one for testing candidate positions. The developed method achieves good genome-level performance and maintains robust performance when applied to other vertebrate species, without re-training. Moreover, the unannotated predicted regulatory regions made on the human genome are enriched for disease-associated variants, suggesting them to be potentially true regulatory elements rather than false positives. We validated high scoring "false positive" predictions using reporter assay and all tested candidates were successfully validated, demonstrating the ability of our method to discover novel human regulatory regions.
format article
author Ramzan Umarov
Yu Li
Takahiro Arakawa
Satoshi Takizawa
Xin Gao
Erik Arner
author_facet Ramzan Umarov
Yu Li
Takahiro Arakawa
Satoshi Takizawa
Xin Gao
Erik Arner
author_sort Ramzan Umarov
title ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation.
title_short ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation.
title_full ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation.
title_fullStr ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation.
title_full_unstemmed ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation.
title_sort refeafi: genome-wide prediction of regulatory elements driving transcription initiation.
publisher Public Library of Science (PLoS)
publishDate 2021
url https://doaj.org/article/d8d98f21607c4485a344943049cf5b7b
work_keys_str_mv AT ramzanumarov refeafigenomewidepredictionofregulatoryelementsdrivingtranscriptioninitiation
AT yuli refeafigenomewidepredictionofregulatoryelementsdrivingtranscriptioninitiation
AT takahiroarakawa refeafigenomewidepredictionofregulatoryelementsdrivingtranscriptioninitiation
AT satoshitakizawa refeafigenomewidepredictionofregulatoryelementsdrivingtranscriptioninitiation
AT xingao refeafigenomewidepredictionofregulatoryelementsdrivingtranscriptioninitiation
AT erikarner refeafigenomewidepredictionofregulatoryelementsdrivingtranscriptioninitiation
_version_ 1718375809380515840