TSEBRA: transcript selector for BRAKER

Abstract Background BRAKER is a suite of automatic pipelines, BRAKER1 and BRAKER2, for the accurate annotation of protein-coding genes in eukaryotic genomes. Each pipeline trains statistical models of protein-coding genes based on provided evidence and, then predicts protein-coding genes in genomic...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Lars Gabriel, Katharina J. Hoff, Tomáš Brůna, Mark Borodovsky, Mario Stanke
Formato: article
Lenguaje:EN
Publicado: BMC 2021
Materias:
Acceso en línea:https://doaj.org/article/c79f7efdc41e45889aec4d525a2a106e
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:c79f7efdc41e45889aec4d525a2a106e
record_format dspace
spelling oai:doaj.org-article:c79f7efdc41e45889aec4d525a2a106e2021-11-28T12:11:03ZTSEBRA: transcript selector for BRAKER10.1186/s12859-021-04482-01471-2105https://doaj.org/article/c79f7efdc41e45889aec4d525a2a106e2021-11-01T00:00:00Zhttps://doi.org/10.1186/s12859-021-04482-0https://doaj.org/toc/1471-2105Abstract Background BRAKER is a suite of automatic pipelines, BRAKER1 and BRAKER2, for the accurate annotation of protein-coding genes in eukaryotic genomes. Each pipeline trains statistical models of protein-coding genes based on provided evidence and, then predicts protein-coding genes in genomic sequences using both the extrinsic evidence and statistical models. For training and prediction, BRAKER1 and BRAKER2 incorporate complementary extrinsic evidence: BRAKER1 uses only RNA-seq data while BRAKER2 uses only a database of cross-species proteins. The BRAKER suite has so far not been able to reliably exceed the accuracy of BRAKER1 and BRAKER2 when incorporating both types of evidence simultaneously. Currently, for a novel genome project where both RNA-seq and protein data are available, the best option is to run both pipelines independently, and to pick one, likely better output. Therefore, one or another type of the extrinsic evidence would remain unexploited. Results We present TSEBRA, a software that selects gene predictions (transcripts) from the sets generated by BRAKER1 and BRAKER2. TSEBRA uses a set of rules to compare scores of overlapping transcripts based on their support by RNA-seq and homologous protein evidence. We show in computational experiments on genomes of 11 species that TSEBRA achieves higher accuracy than either BRAKER1 or BRAKER2 running alone and that TSEBRA compares favorably with the combiner tool EVidenceModeler. Conclusion TSEBRA is an easy-to-use and fast software tool. It can be used in concert with the BRAKER pipeline to generate a gene prediction set supported by both RNA-seq and homologous protein evidence.Lars GabrielKatharina J. HoffTomáš BrůnaMark BorodovskyMario StankeBMCarticleGenome annotationGene predictionProtein-coding genesEvidence integrationRNA-seqProtein homologyComputer applications to medicine. Medical informaticsR858-859.7Biology (General)QH301-705.5ENBMC Bioinformatics, Vol 22, Iss 1, Pp 1-12 (2021)
institution DOAJ
collection DOAJ
language EN
topic Genome annotation
Gene prediction
Protein-coding genes
Evidence integration
RNA-seq
Protein homology
Computer applications to medicine. Medical informatics
R858-859.7
Biology (General)
QH301-705.5
spellingShingle Genome annotation
Gene prediction
Protein-coding genes
Evidence integration
RNA-seq
Protein homology
Computer applications to medicine. Medical informatics
R858-859.7
Biology (General)
QH301-705.5
Lars Gabriel
Katharina J. Hoff
Tomáš Brůna
Mark Borodovsky
Mario Stanke
TSEBRA: transcript selector for BRAKER
description Abstract Background BRAKER is a suite of automatic pipelines, BRAKER1 and BRAKER2, for the accurate annotation of protein-coding genes in eukaryotic genomes. Each pipeline trains statistical models of protein-coding genes based on provided evidence and, then predicts protein-coding genes in genomic sequences using both the extrinsic evidence and statistical models. For training and prediction, BRAKER1 and BRAKER2 incorporate complementary extrinsic evidence: BRAKER1 uses only RNA-seq data while BRAKER2 uses only a database of cross-species proteins. The BRAKER suite has so far not been able to reliably exceed the accuracy of BRAKER1 and BRAKER2 when incorporating both types of evidence simultaneously. Currently, for a novel genome project where both RNA-seq and protein data are available, the best option is to run both pipelines independently, and to pick one, likely better output. Therefore, one or another type of the extrinsic evidence would remain unexploited. Results We present TSEBRA, a software that selects gene predictions (transcripts) from the sets generated by BRAKER1 and BRAKER2. TSEBRA uses a set of rules to compare scores of overlapping transcripts based on their support by RNA-seq and homologous protein evidence. We show in computational experiments on genomes of 11 species that TSEBRA achieves higher accuracy than either BRAKER1 or BRAKER2 running alone and that TSEBRA compares favorably with the combiner tool EVidenceModeler. Conclusion TSEBRA is an easy-to-use and fast software tool. It can be used in concert with the BRAKER pipeline to generate a gene prediction set supported by both RNA-seq and homologous protein evidence.
format article
author Lars Gabriel
Katharina J. Hoff
Tomáš Brůna
Mark Borodovsky
Mario Stanke
author_facet Lars Gabriel
Katharina J. Hoff
Tomáš Brůna
Mark Borodovsky
Mario Stanke
author_sort Lars Gabriel
title TSEBRA: transcript selector for BRAKER
title_short TSEBRA: transcript selector for BRAKER
title_full TSEBRA: transcript selector for BRAKER
title_fullStr TSEBRA: transcript selector for BRAKER
title_full_unstemmed TSEBRA: transcript selector for BRAKER
title_sort tsebra: transcript selector for braker
publisher BMC
publishDate 2021
url https://doaj.org/article/c79f7efdc41e45889aec4d525a2a106e
work_keys_str_mv AT larsgabriel tsebratranscriptselectorforbraker
AT katharinajhoff tsebratranscriptselectorforbraker
AT tomasbruna tsebratranscriptselectorforbraker
AT markborodovsky tsebratranscriptselectorforbraker
AT mariostanke tsebratranscriptselectorforbraker
_version_ 1718408166998278144