Reduce manual curation by combining gene predictions from multiple annotation engines, a case study of start codon prediction.

Nowadays, prokaryotic genomes are sequenced faster than the capacity to manually curate gene annotations. Automated genome annotation engines provide users a straight-forward and complete solution for predicting ORF coordinates and function. For many labs, the use of AGEs is therefore essential to d...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Thomas H A Ederveen, Lex Overmars, Sacha A F T van Hijum
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2013
Materias:
R
Q
Acceso en línea:https://doaj.org/article/9d9371661e5e45b9844cf7f4d1a261c5
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:9d9371661e5e45b9844cf7f4d1a261c5
record_format dspace
spelling oai:doaj.org-article:9d9371661e5e45b9844cf7f4d1a261c52021-11-18T07:46:05ZReduce manual curation by combining gene predictions from multiple annotation engines, a case study of start codon prediction.1932-620310.1371/journal.pone.0063523https://doaj.org/article/9d9371661e5e45b9844cf7f4d1a261c52013-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/23675487/?tool=EBIhttps://doaj.org/toc/1932-6203Nowadays, prokaryotic genomes are sequenced faster than the capacity to manually curate gene annotations. Automated genome annotation engines provide users a straight-forward and complete solution for predicting ORF coordinates and function. For many labs, the use of AGEs is therefore essential to decrease the time necessary for annotating a given prokaryotic genome. However, it is not uncommon for AGEs to provide different and sometimes conflicting predictions. Combining multiple AGEs might allow for more accurate predictions. Here we analyzed the ab initio open reading frame (ORF) calling performance of different AGEs based on curated genome annotations of eight strains from different bacterial species with GC% ranging from 35-52%. We present a case study which demonstrates a novel way of comparative genome annotation, using combinations of AGEs in a pre-defined order (or path) to predict ORF start codons. The order of AGE combinations is from high to low specificity, where the specificity is based on the eight genome annotations. For each AGE combination we are able to derive a so-called projected confidence value, which is the average specificity of ORF start codon prediction based on the eight genomes. The projected confidence enables estimating likeliness of a correct prediction for a particular ORF start codon by a particular AGE combination, pinpointing ORFs notoriously difficult to predict start codons. We correctly predict start codons for 90.5±4.8% of the genes in a genome (based on the eight genomes) with an accuracy of 81.1±7.6%. Our consensus-path methodology allows a marked improvement over majority voting (9.7±4.4%) and with an optimal path ORF start prediction sensitivity is gained while maintaining a high specificity.Thomas H A EderveenLex OvermarsSacha A F T van HijumPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 8, Iss 5, p e63523 (2013)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Thomas H A Ederveen
Lex Overmars
Sacha A F T van Hijum
Reduce manual curation by combining gene predictions from multiple annotation engines, a case study of start codon prediction.
description Nowadays, prokaryotic genomes are sequenced faster than the capacity to manually curate gene annotations. Automated genome annotation engines provide users a straight-forward and complete solution for predicting ORF coordinates and function. For many labs, the use of AGEs is therefore essential to decrease the time necessary for annotating a given prokaryotic genome. However, it is not uncommon for AGEs to provide different and sometimes conflicting predictions. Combining multiple AGEs might allow for more accurate predictions. Here we analyzed the ab initio open reading frame (ORF) calling performance of different AGEs based on curated genome annotations of eight strains from different bacterial species with GC% ranging from 35-52%. We present a case study which demonstrates a novel way of comparative genome annotation, using combinations of AGEs in a pre-defined order (or path) to predict ORF start codons. The order of AGE combinations is from high to low specificity, where the specificity is based on the eight genome annotations. For each AGE combination we are able to derive a so-called projected confidence value, which is the average specificity of ORF start codon prediction based on the eight genomes. The projected confidence enables estimating likeliness of a correct prediction for a particular ORF start codon by a particular AGE combination, pinpointing ORFs notoriously difficult to predict start codons. We correctly predict start codons for 90.5±4.8% of the genes in a genome (based on the eight genomes) with an accuracy of 81.1±7.6%. Our consensus-path methodology allows a marked improvement over majority voting (9.7±4.4%) and with an optimal path ORF start prediction sensitivity is gained while maintaining a high specificity.
format article
author Thomas H A Ederveen
Lex Overmars
Sacha A F T van Hijum
author_facet Thomas H A Ederveen
Lex Overmars
Sacha A F T van Hijum
author_sort Thomas H A Ederveen
title Reduce manual curation by combining gene predictions from multiple annotation engines, a case study of start codon prediction.
title_short Reduce manual curation by combining gene predictions from multiple annotation engines, a case study of start codon prediction.
title_full Reduce manual curation by combining gene predictions from multiple annotation engines, a case study of start codon prediction.
title_fullStr Reduce manual curation by combining gene predictions from multiple annotation engines, a case study of start codon prediction.
title_full_unstemmed Reduce manual curation by combining gene predictions from multiple annotation engines, a case study of start codon prediction.
title_sort reduce manual curation by combining gene predictions from multiple annotation engines, a case study of start codon prediction.
publisher Public Library of Science (PLoS)
publishDate 2013
url https://doaj.org/article/9d9371661e5e45b9844cf7f4d1a261c5
work_keys_str_mv AT thomashaederveen reducemanualcurationbycombininggenepredictionsfrommultipleannotationenginesacasestudyofstartcodonprediction
AT lexovermars reducemanualcurationbycombininggenepredictionsfrommultipleannotationenginesacasestudyofstartcodonprediction
AT sachaaftvanhijum reducemanualcurationbycombininggenepredictionsfrommultipleannotationenginesacasestudyofstartcodonprediction
_version_ 1718422990930051072