Benchmarking Bacterial Promoter Prediction Tools: Potentialities and Limitations

ABSTRACT The promoter region is a key element required for the production of RNA in bacteria. While new high-throughput technology allows massively parallel mapping of promoter elements, we still mainly rely on bioinformatics tools to predict such elements in bacterial genomes. Additionally, despite...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Murilo Henrique Anzolini Cassiano, Rafael Silva-Rocha
Formato:	article
Lenguaje:	EN
Publicado:	American Society for Microbiology 2020
Materias:	promoter prediction bacterial promoters cis-regulatory elements bioinformatics Microbiology QR1-502
Acceso en línea:	https://doaj.org/article/93812e5b99c34489b76ff7b8aafe4c3f
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:93812e5b99c34489b76ff7b8aafe4c3f
record_format	dspace
spelling	oai:doaj.org-article:93812e5b99c34489b76ff7b8aafe4c3f2021-12-02T18:23:16ZBenchmarking Bacterial Promoter Prediction Tools: Potentialities and Limitations10.1128/mSystems.00439-202379-5077https://doaj.org/article/93812e5b99c34489b76ff7b8aafe4c3f2020-08-01T00:00:00Zhttps://journals.asm.org/doi/10.1128/mSystems.00439-20https://doaj.org/toc/2379-5077ABSTRACT The promoter region is a key element required for the production of RNA in bacteria. While new high-throughput technology allows massively parallel mapping of promoter elements, we still mainly rely on bioinformatics tools to predict such elements in bacterial genomes. Additionally, despite many different prediction tools having become popular to identify bacterial promoters, no systematic comparison of such tools has been performed. Here, we performed a systematic comparison between several widely used promoter prediction tools (BPROM, bTSSfinder, BacPP, CNNProm, IBBP, Virtual Footprint, iPro70-FMWin, 70ProPred, iPromoter-2L, and MULTiPly) using well-defined sequence data sets and standardized metrics to determine how well those tools performed related to each other. For this, we used data sets of experimentally validated promoters from Escherichia coli and a control data set composed of randomly generated sequences with similar nucleotide distributions. We compared the performance of the tools using metrics such as specificity, sensitivity, accuracy, and Matthews correlation coefficient (MCC). We show that the widely used BPROM presented the worse performance among the compared tools, while four tools (CNNProm, iPro70-FMWin, 70ProPred, and iPromoter-2L) offered high predictive power. Of these tools, iPro70-FMWin exhibited the best results for most of the metrics used. We present here some potentials and limitations of available tools, and we hope that future work can build upon our effort to systematically characterize this useful class of bioinformatics tools. IMPORTANCE The correct mapping of promoter elements is a crucial step in microbial genomics. Also, when combining new DNA elements into synthetic sequences, predicting the potential generation of new promoter sequences is critical. Over the last years, many bioinformatics tools have been created to allow users to predict promoter elements in a sequence or genome of interest. Here, we assess the predictive power of some of the main prediction tools available using well-defined promoter data sets. Using Escherichia coli as a model organism, we demonstrated that while some tools are biased toward AT-rich sequences, others are very efficient in identifying real promoters with low false-negative rates. We hope the potentials and limitations presented here will help the microbiology community to choose promoter prediction tools among many available alternatives.Murilo Henrique Anzolini CassianoRafael Silva-RochaAmerican Society for Microbiologyarticlepromoter predictionbacterial promoterscis-regulatory elementsbioinformaticspromoter predictionMicrobiologyQR1-502ENmSystems, Vol 5, Iss 4 (2020)
institution	DOAJ
collection	DOAJ
language	EN
topic	promoter prediction bacterial promoters cis-regulatory elements bioinformatics promoter prediction Microbiology QR1-502
spellingShingle	promoter prediction bacterial promoters cis-regulatory elements bioinformatics promoter prediction Microbiology QR1-502 Murilo Henrique Anzolini Cassiano Rafael Silva-Rocha Benchmarking Bacterial Promoter Prediction Tools: Potentialities and Limitations
description	ABSTRACT The promoter region is a key element required for the production of RNA in bacteria. While new high-throughput technology allows massively parallel mapping of promoter elements, we still mainly rely on bioinformatics tools to predict such elements in bacterial genomes. Additionally, despite many different prediction tools having become popular to identify bacterial promoters, no systematic comparison of such tools has been performed. Here, we performed a systematic comparison between several widely used promoter prediction tools (BPROM, bTSSfinder, BacPP, CNNProm, IBBP, Virtual Footprint, iPro70-FMWin, 70ProPred, iPromoter-2L, and MULTiPly) using well-defined sequence data sets and standardized metrics to determine how well those tools performed related to each other. For this, we used data sets of experimentally validated promoters from Escherichia coli and a control data set composed of randomly generated sequences with similar nucleotide distributions. We compared the performance of the tools using metrics such as specificity, sensitivity, accuracy, and Matthews correlation coefficient (MCC). We show that the widely used BPROM presented the worse performance among the compared tools, while four tools (CNNProm, iPro70-FMWin, 70ProPred, and iPromoter-2L) offered high predictive power. Of these tools, iPro70-FMWin exhibited the best results for most of the metrics used. We present here some potentials and limitations of available tools, and we hope that future work can build upon our effort to systematically characterize this useful class of bioinformatics tools. IMPORTANCE The correct mapping of promoter elements is a crucial step in microbial genomics. Also, when combining new DNA elements into synthetic sequences, predicting the potential generation of new promoter sequences is critical. Over the last years, many bioinformatics tools have been created to allow users to predict promoter elements in a sequence or genome of interest. Here, we assess the predictive power of some of the main prediction tools available using well-defined promoter data sets. Using Escherichia coli as a model organism, we demonstrated that while some tools are biased toward AT-rich sequences, others are very efficient in identifying real promoters with low false-negative rates. We hope the potentials and limitations presented here will help the microbiology community to choose promoter prediction tools among many available alternatives.
format	article
author	Murilo Henrique Anzolini Cassiano Rafael Silva-Rocha
author_facet	Murilo Henrique Anzolini Cassiano Rafael Silva-Rocha
author_sort	Murilo Henrique Anzolini Cassiano
title	Benchmarking Bacterial Promoter Prediction Tools: Potentialities and Limitations
title_short	Benchmarking Bacterial Promoter Prediction Tools: Potentialities and Limitations
title_full	Benchmarking Bacterial Promoter Prediction Tools: Potentialities and Limitations
title_fullStr	Benchmarking Bacterial Promoter Prediction Tools: Potentialities and Limitations
title_full_unstemmed	Benchmarking Bacterial Promoter Prediction Tools: Potentialities and Limitations
title_sort	benchmarking bacterial promoter prediction tools: potentialities and limitations
publisher	American Society for Microbiology
publishDate	2020
url	https://doaj.org/article/93812e5b99c34489b76ff7b8aafe4c3f
work_keys_str_mv	AT murilohenriqueanzolinicassiano benchmarkingbacterialpromoterpredictiontoolspotentialitiesandlimitations AT rafaelsilvarocha benchmarkingbacterialpromoterpredictiontoolspotentialitiesandlimitations
_version_	1718378109795827712

Benchmarking Bacterial Promoter Prediction Tools: Potentialities and Limitations

Ejemplares similares