T3SEpp: an Integrated Prediction Pipeline for Bacterial Type III Secreted Effectors

ABSTRACT Many Gram-negative bacteria infect hosts and cause diseases by translocating a variety of type III secreted effectors (T3SEs) into the host cell cytoplasm. However, despite a dramatic increase in the number of available whole-genome sequences, it remains challenging for accurate prediction...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Xinjie Hui, Zewei Chen, Mingxiong Lin, Junya Zhang, Yueming Hu, Yingying Zeng, Xi Cheng, Le Ou-Yang, Ming-an Sun, Aaron P. White, Yejun Wang
Formato: article
Lenguaje:EN
Publicado: American Society for Microbiology 2020
Materias:
Acceso en línea:https://doaj.org/article/5e882c630d7a45568a4d3a9ddf4dfb3d
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:5e882c630d7a45568a4d3a9ddf4dfb3d
record_format dspace
spelling oai:doaj.org-article:5e882c630d7a45568a4d3a9ddf4dfb3d2021-12-02T19:46:20ZT3SEpp: an Integrated Prediction Pipeline for Bacterial Type III Secreted Effectors10.1128/mSystems.00288-202379-5077https://doaj.org/article/5e882c630d7a45568a4d3a9ddf4dfb3d2020-08-01T00:00:00Zhttps://journals.asm.org/doi/10.1128/mSystems.00288-20https://doaj.org/toc/2379-5077ABSTRACT Many Gram-negative bacteria infect hosts and cause diseases by translocating a variety of type III secreted effectors (T3SEs) into the host cell cytoplasm. However, despite a dramatic increase in the number of available whole-genome sequences, it remains challenging for accurate prediction of T3SEs. Traditional prediction models have focused on atypical sequence features buried in the N-terminal peptides of T3SEs, but unfortunately, these models have had high false-positive rates. In this research, we integrated promoter information along with characteristic protein features for signal regions, chaperone-binding domains, and effector domains for T3SE prediction. Machine learning algorithms, including deep learning, were adopted to predict the atypical features mainly buried in signal sequences of T3SEs, followed by development of a voting-based ensemble model integrating the individual prediction results. We assembled this into a unified T3SE prediction pipeline, T3SEpp, which integrated the results of individual modules, resulting in high accuracy (i.e., ∼0.94) and >1-fold reduction in the false-positive rate compared to that of state-of-the-art software tools. The T3SEpp pipeline and sequence features observed here will facilitate the accurate identification of new T3SEs, with numerous benefits for future studies on host-pathogen interactions. IMPORTANCE Type III secreted effector (T3SE) prediction remains a big computational challenge. In practical applications, current software tools often suffer problems of high false-positive rates. One of the causal factors could be the relatively unitary type of biological features used for the design and training of the models. In this research, we made a comprehensive survey on the sequence-based features of T3SEs, including signal sequences, chaperone-binding domains, effector domains, and transcription factor binding promoter sites, and assembled a unified prediction pipeline integrating multi-aspect biological features within homology-based and multiple machine learning models. To our knowledge, we have compiled the most comprehensive biological sequence feature analysis for T3SEs in this research. The T3SEpp pipeline integrating the variety of features and assembling different models showed high accuracy, which should facilitate more accurate identification of T3SEs in new and existing bacterial whole-genome sequences.Xinjie HuiZewei ChenMingxiong LinJunya ZhangYueming HuYingying ZengXi ChengLe Ou-YangMing-an SunAaron P. WhiteYejun WangAmerican Society for Microbiologyarticleeffectormachine learningpredictionT3SEppT3SStype III secretion systemMicrobiologyQR1-502ENmSystems, Vol 5, Iss 4 (2020)
institution DOAJ
collection DOAJ
language EN
topic effector
machine learning
prediction
T3SEpp
T3SS
type III secretion system
Microbiology
QR1-502
spellingShingle effector
machine learning
prediction
T3SEpp
T3SS
type III secretion system
Microbiology
QR1-502
Xinjie Hui
Zewei Chen
Mingxiong Lin
Junya Zhang
Yueming Hu
Yingying Zeng
Xi Cheng
Le Ou-Yang
Ming-an Sun
Aaron P. White
Yejun Wang
T3SEpp: an Integrated Prediction Pipeline for Bacterial Type III Secreted Effectors
description ABSTRACT Many Gram-negative bacteria infect hosts and cause diseases by translocating a variety of type III secreted effectors (T3SEs) into the host cell cytoplasm. However, despite a dramatic increase in the number of available whole-genome sequences, it remains challenging for accurate prediction of T3SEs. Traditional prediction models have focused on atypical sequence features buried in the N-terminal peptides of T3SEs, but unfortunately, these models have had high false-positive rates. In this research, we integrated promoter information along with characteristic protein features for signal regions, chaperone-binding domains, and effector domains for T3SE prediction. Machine learning algorithms, including deep learning, were adopted to predict the atypical features mainly buried in signal sequences of T3SEs, followed by development of a voting-based ensemble model integrating the individual prediction results. We assembled this into a unified T3SE prediction pipeline, T3SEpp, which integrated the results of individual modules, resulting in high accuracy (i.e., ∼0.94) and >1-fold reduction in the false-positive rate compared to that of state-of-the-art software tools. The T3SEpp pipeline and sequence features observed here will facilitate the accurate identification of new T3SEs, with numerous benefits for future studies on host-pathogen interactions. IMPORTANCE Type III secreted effector (T3SE) prediction remains a big computational challenge. In practical applications, current software tools often suffer problems of high false-positive rates. One of the causal factors could be the relatively unitary type of biological features used for the design and training of the models. In this research, we made a comprehensive survey on the sequence-based features of T3SEs, including signal sequences, chaperone-binding domains, effector domains, and transcription factor binding promoter sites, and assembled a unified prediction pipeline integrating multi-aspect biological features within homology-based and multiple machine learning models. To our knowledge, we have compiled the most comprehensive biological sequence feature analysis for T3SEs in this research. The T3SEpp pipeline integrating the variety of features and assembling different models showed high accuracy, which should facilitate more accurate identification of T3SEs in new and existing bacterial whole-genome sequences.
format article
author Xinjie Hui
Zewei Chen
Mingxiong Lin
Junya Zhang
Yueming Hu
Yingying Zeng
Xi Cheng
Le Ou-Yang
Ming-an Sun
Aaron P. White
Yejun Wang
author_facet Xinjie Hui
Zewei Chen
Mingxiong Lin
Junya Zhang
Yueming Hu
Yingying Zeng
Xi Cheng
Le Ou-Yang
Ming-an Sun
Aaron P. White
Yejun Wang
author_sort Xinjie Hui
title T3SEpp: an Integrated Prediction Pipeline for Bacterial Type III Secreted Effectors
title_short T3SEpp: an Integrated Prediction Pipeline for Bacterial Type III Secreted Effectors
title_full T3SEpp: an Integrated Prediction Pipeline for Bacterial Type III Secreted Effectors
title_fullStr T3SEpp: an Integrated Prediction Pipeline for Bacterial Type III Secreted Effectors
title_full_unstemmed T3SEpp: an Integrated Prediction Pipeline for Bacterial Type III Secreted Effectors
title_sort t3sepp: an integrated prediction pipeline for bacterial type iii secreted effectors
publisher American Society for Microbiology
publishDate 2020
url https://doaj.org/article/5e882c630d7a45568a4d3a9ddf4dfb3d
work_keys_str_mv AT xinjiehui t3seppanintegratedpredictionpipelineforbacterialtypeiiisecretedeffectors
AT zeweichen t3seppanintegratedpredictionpipelineforbacterialtypeiiisecretedeffectors
AT mingxionglin t3seppanintegratedpredictionpipelineforbacterialtypeiiisecretedeffectors
AT junyazhang t3seppanintegratedpredictionpipelineforbacterialtypeiiisecretedeffectors
AT yueminghu t3seppanintegratedpredictionpipelineforbacterialtypeiiisecretedeffectors
AT yingyingzeng t3seppanintegratedpredictionpipelineforbacterialtypeiiisecretedeffectors
AT xicheng t3seppanintegratedpredictionpipelineforbacterialtypeiiisecretedeffectors
AT leouyang t3seppanintegratedpredictionpipelineforbacterialtypeiiisecretedeffectors
AT mingansun t3seppanintegratedpredictionpipelineforbacterialtypeiiisecretedeffectors
AT aaronpwhite t3seppanintegratedpredictionpipelineforbacterialtypeiiisecretedeffectors
AT yejunwang t3seppanintegratedpredictionpipelineforbacterialtypeiiisecretedeffectors
_version_ 1718375963410038784