The Effects of Signal Erosion and Core Genome Reduction on the Identification of Diagnostic Markers

ABSTRACT Whole-genome sequence (WGS) data are commonly used to design diagnostic targets for the identification of bacterial pathogens. To do this effectively, genomics databases must be comprehensive to identify the strict core genome that is specific to the target pathogen. As additional genomes a...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Jason W. Sahl, Adam J. Vazquez, Carina M. Hall, Joseph D. Busch, Apichai Tuanyok, Mark Mayo, James M. Schupp, Madeline Lummis, Talima Pearson, Kenzie Shippy, Rebecca E. Colman, Christopher J. Allender, Vanessa Theobald, Derek S. Sarovich, Erin P. Price, Alex Hutcheson, Jonas Korlach, John J. LiPuma, Jason Ladner, Sean Lovett, Galina Koroleva, Gustavo Palacios, Direk Limmathurotsakul, Vanaporn Wuthiekanun, Gumphol Wongsuwan, Bart J. Currie, Paul Keim, David M. Wagner
Formato: article
Lenguaje:EN
Publicado: American Society for Microbiology 2016
Materias:
Acceso en línea:https://doaj.org/article/cee03ed17fd64a758eee20d30ac89ba0
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Sumario:ABSTRACT Whole-genome sequence (WGS) data are commonly used to design diagnostic targets for the identification of bacterial pathogens. To do this effectively, genomics databases must be comprehensive to identify the strict core genome that is specific to the target pathogen. As additional genomes are analyzed, the core genome size is reduced and there is erosion of the target-specific regions due to commonality with related species, potentially resulting in the identification of false positives and/or false negatives. IMPORTANCE A comparative analysis of 1,130 Burkholderia genomes identified unique markers for many named species, including the human pathogens B. pseudomallei and B. mallei. Due to core genome reduction and signature erosion, only 38 targets specific to B. pseudomallei/mallei were identified. By using only public genomes, a larger number of markers were identified, due to undersampling, and this larger number represents the potential for false positives. This analysis has implications for the design of diagnostics for other species where the genomic space of the target and/or closely related species is not well defined.