FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies

Multilocus Sequence Typing (MLST) is a precise microbial typing approach at the intra-species level for epidemiologic and evolutionary purposes. It operates by assigning a sequence type (ST) identifier to each specimen, based on a combination of alleles of multiple housekeeping genes included in a d...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Enzo Guerrero-Araya, Marina Muñoz, César Rodríguez, Daniel Paredes-Sabja
Formato: article
Lenguaje:EN
Publicado: SAGE Publishing 2021
Materias:
Acceso en línea:https://doaj.org/article/44190e03916f474aad18fbbbf3b7c4f8
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:44190e03916f474aad18fbbbf3b7c4f8
record_format dspace
spelling oai:doaj.org-article:44190e03916f474aad18fbbbf3b7c4f82021-12-01T00:06:08ZFastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies1177-932210.1177/11779322211059238https://doaj.org/article/44190e03916f474aad18fbbbf3b7c4f82021-11-01T00:00:00Zhttps://doi.org/10.1177/11779322211059238https://doaj.org/toc/1177-9322Multilocus Sequence Typing (MLST) is a precise microbial typing approach at the intra-species level for epidemiologic and evolutionary purposes. It operates by assigning a sequence type (ST) identifier to each specimen, based on a combination of alleles of multiple housekeeping genes included in a defined scheme. The use of MLST has multiplied due to the availability of large numbers of genomic sequences and epidemiologic data in public repositories. However, data processing speed has become problematic due to the massive size of modern datasets. Here, we present FastMLST, a tool that is designed to perform PubMLST searches using BLASTn and a divide-and-conquer approach that processes each genome assembly in parallel. The output offered by FastMLST includes a table with the ST, allelic profile, and clonal complex or clade (when available), detected for a query, as well as a multi-FASTA file or a series of FASTA files with the concatenated or single allele sequences detected, respectively. FastMLST was validated with 91 different species, with a wide range of guanine-cytosine content (%GC), genome sizes, and fragmentation levels, and a speed test was performed on 3 datasets with varying genome sizes. Compared with other tools such as mlst, CGE/MLST, MLSTar, and PubMLST, FastMLST takes advantage of multiple processors to simultaneously type up to 28 000 genomes in less than 10 minutes, reducing processing times by at least 3-fold with 100% concordance to PubMLST, if contaminated genomes are excluded from the analysis. The source code, installation instructions, and documentation of FastMLST are available at https://github.com/EnzoAndree/FastMLSTEnzo Guerrero-ArayaMarina MuñozCésar RodríguezDaniel Paredes-SabjaSAGE PublishingarticleBiology (General)QH301-705.5ENBioinformatics and Biology Insights, Vol 15 (2021)
institution DOAJ
collection DOAJ
language EN
topic Biology (General)
QH301-705.5
spellingShingle Biology (General)
QH301-705.5
Enzo Guerrero-Araya
Marina Muñoz
César Rodríguez
Daniel Paredes-Sabja
FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies
description Multilocus Sequence Typing (MLST) is a precise microbial typing approach at the intra-species level for epidemiologic and evolutionary purposes. It operates by assigning a sequence type (ST) identifier to each specimen, based on a combination of alleles of multiple housekeeping genes included in a defined scheme. The use of MLST has multiplied due to the availability of large numbers of genomic sequences and epidemiologic data in public repositories. However, data processing speed has become problematic due to the massive size of modern datasets. Here, we present FastMLST, a tool that is designed to perform PubMLST searches using BLASTn and a divide-and-conquer approach that processes each genome assembly in parallel. The output offered by FastMLST includes a table with the ST, allelic profile, and clonal complex or clade (when available), detected for a query, as well as a multi-FASTA file or a series of FASTA files with the concatenated or single allele sequences detected, respectively. FastMLST was validated with 91 different species, with a wide range of guanine-cytosine content (%GC), genome sizes, and fragmentation levels, and a speed test was performed on 3 datasets with varying genome sizes. Compared with other tools such as mlst, CGE/MLST, MLSTar, and PubMLST, FastMLST takes advantage of multiple processors to simultaneously type up to 28 000 genomes in less than 10 minutes, reducing processing times by at least 3-fold with 100% concordance to PubMLST, if contaminated genomes are excluded from the analysis. The source code, installation instructions, and documentation of FastMLST are available at https://github.com/EnzoAndree/FastMLST
format article
author Enzo Guerrero-Araya
Marina Muñoz
César Rodríguez
Daniel Paredes-Sabja
author_facet Enzo Guerrero-Araya
Marina Muñoz
César Rodríguez
Daniel Paredes-Sabja
author_sort Enzo Guerrero-Araya
title FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies
title_short FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies
title_full FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies
title_fullStr FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies
title_full_unstemmed FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies
title_sort fastmlst: a multi-core tool for multilocus sequence typing of draft genome assemblies
publisher SAGE Publishing
publishDate 2021
url https://doaj.org/article/44190e03916f474aad18fbbbf3b7c4f8
work_keys_str_mv AT enzoguerreroaraya fastmlstamulticoretoolformultilocussequencetypingofdraftgenomeassemblies
AT marinamunoz fastmlstamulticoretoolformultilocussequencetypingofdraftgenomeassemblies
AT cesarrodriguez fastmlstamulticoretoolformultilocussequencetypingofdraftgenomeassemblies
AT danielparedessabja fastmlstamulticoretoolformultilocussequencetypingofdraftgenomeassemblies
_version_ 1718406135330897920