OptiClust, an Improved Method for Assigning Amplicon-Based Sequence Data to Operational Taxonomic Units

ABSTRACT Assignment of 16S rRNA gene sequences to operational taxonomic units (OTUs) is a computational bottleneck in the process of analyzing microbial communities. Although this has been an active area of research, it has been difficult to overcome the time and memory demands while improving the q...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Sarah L. Westcott, Patrick D. Schloss
Formato: article
Lenguaje:EN
Publicado: American Society for Microbiology 2017
Materias:
Acceso en línea:https://doaj.org/article/4e82c537decc40d890eac4e2fbeafad9
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:4e82c537decc40d890eac4e2fbeafad9
record_format dspace
spelling oai:doaj.org-article:4e82c537decc40d890eac4e2fbeafad92021-11-15T15:21:46ZOptiClust, an Improved Method for Assigning Amplicon-Based Sequence Data to Operational Taxonomic Units10.1128/mSphereDirect.00073-172379-5042https://doaj.org/article/4e82c537decc40d890eac4e2fbeafad92017-04-01T00:00:00Zhttps://journals.asm.org/doi/10.1128/mSphereDirect.00073-17https://doaj.org/toc/2379-5042ABSTRACT Assignment of 16S rRNA gene sequences to operational taxonomic units (OTUs) is a computational bottleneck in the process of analyzing microbial communities. Although this has been an active area of research, it has been difficult to overcome the time and memory demands while improving the quality of the OTU assignments. Here, we developed a new OTU assignment algorithm that iteratively reassigns sequences to new OTUs to optimize the Matthews correlation coefficient (MCC), a measure of the quality of OTU assignments. To assess the new algorithm, OptiClust, we compared it to 10 other algorithms using 16S rRNA gene sequences from two simulated and four natural communities. Using the OptiClust algorithm, the MCC values averaged 15.2 and 16.5% higher than the OTUs generated when we used the average neighbor and distance-based greedy clustering with VSEARCH, respectively. Furthermore, on average, OptiClust was 94.6 times faster than the average neighbor algorithm and just as fast as distance-based greedy clustering with VSEARCH. An empirical analysis of the efficiency of the algorithms showed that the time and memory required to perform the algorithm scaled quadratically with the number of unique sequences in the data set. The significant improvement in the quality of the OTU assignments over previously existing methods will significantly enhance downstream analysis by limiting the splitting of similar sequences into separate OTUs and merging of dissimilar sequences into the same OTU. The development of the OptiClust algorithm represents a significant advance that is likely to have numerous other applications. IMPORTANCE The analysis of microbial communities from diverse environments using 16S rRNA gene sequencing has expanded our knowledge of the biogeography of microorganisms. An important step in this analysis is the assignment of sequences into taxonomic groups based on their similarity to sequences in a database or based on their similarity to each other, irrespective of a database. In this study, we present a new algorithm for the latter approach. The algorithm, OptiClust, seeks to optimize a metric of assignment quality by shuffling sequences between taxonomic groups. We found that OptiClust produces more robust assignments and does so in a rapid and memory-efficient manner. This advance will allow for a more robust analysis of microbial communities and the factors that shape them. Podcast: A podcast concerning this article is available.Sarah L. WestcottPatrick D. SchlossAmerican Society for Microbiologyarticle16S rRNA genebioinformaticsmicrobial ecologymicrobiomeMicrobiologyQR1-502ENmSphere, Vol 2, Iss 2 (2017)
institution DOAJ
collection DOAJ
language EN
topic 16S rRNA gene
bioinformatics
microbial ecology
microbiome
Microbiology
QR1-502
spellingShingle 16S rRNA gene
bioinformatics
microbial ecology
microbiome
Microbiology
QR1-502
Sarah L. Westcott
Patrick D. Schloss
OptiClust, an Improved Method for Assigning Amplicon-Based Sequence Data to Operational Taxonomic Units
description ABSTRACT Assignment of 16S rRNA gene sequences to operational taxonomic units (OTUs) is a computational bottleneck in the process of analyzing microbial communities. Although this has been an active area of research, it has been difficult to overcome the time and memory demands while improving the quality of the OTU assignments. Here, we developed a new OTU assignment algorithm that iteratively reassigns sequences to new OTUs to optimize the Matthews correlation coefficient (MCC), a measure of the quality of OTU assignments. To assess the new algorithm, OptiClust, we compared it to 10 other algorithms using 16S rRNA gene sequences from two simulated and four natural communities. Using the OptiClust algorithm, the MCC values averaged 15.2 and 16.5% higher than the OTUs generated when we used the average neighbor and distance-based greedy clustering with VSEARCH, respectively. Furthermore, on average, OptiClust was 94.6 times faster than the average neighbor algorithm and just as fast as distance-based greedy clustering with VSEARCH. An empirical analysis of the efficiency of the algorithms showed that the time and memory required to perform the algorithm scaled quadratically with the number of unique sequences in the data set. The significant improvement in the quality of the OTU assignments over previously existing methods will significantly enhance downstream analysis by limiting the splitting of similar sequences into separate OTUs and merging of dissimilar sequences into the same OTU. The development of the OptiClust algorithm represents a significant advance that is likely to have numerous other applications. IMPORTANCE The analysis of microbial communities from diverse environments using 16S rRNA gene sequencing has expanded our knowledge of the biogeography of microorganisms. An important step in this analysis is the assignment of sequences into taxonomic groups based on their similarity to sequences in a database or based on their similarity to each other, irrespective of a database. In this study, we present a new algorithm for the latter approach. The algorithm, OptiClust, seeks to optimize a metric of assignment quality by shuffling sequences between taxonomic groups. We found that OptiClust produces more robust assignments and does so in a rapid and memory-efficient manner. This advance will allow for a more robust analysis of microbial communities and the factors that shape them. Podcast: A podcast concerning this article is available.
format article
author Sarah L. Westcott
Patrick D. Schloss
author_facet Sarah L. Westcott
Patrick D. Schloss
author_sort Sarah L. Westcott
title OptiClust, an Improved Method for Assigning Amplicon-Based Sequence Data to Operational Taxonomic Units
title_short OptiClust, an Improved Method for Assigning Amplicon-Based Sequence Data to Operational Taxonomic Units
title_full OptiClust, an Improved Method for Assigning Amplicon-Based Sequence Data to Operational Taxonomic Units
title_fullStr OptiClust, an Improved Method for Assigning Amplicon-Based Sequence Data to Operational Taxonomic Units
title_full_unstemmed OptiClust, an Improved Method for Assigning Amplicon-Based Sequence Data to Operational Taxonomic Units
title_sort opticlust, an improved method for assigning amplicon-based sequence data to operational taxonomic units
publisher American Society for Microbiology
publishDate 2017
url https://doaj.org/article/4e82c537decc40d890eac4e2fbeafad9
work_keys_str_mv AT sarahlwestcott opticlustanimprovedmethodforassigningampliconbasedsequencedatatooperationaltaxonomicunits
AT patrickdschloss opticlustanimprovedmethodforassigningampliconbasedsequencedatatooperationaltaxonomicunits
_version_ 1718428126591057920