Open-Source Sequence Clustering Methods Improve the State Of the Art

ABSTRACT Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-th...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Evguenia Kopylova, Jose A. Navas-Molina, Céline Mercier, Zhenjiang Zech Xu, Frédéric Mahé, Yan He, Hong-Wei Zhou, Torbjørn Rognes, J. Gregory Caporaso, Rob Knight
Formato:	article
Lenguaje:	EN
Publicado:	American Society for Microbiology 2016
Materias:	sequence clustering operational taxonomic units microbial community analysis amplicon sequencing Microbiology QR1-502
Acceso en línea:	https://doaj.org/article/75b0acec23d8460984a172ced1fd9c07
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:75b0acec23d8460984a172ced1fd9c07
record_format	dspace
spelling	oai:doaj.org-article:75b0acec23d8460984a172ced1fd9c072021-12-02T19:45:29ZOpen-Source Sequence Clustering Methods Improve the State Of the Art10.1128/mSystems.00003-152379-5077https://doaj.org/article/75b0acec23d8460984a172ced1fd9c072016-02-01T00:00:00Zhttps://journals.asm.org/doi/10.1128/mSystems.00003-15https://doaj.org/toc/2379-5077ABSTRACT Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH’s most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release. IMPORTANCE Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014, http://dx.doi.org/10.1186/s12915-014-0069-1 ).Evguenia KopylovaJose A. Navas-MolinaCéline MercierZhenjiang Zech XuFrédéric MahéYan HeHong-Wei ZhouTorbjørn RognesJ. Gregory CaporasoRob KnightAmerican Society for Microbiologyarticlesequence clusteringoperational taxonomic unitsmicrobial community analysisamplicon sequencingMicrobiologyQR1-502ENmSystems, Vol 1, Iss 1 (2016)
institution	DOAJ
collection	DOAJ
language	EN
topic	sequence clustering operational taxonomic units microbial community analysis amplicon sequencing Microbiology QR1-502
spellingShingle	sequence clustering operational taxonomic units microbial community analysis amplicon sequencing Microbiology QR1-502 Evguenia Kopylova Jose A. Navas-Molina Céline Mercier Zhenjiang Zech Xu Frédéric Mahé Yan He Hong-Wei Zhou Torbjørn Rognes J. Gregory Caporaso Rob Knight Open-Source Sequence Clustering Methods Improve the State Of the Art
description	ABSTRACT Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH’s most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release. IMPORTANCE Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014, http://dx.doi.org/10.1186/s12915-014-0069-1 ).
format	article
author	Evguenia Kopylova Jose A. Navas-Molina Céline Mercier Zhenjiang Zech Xu Frédéric Mahé Yan He Hong-Wei Zhou Torbjørn Rognes J. Gregory Caporaso Rob Knight
author_facet	Evguenia Kopylova Jose A. Navas-Molina Céline Mercier Zhenjiang Zech Xu Frédéric Mahé Yan He Hong-Wei Zhou Torbjørn Rognes J. Gregory Caporaso Rob Knight
author_sort	Evguenia Kopylova
title	Open-Source Sequence Clustering Methods Improve the State Of the Art
title_short	Open-Source Sequence Clustering Methods Improve the State Of the Art
title_full	Open-Source Sequence Clustering Methods Improve the State Of the Art
title_fullStr	Open-Source Sequence Clustering Methods Improve the State Of the Art
title_full_unstemmed	Open-Source Sequence Clustering Methods Improve the State Of the Art
title_sort	open-source sequence clustering methods improve the state of the art
publisher	American Society for Microbiology
publishDate	2016
url	https://doaj.org/article/75b0acec23d8460984a172ced1fd9c07
work_keys_str_mv	AT evgueniakopylova opensourcesequenceclusteringmethodsimprovethestateoftheart AT joseanavasmolina opensourcesequenceclusteringmethodsimprovethestateoftheart AT celinemercier opensourcesequenceclusteringmethodsimprovethestateoftheart AT zhenjiangzechxu opensourcesequenceclusteringmethodsimprovethestateoftheart AT fredericmahe opensourcesequenceclusteringmethodsimprovethestateoftheart AT yanhe opensourcesequenceclusteringmethodsimprovethestateoftheart AT hongweizhou opensourcesequenceclusteringmethodsimprovethestateoftheart AT torbjørnrognes opensourcesequenceclusteringmethodsimprovethestateoftheart AT jgregorycaporaso opensourcesequenceclusteringmethodsimprovethestateoftheart AT robknight opensourcesequenceclusteringmethodsimprovethestateoftheart
_version_	1718376032660094976

Open-Source Sequence Clustering Methods Improve the State Of the Art

Ejemplares similares