Clustering huge protein sequence sets in linear time

Billions of metagenomic and genomic sequences fill up public datasets, which makes similarity clustering an important and time-critical analysis step. Here, the authors develop Linclust, an algorithm with linear time complexity that can cluster over a billion sequences within hours on a single serve...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Martin Steinegger, Johannes Söding
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2018
Materias:
Q
Acceso en línea:https://doaj.org/article/01cb78641dc94c18a3dea062537719c0
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:01cb78641dc94c18a3dea062537719c0
record_format dspace
spelling oai:doaj.org-article:01cb78641dc94c18a3dea062537719c02021-12-02T14:39:22ZClustering huge protein sequence sets in linear time10.1038/s41467-018-04964-52041-1723https://doaj.org/article/01cb78641dc94c18a3dea062537719c02018-06-01T00:00:00Zhttps://doi.org/10.1038/s41467-018-04964-5https://doaj.org/toc/2041-1723Billions of metagenomic and genomic sequences fill up public datasets, which makes similarity clustering an important and time-critical analysis step. Here, the authors develop Linclust, an algorithm with linear time complexity that can cluster over a billion sequences within hours on a single server.Martin SteineggerJohannes SödingNature PortfolioarticleScienceQENNature Communications, Vol 9, Iss 1, Pp 1-8 (2018)
institution DOAJ
collection DOAJ
language EN
topic Science
Q
spellingShingle Science
Q
Martin Steinegger
Johannes Söding
Clustering huge protein sequence sets in linear time
description Billions of metagenomic and genomic sequences fill up public datasets, which makes similarity clustering an important and time-critical analysis step. Here, the authors develop Linclust, an algorithm with linear time complexity that can cluster over a billion sequences within hours on a single server.
format article
author Martin Steinegger
Johannes Söding
author_facet Martin Steinegger
Johannes Söding
author_sort Martin Steinegger
title Clustering huge protein sequence sets in linear time
title_short Clustering huge protein sequence sets in linear time
title_full Clustering huge protein sequence sets in linear time
title_fullStr Clustering huge protein sequence sets in linear time
title_full_unstemmed Clustering huge protein sequence sets in linear time
title_sort clustering huge protein sequence sets in linear time
publisher Nature Portfolio
publishDate 2018
url https://doaj.org/article/01cb78641dc94c18a3dea062537719c0
work_keys_str_mv AT martinsteinegger clusteringhugeproteinsequencesetsinlineartime
AT johannessoding clusteringhugeproteinsequencesetsinlineartime
_version_ 1718390633850208256