Clustering huge protein sequence sets in linear time
Billions of metagenomic and genomic sequences fill up public datasets, which makes similarity clustering an important and time-critical analysis step. Here, the authors develop Linclust, an algorithm with linear time complexity that can cluster over a billion sequences within hours on a single serve...
Guardado en:
Autores principales: | , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Nature Portfolio
2018
|
Materias: | |
Acceso en línea: | https://doaj.org/article/01cb78641dc94c18a3dea062537719c0 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:01cb78641dc94c18a3dea062537719c0 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:01cb78641dc94c18a3dea062537719c02021-12-02T14:39:22ZClustering huge protein sequence sets in linear time10.1038/s41467-018-04964-52041-1723https://doaj.org/article/01cb78641dc94c18a3dea062537719c02018-06-01T00:00:00Zhttps://doi.org/10.1038/s41467-018-04964-5https://doaj.org/toc/2041-1723Billions of metagenomic and genomic sequences fill up public datasets, which makes similarity clustering an important and time-critical analysis step. Here, the authors develop Linclust, an algorithm with linear time complexity that can cluster over a billion sequences within hours on a single server.Martin SteineggerJohannes SödingNature PortfolioarticleScienceQENNature Communications, Vol 9, Iss 1, Pp 1-8 (2018) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Science Q |
spellingShingle |
Science Q Martin Steinegger Johannes Söding Clustering huge protein sequence sets in linear time |
description |
Billions of metagenomic and genomic sequences fill up public datasets, which makes similarity clustering an important and time-critical analysis step. Here, the authors develop Linclust, an algorithm with linear time complexity that can cluster over a billion sequences within hours on a single server. |
format |
article |
author |
Martin Steinegger Johannes Söding |
author_facet |
Martin Steinegger Johannes Söding |
author_sort |
Martin Steinegger |
title |
Clustering huge protein sequence sets in linear time |
title_short |
Clustering huge protein sequence sets in linear time |
title_full |
Clustering huge protein sequence sets in linear time |
title_fullStr |
Clustering huge protein sequence sets in linear time |
title_full_unstemmed |
Clustering huge protein sequence sets in linear time |
title_sort |
clustering huge protein sequence sets in linear time |
publisher |
Nature Portfolio |
publishDate |
2018 |
url |
https://doaj.org/article/01cb78641dc94c18a3dea062537719c0 |
work_keys_str_mv |
AT martinsteinegger clusteringhugeproteinsequencesetsinlineartime AT johannessoding clusteringhugeproteinsequencesetsinlineartime |
_version_ |
1718390633850208256 |