Clustering huge protein sequence sets in linear time

Billions of metagenomic and genomic sequences fill up public datasets, which makes similarity clustering an important and time-critical analysis step. Here, the authors develop Linclust, an algorithm with linear time complexity that can cluster over a billion sequences within hours on a single serve...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Martin Steinegger, Johannes Söding
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2018
Materias:
Q
Acceso en línea:https://doaj.org/article/01cb78641dc94c18a3dea062537719c0
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Sumario:Billions of metagenomic and genomic sequences fill up public datasets, which makes similarity clustering an important and time-critical analysis step. Here, the authors develop Linclust, an algorithm with linear time complexity that can cluster over a billion sequences within hours on a single server.