Clustering huge protein sequence sets in linear time

Billions of metagenomic and genomic sequences fill up public datasets, which makes similarity clustering an important and time-critical analysis step. Here, the authors develop Linclust, an algorithm with linear time complexity that can cluster over a billion sequences within hours on a single serve...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Martin Steinegger, Johannes Söding
Formato:	article
Lenguaje:	EN
Publicado:	Nature Portfolio 2018
Materias:	Science Q
Acceso en línea:	https://doaj.org/article/01cb78641dc94c18a3dea062537719c0
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Descripción
Sumario:	Billions of metagenomic and genomic sequences fill up public datasets, which makes similarity clustering an important and time-critical analysis step. Here, the authors develop Linclust, an algorithm with linear time complexity that can cluster over a billion sequences within hours on a single server.

Clustering huge protein sequence sets in linear time

Ejemplares similares