An Empirical Comparison of EM and K-means Algorithms for Binning Metagenomics Datasets

ABSTRACT: Metagenomics is an area of microbiology that deals with the taxonomic classification of genomic samples taken directly from the environment. These samples are sequences of variable length and they may correspond to different species, some of which may be unknown or not previously stored in...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Tapia Reyes,Patricio, Meneses Villegas,Claudio
Lenguaje:	English
Publicado:	Universidad de Tarapacá. 2018
Materias:	Binning metagenomics analysis unsupervised learning clustering
Acceso en línea:	http://www.scielo.cl/scielo.php?script=sci_arttext&pid=S0718-33052018000500020
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:scielo:S0718-33052018000500020
record_format	dspace
spelling	oai:scielo:S0718-330520180005000202018-12-10An Empirical Comparison of EM and K-means Algorithms for Binning Metagenomics DatasetsTapia Reyes,PatricioMeneses Villegas,Claudio Binning metagenomics analysis unsupervised learning clustering ABSTRACT: Metagenomics is an area of microbiology that deals with the taxonomic classification of genomic samples taken directly from the environment. These samples are sequences of variable length and they may correspond to different species, some of which may be unknown or not previously stored in a genomic database. One of the main steps in metagenomics classification correspond to binning the sequence fragments into groups that may correspond to one species. Many approaches are used to perform binning, mainly machine learning algorithms to perform classification or clustering. This paper presents the results of an empirical evaluation of two well-known unsupervised algorithms to perform the metagenomics binning task: the EM versus the K-means algorithms. Both algorithms are tested on short and long reads of synthetic datasets, with different proportions and number of species. These empirical results show that K-means in general outperforms the EM algorithm, but EM results competitive in several of the short reads datasets used for evaluation.info:eu-repo/semantics/openAccessUniversidad de Tarapacá.Ingeniare. Revista chilena de ingeniería v.26 suppl.1 20182018-11-01text/htmlhttp://www.scielo.cl/scielo.php?script=sci_arttext&pid=S0718-33052018000500020en10.4067/S0718-33052018000500020
institution	Scielo Chile
collection	Scielo Chile
language	English
topic	Binning metagenomics analysis unsupervised learning clustering
spellingShingle	Binning metagenomics analysis unsupervised learning clustering Tapia Reyes,Patricio Meneses Villegas,Claudio An Empirical Comparison of EM and K-means Algorithms for Binning Metagenomics Datasets
description	ABSTRACT: Metagenomics is an area of microbiology that deals with the taxonomic classification of genomic samples taken directly from the environment. These samples are sequences of variable length and they may correspond to different species, some of which may be unknown or not previously stored in a genomic database. One of the main steps in metagenomics classification correspond to binning the sequence fragments into groups that may correspond to one species. Many approaches are used to perform binning, mainly machine learning algorithms to perform classification or clustering. This paper presents the results of an empirical evaluation of two well-known unsupervised algorithms to perform the metagenomics binning task: the EM versus the K-means algorithms. Both algorithms are tested on short and long reads of synthetic datasets, with different proportions and number of species. These empirical results show that K-means in general outperforms the EM algorithm, but EM results competitive in several of the short reads datasets used for evaluation.
author	Tapia Reyes,Patricio Meneses Villegas,Claudio
author_facet	Tapia Reyes,Patricio Meneses Villegas,Claudio
author_sort	Tapia Reyes,Patricio
title	An Empirical Comparison of EM and K-means Algorithms for Binning Metagenomics Datasets
title_short	An Empirical Comparison of EM and K-means Algorithms for Binning Metagenomics Datasets
title_full	An Empirical Comparison of EM and K-means Algorithms for Binning Metagenomics Datasets
title_fullStr	An Empirical Comparison of EM and K-means Algorithms for Binning Metagenomics Datasets
title_full_unstemmed	An Empirical Comparison of EM and K-means Algorithms for Binning Metagenomics Datasets
title_sort	empirical comparison of em and k-means algorithms for binning metagenomics datasets
publisher	Universidad de Tarapacá.
publishDate	2018
url	http://www.scielo.cl/scielo.php?script=sci_arttext&pid=S0718-33052018000500020
work_keys_str_mv	AT tapiareyespatricio anempiricalcomparisonofemandkmeansalgorithmsforbinningmetagenomicsdatasets AT menesesvillegasclaudio anempiricalcomparisonofemandkmeansalgorithmsforbinningmetagenomicsdatasets AT tapiareyespatricio empiricalcomparisonofemandkmeansalgorithmsforbinningmetagenomicsdatasets AT menesesvillegasclaudio empiricalcomparisonofemandkmeansalgorithmsforbinningmetagenomicsdatasets
_version_	1714203464005844992

An Empirical Comparison of EM and K-means Algorithms for Binning Metagenomics Datasets

Ejemplares similares