EvalDNA: a machine learning-based tool for the comprehensive evaluation of mammalian genome assembly quality
Abstract Background To select the most complete, continuous, and accurate assembly for an organism of interest, comprehensive quality assessment of assemblies is necessary. We present a novel tool, called Evaluation of De Novo Assemblies (EvalDNA), which uses supervised machine learning for the qual...
Guardado en:
Autores principales: | , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
BMC
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/9149ddb6466743b6a1005ecd98f92813 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:9149ddb6466743b6a1005ecd98f92813 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:9149ddb6466743b6a1005ecd98f928132021-11-28T12:11:15ZEvalDNA: a machine learning-based tool for the comprehensive evaluation of mammalian genome assembly quality10.1186/s12859-021-04480-21471-2105https://doaj.org/article/9149ddb6466743b6a1005ecd98f928132021-11-01T00:00:00Zhttps://doi.org/10.1186/s12859-021-04480-2https://doaj.org/toc/1471-2105Abstract Background To select the most complete, continuous, and accurate assembly for an organism of interest, comprehensive quality assessment of assemblies is necessary. We present a novel tool, called Evaluation of De Novo Assemblies (EvalDNA), which uses supervised machine learning for the quality scoring of genome assemblies and does not require an existing reference genome for accuracy assessment. Results EvalDNA calculates a list of quality metrics from an assembled sequence and applies a model created from supervised machine learning methods to integrate various metrics into a comprehensive quality score. A well-tested, accurate model for scoring mammalian genome sequences is provided as part of EvalDNA. This random forest regression model evaluates an assembled sequence based on continuity, completeness, and accuracy, and was able to explain 86% of the variation in reference-based quality scores within the testing data. EvalDNA was applied to human chromosome 14 assemblies from the GAGE study to rank genome assemblers and to compare EvalDNA to two other quality evaluation tools. In addition, EvalDNA was used to evaluate several genome assemblies of the Chinese hamster genome to help establish a better reference genome for the biopharmaceutical manufacturing community. EvalDNA was also used to assess more recent human assemblies from the QUAST-LG study completed in 2018, and its ability to score bacterial genomes was examined through application on bacterial assemblies from the GAGE-B study. Conclusions EvalDNA enables scientists to easily identify the best available genome assembly for their organism of interest without requiring a reference assembly. EvalDNA sets itself apart from other quality assessment tools by producing a quality score that enables direct comparison among assemblies from different species.Madolyn L. MacDonaldKelvin H. LeeBMCarticleGenome assembly qualityGenome assemblyMachine learningChinese hamsterCHO cellsComputer applications to medicine. Medical informaticsR858-859.7Biology (General)QH301-705.5ENBMC Bioinformatics, Vol 22, Iss 1, Pp 1-26 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Genome assembly quality Genome assembly Machine learning Chinese hamster CHO cells Computer applications to medicine. Medical informatics R858-859.7 Biology (General) QH301-705.5 |
spellingShingle |
Genome assembly quality Genome assembly Machine learning Chinese hamster CHO cells Computer applications to medicine. Medical informatics R858-859.7 Biology (General) QH301-705.5 Madolyn L. MacDonald Kelvin H. Lee EvalDNA: a machine learning-based tool for the comprehensive evaluation of mammalian genome assembly quality |
description |
Abstract Background To select the most complete, continuous, and accurate assembly for an organism of interest, comprehensive quality assessment of assemblies is necessary. We present a novel tool, called Evaluation of De Novo Assemblies (EvalDNA), which uses supervised machine learning for the quality scoring of genome assemblies and does not require an existing reference genome for accuracy assessment. Results EvalDNA calculates a list of quality metrics from an assembled sequence and applies a model created from supervised machine learning methods to integrate various metrics into a comprehensive quality score. A well-tested, accurate model for scoring mammalian genome sequences is provided as part of EvalDNA. This random forest regression model evaluates an assembled sequence based on continuity, completeness, and accuracy, and was able to explain 86% of the variation in reference-based quality scores within the testing data. EvalDNA was applied to human chromosome 14 assemblies from the GAGE study to rank genome assemblers and to compare EvalDNA to two other quality evaluation tools. In addition, EvalDNA was used to evaluate several genome assemblies of the Chinese hamster genome to help establish a better reference genome for the biopharmaceutical manufacturing community. EvalDNA was also used to assess more recent human assemblies from the QUAST-LG study completed in 2018, and its ability to score bacterial genomes was examined through application on bacterial assemblies from the GAGE-B study. Conclusions EvalDNA enables scientists to easily identify the best available genome assembly for their organism of interest without requiring a reference assembly. EvalDNA sets itself apart from other quality assessment tools by producing a quality score that enables direct comparison among assemblies from different species. |
format |
article |
author |
Madolyn L. MacDonald Kelvin H. Lee |
author_facet |
Madolyn L. MacDonald Kelvin H. Lee |
author_sort |
Madolyn L. MacDonald |
title |
EvalDNA: a machine learning-based tool for the comprehensive evaluation of mammalian genome assembly quality |
title_short |
EvalDNA: a machine learning-based tool for the comprehensive evaluation of mammalian genome assembly quality |
title_full |
EvalDNA: a machine learning-based tool for the comprehensive evaluation of mammalian genome assembly quality |
title_fullStr |
EvalDNA: a machine learning-based tool for the comprehensive evaluation of mammalian genome assembly quality |
title_full_unstemmed |
EvalDNA: a machine learning-based tool for the comprehensive evaluation of mammalian genome assembly quality |
title_sort |
evaldna: a machine learning-based tool for the comprehensive evaluation of mammalian genome assembly quality |
publisher |
BMC |
publishDate |
2021 |
url |
https://doaj.org/article/9149ddb6466743b6a1005ecd98f92813 |
work_keys_str_mv |
AT madolynlmacdonald evaldnaamachinelearningbasedtoolforthecomprehensiveevaluationofmammaliangenomeassemblyquality AT kelvinhlee evaldnaamachinelearningbasedtoolforthecomprehensiveevaluationofmammaliangenomeassemblyquality |
_version_ |
1718408154088210432 |