Benchmarking germline CNV calling tools from exome sequencing data

Abstract Whole-exome sequencing is an attractive alternative to microarray analysis because of the low cost and potential ability to detect copy number variations (CNV) of various sizes (from 1–2 exons to several Mb). Previous comparison of the most popular CNV calling tools showed a high portion of...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Veronika Gordeeva, Elena Sharova, Konstantin Babalyan, Rinat Sultanov, Vadim M. Govorun, Georgij Arapidi
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/74053e2546e143b9894809dc6ad427ac
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:74053e2546e143b9894809dc6ad427ac
record_format dspace
spelling oai:doaj.org-article:74053e2546e143b9894809dc6ad427ac2021-12-02T18:30:39ZBenchmarking germline CNV calling tools from exome sequencing data10.1038/s41598-021-93878-22045-2322https://doaj.org/article/74053e2546e143b9894809dc6ad427ac2021-07-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-93878-2https://doaj.org/toc/2045-2322Abstract Whole-exome sequencing is an attractive alternative to microarray analysis because of the low cost and potential ability to detect copy number variations (CNV) of various sizes (from 1–2 exons to several Mb). Previous comparison of the most popular CNV calling tools showed a high portion of false-positive calls. Moreover, due to a lack of a gold standard CNV set, the results are limited and incomparable. Here, we aimed to perform a comprehensive analysis of tools capable of germline CNV calling available at the moment using a single CNV standard and reference sample set. Compiling variants from previous studies with Bayesian estimation approach, we constructed an internal standard for NA12878 sample (pilot National Institute of Standards and Technology Reference Material) including 110,050 CNV or non-CNV exons. The standard was used to evaluate the performance of 16 germline CNV calling tools on the NA12878 sample and 10 correlated exomes as a reference set with respect to length distribution, concordance, and efficiency. Each algorithm had a certain range of detected lengths and showed low concordance with other tools. Most tools are focused on detection of a limited number of CNVs one to seven exons long with a false-positive rate below 50%. EXCAVATOR2, exomeCopy, and FishingCNV focused on detection of a wide range of variations but showed low precision. Upon unified comparison, the tools were not equivalent. The analysis performed allows choosing algorithms or ensembles of algorithms most suitable for a specific goal, e.g. population studies or medical genetics.Veronika GordeevaElena SharovaKonstantin BabalyanRinat SultanovVadim M. GovorunGeorgij ArapidiNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-11 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Veronika Gordeeva
Elena Sharova
Konstantin Babalyan
Rinat Sultanov
Vadim M. Govorun
Georgij Arapidi
Benchmarking germline CNV calling tools from exome sequencing data
description Abstract Whole-exome sequencing is an attractive alternative to microarray analysis because of the low cost and potential ability to detect copy number variations (CNV) of various sizes (from 1–2 exons to several Mb). Previous comparison of the most popular CNV calling tools showed a high portion of false-positive calls. Moreover, due to a lack of a gold standard CNV set, the results are limited and incomparable. Here, we aimed to perform a comprehensive analysis of tools capable of germline CNV calling available at the moment using a single CNV standard and reference sample set. Compiling variants from previous studies with Bayesian estimation approach, we constructed an internal standard for NA12878 sample (pilot National Institute of Standards and Technology Reference Material) including 110,050 CNV or non-CNV exons. The standard was used to evaluate the performance of 16 germline CNV calling tools on the NA12878 sample and 10 correlated exomes as a reference set with respect to length distribution, concordance, and efficiency. Each algorithm had a certain range of detected lengths and showed low concordance with other tools. Most tools are focused on detection of a limited number of CNVs one to seven exons long with a false-positive rate below 50%. EXCAVATOR2, exomeCopy, and FishingCNV focused on detection of a wide range of variations but showed low precision. Upon unified comparison, the tools were not equivalent. The analysis performed allows choosing algorithms or ensembles of algorithms most suitable for a specific goal, e.g. population studies or medical genetics.
format article
author Veronika Gordeeva
Elena Sharova
Konstantin Babalyan
Rinat Sultanov
Vadim M. Govorun
Georgij Arapidi
author_facet Veronika Gordeeva
Elena Sharova
Konstantin Babalyan
Rinat Sultanov
Vadim M. Govorun
Georgij Arapidi
author_sort Veronika Gordeeva
title Benchmarking germline CNV calling tools from exome sequencing data
title_short Benchmarking germline CNV calling tools from exome sequencing data
title_full Benchmarking germline CNV calling tools from exome sequencing data
title_fullStr Benchmarking germline CNV calling tools from exome sequencing data
title_full_unstemmed Benchmarking germline CNV calling tools from exome sequencing data
title_sort benchmarking germline cnv calling tools from exome sequencing data
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/74053e2546e143b9894809dc6ad427ac
work_keys_str_mv AT veronikagordeeva benchmarkinggermlinecnvcallingtoolsfromexomesequencingdata
AT elenasharova benchmarkinggermlinecnvcallingtoolsfromexomesequencingdata
AT konstantinbabalyan benchmarkinggermlinecnvcallingtoolsfromexomesequencingdata
AT rinatsultanov benchmarkinggermlinecnvcallingtoolsfromexomesequencingdata
AT vadimmgovorun benchmarkinggermlinecnvcallingtoolsfromexomesequencingdata
AT georgijarapidi benchmarkinggermlinecnvcallingtoolsfromexomesequencingdata
_version_ 1718377985063518208