Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data

Abstract Background SNP arrays, short- and long-read genome sequencing are genome-wide high-throughput technologies that may be used to assay copy number variants (CNVs) in a personal genome. Each of these technologies comes with its own limitations and biases, many of which are well-known, but not...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Ksenia Lavrichenko, Stefan Johansson, Inge Jonassen
Formato: article
Lenguaje:EN
Publicado: BMC 2021
Materias:
CNV
Acceso en línea:https://doaj.org/article/4f60f00ed0f7489ab7ca3dac99e3b042
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:4f60f00ed0f7489ab7ca3dac99e3b042
record_format dspace
spelling oai:doaj.org-article:4f60f00ed0f7489ab7ca3dac99e3b0422021-11-21T12:26:36ZComprehensive characterization of copy number variation (CNV) called from array, long- and short-read data10.1186/s12864-021-08082-31471-2164https://doaj.org/article/4f60f00ed0f7489ab7ca3dac99e3b0422021-11-01T00:00:00Zhttps://doi.org/10.1186/s12864-021-08082-3https://doaj.org/toc/1471-2164Abstract Background SNP arrays, short- and long-read genome sequencing are genome-wide high-throughput technologies that may be used to assay copy number variants (CNVs) in a personal genome. Each of these technologies comes with its own limitations and biases, many of which are well-known, but not all of them are thoroughly quantified. Results We assembled an ensemble of public datasets of published CNV calls and raw data for the well-studied Genome in a Bottle individual NA12878. This assembly represents a variety of methods and pipelines used for CNV calling from array, short- and long-read technologies. We then performed cross-technology comparisons regarding their ability to call CNVs. Different from other studies, we refrained from using the golden standard. Instead, we attempted to validate the CNV calls by the raw data of each technology. Conclusions Our study confirms that long-read platforms enable recalling CNVs in genomic regions inaccessible to arrays or short reads. We also found that the reproducibility of a CNV by different pipelines within each technology is strongly linked to other CNV evidence measures. Importantly, the three technologies show distinct public database frequency profiles, which differ depending on what technology the database was built on.Ksenia LavrichenkoStefan JohanssonInge JonassenBMCarticleCNVMicroarraysShort readsLong readsGenome in a BottleBiotechnologyTP248.13-248.65GeneticsQH426-470ENBMC Genomics, Vol 22, Iss 1, Pp 1-15 (2021)
institution DOAJ
collection DOAJ
language EN
topic CNV
Microarrays
Short reads
Long reads
Genome in a Bottle
Biotechnology
TP248.13-248.65
Genetics
QH426-470
spellingShingle CNV
Microarrays
Short reads
Long reads
Genome in a Bottle
Biotechnology
TP248.13-248.65
Genetics
QH426-470
Ksenia Lavrichenko
Stefan Johansson
Inge Jonassen
Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data
description Abstract Background SNP arrays, short- and long-read genome sequencing are genome-wide high-throughput technologies that may be used to assay copy number variants (CNVs) in a personal genome. Each of these technologies comes with its own limitations and biases, many of which are well-known, but not all of them are thoroughly quantified. Results We assembled an ensemble of public datasets of published CNV calls and raw data for the well-studied Genome in a Bottle individual NA12878. This assembly represents a variety of methods and pipelines used for CNV calling from array, short- and long-read technologies. We then performed cross-technology comparisons regarding their ability to call CNVs. Different from other studies, we refrained from using the golden standard. Instead, we attempted to validate the CNV calls by the raw data of each technology. Conclusions Our study confirms that long-read platforms enable recalling CNVs in genomic regions inaccessible to arrays or short reads. We also found that the reproducibility of a CNV by different pipelines within each technology is strongly linked to other CNV evidence measures. Importantly, the three technologies show distinct public database frequency profiles, which differ depending on what technology the database was built on.
format article
author Ksenia Lavrichenko
Stefan Johansson
Inge Jonassen
author_facet Ksenia Lavrichenko
Stefan Johansson
Inge Jonassen
author_sort Ksenia Lavrichenko
title Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data
title_short Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data
title_full Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data
title_fullStr Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data
title_full_unstemmed Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data
title_sort comprehensive characterization of copy number variation (cnv) called from array, long- and short-read data
publisher BMC
publishDate 2021
url https://doaj.org/article/4f60f00ed0f7489ab7ca3dac99e3b042
work_keys_str_mv AT ksenialavrichenko comprehensivecharacterizationofcopynumbervariationcnvcalledfromarraylongandshortreaddata
AT stefanjohansson comprehensivecharacterizationofcopynumbervariationcnvcalledfromarraylongandshortreaddata
AT ingejonassen comprehensivecharacterizationofcopynumbervariationcnvcalledfromarraylongandshortreaddata
_version_ 1718419005835837440