CNV-P: a machine-learning framework for predicting high confident copy number variations

Background Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs f...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Taifu Wang, Jinghua Sun, Xiuqing Zhang, Wen-Jing Wang, Qing Zhou
Formato: article
Lenguaje:EN
Publicado: PeerJ Inc. 2021
Materias:
R
Acceso en línea:https://doaj.org/article/ad731120167f4781abc75638c1c55012
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:ad731120167f4781abc75638c1c55012
record_format dspace
spelling oai:doaj.org-article:ad731120167f4781abc75638c1c550122021-12-04T15:05:07ZCNV-P: a machine-learning framework for predicting high confident copy number variations10.7717/peerj.125642167-8359https://doaj.org/article/ad731120167f4781abc75638c1c550122021-12-01T00:00:00Zhttps://peerj.com/articles/12564.pdfhttps://peerj.com/articles/12564/https://doaj.org/toc/2167-8359Background Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement. Methods Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier. Results The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing. Conclusions Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases.Taifu WangJinghua SunXiuqing ZhangWen-Jing WangQing ZhouPeerJ Inc.articleCopy number variantMachine learningGenome sequencingMedicineRENPeerJ, Vol 9, p e12564 (2021)
institution DOAJ
collection DOAJ
language EN
topic Copy number variant
Machine learning
Genome sequencing
Medicine
R
spellingShingle Copy number variant
Machine learning
Genome sequencing
Medicine
R
Taifu Wang
Jinghua Sun
Xiuqing Zhang
Wen-Jing Wang
Qing Zhou
CNV-P: a machine-learning framework for predicting high confident copy number variations
description Background Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement. Methods Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier. Results The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing. Conclusions Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases.
format article
author Taifu Wang
Jinghua Sun
Xiuqing Zhang
Wen-Jing Wang
Qing Zhou
author_facet Taifu Wang
Jinghua Sun
Xiuqing Zhang
Wen-Jing Wang
Qing Zhou
author_sort Taifu Wang
title CNV-P: a machine-learning framework for predicting high confident copy number variations
title_short CNV-P: a machine-learning framework for predicting high confident copy number variations
title_full CNV-P: a machine-learning framework for predicting high confident copy number variations
title_fullStr CNV-P: a machine-learning framework for predicting high confident copy number variations
title_full_unstemmed CNV-P: a machine-learning framework for predicting high confident copy number variations
title_sort cnv-p: a machine-learning framework for predicting high confident copy number variations
publisher PeerJ Inc.
publishDate 2021
url https://doaj.org/article/ad731120167f4781abc75638c1c55012
work_keys_str_mv AT taifuwang cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations
AT jinghuasun cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations
AT xiuqingzhang cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations
AT wenjingwang cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations
AT qingzhou cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations
_version_ 1718372815592226816