CNV-P: a machine-learning framework for predicting high confident copy number variations
Background Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs f...
Guardado en:
Autores principales: | , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
PeerJ Inc.
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/ad731120167f4781abc75638c1c55012 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:ad731120167f4781abc75638c1c55012 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:ad731120167f4781abc75638c1c550122021-12-04T15:05:07ZCNV-P: a machine-learning framework for predicting high confident copy number variations10.7717/peerj.125642167-8359https://doaj.org/article/ad731120167f4781abc75638c1c550122021-12-01T00:00:00Zhttps://peerj.com/articles/12564.pdfhttps://peerj.com/articles/12564/https://doaj.org/toc/2167-8359Background Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement. Methods Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier. Results The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing. Conclusions Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases.Taifu WangJinghua SunXiuqing ZhangWen-Jing WangQing ZhouPeerJ Inc.articleCopy number variantMachine learningGenome sequencingMedicineRENPeerJ, Vol 9, p e12564 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Copy number variant Machine learning Genome sequencing Medicine R |
spellingShingle |
Copy number variant Machine learning Genome sequencing Medicine R Taifu Wang Jinghua Sun Xiuqing Zhang Wen-Jing Wang Qing Zhou CNV-P: a machine-learning framework for predicting high confident copy number variations |
description |
Background Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement. Methods Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier. Results The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing. Conclusions Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases. |
format |
article |
author |
Taifu Wang Jinghua Sun Xiuqing Zhang Wen-Jing Wang Qing Zhou |
author_facet |
Taifu Wang Jinghua Sun Xiuqing Zhang Wen-Jing Wang Qing Zhou |
author_sort |
Taifu Wang |
title |
CNV-P: a machine-learning framework for predicting high confident copy number variations |
title_short |
CNV-P: a machine-learning framework for predicting high confident copy number variations |
title_full |
CNV-P: a machine-learning framework for predicting high confident copy number variations |
title_fullStr |
CNV-P: a machine-learning framework for predicting high confident copy number variations |
title_full_unstemmed |
CNV-P: a machine-learning framework for predicting high confident copy number variations |
title_sort |
cnv-p: a machine-learning framework for predicting high confident copy number variations |
publisher |
PeerJ Inc. |
publishDate |
2021 |
url |
https://doaj.org/article/ad731120167f4781abc75638c1c55012 |
work_keys_str_mv |
AT taifuwang cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations AT jinghuasun cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations AT xiuqingzhang cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations AT wenjingwang cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations AT qingzhou cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations |
_version_ |
1718372815592226816 |