A Bayesian framework to identify methylcytosines from high-throughput bisulfite sequencing data.

High-throughput bisulfite sequencing technologies have provided a comprehensive and well-fitted way to investigate DNA methylation at single-base resolution. However, there are substantial bioinformatic challenges to distinguish precisely methylcytosines from unconverted cytosines based on bisulfite...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Qing Xie, Qi Liu, Fengbiao Mao, Wanshi Cai, Honghu Wu, Mingcong You, Zhen Wang, Bingyu Chen, Zhong Sheng Sun, Jinyu Wu
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2014
Materias:
Acceso en línea:https://doaj.org/article/17f21706f5e54475b523a39cb8f4cb9c
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:17f21706f5e54475b523a39cb8f4cb9c
record_format dspace
spelling oai:doaj.org-article:17f21706f5e54475b523a39cb8f4cb9c2021-11-25T05:40:43ZA Bayesian framework to identify methylcytosines from high-throughput bisulfite sequencing data.1553-734X1553-735810.1371/journal.pcbi.1003853https://doaj.org/article/17f21706f5e54475b523a39cb8f4cb9c2014-09-01T00:00:00Zhttps://doi.org/10.1371/journal.pcbi.1003853https://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358High-throughput bisulfite sequencing technologies have provided a comprehensive and well-fitted way to investigate DNA methylation at single-base resolution. However, there are substantial bioinformatic challenges to distinguish precisely methylcytosines from unconverted cytosines based on bisulfite sequencing data. The challenges arise, at least in part, from cell heterozygosis caused by multicellular sequencing and the still limited number of statistical methods that are available for methylcytosine calling based on bisulfite sequencing data. Here, we present an algorithm, termed Bycom, a new Bayesian model that can perform methylcytosine calling with high accuracy. Bycom considers cell heterozygosis along with sequencing errors and bisulfite conversion efficiency to improve calling accuracy. Bycom performance was compared with the performance of Lister, the method most widely used to identify methylcytosines from bisulfite sequencing data. The results showed that the performance of Bycom was better than that of Lister for data with high methylation levels. Bycom also showed higher sensitivity and specificity for low methylation level samples (<1%) than Lister. A validation experiment based on reduced representation bisulfite sequencing data suggested that Bycom had a false positive rate of about 4% while maintaining an accuracy of close to 94%. This study demonstrated that Bycom had a low false calling rate at any methylation level and accurate methylcytosine calling at high methylation levels. Bycom will contribute significantly to studies aimed at recalibrating the methylation level of genomic regions based on the presence of methylcytosines.Qing XieQi LiuFengbiao MaoWanshi CaiHonghu WuMingcong YouZhen WangBingyu ChenZhong Sheng SunJinyu WuPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 10, Iss 9, p e1003853 (2014)
institution DOAJ
collection DOAJ
language EN
topic Biology (General)
QH301-705.5
spellingShingle Biology (General)
QH301-705.5
Qing Xie
Qi Liu
Fengbiao Mao
Wanshi Cai
Honghu Wu
Mingcong You
Zhen Wang
Bingyu Chen
Zhong Sheng Sun
Jinyu Wu
A Bayesian framework to identify methylcytosines from high-throughput bisulfite sequencing data.
description High-throughput bisulfite sequencing technologies have provided a comprehensive and well-fitted way to investigate DNA methylation at single-base resolution. However, there are substantial bioinformatic challenges to distinguish precisely methylcytosines from unconverted cytosines based on bisulfite sequencing data. The challenges arise, at least in part, from cell heterozygosis caused by multicellular sequencing and the still limited number of statistical methods that are available for methylcytosine calling based on bisulfite sequencing data. Here, we present an algorithm, termed Bycom, a new Bayesian model that can perform methylcytosine calling with high accuracy. Bycom considers cell heterozygosis along with sequencing errors and bisulfite conversion efficiency to improve calling accuracy. Bycom performance was compared with the performance of Lister, the method most widely used to identify methylcytosines from bisulfite sequencing data. The results showed that the performance of Bycom was better than that of Lister for data with high methylation levels. Bycom also showed higher sensitivity and specificity for low methylation level samples (<1%) than Lister. A validation experiment based on reduced representation bisulfite sequencing data suggested that Bycom had a false positive rate of about 4% while maintaining an accuracy of close to 94%. This study demonstrated that Bycom had a low false calling rate at any methylation level and accurate methylcytosine calling at high methylation levels. Bycom will contribute significantly to studies aimed at recalibrating the methylation level of genomic regions based on the presence of methylcytosines.
format article
author Qing Xie
Qi Liu
Fengbiao Mao
Wanshi Cai
Honghu Wu
Mingcong You
Zhen Wang
Bingyu Chen
Zhong Sheng Sun
Jinyu Wu
author_facet Qing Xie
Qi Liu
Fengbiao Mao
Wanshi Cai
Honghu Wu
Mingcong You
Zhen Wang
Bingyu Chen
Zhong Sheng Sun
Jinyu Wu
author_sort Qing Xie
title A Bayesian framework to identify methylcytosines from high-throughput bisulfite sequencing data.
title_short A Bayesian framework to identify methylcytosines from high-throughput bisulfite sequencing data.
title_full A Bayesian framework to identify methylcytosines from high-throughput bisulfite sequencing data.
title_fullStr A Bayesian framework to identify methylcytosines from high-throughput bisulfite sequencing data.
title_full_unstemmed A Bayesian framework to identify methylcytosines from high-throughput bisulfite sequencing data.
title_sort bayesian framework to identify methylcytosines from high-throughput bisulfite sequencing data.
publisher Public Library of Science (PLoS)
publishDate 2014
url https://doaj.org/article/17f21706f5e54475b523a39cb8f4cb9c
work_keys_str_mv AT qingxie abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT qiliu abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT fengbiaomao abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT wanshicai abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT honghuwu abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT mingcongyou abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT zhenwang abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT bingyuchen abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT zhongshengsun abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT jinyuwu abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT qingxie bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT qiliu bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT fengbiaomao bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT wanshicai bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT honghuwu bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT mingcongyou bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT zhenwang bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT bingyuchen bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT zhongshengsun bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT jinyuwu bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
_version_ 1718414502642319360