Improved prediction of smoking status via isoform-aware RNA-seq deep learning models.

Most predictive models based on gene expression data do not leverage information related to gene splicing, despite the fact that splicing is a fundamental feature of eukaryotic gene expression. Cigarette smoking is an important environmental risk factor for many diseases, and it has profound effects...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Zifeng Wang, Aria Masoomi, Zhonghui Xu, Adel Boueiz, Sool Lee, Tingting Zhao, Russell Bowler, Michael Cho, Edwin K Silverman, Craig Hersh, Jennifer Dy, Peter J Castaldi
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2021
Materias:
Acceso en línea:https://doaj.org/article/dad7f97d944747acba04531e3bfd1c45
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:dad7f97d944747acba04531e3bfd1c45
record_format dspace
spelling oai:doaj.org-article:dad7f97d944747acba04531e3bfd1c452021-12-02T19:57:42ZImproved prediction of smoking status via isoform-aware RNA-seq deep learning models.1553-734X1553-735810.1371/journal.pcbi.1009433https://doaj.org/article/dad7f97d944747acba04531e3bfd1c452021-10-01T00:00:00Zhttps://doi.org/10.1371/journal.pcbi.1009433https://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358Most predictive models based on gene expression data do not leverage information related to gene splicing, despite the fact that splicing is a fundamental feature of eukaryotic gene expression. Cigarette smoking is an important environmental risk factor for many diseases, and it has profound effects on gene expression. Using smoking status as a prediction target, we developed deep neural network predictive models using gene, exon, and isoform level quantifications from RNA sequencing data in 2,557 subjects in the COPDGene Study. We observed that models using exon and isoform quantifications clearly outperformed gene-level models when using data from 5 genes from a previously published prediction model. Whereas the test set performance of the previously published model was 0.82 in the original publication, our exon-based models including an exon-to-isoform mapping layer achieved a test set AUC (area under the receiver operating characteristic) of 0.88, which improved to an AUC of 0.94 using exon quantifications from a larger set of genes. Isoform variability is an important source of latent information in RNA-seq data that can be used to improve clinical prediction models.Zifeng WangAria MasoomiZhonghui XuAdel BoueizSool LeeTingting ZhaoRussell BowlerMichael ChoEdwin K SilvermanCraig HershJennifer DyPeter J CastaldiPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 17, Iss 10, p e1009433 (2021)
institution DOAJ
collection DOAJ
language EN
topic Biology (General)
QH301-705.5
spellingShingle Biology (General)
QH301-705.5
Zifeng Wang
Aria Masoomi
Zhonghui Xu
Adel Boueiz
Sool Lee
Tingting Zhao
Russell Bowler
Michael Cho
Edwin K Silverman
Craig Hersh
Jennifer Dy
Peter J Castaldi
Improved prediction of smoking status via isoform-aware RNA-seq deep learning models.
description Most predictive models based on gene expression data do not leverage information related to gene splicing, despite the fact that splicing is a fundamental feature of eukaryotic gene expression. Cigarette smoking is an important environmental risk factor for many diseases, and it has profound effects on gene expression. Using smoking status as a prediction target, we developed deep neural network predictive models using gene, exon, and isoform level quantifications from RNA sequencing data in 2,557 subjects in the COPDGene Study. We observed that models using exon and isoform quantifications clearly outperformed gene-level models when using data from 5 genes from a previously published prediction model. Whereas the test set performance of the previously published model was 0.82 in the original publication, our exon-based models including an exon-to-isoform mapping layer achieved a test set AUC (area under the receiver operating characteristic) of 0.88, which improved to an AUC of 0.94 using exon quantifications from a larger set of genes. Isoform variability is an important source of latent information in RNA-seq data that can be used to improve clinical prediction models.
format article
author Zifeng Wang
Aria Masoomi
Zhonghui Xu
Adel Boueiz
Sool Lee
Tingting Zhao
Russell Bowler
Michael Cho
Edwin K Silverman
Craig Hersh
Jennifer Dy
Peter J Castaldi
author_facet Zifeng Wang
Aria Masoomi
Zhonghui Xu
Adel Boueiz
Sool Lee
Tingting Zhao
Russell Bowler
Michael Cho
Edwin K Silverman
Craig Hersh
Jennifer Dy
Peter J Castaldi
author_sort Zifeng Wang
title Improved prediction of smoking status via isoform-aware RNA-seq deep learning models.
title_short Improved prediction of smoking status via isoform-aware RNA-seq deep learning models.
title_full Improved prediction of smoking status via isoform-aware RNA-seq deep learning models.
title_fullStr Improved prediction of smoking status via isoform-aware RNA-seq deep learning models.
title_full_unstemmed Improved prediction of smoking status via isoform-aware RNA-seq deep learning models.
title_sort improved prediction of smoking status via isoform-aware rna-seq deep learning models.
publisher Public Library of Science (PLoS)
publishDate 2021
url https://doaj.org/article/dad7f97d944747acba04531e3bfd1c45
work_keys_str_mv AT zifengwang improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels
AT ariamasoomi improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels
AT zhonghuixu improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels
AT adelboueiz improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels
AT soollee improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels
AT tingtingzhao improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels
AT russellbowler improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels
AT michaelcho improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels
AT edwinksilverman improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels
AT craighersh improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels
AT jenniferdy improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels
AT peterjcastaldi improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels
_version_ 1718375810238251008