Improved prediction of smoking status via isoform-aware RNA-seq deep learning models.
Most predictive models based on gene expression data do not leverage information related to gene splicing, despite the fact that splicing is a fundamental feature of eukaryotic gene expression. Cigarette smoking is an important environmental risk factor for many diseases, and it has profound effects...
Guardado en:
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Public Library of Science (PLoS)
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/dad7f97d944747acba04531e3bfd1c45 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:dad7f97d944747acba04531e3bfd1c45 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:dad7f97d944747acba04531e3bfd1c452021-12-02T19:57:42ZImproved prediction of smoking status via isoform-aware RNA-seq deep learning models.1553-734X1553-735810.1371/journal.pcbi.1009433https://doaj.org/article/dad7f97d944747acba04531e3bfd1c452021-10-01T00:00:00Zhttps://doi.org/10.1371/journal.pcbi.1009433https://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358Most predictive models based on gene expression data do not leverage information related to gene splicing, despite the fact that splicing is a fundamental feature of eukaryotic gene expression. Cigarette smoking is an important environmental risk factor for many diseases, and it has profound effects on gene expression. Using smoking status as a prediction target, we developed deep neural network predictive models using gene, exon, and isoform level quantifications from RNA sequencing data in 2,557 subjects in the COPDGene Study. We observed that models using exon and isoform quantifications clearly outperformed gene-level models when using data from 5 genes from a previously published prediction model. Whereas the test set performance of the previously published model was 0.82 in the original publication, our exon-based models including an exon-to-isoform mapping layer achieved a test set AUC (area under the receiver operating characteristic) of 0.88, which improved to an AUC of 0.94 using exon quantifications from a larger set of genes. Isoform variability is an important source of latent information in RNA-seq data that can be used to improve clinical prediction models.Zifeng WangAria MasoomiZhonghui XuAdel BoueizSool LeeTingting ZhaoRussell BowlerMichael ChoEdwin K SilvermanCraig HershJennifer DyPeter J CastaldiPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 17, Iss 10, p e1009433 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Biology (General) QH301-705.5 |
spellingShingle |
Biology (General) QH301-705.5 Zifeng Wang Aria Masoomi Zhonghui Xu Adel Boueiz Sool Lee Tingting Zhao Russell Bowler Michael Cho Edwin K Silverman Craig Hersh Jennifer Dy Peter J Castaldi Improved prediction of smoking status via isoform-aware RNA-seq deep learning models. |
description |
Most predictive models based on gene expression data do not leverage information related to gene splicing, despite the fact that splicing is a fundamental feature of eukaryotic gene expression. Cigarette smoking is an important environmental risk factor for many diseases, and it has profound effects on gene expression. Using smoking status as a prediction target, we developed deep neural network predictive models using gene, exon, and isoform level quantifications from RNA sequencing data in 2,557 subjects in the COPDGene Study. We observed that models using exon and isoform quantifications clearly outperformed gene-level models when using data from 5 genes from a previously published prediction model. Whereas the test set performance of the previously published model was 0.82 in the original publication, our exon-based models including an exon-to-isoform mapping layer achieved a test set AUC (area under the receiver operating characteristic) of 0.88, which improved to an AUC of 0.94 using exon quantifications from a larger set of genes. Isoform variability is an important source of latent information in RNA-seq data that can be used to improve clinical prediction models. |
format |
article |
author |
Zifeng Wang Aria Masoomi Zhonghui Xu Adel Boueiz Sool Lee Tingting Zhao Russell Bowler Michael Cho Edwin K Silverman Craig Hersh Jennifer Dy Peter J Castaldi |
author_facet |
Zifeng Wang Aria Masoomi Zhonghui Xu Adel Boueiz Sool Lee Tingting Zhao Russell Bowler Michael Cho Edwin K Silverman Craig Hersh Jennifer Dy Peter J Castaldi |
author_sort |
Zifeng Wang |
title |
Improved prediction of smoking status via isoform-aware RNA-seq deep learning models. |
title_short |
Improved prediction of smoking status via isoform-aware RNA-seq deep learning models. |
title_full |
Improved prediction of smoking status via isoform-aware RNA-seq deep learning models. |
title_fullStr |
Improved prediction of smoking status via isoform-aware RNA-seq deep learning models. |
title_full_unstemmed |
Improved prediction of smoking status via isoform-aware RNA-seq deep learning models. |
title_sort |
improved prediction of smoking status via isoform-aware rna-seq deep learning models. |
publisher |
Public Library of Science (PLoS) |
publishDate |
2021 |
url |
https://doaj.org/article/dad7f97d944747acba04531e3bfd1c45 |
work_keys_str_mv |
AT zifengwang improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels AT ariamasoomi improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels AT zhonghuixu improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels AT adelboueiz improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels AT soollee improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels AT tingtingzhao improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels AT russellbowler improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels AT michaelcho improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels AT edwinksilverman improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels AT craighersh improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels AT jenniferdy improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels AT peterjcastaldi improvedpredictionofsmokingstatusviaisoformawarernaseqdeeplearningmodels |
_version_ |
1718375810238251008 |