Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers
Background: Long non-coding RNA plays a vital role in changing the expression profiles of various target genes that lead to cancer development. Thus, identifying prognostic lncRNAs related to different cancers might help in developing cancer therapy. Method: To discover the critical lncRNAs that can...
Guardado en:
Autores principales: | , , , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
MDPI AG
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/7bf70ba08db04196810e51adc7958800 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:7bf70ba08db04196810e51adc7958800 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:7bf70ba08db04196810e51adc79588002021-11-11T17:20:10ZMulti-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers10.3390/ijms2221119191422-00671661-6596https://doaj.org/article/7bf70ba08db04196810e51adc79588002021-11-01T00:00:00Zhttps://www.mdpi.com/1422-0067/22/21/11919https://doaj.org/toc/1661-6596https://doaj.org/toc/1422-0067Background: Long non-coding RNA plays a vital role in changing the expression profiles of various target genes that lead to cancer development. Thus, identifying prognostic lncRNAs related to different cancers might help in developing cancer therapy. Method: To discover the critical lncRNAs that can identify the origin of different cancers, we propose the use of the state-of-the-art deep learning algorithm concrete autoencoder (CAE) in an unsupervised setting, which efficiently identifies a subset of the most informative features. However, CAE does not identify reproducible features in different runs due to its stochastic nature. We thus propose a multi-run CAE (mrCAE) to identify a stable set of features to address this issue. The assumption is that a feature appearing in multiple runs carries more meaningful information about the data under consideration. The genome-wide lncRNA expression profiles of 12 different types of cancers, with a total of 4768 samples available in The Cancer Genome Atlas (TCGA), were analyzed to discover the key lncRNAs. The lncRNAs identified by multiple runs of CAE were added to a final list of key lncRNAs that are capable of identifying 12 different cancers. Results: Our results showed that mrCAE performs better in feature selection than single-run CAE, standard autoencoder (AE), and other state-of-the-art feature selection techniques. This study revealed a set of top-ranking 128 lncRNAs that could identify the origin of 12 different cancers with an accuracy of 95%. Survival analysis showed that 76 of 128 lncRNAs have the prognostic capability to differentiate high- and low-risk groups of patients with different cancers. Conclusion: The proposed mrCAE, which selects actual features, outperformed the AE even though it selects the latent or pseudo-features. By selecting actual features instead of pseudo-features, mrCAE can be valuable for precision medicine. The identified prognostic lncRNAs can be further studied to develop therapies for different cancers.Abdullah Al MamunRaihanul Bari TanvirMasrur SobhanKalai MatheeGiri NarasimhanGregory E. HoltAnanda Mohan MondalMDPI AGarticleautoencoderconcrete autoencoderdeep learningfeature selectionlncRNAmrCAEBiology (General)QH301-705.5ChemistryQD1-999ENInternational Journal of Molecular Sciences, Vol 22, Iss 11919, p 11919 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
autoencoder concrete autoencoder deep learning feature selection lncRNA mrCAE Biology (General) QH301-705.5 Chemistry QD1-999 |
spellingShingle |
autoencoder concrete autoencoder deep learning feature selection lncRNA mrCAE Biology (General) QH301-705.5 Chemistry QD1-999 Abdullah Al Mamun Raihanul Bari Tanvir Masrur Sobhan Kalai Mathee Giri Narasimhan Gregory E. Holt Ananda Mohan Mondal Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers |
description |
Background: Long non-coding RNA plays a vital role in changing the expression profiles of various target genes that lead to cancer development. Thus, identifying prognostic lncRNAs related to different cancers might help in developing cancer therapy. Method: To discover the critical lncRNAs that can identify the origin of different cancers, we propose the use of the state-of-the-art deep learning algorithm concrete autoencoder (CAE) in an unsupervised setting, which efficiently identifies a subset of the most informative features. However, CAE does not identify reproducible features in different runs due to its stochastic nature. We thus propose a multi-run CAE (mrCAE) to identify a stable set of features to address this issue. The assumption is that a feature appearing in multiple runs carries more meaningful information about the data under consideration. The genome-wide lncRNA expression profiles of 12 different types of cancers, with a total of 4768 samples available in The Cancer Genome Atlas (TCGA), were analyzed to discover the key lncRNAs. The lncRNAs identified by multiple runs of CAE were added to a final list of key lncRNAs that are capable of identifying 12 different cancers. Results: Our results showed that mrCAE performs better in feature selection than single-run CAE, standard autoencoder (AE), and other state-of-the-art feature selection techniques. This study revealed a set of top-ranking 128 lncRNAs that could identify the origin of 12 different cancers with an accuracy of 95%. Survival analysis showed that 76 of 128 lncRNAs have the prognostic capability to differentiate high- and low-risk groups of patients with different cancers. Conclusion: The proposed mrCAE, which selects actual features, outperformed the AE even though it selects the latent or pseudo-features. By selecting actual features instead of pseudo-features, mrCAE can be valuable for precision medicine. The identified prognostic lncRNAs can be further studied to develop therapies for different cancers. |
format |
article |
author |
Abdullah Al Mamun Raihanul Bari Tanvir Masrur Sobhan Kalai Mathee Giri Narasimhan Gregory E. Holt Ananda Mohan Mondal |
author_facet |
Abdullah Al Mamun Raihanul Bari Tanvir Masrur Sobhan Kalai Mathee Giri Narasimhan Gregory E. Holt Ananda Mohan Mondal |
author_sort |
Abdullah Al Mamun |
title |
Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers |
title_short |
Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers |
title_full |
Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers |
title_fullStr |
Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers |
title_full_unstemmed |
Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers |
title_sort |
multi-run concrete autoencoder to identify prognostic lncrnas for 12 cancers |
publisher |
MDPI AG |
publishDate |
2021 |
url |
https://doaj.org/article/7bf70ba08db04196810e51adc7958800 |
work_keys_str_mv |
AT abdullahalmamun multirunconcreteautoencodertoidentifyprognosticlncrnasfor12cancers AT raihanulbaritanvir multirunconcreteautoencodertoidentifyprognosticlncrnasfor12cancers AT masrursobhan multirunconcreteautoencodertoidentifyprognosticlncrnasfor12cancers AT kalaimathee multirunconcreteautoencodertoidentifyprognosticlncrnasfor12cancers AT girinarasimhan multirunconcreteautoencodertoidentifyprognosticlncrnasfor12cancers AT gregoryeholt multirunconcreteautoencodertoidentifyprognosticlncrnasfor12cancers AT anandamohanmondal multirunconcreteautoencodertoidentifyprognosticlncrnasfor12cancers |
_version_ |
1718432126891720704 |