Transcriptome profiling by combined machine learning and statistical R analysis identifies TMEM236 as a potential novel diagnostic biomarker for colorectal cancer

Abstract Colorectal cancer (CRC) is a common cause of cancer-related deaths worldwide. The CRC mRNA gene expression dataset containing 644 CRC tumor and 51 normal samples from the cancer genome atlas (TCGA) was pre-processed to identify the significant differentially expressed genes (DEGs). Feature...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Neha Shree Maurya, Sandeep Kushwaha, Aakash Chawade, Ashutosh Mani
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/2259edeee46d4068928014e90a777c6b
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:2259edeee46d4068928014e90a777c6b
record_format dspace
spelling oai:doaj.org-article:2259edeee46d4068928014e90a777c6b2021-12-02T16:08:06ZTranscriptome profiling by combined machine learning and statistical R analysis identifies TMEM236 as a potential novel diagnostic biomarker for colorectal cancer10.1038/s41598-021-92692-02045-2322https://doaj.org/article/2259edeee46d4068928014e90a777c6b2021-07-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-92692-0https://doaj.org/toc/2045-2322Abstract Colorectal cancer (CRC) is a common cause of cancer-related deaths worldwide. The CRC mRNA gene expression dataset containing 644 CRC tumor and 51 normal samples from the cancer genome atlas (TCGA) was pre-processed to identify the significant differentially expressed genes (DEGs). Feature selection techniques Least absolute shrinkage and selection operator (LASSO) and Relief were used along with class balancing for obtaining features (genes) of high importance. The classification of the CRC dataset was done by ML algorithms namely, random forest (RF), K-nearest neighbour (KNN), and artificial neural networks (ANN). The significant DEGs were 2933, having 1832 upregulated and 1101 downregulated genes. The CRC gene expression dataset had 23,186 features. LASSO had performed better than Relief for classifying tumor and normal samples through ML algorithms namely RF, KNN, and ANN with an accuracy of 100%, while Relief had given 79.5%, 85.05%, and 100% respectively. Common features between LASSO and DEGs were 38, from them only 5 common genes namely, VSTM2A, NR5A2, TMEM236, GDLN, and ETFDH had shown statistically significant survival analysis. Functional review and analysis of the selected genes helped in downsizing the 5 genes to 2, which are VSTM2A and TMEM236. Differential expression of TMEM236 was statistically significant and was markedly reduced in the dataset which solicits appreciation for assessment as a novel biomarker for CRC diagnosis.Neha Shree MauryaSandeep KushwahaAakash ChawadeAshutosh ManiNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-11 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Neha Shree Maurya
Sandeep Kushwaha
Aakash Chawade
Ashutosh Mani
Transcriptome profiling by combined machine learning and statistical R analysis identifies TMEM236 as a potential novel diagnostic biomarker for colorectal cancer
description Abstract Colorectal cancer (CRC) is a common cause of cancer-related deaths worldwide. The CRC mRNA gene expression dataset containing 644 CRC tumor and 51 normal samples from the cancer genome atlas (TCGA) was pre-processed to identify the significant differentially expressed genes (DEGs). Feature selection techniques Least absolute shrinkage and selection operator (LASSO) and Relief were used along with class balancing for obtaining features (genes) of high importance. The classification of the CRC dataset was done by ML algorithms namely, random forest (RF), K-nearest neighbour (KNN), and artificial neural networks (ANN). The significant DEGs were 2933, having 1832 upregulated and 1101 downregulated genes. The CRC gene expression dataset had 23,186 features. LASSO had performed better than Relief for classifying tumor and normal samples through ML algorithms namely RF, KNN, and ANN with an accuracy of 100%, while Relief had given 79.5%, 85.05%, and 100% respectively. Common features between LASSO and DEGs were 38, from them only 5 common genes namely, VSTM2A, NR5A2, TMEM236, GDLN, and ETFDH had shown statistically significant survival analysis. Functional review and analysis of the selected genes helped in downsizing the 5 genes to 2, which are VSTM2A and TMEM236. Differential expression of TMEM236 was statistically significant and was markedly reduced in the dataset which solicits appreciation for assessment as a novel biomarker for CRC diagnosis.
format article
author Neha Shree Maurya
Sandeep Kushwaha
Aakash Chawade
Ashutosh Mani
author_facet Neha Shree Maurya
Sandeep Kushwaha
Aakash Chawade
Ashutosh Mani
author_sort Neha Shree Maurya
title Transcriptome profiling by combined machine learning and statistical R analysis identifies TMEM236 as a potential novel diagnostic biomarker for colorectal cancer
title_short Transcriptome profiling by combined machine learning and statistical R analysis identifies TMEM236 as a potential novel diagnostic biomarker for colorectal cancer
title_full Transcriptome profiling by combined machine learning and statistical R analysis identifies TMEM236 as a potential novel diagnostic biomarker for colorectal cancer
title_fullStr Transcriptome profiling by combined machine learning and statistical R analysis identifies TMEM236 as a potential novel diagnostic biomarker for colorectal cancer
title_full_unstemmed Transcriptome profiling by combined machine learning and statistical R analysis identifies TMEM236 as a potential novel diagnostic biomarker for colorectal cancer
title_sort transcriptome profiling by combined machine learning and statistical r analysis identifies tmem236 as a potential novel diagnostic biomarker for colorectal cancer
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/2259edeee46d4068928014e90a777c6b
work_keys_str_mv AT nehashreemaurya transcriptomeprofilingbycombinedmachinelearningandstatisticalranalysisidentifiestmem236asapotentialnoveldiagnosticbiomarkerforcolorectalcancer
AT sandeepkushwaha transcriptomeprofilingbycombinedmachinelearningandstatisticalranalysisidentifiestmem236asapotentialnoveldiagnosticbiomarkerforcolorectalcancer
AT aakashchawade transcriptomeprofilingbycombinedmachinelearningandstatisticalranalysisidentifiestmem236asapotentialnoveldiagnosticbiomarkerforcolorectalcancer
AT ashutoshmani transcriptomeprofilingbycombinedmachinelearningandstatisticalranalysisidentifiestmem236asapotentialnoveldiagnosticbiomarkerforcolorectalcancer
_version_ 1718384582004310016