Identification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning

Colorectal cancer (CRC) is one of the most common malignancies worldwide. Biomarker discovery is critical to improve CRC diagnosis, however, machine learning offers a new platform to study the etiology of CRC for this purpose. Therefore, the current study aimed to perform an integrated bioinformatic...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Ahmed Hammad, Mohamed Elshaer, Xiuwen Tang
Formato: article
Lenguaje:EN
Publicado: AIMS Press 2021
Materias:
ppi
auc
Acceso en línea:https://doaj.org/article/e6863ae2fbc144c09b43904336fdaa66
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:e6863ae2fbc144c09b43904336fdaa66
record_format dspace
spelling oai:doaj.org-article:e6863ae2fbc144c09b43904336fdaa662021-11-29T02:53:02ZIdentification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning10.3934/mbe.20214431551-0018https://doaj.org/article/e6863ae2fbc144c09b43904336fdaa662021-10-01T00:00:00Zhttps://www.aimspress.com/article/doi/10.3934/mbe.2021443?viewType=HTMLhttps://doaj.org/toc/1551-0018Colorectal cancer (CRC) is one of the most common malignancies worldwide. Biomarker discovery is critical to improve CRC diagnosis, however, machine learning offers a new platform to study the etiology of CRC for this purpose. Therefore, the current study aimed to perform an integrated bioinformatics and machine learning analyses to explore novel biomarkers for CRC prognosis. In this study, we acquired gene expression microarray data from Gene Expression Omnibus (GEO) database. The microarray expressions GSE103512 dataset was downloaded and integrated. Subsequently, differentially expressed genes (DEGs) were identified and functionally analyzed via Gene Ontology (GO) and Kyoto Enrichment of Genes and Genomes (KEGG). Furthermore, protein protein interaction (PPI) network analysis was conducted using the STRING database and Cytoscape software to identify hub genes; however, the hub genes were subjected to Support Vector Machine (SVM), Receiver operating characteristic curve (ROC) and survival analyses to explore their diagnostic values. Meanwhile, TCGA transcriptomics data in Gene Expression Profiling Interactive Analysis (GEPIA) database and the pathology data presented by in the human protein atlas (HPA) database were used to verify our transcriptomic analyses. A total of 105 DEGs were identified in this study. Functional enrichment analysis showed that these genes were significantly enriched in biological processes related to cancer progression. Thereafter, PPI network explored a total of 10 significant hub genes. The ROC curve was used to predict the potential application of biomarkers in CRC diagnosis, with an area under ROC curve (AUC) of these genes exceeding 0.92 suggesting that this risk classifier can discriminate between CRC patients and normal controls. Moreover, the prognostic values of these hub genes were confirmed by survival analyses using different CRC patient cohorts. Our results demonstrated that these 10 differentially expressed hub genes could be used as potential biomarkers for CRC diagnosis.Ahmed Hammad Mohamed ElshaerXiuwen Tang AIMS Pressarticlehub genesppigene microarraycolorectal canceraucbiomarkersBiotechnologyTP248.13-248.65MathematicsQA1-939ENMathematical Biosciences and Engineering, Vol 18, Iss 6, Pp 8997-9015 (2021)
institution DOAJ
collection DOAJ
language EN
topic hub genes
ppi
gene microarray
colorectal cancer
auc
biomarkers
Biotechnology
TP248.13-248.65
Mathematics
QA1-939
spellingShingle hub genes
ppi
gene microarray
colorectal cancer
auc
biomarkers
Biotechnology
TP248.13-248.65
Mathematics
QA1-939
Ahmed Hammad
Mohamed Elshaer
Xiuwen Tang
Identification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning
description Colorectal cancer (CRC) is one of the most common malignancies worldwide. Biomarker discovery is critical to improve CRC diagnosis, however, machine learning offers a new platform to study the etiology of CRC for this purpose. Therefore, the current study aimed to perform an integrated bioinformatics and machine learning analyses to explore novel biomarkers for CRC prognosis. In this study, we acquired gene expression microarray data from Gene Expression Omnibus (GEO) database. The microarray expressions GSE103512 dataset was downloaded and integrated. Subsequently, differentially expressed genes (DEGs) were identified and functionally analyzed via Gene Ontology (GO) and Kyoto Enrichment of Genes and Genomes (KEGG). Furthermore, protein protein interaction (PPI) network analysis was conducted using the STRING database and Cytoscape software to identify hub genes; however, the hub genes were subjected to Support Vector Machine (SVM), Receiver operating characteristic curve (ROC) and survival analyses to explore their diagnostic values. Meanwhile, TCGA transcriptomics data in Gene Expression Profiling Interactive Analysis (GEPIA) database and the pathology data presented by in the human protein atlas (HPA) database were used to verify our transcriptomic analyses. A total of 105 DEGs were identified in this study. Functional enrichment analysis showed that these genes were significantly enriched in biological processes related to cancer progression. Thereafter, PPI network explored a total of 10 significant hub genes. The ROC curve was used to predict the potential application of biomarkers in CRC diagnosis, with an area under ROC curve (AUC) of these genes exceeding 0.92 suggesting that this risk classifier can discriminate between CRC patients and normal controls. Moreover, the prognostic values of these hub genes were confirmed by survival analyses using different CRC patient cohorts. Our results demonstrated that these 10 differentially expressed hub genes could be used as potential biomarkers for CRC diagnosis.
format article
author Ahmed Hammad
Mohamed Elshaer
Xiuwen Tang
author_facet Ahmed Hammad
Mohamed Elshaer
Xiuwen Tang
author_sort Ahmed Hammad
title Identification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning
title_short Identification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning
title_full Identification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning
title_fullStr Identification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning
title_full_unstemmed Identification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning
title_sort identification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning
publisher AIMS Press
publishDate 2021
url https://doaj.org/article/e6863ae2fbc144c09b43904336fdaa66
work_keys_str_mv AT ahmedhammad identificationofpotentialbiomarkerswithcolorectalcancerbasedonbioinformaticsanalysisandmachinelearning
AT mohamedelshaer identificationofpotentialbiomarkerswithcolorectalcancerbasedonbioinformaticsanalysisandmachinelearning
AT xiuwentang identificationofpotentialbiomarkerswithcolorectalcancerbasedonbioinformaticsanalysisandmachinelearning
_version_ 1718407680936116224