Identification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning
Colorectal cancer (CRC) is one of the most common malignancies worldwide. Biomarker discovery is critical to improve CRC diagnosis, however, machine learning offers a new platform to study the etiology of CRC for this purpose. Therefore, the current study aimed to perform an integrated bioinformatic...
Guardado en:
Autores principales: | , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
AIMS Press
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/e6863ae2fbc144c09b43904336fdaa66 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:e6863ae2fbc144c09b43904336fdaa66 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:e6863ae2fbc144c09b43904336fdaa662021-11-29T02:53:02ZIdentification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning10.3934/mbe.20214431551-0018https://doaj.org/article/e6863ae2fbc144c09b43904336fdaa662021-10-01T00:00:00Zhttps://www.aimspress.com/article/doi/10.3934/mbe.2021443?viewType=HTMLhttps://doaj.org/toc/1551-0018Colorectal cancer (CRC) is one of the most common malignancies worldwide. Biomarker discovery is critical to improve CRC diagnosis, however, machine learning offers a new platform to study the etiology of CRC for this purpose. Therefore, the current study aimed to perform an integrated bioinformatics and machine learning analyses to explore novel biomarkers for CRC prognosis. In this study, we acquired gene expression microarray data from Gene Expression Omnibus (GEO) database. The microarray expressions GSE103512 dataset was downloaded and integrated. Subsequently, differentially expressed genes (DEGs) were identified and functionally analyzed via Gene Ontology (GO) and Kyoto Enrichment of Genes and Genomes (KEGG). Furthermore, protein protein interaction (PPI) network analysis was conducted using the STRING database and Cytoscape software to identify hub genes; however, the hub genes were subjected to Support Vector Machine (SVM), Receiver operating characteristic curve (ROC) and survival analyses to explore their diagnostic values. Meanwhile, TCGA transcriptomics data in Gene Expression Profiling Interactive Analysis (GEPIA) database and the pathology data presented by in the human protein atlas (HPA) database were used to verify our transcriptomic analyses. A total of 105 DEGs were identified in this study. Functional enrichment analysis showed that these genes were significantly enriched in biological processes related to cancer progression. Thereafter, PPI network explored a total of 10 significant hub genes. The ROC curve was used to predict the potential application of biomarkers in CRC diagnosis, with an area under ROC curve (AUC) of these genes exceeding 0.92 suggesting that this risk classifier can discriminate between CRC patients and normal controls. Moreover, the prognostic values of these hub genes were confirmed by survival analyses using different CRC patient cohorts. Our results demonstrated that these 10 differentially expressed hub genes could be used as potential biomarkers for CRC diagnosis.Ahmed Hammad Mohamed ElshaerXiuwen Tang AIMS Pressarticlehub genesppigene microarraycolorectal canceraucbiomarkersBiotechnologyTP248.13-248.65MathematicsQA1-939ENMathematical Biosciences and Engineering, Vol 18, Iss 6, Pp 8997-9015 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
hub genes ppi gene microarray colorectal cancer auc biomarkers Biotechnology TP248.13-248.65 Mathematics QA1-939 |
spellingShingle |
hub genes ppi gene microarray colorectal cancer auc biomarkers Biotechnology TP248.13-248.65 Mathematics QA1-939 Ahmed Hammad Mohamed Elshaer Xiuwen Tang Identification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning |
description |
Colorectal cancer (CRC) is one of the most common malignancies worldwide. Biomarker discovery is critical to improve CRC diagnosis, however, machine learning offers a new platform to study the etiology of CRC for this purpose. Therefore, the current study aimed to perform an integrated bioinformatics and machine learning analyses to explore novel biomarkers for CRC prognosis. In this study, we acquired gene expression microarray data from Gene Expression Omnibus (GEO) database. The microarray expressions GSE103512 dataset was downloaded and integrated. Subsequently, differentially expressed genes (DEGs) were identified and functionally analyzed via Gene Ontology (GO) and Kyoto Enrichment of Genes and Genomes (KEGG). Furthermore, protein protein interaction (PPI) network analysis was conducted using the STRING database and Cytoscape software to identify hub genes; however, the hub genes were subjected to Support Vector Machine (SVM), Receiver operating characteristic curve (ROC) and survival analyses to explore their diagnostic values. Meanwhile, TCGA transcriptomics data in Gene Expression Profiling Interactive Analysis (GEPIA) database and the pathology data presented by in the human protein atlas (HPA) database were used to verify our transcriptomic analyses. A total of 105 DEGs were identified in this study. Functional enrichment analysis showed that these genes were significantly enriched in biological processes related to cancer progression. Thereafter, PPI network explored a total of 10 significant hub genes. The ROC curve was used to predict the potential application of biomarkers in CRC diagnosis, with an area under ROC curve (AUC) of these genes exceeding 0.92 suggesting that this risk classifier can discriminate between CRC patients and normal controls. Moreover, the prognostic values of these hub genes were confirmed by survival analyses using different CRC patient cohorts. Our results demonstrated that these 10 differentially expressed hub genes could be used as potential biomarkers for CRC diagnosis. |
format |
article |
author |
Ahmed Hammad Mohamed Elshaer Xiuwen Tang |
author_facet |
Ahmed Hammad Mohamed Elshaer Xiuwen Tang |
author_sort |
Ahmed Hammad |
title |
Identification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning |
title_short |
Identification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning |
title_full |
Identification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning |
title_fullStr |
Identification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning |
title_full_unstemmed |
Identification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning |
title_sort |
identification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning |
publisher |
AIMS Press |
publishDate |
2021 |
url |
https://doaj.org/article/e6863ae2fbc144c09b43904336fdaa66 |
work_keys_str_mv |
AT ahmedhammad identificationofpotentialbiomarkerswithcolorectalcancerbasedonbioinformaticsanalysisandmachinelearning AT mohamedelshaer identificationofpotentialbiomarkerswithcolorectalcancerbasedonbioinformaticsanalysisandmachinelearning AT xiuwentang identificationofpotentialbiomarkerswithcolorectalcancerbasedonbioinformaticsanalysisandmachinelearning |
_version_ |
1718407680936116224 |