Use of relevancy and complementary information for discriminatory gene selection from high-dimensional gene expression data.

With the advent of high-throughput technologies, life sciences are generating a huge amount of varied biomolecular data. Global gene expression profiles provide a snapshot of all the genes that are transcribed in a cell or in a tissue under a particular condition. The high-dimensionality of such gen...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Md Nazmul Haque, Sadia Sharmin, Amin Ahsan Ali, Abu Ashfaqur Sajib, Mohammad Shoyaib
Formato:	article
Lenguaje:	EN
Publicado:	Public Library of Science (PLoS) 2021
Materias:	Medicine R Science Q
Acceso en línea:	https://doaj.org/article/55ab0055d4ba414ca784028d7481870d
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:55ab0055d4ba414ca784028d7481870d
record_format	dspace
spelling	oai:doaj.org-article:55ab0055d4ba414ca784028d7481870d2021-12-02T20:17:20ZUse of relevancy and complementary information for discriminatory gene selection from high-dimensional gene expression data.1932-620310.1371/journal.pone.0230164https://doaj.org/article/55ab0055d4ba414ca784028d7481870d2021-01-01T00:00:00Zhttps://doi.org/10.1371/journal.pone.0230164https://doaj.org/toc/1932-6203With the advent of high-throughput technologies, life sciences are generating a huge amount of varied biomolecular data. Global gene expression profiles provide a snapshot of all the genes that are transcribed in a cell or in a tissue under a particular condition. The high-dimensionality of such gene expression data (i.e., very large number of features/genes analyzed with relatively much less number of samples) makes it difficult to identify the key genes (biomarkers) that are truly attributing to a particular phenotype or condition, (such as cancer), de novo. For identifying the key genes from gene expression data, among the existing literature, mutual information (MI) is one of the most successful criteria. However, the correction of MI for finite sample is not taken into account in this regard. It is also important to incorporate dynamic discretization of genes for more relevant gene selection, although this is not considered in the available methods. Besides, it is usually suggested in current studies to remove redundant genes which is particularly inappropriate for biological data, as a group of genes may connect to each other for downstreaming proteins. Thus, despite being redundant, it is needed to add the genes which provide additional useful information for the disease. Addressing these issues, we proposed Mutual information based Gene Selection method (MGS) for selecting informative genes. Moreover, to rank these selected genes, we extended MGS and propose two ranking methods on the selected genes, such as MGSf-based on frequency and MGSrf-based on Random Forest. The proposed method not only obtained better classification rates on gene expression datasets derived from different gene expression studies compared to recently reported methods but also detected the key genes relevant to pathways with a causal relationship to the disease, which indicate that it will also able to find the responsible genes for an unknown disease data.Md Nazmul HaqueSadia SharminAmin Ahsan AliAbu Ashfaqur SajibMohammad ShoyaibPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 16, Iss 10, p e0230164 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Medicine R Science Q
spellingShingle	Medicine R Science Q Md Nazmul Haque Sadia Sharmin Amin Ahsan Ali Abu Ashfaqur Sajib Mohammad Shoyaib Use of relevancy and complementary information for discriminatory gene selection from high-dimensional gene expression data.
description	With the advent of high-throughput technologies, life sciences are generating a huge amount of varied biomolecular data. Global gene expression profiles provide a snapshot of all the genes that are transcribed in a cell or in a tissue under a particular condition. The high-dimensionality of such gene expression data (i.e., very large number of features/genes analyzed with relatively much less number of samples) makes it difficult to identify the key genes (biomarkers) that are truly attributing to a particular phenotype or condition, (such as cancer), de novo. For identifying the key genes from gene expression data, among the existing literature, mutual information (MI) is one of the most successful criteria. However, the correction of MI for finite sample is not taken into account in this regard. It is also important to incorporate dynamic discretization of genes for more relevant gene selection, although this is not considered in the available methods. Besides, it is usually suggested in current studies to remove redundant genes which is particularly inappropriate for biological data, as a group of genes may connect to each other for downstreaming proteins. Thus, despite being redundant, it is needed to add the genes which provide additional useful information for the disease. Addressing these issues, we proposed Mutual information based Gene Selection method (MGS) for selecting informative genes. Moreover, to rank these selected genes, we extended MGS and propose two ranking methods on the selected genes, such as MGSf-based on frequency and MGSrf-based on Random Forest. The proposed method not only obtained better classification rates on gene expression datasets derived from different gene expression studies compared to recently reported methods but also detected the key genes relevant to pathways with a causal relationship to the disease, which indicate that it will also able to find the responsible genes for an unknown disease data.
format	article
author	Md Nazmul Haque Sadia Sharmin Amin Ahsan Ali Abu Ashfaqur Sajib Mohammad Shoyaib
author_facet	Md Nazmul Haque Sadia Sharmin Amin Ahsan Ali Abu Ashfaqur Sajib Mohammad Shoyaib
author_sort	Md Nazmul Haque
title	Use of relevancy and complementary information for discriminatory gene selection from high-dimensional gene expression data.
title_short	Use of relevancy and complementary information for discriminatory gene selection from high-dimensional gene expression data.
title_full	Use of relevancy and complementary information for discriminatory gene selection from high-dimensional gene expression data.
title_fullStr	Use of relevancy and complementary information for discriminatory gene selection from high-dimensional gene expression data.
title_full_unstemmed	Use of relevancy and complementary information for discriminatory gene selection from high-dimensional gene expression data.
title_sort	use of relevancy and complementary information for discriminatory gene selection from high-dimensional gene expression data.
publisher	Public Library of Science (PLoS)
publishDate	2021
url	https://doaj.org/article/55ab0055d4ba414ca784028d7481870d
work_keys_str_mv	AT mdnazmulhaque useofrelevancyandcomplementaryinformationfordiscriminatorygeneselectionfromhighdimensionalgeneexpressiondata AT sadiasharmin useofrelevancyandcomplementaryinformationfordiscriminatorygeneselectionfromhighdimensionalgeneexpressiondata AT aminahsanali useofrelevancyandcomplementaryinformationfordiscriminatorygeneselectionfromhighdimensionalgeneexpressiondata AT abuashfaqursajib useofrelevancyandcomplementaryinformationfordiscriminatorygeneselectionfromhighdimensionalgeneexpressiondata AT mohammadshoyaib useofrelevancyandcomplementaryinformationfordiscriminatorygeneselectionfromhighdimensionalgeneexpressiondata
_version_	1718374406344933376

Use of relevancy and complementary information for discriminatory gene selection from high-dimensional gene expression data.

Ejemplares similares