The impact of multifunctional genes on "guilt by association" analysis.

Many previous studies have shown that by using variants of "guilt-by-association", gene function predictions can be made with very high statistical confidence. In these studies, it is assumed that the "associations" in the data (e.g., protein interaction partners) of a gene are n...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Jesse Gillis, Paul Pavlidis
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2011
Materias:
R
Q
Acceso en línea:https://doaj.org/article/d4c8c1186b524e58911c36dbf78099e1
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:d4c8c1186b524e58911c36dbf78099e1
record_format dspace
spelling oai:doaj.org-article:d4c8c1186b524e58911c36dbf78099e12021-11-18T06:58:33ZThe impact of multifunctional genes on "guilt by association" analysis.1932-620310.1371/journal.pone.0017258https://doaj.org/article/d4c8c1186b524e58911c36dbf78099e12011-02-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/21364756/?tool=EBIhttps://doaj.org/toc/1932-6203Many previous studies have shown that by using variants of "guilt-by-association", gene function predictions can be made with very high statistical confidence. In these studies, it is assumed that the "associations" in the data (e.g., protein interaction partners) of a gene are necessary in establishing "guilt". In this paper we show that multifunctionality, rather than association, is a primary driver of gene function prediction. We first show that knowledge of the degree of multifunctionality alone can produce astonishingly strong performance when used as a predictor of gene function. We then demonstrate how multifunctionality is encoded in gene interaction data (such as protein interactions and coexpression networks) and how this can feed forward into gene function prediction algorithms. We find that high-quality gene function predictions can be made using data that possesses no information on which gene interacts with which. By examining a wide range of networks from mouse, human and yeast, as well as multiple prediction methods and evaluation metrics, we provide evidence that this problem is pervasive and does not reflect the failings of any particular algorithm or data type. We propose computational controls that can be used to provide more meaningful control when estimating gene function prediction performance. We suggest that this source of bias due to multifunctionality is important to control for, with widespread implications for the interpretation of genomics studies.Jesse GillisPaul PavlidisPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 6, Iss 2, p e17258 (2011)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Jesse Gillis
Paul Pavlidis
The impact of multifunctional genes on "guilt by association" analysis.
description Many previous studies have shown that by using variants of "guilt-by-association", gene function predictions can be made with very high statistical confidence. In these studies, it is assumed that the "associations" in the data (e.g., protein interaction partners) of a gene are necessary in establishing "guilt". In this paper we show that multifunctionality, rather than association, is a primary driver of gene function prediction. We first show that knowledge of the degree of multifunctionality alone can produce astonishingly strong performance when used as a predictor of gene function. We then demonstrate how multifunctionality is encoded in gene interaction data (such as protein interactions and coexpression networks) and how this can feed forward into gene function prediction algorithms. We find that high-quality gene function predictions can be made using data that possesses no information on which gene interacts with which. By examining a wide range of networks from mouse, human and yeast, as well as multiple prediction methods and evaluation metrics, we provide evidence that this problem is pervasive and does not reflect the failings of any particular algorithm or data type. We propose computational controls that can be used to provide more meaningful control when estimating gene function prediction performance. We suggest that this source of bias due to multifunctionality is important to control for, with widespread implications for the interpretation of genomics studies.
format article
author Jesse Gillis
Paul Pavlidis
author_facet Jesse Gillis
Paul Pavlidis
author_sort Jesse Gillis
title The impact of multifunctional genes on "guilt by association" analysis.
title_short The impact of multifunctional genes on "guilt by association" analysis.
title_full The impact of multifunctional genes on "guilt by association" analysis.
title_fullStr The impact of multifunctional genes on "guilt by association" analysis.
title_full_unstemmed The impact of multifunctional genes on "guilt by association" analysis.
title_sort impact of multifunctional genes on "guilt by association" analysis.
publisher Public Library of Science (PLoS)
publishDate 2011
url https://doaj.org/article/d4c8c1186b524e58911c36dbf78099e1
work_keys_str_mv AT jessegillis theimpactofmultifunctionalgenesonguiltbyassociationanalysis
AT paulpavlidis theimpactofmultifunctionalgenesonguiltbyassociationanalysis
AT jessegillis impactofmultifunctionalgenesonguiltbyassociationanalysis
AT paulpavlidis impactofmultifunctionalgenesonguiltbyassociationanalysis
_version_ 1718424135375257600