Exploration of uncharted regions of the protein universe.

The genome projects have unearthed an enormous diversity of genes of unknown function that are still awaiting biological and biochemical characterization. These genes, as most others, can be grouped into families based on sequence similarity. The PFAM database currently contains over 2,200 such fami...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Lukasz Jaroszewski, Zhanwen Li, S Sri Krishna, Constantina Bakolitsa, John Wooley, Ashley M Deacon, Ian A Wilson, Adam Godzik
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2009
Materias:
Acceso en línea:https://doaj.org/article/a505829aab6b4814b2aa0c900f886bf7
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:a505829aab6b4814b2aa0c900f886bf7
record_format dspace
spelling oai:doaj.org-article:a505829aab6b4814b2aa0c900f886bf72021-11-25T05:34:00ZExploration of uncharted regions of the protein universe.1544-91731545-788510.1371/journal.pbio.1000205https://doaj.org/article/a505829aab6b4814b2aa0c900f886bf72009-09-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/19787035/?tool=EBIhttps://doaj.org/toc/1544-9173https://doaj.org/toc/1545-7885The genome projects have unearthed an enormous diversity of genes of unknown function that are still awaiting biological and biochemical characterization. These genes, as most others, can be grouped into families based on sequence similarity. The PFAM database currently contains over 2,200 such families, referred to as domains of unknown function (DUF). In a coordinated effort, the four large-scale centers of the NIH Protein Structure Initiative have determined the first three-dimensional structures for more than 250 of these DUF families. Analysis of the first 248 reveals that about two thirds of the DUF families likely represent very divergent branches of already known and well-characterized families, which allows hypotheses to be formulated about their biological function. The remainder can be formally categorized as new folds, although about one third of these show significant substructure similarity to previously characterized folds. These results infer that, despite the enormous increase in the number and the diversity of new genes being uncovered, the fold space of the proteins they encode is gradually becoming saturated. The previously unexplored sectors of the protein universe appear to be primarily shaped by extreme diversification of known protein families, which then enables organisms to evolve new functions and adapt to particular niches and habitats. Notwithstanding, these DUF families still constitute the richest source for discovery of the remaining protein folds and topologies.Lukasz JaroszewskiZhanwen LiS Sri KrishnaConstantina BakolitsaJohn WooleyAshley M DeaconIan A WilsonAdam GodzikPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Biology, Vol 7, Iss 9, p e1000205 (2009)
institution DOAJ
collection DOAJ
language EN
topic Biology (General)
QH301-705.5
spellingShingle Biology (General)
QH301-705.5
Lukasz Jaroszewski
Zhanwen Li
S Sri Krishna
Constantina Bakolitsa
John Wooley
Ashley M Deacon
Ian A Wilson
Adam Godzik
Exploration of uncharted regions of the protein universe.
description The genome projects have unearthed an enormous diversity of genes of unknown function that are still awaiting biological and biochemical characterization. These genes, as most others, can be grouped into families based on sequence similarity. The PFAM database currently contains over 2,200 such families, referred to as domains of unknown function (DUF). In a coordinated effort, the four large-scale centers of the NIH Protein Structure Initiative have determined the first three-dimensional structures for more than 250 of these DUF families. Analysis of the first 248 reveals that about two thirds of the DUF families likely represent very divergent branches of already known and well-characterized families, which allows hypotheses to be formulated about their biological function. The remainder can be formally categorized as new folds, although about one third of these show significant substructure similarity to previously characterized folds. These results infer that, despite the enormous increase in the number and the diversity of new genes being uncovered, the fold space of the proteins they encode is gradually becoming saturated. The previously unexplored sectors of the protein universe appear to be primarily shaped by extreme diversification of known protein families, which then enables organisms to evolve new functions and adapt to particular niches and habitats. Notwithstanding, these DUF families still constitute the richest source for discovery of the remaining protein folds and topologies.
format article
author Lukasz Jaroszewski
Zhanwen Li
S Sri Krishna
Constantina Bakolitsa
John Wooley
Ashley M Deacon
Ian A Wilson
Adam Godzik
author_facet Lukasz Jaroszewski
Zhanwen Li
S Sri Krishna
Constantina Bakolitsa
John Wooley
Ashley M Deacon
Ian A Wilson
Adam Godzik
author_sort Lukasz Jaroszewski
title Exploration of uncharted regions of the protein universe.
title_short Exploration of uncharted regions of the protein universe.
title_full Exploration of uncharted regions of the protein universe.
title_fullStr Exploration of uncharted regions of the protein universe.
title_full_unstemmed Exploration of uncharted regions of the protein universe.
title_sort exploration of uncharted regions of the protein universe.
publisher Public Library of Science (PLoS)
publishDate 2009
url https://doaj.org/article/a505829aab6b4814b2aa0c900f886bf7
work_keys_str_mv AT lukaszjaroszewski explorationofunchartedregionsoftheproteinuniverse
AT zhanwenli explorationofunchartedregionsoftheproteinuniverse
AT ssrikrishna explorationofunchartedregionsoftheproteinuniverse
AT constantinabakolitsa explorationofunchartedregionsoftheproteinuniverse
AT johnwooley explorationofunchartedregionsoftheproteinuniverse
AT ashleymdeacon explorationofunchartedregionsoftheproteinuniverse
AT ianawilson explorationofunchartedregionsoftheproteinuniverse
AT adamgodzik explorationofunchartedregionsoftheproteinuniverse
_version_ 1718414595716022272