How large is the metabolome? A critical analysis of data exchange practices in chemistry.

<h4>Background</h4>Calculating the metabolome size of species by genome-guided reconstruction of metabolic pathways misses all products from orphan genes and from enzymes lacking annotated genes. Hence, metabolomes need to be determined experimentally. Annotations by mass spectrometry wo...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Tobias Kind, Martin Scholz, Oliver Fiehn
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2009
Materias:
R
Q
Acceso en línea:https://doaj.org/article/4dda5d06ad304fb1ae82bf32cc4a997f
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:4dda5d06ad304fb1ae82bf32cc4a997f
record_format dspace
spelling oai:doaj.org-article:4dda5d06ad304fb1ae82bf32cc4a997f2021-11-25T06:22:51ZHow large is the metabolome? A critical analysis of data exchange practices in chemistry.1932-620310.1371/journal.pone.0005440https://doaj.org/article/4dda5d06ad304fb1ae82bf32cc4a997f2009-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/19415114/?tool=EBIhttps://doaj.org/toc/1932-6203<h4>Background</h4>Calculating the metabolome size of species by genome-guided reconstruction of metabolic pathways misses all products from orphan genes and from enzymes lacking annotated genes. Hence, metabolomes need to be determined experimentally. Annotations by mass spectrometry would greatly benefit if peer-reviewed public databases could be queried to compile target lists of structures that already have been reported for a given species. We detail current obstacles to compile such a knowledge base of metabolites.<h4>Results</h4>As an example, results are presented for rice. Two rice (oryza sativa) subspecies have been fully sequenced, oryza japonica and oryza indica. Several major small molecule databases were compared for listing known rice metabolites comprising PubChem, Chemical Abstracts, Beilstein, Patent databases, Dictionary of Natural Products, SetupX/BinBase, KNApSAcK DB, and finally those databases which were obtained by computational approaches, i.e. RiceCyc, KEGG, and Reactome. More than 5,000 small molecules were retrieved when searching these databases. Unfortunately, most often, genuine rice metabolites were retrieved together with non-metabolite database entries such as pesticides. Overlaps from database compound lists were very difficult to compare because structures were either not encoded in machine-readable format or because compound identifiers were not cross-referenced between databases.<h4>Conclusions</h4>We conclude that present databases are not capable of comprehensively retrieving all known metabolites. Metabolome lists are yet mostly restricted to genome-reconstructed pathways. We suggest that providers of (bio)chemical databases enrich their database identifiers to PubChem IDs and InChIKeys to enable cross-database queries. In addition, peer-reviewed journal repositories need to mandate submission of structures and spectra in machine readable format to allow automated semantic annotation of articles containing chemical structures. Such changes in publication standards and database architectures will enable researchers to compile current knowledge about the metabolome of species, which may extend to derived information such as spectral libraries, organ-specific metabolites, and cross-study comparisons.Tobias KindMartin ScholzOliver FiehnPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 4, Iss 5, p e5440 (2009)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Tobias Kind
Martin Scholz
Oliver Fiehn
How large is the metabolome? A critical analysis of data exchange practices in chemistry.
description <h4>Background</h4>Calculating the metabolome size of species by genome-guided reconstruction of metabolic pathways misses all products from orphan genes and from enzymes lacking annotated genes. Hence, metabolomes need to be determined experimentally. Annotations by mass spectrometry would greatly benefit if peer-reviewed public databases could be queried to compile target lists of structures that already have been reported for a given species. We detail current obstacles to compile such a knowledge base of metabolites.<h4>Results</h4>As an example, results are presented for rice. Two rice (oryza sativa) subspecies have been fully sequenced, oryza japonica and oryza indica. Several major small molecule databases were compared for listing known rice metabolites comprising PubChem, Chemical Abstracts, Beilstein, Patent databases, Dictionary of Natural Products, SetupX/BinBase, KNApSAcK DB, and finally those databases which were obtained by computational approaches, i.e. RiceCyc, KEGG, and Reactome. More than 5,000 small molecules were retrieved when searching these databases. Unfortunately, most often, genuine rice metabolites were retrieved together with non-metabolite database entries such as pesticides. Overlaps from database compound lists were very difficult to compare because structures were either not encoded in machine-readable format or because compound identifiers were not cross-referenced between databases.<h4>Conclusions</h4>We conclude that present databases are not capable of comprehensively retrieving all known metabolites. Metabolome lists are yet mostly restricted to genome-reconstructed pathways. We suggest that providers of (bio)chemical databases enrich their database identifiers to PubChem IDs and InChIKeys to enable cross-database queries. In addition, peer-reviewed journal repositories need to mandate submission of structures and spectra in machine readable format to allow automated semantic annotation of articles containing chemical structures. Such changes in publication standards and database architectures will enable researchers to compile current knowledge about the metabolome of species, which may extend to derived information such as spectral libraries, organ-specific metabolites, and cross-study comparisons.
format article
author Tobias Kind
Martin Scholz
Oliver Fiehn
author_facet Tobias Kind
Martin Scholz
Oliver Fiehn
author_sort Tobias Kind
title How large is the metabolome? A critical analysis of data exchange practices in chemistry.
title_short How large is the metabolome? A critical analysis of data exchange practices in chemistry.
title_full How large is the metabolome? A critical analysis of data exchange practices in chemistry.
title_fullStr How large is the metabolome? A critical analysis of data exchange practices in chemistry.
title_full_unstemmed How large is the metabolome? A critical analysis of data exchange practices in chemistry.
title_sort how large is the metabolome? a critical analysis of data exchange practices in chemistry.
publisher Public Library of Science (PLoS)
publishDate 2009
url https://doaj.org/article/4dda5d06ad304fb1ae82bf32cc4a997f
work_keys_str_mv AT tobiaskind howlargeisthemetabolomeacriticalanalysisofdataexchangepracticesinchemistry
AT martinscholz howlargeisthemetabolomeacriticalanalysisofdataexchangepracticesinchemistry
AT oliverfiehn howlargeisthemetabolomeacriticalanalysisofdataexchangepracticesinchemistry
_version_ 1718413800410972160