Reference-free comparative genomics of 174 chloroplasts.

Direct analysis of unassembled genomic data could greatly increase the power of short read DNA sequencing technologies and allow comparative genomics of organisms without a completed reference available. Here, we compare 174 chloroplasts by analyzing the taxanomic distribution of short kmers across...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Chai-Shian Kua, Jue Ruan, John Harting, Cheng-Xi Ye, Matthew R Helmus, Jun Yu, Charles H Cannon
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2012
Materias:
R
Q
Acceso en línea:https://doaj.org/article/a0889bd0f6b3498d9393f89743c533c8
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:a0889bd0f6b3498d9393f89743c533c8
record_format dspace
spelling oai:doaj.org-article:a0889bd0f6b3498d9393f89743c533c82021-11-18T08:08:11ZReference-free comparative genomics of 174 chloroplasts.1932-620310.1371/journal.pone.0048995https://doaj.org/article/a0889bd0f6b3498d9393f89743c533c82012-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/23185288/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203Direct analysis of unassembled genomic data could greatly increase the power of short read DNA sequencing technologies and allow comparative genomics of organisms without a completed reference available. Here, we compare 174 chloroplasts by analyzing the taxanomic distribution of short kmers across genomes [1]. We then assemble de novo contigs centered on informative variation. The localized de novo contigs can be separated into two major classes: tip = unique to a single genome and group = shared by a subset of genomes. Prior to assembly, we found that ~18% of the chloroplast was duplicated in the inverted repeat (IR) region across a four-fold difference in genome sizes, from a highly reduced parasitic orchid [2] to a massive algal chloroplast [3], including gnetophytes [4] and cycads [5]. The conservation of this ratio between single copy and duplicated sequence was basal among green plants, independent of photosynthesis and mechanism of genome size change, and different in gymnosperms and lower plants. Major lineages in the angiosperm clade differed in the pattern of shared kmers and de novo contigs. For example, parasitic plants demonstrated an expected accelerated overall rate of evolution, while the hemi-parasitic genomes contained a great deal more novel sequence than holo-parasitic plants, suggesting different mechanisms at different stages of genomic contraction. Additionally, the legumes are diverging more quickly and in different ways than other major families. Small duplicated fragments of the rrn23 genes were deeply conserved among seed plants, including among several species without the IR regions, indicating a crucial functional role of this duplication. Localized de novo assembly of informative kmers greatly reduces the complexity of large comparative analyses by confining the analysis to a small partition of data and genomes relevant to the specific question, allowing direct analysis of next-gen sequence data from previously unstudied genomes and rapid discovery of informative candidate regions.Chai-Shian KuaJue RuanJohn HartingCheng-Xi YeMatthew R HelmusJun YuCharles H CannonPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 7, Iss 11, p e48995 (2012)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Chai-Shian Kua
Jue Ruan
John Harting
Cheng-Xi Ye
Matthew R Helmus
Jun Yu
Charles H Cannon
Reference-free comparative genomics of 174 chloroplasts.
description Direct analysis of unassembled genomic data could greatly increase the power of short read DNA sequencing technologies and allow comparative genomics of organisms without a completed reference available. Here, we compare 174 chloroplasts by analyzing the taxanomic distribution of short kmers across genomes [1]. We then assemble de novo contigs centered on informative variation. The localized de novo contigs can be separated into two major classes: tip = unique to a single genome and group = shared by a subset of genomes. Prior to assembly, we found that ~18% of the chloroplast was duplicated in the inverted repeat (IR) region across a four-fold difference in genome sizes, from a highly reduced parasitic orchid [2] to a massive algal chloroplast [3], including gnetophytes [4] and cycads [5]. The conservation of this ratio between single copy and duplicated sequence was basal among green plants, independent of photosynthesis and mechanism of genome size change, and different in gymnosperms and lower plants. Major lineages in the angiosperm clade differed in the pattern of shared kmers and de novo contigs. For example, parasitic plants demonstrated an expected accelerated overall rate of evolution, while the hemi-parasitic genomes contained a great deal more novel sequence than holo-parasitic plants, suggesting different mechanisms at different stages of genomic contraction. Additionally, the legumes are diverging more quickly and in different ways than other major families. Small duplicated fragments of the rrn23 genes were deeply conserved among seed plants, including among several species without the IR regions, indicating a crucial functional role of this duplication. Localized de novo assembly of informative kmers greatly reduces the complexity of large comparative analyses by confining the analysis to a small partition of data and genomes relevant to the specific question, allowing direct analysis of next-gen sequence data from previously unstudied genomes and rapid discovery of informative candidate regions.
format article
author Chai-Shian Kua
Jue Ruan
John Harting
Cheng-Xi Ye
Matthew R Helmus
Jun Yu
Charles H Cannon
author_facet Chai-Shian Kua
Jue Ruan
John Harting
Cheng-Xi Ye
Matthew R Helmus
Jun Yu
Charles H Cannon
author_sort Chai-Shian Kua
title Reference-free comparative genomics of 174 chloroplasts.
title_short Reference-free comparative genomics of 174 chloroplasts.
title_full Reference-free comparative genomics of 174 chloroplasts.
title_fullStr Reference-free comparative genomics of 174 chloroplasts.
title_full_unstemmed Reference-free comparative genomics of 174 chloroplasts.
title_sort reference-free comparative genomics of 174 chloroplasts.
publisher Public Library of Science (PLoS)
publishDate 2012
url https://doaj.org/article/a0889bd0f6b3498d9393f89743c533c8
work_keys_str_mv AT chaishiankua referencefreecomparativegenomicsof174chloroplasts
AT jueruan referencefreecomparativegenomicsof174chloroplasts
AT johnharting referencefreecomparativegenomicsof174chloroplasts
AT chengxiye referencefreecomparativegenomicsof174chloroplasts
AT matthewrhelmus referencefreecomparativegenomicsof174chloroplasts
AT junyu referencefreecomparativegenomicsof174chloroplasts
AT charleshcannon referencefreecomparativegenomicsof174chloroplasts
_version_ 1718422184763850752