Comprehensive discovery of CRISPR-targeted terminally redundant sequences in the human gut metagenome: Viruses, plasmids, and more.
Viruses are the most numerous biological entity, existing in all environments and infecting all cellular organisms. Compared with cellular life, the evolution and origin of viruses are poorly understood; viruses are enormously diverse, and most lack sequence similarity to cellular genes. To uncover...
Guardado en:
Autores principales: | , , , , , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Public Library of Science (PLoS)
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/8cc30e0bbe7e468db1961f98a039c42a |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:8cc30e0bbe7e468db1961f98a039c42a |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:8cc30e0bbe7e468db1961f98a039c42a2021-12-02T19:57:42ZComprehensive discovery of CRISPR-targeted terminally redundant sequences in the human gut metagenome: Viruses, plasmids, and more.1553-734X1553-735810.1371/journal.pcbi.1009428https://doaj.org/article/8cc30e0bbe7e468db1961f98a039c42a2021-10-01T00:00:00Zhttps://doi.org/10.1371/journal.pcbi.1009428https://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358Viruses are the most numerous biological entity, existing in all environments and infecting all cellular organisms. Compared with cellular life, the evolution and origin of viruses are poorly understood; viruses are enormously diverse, and most lack sequence similarity to cellular genes. To uncover viral sequences without relying on either reference viral sequences from databases or marker genes that characterize specific viral taxa, we developed an analysis pipeline for virus inference based on clustered regularly interspaced short palindromic repeats (CRISPR). CRISPR is a prokaryotic nucleic acid restriction system that stores the memory of previous exposure. Our protocol can infer CRISPR-targeted sequences, including viruses, plasmids, and previously uncharacterized elements, and predict their hosts using unassembled short-read metagenomic sequencing data. By analyzing human gut metagenomic data, we extracted 11,391 terminally redundant CRISPR-targeted sequences, which are likely complete circular genomes. The sequences included 2,154 tailed-phage genomes, together with 257 complete crAssphage genomes, 11 genomes larger than 200 kilobases, 766 genomes of Microviridae species, 56 genomes of Inoviridae species, and 95 previously uncharacterized circular small genomes that have no reliably predicted protein-coding gene. We predicted the host(s) of approximately 70% of the discovered genomes at the taxonomic level of phylum by linking protospacers to taxonomically assigned CRISPR direct repeats. These results demonstrate that our protocol is efficient for de novo inference of CRISPR-targeted sequences and their host prediction.Ryota SugimotoLuca NishimuraPhuong Thanh NguyenJumpei ItoNicholas F ParrishHiroshi MoriKen KurokawaHirofumi NakaokaIturo InouePublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 17, Iss 10, p e1009428 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Biology (General) QH301-705.5 |
spellingShingle |
Biology (General) QH301-705.5 Ryota Sugimoto Luca Nishimura Phuong Thanh Nguyen Jumpei Ito Nicholas F Parrish Hiroshi Mori Ken Kurokawa Hirofumi Nakaoka Ituro Inoue Comprehensive discovery of CRISPR-targeted terminally redundant sequences in the human gut metagenome: Viruses, plasmids, and more. |
description |
Viruses are the most numerous biological entity, existing in all environments and infecting all cellular organisms. Compared with cellular life, the evolution and origin of viruses are poorly understood; viruses are enormously diverse, and most lack sequence similarity to cellular genes. To uncover viral sequences without relying on either reference viral sequences from databases or marker genes that characterize specific viral taxa, we developed an analysis pipeline for virus inference based on clustered regularly interspaced short palindromic repeats (CRISPR). CRISPR is a prokaryotic nucleic acid restriction system that stores the memory of previous exposure. Our protocol can infer CRISPR-targeted sequences, including viruses, plasmids, and previously uncharacterized elements, and predict their hosts using unassembled short-read metagenomic sequencing data. By analyzing human gut metagenomic data, we extracted 11,391 terminally redundant CRISPR-targeted sequences, which are likely complete circular genomes. The sequences included 2,154 tailed-phage genomes, together with 257 complete crAssphage genomes, 11 genomes larger than 200 kilobases, 766 genomes of Microviridae species, 56 genomes of Inoviridae species, and 95 previously uncharacterized circular small genomes that have no reliably predicted protein-coding gene. We predicted the host(s) of approximately 70% of the discovered genomes at the taxonomic level of phylum by linking protospacers to taxonomically assigned CRISPR direct repeats. These results demonstrate that our protocol is efficient for de novo inference of CRISPR-targeted sequences and their host prediction. |
format |
article |
author |
Ryota Sugimoto Luca Nishimura Phuong Thanh Nguyen Jumpei Ito Nicholas F Parrish Hiroshi Mori Ken Kurokawa Hirofumi Nakaoka Ituro Inoue |
author_facet |
Ryota Sugimoto Luca Nishimura Phuong Thanh Nguyen Jumpei Ito Nicholas F Parrish Hiroshi Mori Ken Kurokawa Hirofumi Nakaoka Ituro Inoue |
author_sort |
Ryota Sugimoto |
title |
Comprehensive discovery of CRISPR-targeted terminally redundant sequences in the human gut metagenome: Viruses, plasmids, and more. |
title_short |
Comprehensive discovery of CRISPR-targeted terminally redundant sequences in the human gut metagenome: Viruses, plasmids, and more. |
title_full |
Comprehensive discovery of CRISPR-targeted terminally redundant sequences in the human gut metagenome: Viruses, plasmids, and more. |
title_fullStr |
Comprehensive discovery of CRISPR-targeted terminally redundant sequences in the human gut metagenome: Viruses, plasmids, and more. |
title_full_unstemmed |
Comprehensive discovery of CRISPR-targeted terminally redundant sequences in the human gut metagenome: Viruses, plasmids, and more. |
title_sort |
comprehensive discovery of crispr-targeted terminally redundant sequences in the human gut metagenome: viruses, plasmids, and more. |
publisher |
Public Library of Science (PLoS) |
publishDate |
2021 |
url |
https://doaj.org/article/8cc30e0bbe7e468db1961f98a039c42a |
work_keys_str_mv |
AT ryotasugimoto comprehensivediscoveryofcrisprtargetedterminallyredundantsequencesinthehumangutmetagenomevirusesplasmidsandmore AT lucanishimura comprehensivediscoveryofcrisprtargetedterminallyredundantsequencesinthehumangutmetagenomevirusesplasmidsandmore AT phuongthanhnguyen comprehensivediscoveryofcrisprtargetedterminallyredundantsequencesinthehumangutmetagenomevirusesplasmidsandmore AT jumpeiito comprehensivediscoveryofcrisprtargetedterminallyredundantsequencesinthehumangutmetagenomevirusesplasmidsandmore AT nicholasfparrish comprehensivediscoveryofcrisprtargetedterminallyredundantsequencesinthehumangutmetagenomevirusesplasmidsandmore AT hiroshimori comprehensivediscoveryofcrisprtargetedterminallyredundantsequencesinthehumangutmetagenomevirusesplasmidsandmore AT kenkurokawa comprehensivediscoveryofcrisprtargetedterminallyredundantsequencesinthehumangutmetagenomevirusesplasmidsandmore AT hirofuminakaoka comprehensivediscoveryofcrisprtargetedterminallyredundantsequencesinthehumangutmetagenomevirusesplasmidsandmore AT ituroinoue comprehensivediscoveryofcrisprtargetedterminallyredundantsequencesinthehumangutmetagenomevirusesplasmidsandmore |
_version_ |
1718375789556137984 |