Comprehensive discovery of CRISPR-targeted terminally redundant sequences in the human gut metagenome: Viruses, plasmids, and more

Viruses are the most numerous biological entity, existing in all environments and infecting all cellular organisms. Compared with cellular life, the evolution and origin of viruses are poorly understood; viruses are enormously diverse, and most lack sequence similarity to cellular genes. To uncover...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Ryota Sugimoto, Luca Nishimura, Phuong Thanh Nguyen, Jumpei Ito, Nicholas F. Parrish, Hiroshi Mori, Ken Kurokawa, Hirofumi Nakaoka, Ituro Inoue
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2021
Materias:
Acceso en línea:https://doaj.org/article/01681be08d17477eaed58d1209f95089
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:01681be08d17477eaed58d1209f95089
record_format dspace
spelling oai:doaj.org-article:01681be08d17477eaed58d1209f950892021-11-04T05:42:44ZComprehensive discovery of CRISPR-targeted terminally redundant sequences in the human gut metagenome: Viruses, plasmids, and more1553-734X1553-7358https://doaj.org/article/01681be08d17477eaed58d1209f950892021-10-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8530359/?tool=EBIhttps://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358Viruses are the most numerous biological entity, existing in all environments and infecting all cellular organisms. Compared with cellular life, the evolution and origin of viruses are poorly understood; viruses are enormously diverse, and most lack sequence similarity to cellular genes. To uncover viral sequences without relying on either reference viral sequences from databases or marker genes that characterize specific viral taxa, we developed an analysis pipeline for virus inference based on clustered regularly interspaced short palindromic repeats (CRISPR). CRISPR is a prokaryotic nucleic acid restriction system that stores the memory of previous exposure. Our protocol can infer CRISPR-targeted sequences, including viruses, plasmids, and previously uncharacterized elements, and predict their hosts using unassembled short-read metagenomic sequencing data. By analyzing human gut metagenomic data, we extracted 11,391 terminally redundant CRISPR-targeted sequences, which are likely complete circular genomes. The sequences included 2,154 tailed-phage genomes, together with 257 complete crAssphage genomes, 11 genomes larger than 200 kilobases, 766 genomes of Microviridae species, 56 genomes of Inoviridae species, and 95 previously uncharacterized circular small genomes that have no reliably predicted protein-coding gene. We predicted the host(s) of approximately 70% of the discovered genomes at the taxonomic level of phylum by linking protospacers to taxonomically assigned CRISPR direct repeats. These results demonstrate that our protocol is efficient for de novo inference of CRISPR-targeted sequences and their host prediction. Author summary The evolution and origins of viruses are long-standing questions in the field of biology. Viral genomes provide fundamental information to infer the evolution and origin of viruses. However, viruses are extraordinarily diverse, and there are no single genes shared across entire species. Several methods were developed to collect viral genomes from metagenome. To infer viral genomes from metagenome, previous approaches relied on reference viral genomes. We thought that such reference-based methods may not be sufficient to uncover diverse viral genomes; therefore, we developed a pipeline that utilizes CRISPR, a prokaryotic adaptive immunological memory. Using this pipeline, we discovered more than 10,000 positively complete CRISPR-targeted genomes from human gut metagenome datasets. A substantial portion of the discovered genomes encoded various types of capsid proteins, supporting the contention that these sequences are viral. Although the majority of these capsid-protein-coding sequences were previously characterized, we notably discovered Inoviridae genomes that were previously difficult to infer as being viral. Furthermore, some of the remaining unclassified sequences without a detectable capsid-protein-encoding gene had a notably low protein-coding ratio. Overall, our pipeline successfully discovered viruses and previously uncharacterized presumably mobile genetic elements targeted by CRISPR.Ryota SugimotoLuca NishimuraPhuong Thanh NguyenJumpei ItoNicholas F. ParrishHiroshi MoriKen KurokawaHirofumi NakaokaIturo InouePublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 17, Iss 10 (2021)
institution DOAJ
collection DOAJ
language EN
topic Biology (General)
QH301-705.5
spellingShingle Biology (General)
QH301-705.5
Ryota Sugimoto
Luca Nishimura
Phuong Thanh Nguyen
Jumpei Ito
Nicholas F. Parrish
Hiroshi Mori
Ken Kurokawa
Hirofumi Nakaoka
Ituro Inoue
Comprehensive discovery of CRISPR-targeted terminally redundant sequences in the human gut metagenome: Viruses, plasmids, and more
description Viruses are the most numerous biological entity, existing in all environments and infecting all cellular organisms. Compared with cellular life, the evolution and origin of viruses are poorly understood; viruses are enormously diverse, and most lack sequence similarity to cellular genes. To uncover viral sequences without relying on either reference viral sequences from databases or marker genes that characterize specific viral taxa, we developed an analysis pipeline for virus inference based on clustered regularly interspaced short palindromic repeats (CRISPR). CRISPR is a prokaryotic nucleic acid restriction system that stores the memory of previous exposure. Our protocol can infer CRISPR-targeted sequences, including viruses, plasmids, and previously uncharacterized elements, and predict their hosts using unassembled short-read metagenomic sequencing data. By analyzing human gut metagenomic data, we extracted 11,391 terminally redundant CRISPR-targeted sequences, which are likely complete circular genomes. The sequences included 2,154 tailed-phage genomes, together with 257 complete crAssphage genomes, 11 genomes larger than 200 kilobases, 766 genomes of Microviridae species, 56 genomes of Inoviridae species, and 95 previously uncharacterized circular small genomes that have no reliably predicted protein-coding gene. We predicted the host(s) of approximately 70% of the discovered genomes at the taxonomic level of phylum by linking protospacers to taxonomically assigned CRISPR direct repeats. These results demonstrate that our protocol is efficient for de novo inference of CRISPR-targeted sequences and their host prediction. Author summary The evolution and origins of viruses are long-standing questions in the field of biology. Viral genomes provide fundamental information to infer the evolution and origin of viruses. However, viruses are extraordinarily diverse, and there are no single genes shared across entire species. Several methods were developed to collect viral genomes from metagenome. To infer viral genomes from metagenome, previous approaches relied on reference viral genomes. We thought that such reference-based methods may not be sufficient to uncover diverse viral genomes; therefore, we developed a pipeline that utilizes CRISPR, a prokaryotic adaptive immunological memory. Using this pipeline, we discovered more than 10,000 positively complete CRISPR-targeted genomes from human gut metagenome datasets. A substantial portion of the discovered genomes encoded various types of capsid proteins, supporting the contention that these sequences are viral. Although the majority of these capsid-protein-coding sequences were previously characterized, we notably discovered Inoviridae genomes that were previously difficult to infer as being viral. Furthermore, some of the remaining unclassified sequences without a detectable capsid-protein-encoding gene had a notably low protein-coding ratio. Overall, our pipeline successfully discovered viruses and previously uncharacterized presumably mobile genetic elements targeted by CRISPR.
format article
author Ryota Sugimoto
Luca Nishimura
Phuong Thanh Nguyen
Jumpei Ito
Nicholas F. Parrish
Hiroshi Mori
Ken Kurokawa
Hirofumi Nakaoka
Ituro Inoue
author_facet Ryota Sugimoto
Luca Nishimura
Phuong Thanh Nguyen
Jumpei Ito
Nicholas F. Parrish
Hiroshi Mori
Ken Kurokawa
Hirofumi Nakaoka
Ituro Inoue
author_sort Ryota Sugimoto
title Comprehensive discovery of CRISPR-targeted terminally redundant sequences in the human gut metagenome: Viruses, plasmids, and more
title_short Comprehensive discovery of CRISPR-targeted terminally redundant sequences in the human gut metagenome: Viruses, plasmids, and more
title_full Comprehensive discovery of CRISPR-targeted terminally redundant sequences in the human gut metagenome: Viruses, plasmids, and more
title_fullStr Comprehensive discovery of CRISPR-targeted terminally redundant sequences in the human gut metagenome: Viruses, plasmids, and more
title_full_unstemmed Comprehensive discovery of CRISPR-targeted terminally redundant sequences in the human gut metagenome: Viruses, plasmids, and more
title_sort comprehensive discovery of crispr-targeted terminally redundant sequences in the human gut metagenome: viruses, plasmids, and more
publisher Public Library of Science (PLoS)
publishDate 2021
url https://doaj.org/article/01681be08d17477eaed58d1209f95089
work_keys_str_mv AT ryotasugimoto comprehensivediscoveryofcrisprtargetedterminallyredundantsequencesinthehumangutmetagenomevirusesplasmidsandmore
AT lucanishimura comprehensivediscoveryofcrisprtargetedterminallyredundantsequencesinthehumangutmetagenomevirusesplasmidsandmore
AT phuongthanhnguyen comprehensivediscoveryofcrisprtargetedterminallyredundantsequencesinthehumangutmetagenomevirusesplasmidsandmore
AT jumpeiito comprehensivediscoveryofcrisprtargetedterminallyredundantsequencesinthehumangutmetagenomevirusesplasmidsandmore
AT nicholasfparrish comprehensivediscoveryofcrisprtargetedterminallyredundantsequencesinthehumangutmetagenomevirusesplasmidsandmore
AT hiroshimori comprehensivediscoveryofcrisprtargetedterminallyredundantsequencesinthehumangutmetagenomevirusesplasmidsandmore
AT kenkurokawa comprehensivediscoveryofcrisprtargetedterminallyredundantsequencesinthehumangutmetagenomevirusesplasmidsandmore
AT hirofuminakaoka comprehensivediscoveryofcrisprtargetedterminallyredundantsequencesinthehumangutmetagenomevirusesplasmidsandmore
AT ituroinoue comprehensivediscoveryofcrisprtargetedterminallyredundantsequencesinthehumangutmetagenomevirusesplasmidsandmore
_version_ 1718445190577913856