GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs.

Transcription factors (TFs) are proteins that promote or reduce the expression of genes by binding short genomic DNA sequences known as transcription factor binding sites (TFBS). While several tools have been developed to scan for potential occurrences of TFBS in linear DNA sequences or reference ge...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Manuel Tognon, Vincenzo Bonnici, Erik Garrison, Rosalba Giugno, Luca Pinello
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2021
Materias:
Acceso en línea:https://doaj.org/article/a139b8b3383047a48810a794b90b50ba
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:a139b8b3383047a48810a794b90b50ba
record_format dspace
spelling oai:doaj.org-article:a139b8b3383047a48810a794b90b50ba2021-12-02T19:58:13ZGRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs.1553-734X1553-735810.1371/journal.pcbi.1009444https://doaj.org/article/a139b8b3383047a48810a794b90b50ba2021-09-01T00:00:00Zhttps://doi.org/10.1371/journal.pcbi.1009444https://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358Transcription factors (TFs) are proteins that promote or reduce the expression of genes by binding short genomic DNA sequences known as transcription factor binding sites (TFBS). While several tools have been developed to scan for potential occurrences of TFBS in linear DNA sequences or reference genomes, no tool exists to find them in pangenome variation graphs (VGs). VGs are sequence-labelled graphs that can efficiently encode collections of genomes and their variants in a single, compact data structure. Because VGs can losslessly compress large pangenomes, TFBS scanning in VGs can efficiently capture how genomic variation affects the potential binding landscape of TFs in a population of individuals. Here we present GRAFIMO (GRAph-based Finding of Individual Motif Occurrences), a command-line tool for the scanning of known TF DNA motifs represented as Position Weight Matrices (PWMs) in VGs. GRAFIMO extends the standard PWM scanning procedure by considering variations and alternative haplotypes encoded in a VG. Using GRAFIMO on a VG based on individuals from the 1000 Genomes project we recover several potential binding sites that are enhanced, weakened or missed when scanning only the reference genome, and which could constitute individual-specific binding events. GRAFIMO is available as an open-source tool, under the MIT license, at https://github.com/pinellolab/GRAFIMO and https://github.com/InfOmics/GRAFIMO.Manuel TognonVincenzo BonniciErik GarrisonRosalba GiugnoLuca PinelloPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 17, Iss 9, p e1009444 (2021)
institution DOAJ
collection DOAJ
language EN
topic Biology (General)
QH301-705.5
spellingShingle Biology (General)
QH301-705.5
Manuel Tognon
Vincenzo Bonnici
Erik Garrison
Rosalba Giugno
Luca Pinello
GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs.
description Transcription factors (TFs) are proteins that promote or reduce the expression of genes by binding short genomic DNA sequences known as transcription factor binding sites (TFBS). While several tools have been developed to scan for potential occurrences of TFBS in linear DNA sequences or reference genomes, no tool exists to find them in pangenome variation graphs (VGs). VGs are sequence-labelled graphs that can efficiently encode collections of genomes and their variants in a single, compact data structure. Because VGs can losslessly compress large pangenomes, TFBS scanning in VGs can efficiently capture how genomic variation affects the potential binding landscape of TFs in a population of individuals. Here we present GRAFIMO (GRAph-based Finding of Individual Motif Occurrences), a command-line tool for the scanning of known TF DNA motifs represented as Position Weight Matrices (PWMs) in VGs. GRAFIMO extends the standard PWM scanning procedure by considering variations and alternative haplotypes encoded in a VG. Using GRAFIMO on a VG based on individuals from the 1000 Genomes project we recover several potential binding sites that are enhanced, weakened or missed when scanning only the reference genome, and which could constitute individual-specific binding events. GRAFIMO is available as an open-source tool, under the MIT license, at https://github.com/pinellolab/GRAFIMO and https://github.com/InfOmics/GRAFIMO.
format article
author Manuel Tognon
Vincenzo Bonnici
Erik Garrison
Rosalba Giugno
Luca Pinello
author_facet Manuel Tognon
Vincenzo Bonnici
Erik Garrison
Rosalba Giugno
Luca Pinello
author_sort Manuel Tognon
title GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs.
title_short GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs.
title_full GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs.
title_fullStr GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs.
title_full_unstemmed GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs.
title_sort grafimo: variant and haplotype aware motif scanning on pangenome graphs.
publisher Public Library of Science (PLoS)
publishDate 2021
url https://doaj.org/article/a139b8b3383047a48810a794b90b50ba
work_keys_str_mv AT manueltognon grafimovariantandhaplotypeawaremotifscanningonpangenomegraphs
AT vincenzobonnici grafimovariantandhaplotypeawaremotifscanningonpangenomegraphs
AT erikgarrison grafimovariantandhaplotypeawaremotifscanningonpangenomegraphs
AT rosalbagiugno grafimovariantandhaplotypeawaremotifscanningonpangenomegraphs
AT lucapinello grafimovariantandhaplotypeawaremotifscanningonpangenomegraphs
_version_ 1718375792742760448