qPMS7: a fast algorithm for finding (ℓ, d)-motifs in DNA and protein sequences.

Detection of rare events happening in a set of DNA/protein sequences could lead to new biological discoveries. One kind of such rare events is the presence of patterns called motifs in DNA/protein sequences. Finding motifs is a challenging problem since the general version of motif search has been p...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Hieu Dinh, Sanguthevar Rajasekaran, Jaime Davila
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2012
Materias:
R
Q
Acceso en línea:https://doaj.org/article/34be10bd78cf4c7a8ccf4cb71dea0936
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:34be10bd78cf4c7a8ccf4cb71dea0936
record_format dspace
spelling oai:doaj.org-article:34be10bd78cf4c7a8ccf4cb71dea09362021-11-18T07:11:21ZqPMS7: a fast algorithm for finding (ℓ, d)-motifs in DNA and protein sequences.1932-620310.1371/journal.pone.0041425https://doaj.org/article/34be10bd78cf4c7a8ccf4cb71dea09362012-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/22848493/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203Detection of rare events happening in a set of DNA/protein sequences could lead to new biological discoveries. One kind of such rare events is the presence of patterns called motifs in DNA/protein sequences. Finding motifs is a challenging problem since the general version of motif search has been proven to be intractable. Motifs discovery is an important problem in biology. For example, it is useful in the detection of transcription factor binding sites and transcriptional regulatory elements that are very crucial in understanding gene function, human disease, drug design, etc. Many versions of the motif search problem have been proposed in the literature. One such is the (ℓ, d)-motif search (or Planted Motif Search (PMS)). A generalized version of the PMS problem, namely, Quorum Planted Motif Search (qPMS), is shown to accurately model motifs in real data. However, solving the qPMS problem is an extremely difficult task because a special case of it, the PMS Problem, is already NP-hard, which means that any algorithm solving it can be expected to take exponential time in the worse case scenario. In this paper, we propose a novel algorithm named qPMS7 that tackles the qPMS problem on real data as well as challenging instances. Experimental results show that our Algorithm qPMS7 is on an average 5 times faster than the state-of-art algorithm. The executable program of Algorithm qPMS7 is freely available on the web at http://pms.engr.uconn.edu/downloads/qPMS7.zip. Our online motif discovery tools that use Algorithm qPMS7 are freely available at http://pms.engr.uconn.edu or http://motifsearch.com.Hieu DinhSanguthevar RajasekaranJaime DavilaPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 7, Iss 7, p e41425 (2012)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Hieu Dinh
Sanguthevar Rajasekaran
Jaime Davila
qPMS7: a fast algorithm for finding (ℓ, d)-motifs in DNA and protein sequences.
description Detection of rare events happening in a set of DNA/protein sequences could lead to new biological discoveries. One kind of such rare events is the presence of patterns called motifs in DNA/protein sequences. Finding motifs is a challenging problem since the general version of motif search has been proven to be intractable. Motifs discovery is an important problem in biology. For example, it is useful in the detection of transcription factor binding sites and transcriptional regulatory elements that are very crucial in understanding gene function, human disease, drug design, etc. Many versions of the motif search problem have been proposed in the literature. One such is the (ℓ, d)-motif search (or Planted Motif Search (PMS)). A generalized version of the PMS problem, namely, Quorum Planted Motif Search (qPMS), is shown to accurately model motifs in real data. However, solving the qPMS problem is an extremely difficult task because a special case of it, the PMS Problem, is already NP-hard, which means that any algorithm solving it can be expected to take exponential time in the worse case scenario. In this paper, we propose a novel algorithm named qPMS7 that tackles the qPMS problem on real data as well as challenging instances. Experimental results show that our Algorithm qPMS7 is on an average 5 times faster than the state-of-art algorithm. The executable program of Algorithm qPMS7 is freely available on the web at http://pms.engr.uconn.edu/downloads/qPMS7.zip. Our online motif discovery tools that use Algorithm qPMS7 are freely available at http://pms.engr.uconn.edu or http://motifsearch.com.
format article
author Hieu Dinh
Sanguthevar Rajasekaran
Jaime Davila
author_facet Hieu Dinh
Sanguthevar Rajasekaran
Jaime Davila
author_sort Hieu Dinh
title qPMS7: a fast algorithm for finding (ℓ, d)-motifs in DNA and protein sequences.
title_short qPMS7: a fast algorithm for finding (ℓ, d)-motifs in DNA and protein sequences.
title_full qPMS7: a fast algorithm for finding (ℓ, d)-motifs in DNA and protein sequences.
title_fullStr qPMS7: a fast algorithm for finding (ℓ, d)-motifs in DNA and protein sequences.
title_full_unstemmed qPMS7: a fast algorithm for finding (ℓ, d)-motifs in DNA and protein sequences.
title_sort qpms7: a fast algorithm for finding (ℓ, d)-motifs in dna and protein sequences.
publisher Public Library of Science (PLoS)
publishDate 2012
url https://doaj.org/article/34be10bd78cf4c7a8ccf4cb71dea0936
work_keys_str_mv AT hieudinh qpms7afastalgorithmforfindingldmotifsindnaandproteinsequences
AT sanguthevarrajasekaran qpms7afastalgorithmforfindingldmotifsindnaandproteinsequences
AT jaimedavila qpms7afastalgorithmforfindingldmotifsindnaandproteinsequences
_version_ 1718423779611246592