SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins.

<h4>Background</h4>Short linear motifs (SLiMs) in proteins are functional microdomains of fundamental importance in many biological systems. SLiMs typically consist of a 3 to 10 amino acid stretch of the primary protein sequence, of which as few as two sites may be important for activity...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Richard J Edwards, Norman E Davey, Denis C Shields
Formato:	article
Lenguaje:	EN
Publicado:	Public Library of Science (PLoS) 2007
Materias:	Medicine R Science Q
Acceso en línea:	https://doaj.org/article/614f80980c974c5d978812d3906646e4
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:614f80980c974c5d978812d3906646e4
record_format	dspace
spelling	oai:doaj.org-article:614f80980c974c5d978812d3906646e42021-11-25T06:10:48ZSLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins.1932-620310.1371/journal.pone.0000967https://doaj.org/article/614f80980c974c5d978812d3906646e42007-10-01T00:00:00Zhttps://doi.org/10.1371/journal.pone.0000967https://doaj.org/toc/1932-6203<h4>Background</h4>Short linear motifs (SLiMs) in proteins are functional microdomains of fundamental importance in many biological systems. SLiMs typically consist of a 3 to 10 amino acid stretch of the primary protein sequence, of which as few as two sites may be important for activity, making identification of novel SLiMs extremely difficult. In particular, it can be very difficult to distinguish a randomly recurring "motif" from a truly over-represented one. Incorporating ambiguous amino acid positions and/or variable-length wildcard spacers between defined residues further complicates the matter.<h4>Methodology/principal findings</h4>In this paper we present two algorithms. SLiMBuild identifies convergently evolved, short motifs in a dataset of proteins. Motifs are built by combining dimers into longer patterns, retaining only those motifs occurring in a sufficient number of unrelated proteins. Motifs with fixed amino acid positions are identified and then combined to incorporate amino acid ambiguity and variable-length wildcard spacers. The algorithm is computationally efficient compared to alternatives, particularly when datasets include homologous proteins, and provides great flexibility in the nature of motifs returned. The SLiMChance algorithm estimates the probability of returned motifs arising by chance, correcting for the size and composition of the dataset, and assigns a significance value to each motif. These algorithms are implemented in a software package, SLiMFinder. SLiMFinder default settings identify known SLiMs with 100% specificity, and have a low false discovery rate on random test data.<h4>Conclusions/significance</h4>The efficiency of SLiMBuild and low false discovery rate of SLiMChance make SLiMFinder highly suited to high throughput motif discovery and individual high quality analyses alike. Examples of such analyses on real biological data, and how SLiMFinder results can help direct future discoveries, are provided. SLiMFinder is freely available for download under a GNU license from http://bioinformatics.ucd.ie/shields/software/slimfinder/.Richard J EdwardsNorman E DaveyNorman E DaveyDenis C ShieldsPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 2, Iss 10, p e967 (2007)
institution	DOAJ
collection	DOAJ
language	EN
topic	Medicine R Science Q
spellingShingle	Medicine R Science Q Richard J Edwards Norman E Davey Norman E Davey Denis C Shields SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins.
description	<h4>Background</h4>Short linear motifs (SLiMs) in proteins are functional microdomains of fundamental importance in many biological systems. SLiMs typically consist of a 3 to 10 amino acid stretch of the primary protein sequence, of which as few as two sites may be important for activity, making identification of novel SLiMs extremely difficult. In particular, it can be very difficult to distinguish a randomly recurring "motif" from a truly over-represented one. Incorporating ambiguous amino acid positions and/or variable-length wildcard spacers between defined residues further complicates the matter.<h4>Methodology/principal findings</h4>In this paper we present two algorithms. SLiMBuild identifies convergently evolved, short motifs in a dataset of proteins. Motifs are built by combining dimers into longer patterns, retaining only those motifs occurring in a sufficient number of unrelated proteins. Motifs with fixed amino acid positions are identified and then combined to incorporate amino acid ambiguity and variable-length wildcard spacers. The algorithm is computationally efficient compared to alternatives, particularly when datasets include homologous proteins, and provides great flexibility in the nature of motifs returned. The SLiMChance algorithm estimates the probability of returned motifs arising by chance, correcting for the size and composition of the dataset, and assigns a significance value to each motif. These algorithms are implemented in a software package, SLiMFinder. SLiMFinder default settings identify known SLiMs with 100% specificity, and have a low false discovery rate on random test data.<h4>Conclusions/significance</h4>The efficiency of SLiMBuild and low false discovery rate of SLiMChance make SLiMFinder highly suited to high throughput motif discovery and individual high quality analyses alike. Examples of such analyses on real biological data, and how SLiMFinder results can help direct future discoveries, are provided. SLiMFinder is freely available for download under a GNU license from http://bioinformatics.ucd.ie/shields/software/slimfinder/.
format	article
author	Richard J Edwards Norman E Davey Norman E Davey Denis C Shields
author_facet	Richard J Edwards Norman E Davey Norman E Davey Denis C Shields
author_sort	Richard J Edwards
title	SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins.
title_short	SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins.
title_full	SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins.
title_fullStr	SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins.
title_full_unstemmed	SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins.
title_sort	slimfinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins.
publisher	Public Library of Science (PLoS)
publishDate	2007
url	https://doaj.org/article/614f80980c974c5d978812d3906646e4
work_keys_str_mv	AT richardjedwards slimfinderaprobabilisticmethodforidentifyingoverrepresentedconvergentlyevolvedshortlinearmotifsinproteins AT normanedavey slimfinderaprobabilisticmethodforidentifyingoverrepresentedconvergentlyevolvedshortlinearmotifsinproteins AT normanedavey slimfinderaprobabilisticmethodforidentifyingoverrepresentedconvergentlyevolvedshortlinearmotifsinproteins AT deniscshields slimfinderaprobabilisticmethodforidentifyingoverrepresentedconvergentlyevolvedshortlinearmotifsinproteins
_version_	1718414104592384000

SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins.

Ejemplares similares