Trivial and nontrivial error sources account for misidentification of protein partners in mutual information approaches

Abstract The problem of finding the correct set of partners for a given pair of interacting protein families based on multi-sequence alignments (MSAs) has received great attention over the years. Recently, the native contacts of two interacting proteins were shown to store the strongest mutual infor...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Camila Pontes, Miguel Andrade, José Fiorote, Werner Treptow
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/b166faf4fe6a45e08cccaab513bd19d5
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:b166faf4fe6a45e08cccaab513bd19d5
record_format dspace
spelling oai:doaj.org-article:b166faf4fe6a45e08cccaab513bd19d52021-12-02T17:04:36ZTrivial and nontrivial error sources account for misidentification of protein partners in mutual information approaches10.1038/s41598-021-86455-02045-2322https://doaj.org/article/b166faf4fe6a45e08cccaab513bd19d52021-03-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-86455-0https://doaj.org/toc/2045-2322Abstract The problem of finding the correct set of partners for a given pair of interacting protein families based on multi-sequence alignments (MSAs) has received great attention over the years. Recently, the native contacts of two interacting proteins were shown to store the strongest mutual information (MI) signal to discriminate MSA concatenations with the largest fraction of correct pairings. Although that signal might be of practical relevance in the search for an effective heuristic to solve the problem, the number of MSA concatenations with near-native MI is large, imposing severe limitations. Here, a Genetic Algorithm that explores possible MSA concatenations according to a MI maximization criteria is shown to find degenerate solutions with two error sources, arising from mismatches among (i) similar and (ii) non-similar sequences. If mistakes made among similar sequences are disregarded, type-(i) solutions are found to resolve correct pairings at best true positive (TP) rates of 70%—far above the very same estimates in type-(ii) solutions. A machine learning classification algorithm helps to show further that differences between optimized solutions based on TP rates are not artificial and may have biological meaning associated with the three-dimensional distribution of the MI signal. Type-(i) solutions may therefore correspond to reliable results for predictive purposes, found here to be more likely obtained via MI maximization across protein systems having a minimum critical number of amino acid contacts on their interaction surfaces (N > 200).Camila PontesMiguel AndradeJosé FioroteWerner TreptowNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-11 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Camila Pontes
Miguel Andrade
José Fiorote
Werner Treptow
Trivial and nontrivial error sources account for misidentification of protein partners in mutual information approaches
description Abstract The problem of finding the correct set of partners for a given pair of interacting protein families based on multi-sequence alignments (MSAs) has received great attention over the years. Recently, the native contacts of two interacting proteins were shown to store the strongest mutual information (MI) signal to discriminate MSA concatenations with the largest fraction of correct pairings. Although that signal might be of practical relevance in the search for an effective heuristic to solve the problem, the number of MSA concatenations with near-native MI is large, imposing severe limitations. Here, a Genetic Algorithm that explores possible MSA concatenations according to a MI maximization criteria is shown to find degenerate solutions with two error sources, arising from mismatches among (i) similar and (ii) non-similar sequences. If mistakes made among similar sequences are disregarded, type-(i) solutions are found to resolve correct pairings at best true positive (TP) rates of 70%—far above the very same estimates in type-(ii) solutions. A machine learning classification algorithm helps to show further that differences between optimized solutions based on TP rates are not artificial and may have biological meaning associated with the three-dimensional distribution of the MI signal. Type-(i) solutions may therefore correspond to reliable results for predictive purposes, found here to be more likely obtained via MI maximization across protein systems having a minimum critical number of amino acid contacts on their interaction surfaces (N > 200).
format article
author Camila Pontes
Miguel Andrade
José Fiorote
Werner Treptow
author_facet Camila Pontes
Miguel Andrade
José Fiorote
Werner Treptow
author_sort Camila Pontes
title Trivial and nontrivial error sources account for misidentification of protein partners in mutual information approaches
title_short Trivial and nontrivial error sources account for misidentification of protein partners in mutual information approaches
title_full Trivial and nontrivial error sources account for misidentification of protein partners in mutual information approaches
title_fullStr Trivial and nontrivial error sources account for misidentification of protein partners in mutual information approaches
title_full_unstemmed Trivial and nontrivial error sources account for misidentification of protein partners in mutual information approaches
title_sort trivial and nontrivial error sources account for misidentification of protein partners in mutual information approaches
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/b166faf4fe6a45e08cccaab513bd19d5
work_keys_str_mv AT camilapontes trivialandnontrivialerrorsourcesaccountformisidentificationofproteinpartnersinmutualinformationapproaches
AT miguelandrade trivialandnontrivialerrorsourcesaccountformisidentificationofproteinpartnersinmutualinformationapproaches
AT josefiorote trivialandnontrivialerrorsourcesaccountformisidentificationofproteinpartnersinmutualinformationapproaches
AT wernertreptow trivialandnontrivialerrorsourcesaccountformisidentificationofproteinpartnersinmutualinformationapproaches
_version_ 1718381868164841472