Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine

Abstract The incorporation of unnatural amino acids (Uaas) has provided an avenue for novel chemistries to be explored in biological systems. However, the successful application of Uaas is often hampered by site-specific impacts on protein yield and solubility. Although previous efforts to identify...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Sam Giannakoulias, Sumant R. Shringari, John J. Ferrie, E. James Petersson
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/c1284061dfd245adaed36a4cf18dc1a1
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:c1284061dfd245adaed36a4cf18dc1a1
record_format dspace
spelling oai:doaj.org-article:c1284061dfd245adaed36a4cf18dc1a12021-12-02T18:34:01ZBiomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine10.1038/s41598-021-97965-22045-2322https://doaj.org/article/c1284061dfd245adaed36a4cf18dc1a12021-09-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-97965-2https://doaj.org/toc/2045-2322Abstract The incorporation of unnatural amino acids (Uaas) has provided an avenue for novel chemistries to be explored in biological systems. However, the successful application of Uaas is often hampered by site-specific impacts on protein yield and solubility. Although previous efforts to identify features which accurately capture these site-specific effects have been unsuccessful, we have developed a set of novel Rosetta Custom Score Functions and alternative Empirical Score Functions that accurately predict the effects of acridon-2-yl-alanine (Acd) incorporation on protein yield and solubility. Acd-containing mutants were simulated in PyRosetta, and machine learning (ML) was performed using either the decomposed values of the Rosetta energy function, or changes in residue contacts and bioinformatics. Using these feature sets, which represent Rosetta score function specific and bioinformatics-derived terms, ML models were trained to predict highly abstract experimental parameters such as mutant protein yield and solubility and displayed robust performance on well-balanced holdouts. Model feature importance analyses demonstrated that terms corresponding to hydrophobic interactions, desolvation, and amino acid angle preferences played a pivotal role in predicting tolerance of mutation to Acd. Overall, this work provides evidence that the application of ML to features extracted from simulated structural models allow for the accurate prediction of diverse and abstract biological phenomena, beyond the predictivity of traditional modeling and simulation approaches.Sam GiannakouliasSumant R. ShringariJohn J. FerrieE. James PeterssonNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-12 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Sam Giannakoulias
Sumant R. Shringari
John J. Ferrie
E. James Petersson
Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine
description Abstract The incorporation of unnatural amino acids (Uaas) has provided an avenue for novel chemistries to be explored in biological systems. However, the successful application of Uaas is often hampered by site-specific impacts on protein yield and solubility. Although previous efforts to identify features which accurately capture these site-specific effects have been unsuccessful, we have developed a set of novel Rosetta Custom Score Functions and alternative Empirical Score Functions that accurately predict the effects of acridon-2-yl-alanine (Acd) incorporation on protein yield and solubility. Acd-containing mutants were simulated in PyRosetta, and machine learning (ML) was performed using either the decomposed values of the Rosetta energy function, or changes in residue contacts and bioinformatics. Using these feature sets, which represent Rosetta score function specific and bioinformatics-derived terms, ML models were trained to predict highly abstract experimental parameters such as mutant protein yield and solubility and displayed robust performance on well-balanced holdouts. Model feature importance analyses demonstrated that terms corresponding to hydrophobic interactions, desolvation, and amino acid angle preferences played a pivotal role in predicting tolerance of mutation to Acd. Overall, this work provides evidence that the application of ML to features extracted from simulated structural models allow for the accurate prediction of diverse and abstract biological phenomena, beyond the predictivity of traditional modeling and simulation approaches.
format article
author Sam Giannakoulias
Sumant R. Shringari
John J. Ferrie
E. James Petersson
author_facet Sam Giannakoulias
Sumant R. Shringari
John J. Ferrie
E. James Petersson
author_sort Sam Giannakoulias
title Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine
title_short Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine
title_full Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine
title_fullStr Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine
title_full_unstemmed Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine
title_sort biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/c1284061dfd245adaed36a4cf18dc1a1
work_keys_str_mv AT samgiannakoulias biomolecularsimulationbasedmachinelearningmodelsaccuratelypredictsitesoftolerabilitytotheunnaturalaminoacidacridonylalanine
AT sumantrshringari biomolecularsimulationbasedmachinelearningmodelsaccuratelypredictsitesoftolerabilitytotheunnaturalaminoacidacridonylalanine
AT johnjferrie biomolecularsimulationbasedmachinelearningmodelsaccuratelypredictsitesoftolerabilitytotheunnaturalaminoacidacridonylalanine
AT ejamespetersson biomolecularsimulationbasedmachinelearningmodelsaccuratelypredictsitesoftolerabilitytotheunnaturalaminoacidacridonylalanine
_version_ 1718377943672029184