PARROT is a flexible recurrent neural network framework for analysis of large protein datasets

The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine lea...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Daniel Griffith, Alex S Holehouse
Formato: article
Lenguaje:EN
Publicado: eLife Sciences Publications Ltd 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/eea3c378655a42d68e392633fc89bbb7
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:eea3c378655a42d68e392633fc89bbb7
record_format dspace
spelling oai:doaj.org-article:eea3c378655a42d68e392633fc89bbb72021-12-01T10:58:22ZPARROT is a flexible recurrent neural network framework for analysis of large protein datasets10.7554/eLife.705762050-084Xe70576https://doaj.org/article/eea3c378655a42d68e392633fc89bbb72021-09-01T00:00:00Zhttps://elifesciences.org/articles/70576https://doaj.org/toc/2050-084XThe rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex nonlinear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems.Daniel GriffithAlex S HolehouseeLife Sciences Publications Ltdarticlemachine learninghigh-throughput methodsproteomicsbioinformaticsfunctional annotationMedicineRScienceQBiology (General)QH301-705.5ENeLife, Vol 10 (2021)
institution DOAJ
collection DOAJ
language EN
topic machine learning
high-throughput methods
proteomics
bioinformatics
functional annotation
Medicine
R
Science
Q
Biology (General)
QH301-705.5
spellingShingle machine learning
high-throughput methods
proteomics
bioinformatics
functional annotation
Medicine
R
Science
Q
Biology (General)
QH301-705.5
Daniel Griffith
Alex S Holehouse
PARROT is a flexible recurrent neural network framework for analysis of large protein datasets
description The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex nonlinear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems.
format article
author Daniel Griffith
Alex S Holehouse
author_facet Daniel Griffith
Alex S Holehouse
author_sort Daniel Griffith
title PARROT is a flexible recurrent neural network framework for analysis of large protein datasets
title_short PARROT is a flexible recurrent neural network framework for analysis of large protein datasets
title_full PARROT is a flexible recurrent neural network framework for analysis of large protein datasets
title_fullStr PARROT is a flexible recurrent neural network framework for analysis of large protein datasets
title_full_unstemmed PARROT is a flexible recurrent neural network framework for analysis of large protein datasets
title_sort parrot is a flexible recurrent neural network framework for analysis of large protein datasets
publisher eLife Sciences Publications Ltd
publishDate 2021
url https://doaj.org/article/eea3c378655a42d68e392633fc89bbb7
work_keys_str_mv AT danielgriffith parrotisaflexiblerecurrentneuralnetworkframeworkforanalysisoflargeproteindatasets
AT alexsholehouse parrotisaflexiblerecurrentneuralnetworkframeworkforanalysisoflargeproteindatasets
_version_ 1718405268511916032