Protein transfer learning improves identification of heat shock protein families.

Heat shock proteins (HSPs) play a pivotal role as molecular chaperones against unfavorable conditions. Although HSPs are of great importance, their computational identification remains a significant challenge. Previous studies have two major limitations. First, they relied heavily on amino acid comp...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Seonwoo Min, HyunGi Kim, Byunghan Lee, Sungroh Yoon
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/036c675004f24100bed89d887be12ea8
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:036c675004f24100bed89d887be12ea8
record_format dspace
spelling oai:doaj.org-article:036c675004f24100bed89d887be12ea82021-11-25T06:19:08ZProtein transfer learning improves identification of heat shock protein families.1932-620310.1371/journal.pone.0251865https://doaj.org/article/036c675004f24100bed89d887be12ea82021-01-01T00:00:00Zhttps://doi.org/10.1371/journal.pone.0251865https://doaj.org/toc/1932-6203Heat shock proteins (HSPs) play a pivotal role as molecular chaperones against unfavorable conditions. Although HSPs are of great importance, their computational identification remains a significant challenge. Previous studies have two major limitations. First, they relied heavily on amino acid composition features, which inevitably limited their prediction performance. Second, their prediction performance was overestimated because of the independent two-stage evaluations and train-test data redundancy. To overcome these limitations, we introduce two novel deep learning algorithms: (1) time-efficient DeepHSP and (2) high-performance DeeperHSP. We propose a convolutional neural network (CNN)-based DeepHSP that classifies both non-HSPs and six HSP families simultaneously. It outperforms state-of-the-art algorithms, despite taking 14-15 times less time for both training and inference. We further improve the performance of DeepHSP by taking advantage of protein transfer learning. While DeepHSP is trained on raw protein sequences, DeeperHSP is trained on top of pre-trained protein representations. Therefore, DeeperHSP remarkably outperforms state-of-the-art algorithms increasing F1 scores in both cross-validation and independent test experiments by 20% and 10%, respectively. We envision that the proposed algorithms can provide a proteome-wide prediction of HSPs and help in various downstream analyses for pathology and clinical research.Seonwoo MinHyunGi KimByunghan LeeSungroh YoonPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 16, Iss 5, p e0251865 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Seonwoo Min
HyunGi Kim
Byunghan Lee
Sungroh Yoon
Protein transfer learning improves identification of heat shock protein families.
description Heat shock proteins (HSPs) play a pivotal role as molecular chaperones against unfavorable conditions. Although HSPs are of great importance, their computational identification remains a significant challenge. Previous studies have two major limitations. First, they relied heavily on amino acid composition features, which inevitably limited their prediction performance. Second, their prediction performance was overestimated because of the independent two-stage evaluations and train-test data redundancy. To overcome these limitations, we introduce two novel deep learning algorithms: (1) time-efficient DeepHSP and (2) high-performance DeeperHSP. We propose a convolutional neural network (CNN)-based DeepHSP that classifies both non-HSPs and six HSP families simultaneously. It outperforms state-of-the-art algorithms, despite taking 14-15 times less time for both training and inference. We further improve the performance of DeepHSP by taking advantage of protein transfer learning. While DeepHSP is trained on raw protein sequences, DeeperHSP is trained on top of pre-trained protein representations. Therefore, DeeperHSP remarkably outperforms state-of-the-art algorithms increasing F1 scores in both cross-validation and independent test experiments by 20% and 10%, respectively. We envision that the proposed algorithms can provide a proteome-wide prediction of HSPs and help in various downstream analyses for pathology and clinical research.
format article
author Seonwoo Min
HyunGi Kim
Byunghan Lee
Sungroh Yoon
author_facet Seonwoo Min
HyunGi Kim
Byunghan Lee
Sungroh Yoon
author_sort Seonwoo Min
title Protein transfer learning improves identification of heat shock protein families.
title_short Protein transfer learning improves identification of heat shock protein families.
title_full Protein transfer learning improves identification of heat shock protein families.
title_fullStr Protein transfer learning improves identification of heat shock protein families.
title_full_unstemmed Protein transfer learning improves identification of heat shock protein families.
title_sort protein transfer learning improves identification of heat shock protein families.
publisher Public Library of Science (PLoS)
publishDate 2021
url https://doaj.org/article/036c675004f24100bed89d887be12ea8
work_keys_str_mv AT seonwoomin proteintransferlearningimprovesidentificationofheatshockproteinfamilies
AT hyungikim proteintransferlearningimprovesidentificationofheatshockproteinfamilies
AT byunghanlee proteintransferlearningimprovesidentificationofheatshockproteinfamilies
AT sungrohyoon proteintransferlearningimprovesidentificationofheatshockproteinfamilies
_version_ 1718413904360505344