Critical assessment of coiled-coil predictions based on protein structure data

Abstract Coiled-coil regions were among the first protein motifs described structurally and theoretically. The simplicity of the motif promises that coiled-coil regions can be detected with reasonable accuracy and precision in any protein sequence. Here, we re-evaluated the most commonly used coiled...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Dominic Simm, Klas Hatje, Stephan Waack, Martin Kollmar
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/457f24aabd3c4d23a4754297cf759afa
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:457f24aabd3c4d23a4754297cf759afa
record_format dspace
spelling oai:doaj.org-article:457f24aabd3c4d23a4754297cf759afa2021-12-02T16:04:36ZCritical assessment of coiled-coil predictions based on protein structure data10.1038/s41598-021-91886-w2045-2322https://doaj.org/article/457f24aabd3c4d23a4754297cf759afa2021-06-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-91886-whttps://doaj.org/toc/2045-2322Abstract Coiled-coil regions were among the first protein motifs described structurally and theoretically. The simplicity of the motif promises that coiled-coil regions can be detected with reasonable accuracy and precision in any protein sequence. Here, we re-evaluated the most commonly used coiled-coil prediction tools with respect to the most comprehensive reference data set available, the entire Protein Data Bank, down to each amino acid and its secondary structure. Apart from the 30-fold difference in minimum and maximum number of coiled coils predicted the tools strongly vary in where they predict coiled-coil regions. Accordingly, there is a high number of false predictions and missed, true coiled-coil regions. The evaluation of the binary classification metrics in comparison with naïve coin-flip models and the calculation of the Matthews correlation coefficient, the most reliable performance metric for imbalanced data sets, suggests that the tested tools’ performance is close to random. This implicates that the tools’ predictions have only limited informative value. Coiled-coil predictions are often used to interpret biochemical data and are part of in-silico functional genome annotation. Our results indicate that these predictions should be treated very cautiously and need to be supported and validated by experimental evidence.Dominic SimmKlas HatjeStephan WaackMartin KollmarNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-18 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Dominic Simm
Klas Hatje
Stephan Waack
Martin Kollmar
Critical assessment of coiled-coil predictions based on protein structure data
description Abstract Coiled-coil regions were among the first protein motifs described structurally and theoretically. The simplicity of the motif promises that coiled-coil regions can be detected with reasonable accuracy and precision in any protein sequence. Here, we re-evaluated the most commonly used coiled-coil prediction tools with respect to the most comprehensive reference data set available, the entire Protein Data Bank, down to each amino acid and its secondary structure. Apart from the 30-fold difference in minimum and maximum number of coiled coils predicted the tools strongly vary in where they predict coiled-coil regions. Accordingly, there is a high number of false predictions and missed, true coiled-coil regions. The evaluation of the binary classification metrics in comparison with naïve coin-flip models and the calculation of the Matthews correlation coefficient, the most reliable performance metric for imbalanced data sets, suggests that the tested tools’ performance is close to random. This implicates that the tools’ predictions have only limited informative value. Coiled-coil predictions are often used to interpret biochemical data and are part of in-silico functional genome annotation. Our results indicate that these predictions should be treated very cautiously and need to be supported and validated by experimental evidence.
format article
author Dominic Simm
Klas Hatje
Stephan Waack
Martin Kollmar
author_facet Dominic Simm
Klas Hatje
Stephan Waack
Martin Kollmar
author_sort Dominic Simm
title Critical assessment of coiled-coil predictions based on protein structure data
title_short Critical assessment of coiled-coil predictions based on protein structure data
title_full Critical assessment of coiled-coil predictions based on protein structure data
title_fullStr Critical assessment of coiled-coil predictions based on protein structure data
title_full_unstemmed Critical assessment of coiled-coil predictions based on protein structure data
title_sort critical assessment of coiled-coil predictions based on protein structure data
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/457f24aabd3c4d23a4754297cf759afa
work_keys_str_mv AT dominicsimm criticalassessmentofcoiledcoilpredictionsbasedonproteinstructuredata
AT klashatje criticalassessmentofcoiledcoilpredictionsbasedonproteinstructuredata
AT stephanwaack criticalassessmentofcoiledcoilpredictionsbasedonproteinstructuredata
AT martinkollmar criticalassessmentofcoiledcoilpredictionsbasedonproteinstructuredata
_version_ 1718385202955288576