Bioinformatics methods for identification of amyloidogenic peptides show robustness to misannotated training data

Abstract Several disorders are related to amyloid aggregation of proteins, for example Alzheimer’s or Parkinson’s diseases. Amyloid proteins form fibrils of aggregated beta structures. This is preceded by formation of oligomers—the most cytotoxic species. Determining amyloidogenicity is tedious and...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Natalia Szulc, Michał Burdukiewicz, Marlena Gąsior-Głogowska, Jakub W. Wojciechowski, Jarosław Chilimoniuk, Paweł Mackiewicz, Tomas Šneideris, Vytautas Smirnovas, Malgorzata Kotulska
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/1b581be9c50f488f8323d496d600981b
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:1b581be9c50f488f8323d496d600981b
record_format dspace
spelling oai:doaj.org-article:1b581be9c50f488f8323d496d600981b2021-12-02T13:41:23ZBioinformatics methods for identification of amyloidogenic peptides show robustness to misannotated training data10.1038/s41598-021-86530-62045-2322https://doaj.org/article/1b581be9c50f488f8323d496d600981b2021-04-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-86530-6https://doaj.org/toc/2045-2322Abstract Several disorders are related to amyloid aggregation of proteins, for example Alzheimer’s or Parkinson’s diseases. Amyloid proteins form fibrils of aggregated beta structures. This is preceded by formation of oligomers—the most cytotoxic species. Determining amyloidogenicity is tedious and costly. The most reliable identification of amyloids is obtained with high resolution microscopies, such as electron microscopy or atomic force microscopy (AFM). More frequently, less expensive and faster methods are used, especially infrared (IR) spectroscopy or Thioflavin T staining. Different experimental methods are not always concurrent, especially when amyloid peptides do not readily form fibrils but oligomers. This may lead to peptide misclassification and mislabeling. Several bioinformatics methods have been proposed for in-silico identification of amyloids, many of them based on machine learning. The effectiveness of these methods heavily depends on accurate annotation of the reference training data obtained from in-vitro experiments. We study how robust are bioinformatics methods to weak supervision, encountering imperfect training data. AmyloGram and three other amyloid predictors were applied. The results proved that a certain degree of misannotation in the reference data can be eliminated by the bioinformatics tools, even if they belonged to their training set. The computational results are supported by new experiments with IR and AFM methods.Natalia SzulcMichał BurdukiewiczMarlena Gąsior-GłogowskaJakub W. WojciechowskiJarosław ChilimoniukPaweł MackiewiczTomas ŠneiderisVytautas SmirnovasMalgorzata KotulskaNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-11 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Natalia Szulc
Michał Burdukiewicz
Marlena Gąsior-Głogowska
Jakub W. Wojciechowski
Jarosław Chilimoniuk
Paweł Mackiewicz
Tomas Šneideris
Vytautas Smirnovas
Malgorzata Kotulska
Bioinformatics methods for identification of amyloidogenic peptides show robustness to misannotated training data
description Abstract Several disorders are related to amyloid aggregation of proteins, for example Alzheimer’s or Parkinson’s diseases. Amyloid proteins form fibrils of aggregated beta structures. This is preceded by formation of oligomers—the most cytotoxic species. Determining amyloidogenicity is tedious and costly. The most reliable identification of amyloids is obtained with high resolution microscopies, such as electron microscopy or atomic force microscopy (AFM). More frequently, less expensive and faster methods are used, especially infrared (IR) spectroscopy or Thioflavin T staining. Different experimental methods are not always concurrent, especially when amyloid peptides do not readily form fibrils but oligomers. This may lead to peptide misclassification and mislabeling. Several bioinformatics methods have been proposed for in-silico identification of amyloids, many of them based on machine learning. The effectiveness of these methods heavily depends on accurate annotation of the reference training data obtained from in-vitro experiments. We study how robust are bioinformatics methods to weak supervision, encountering imperfect training data. AmyloGram and three other amyloid predictors were applied. The results proved that a certain degree of misannotation in the reference data can be eliminated by the bioinformatics tools, even if they belonged to their training set. The computational results are supported by new experiments with IR and AFM methods.
format article
author Natalia Szulc
Michał Burdukiewicz
Marlena Gąsior-Głogowska
Jakub W. Wojciechowski
Jarosław Chilimoniuk
Paweł Mackiewicz
Tomas Šneideris
Vytautas Smirnovas
Malgorzata Kotulska
author_facet Natalia Szulc
Michał Burdukiewicz
Marlena Gąsior-Głogowska
Jakub W. Wojciechowski
Jarosław Chilimoniuk
Paweł Mackiewicz
Tomas Šneideris
Vytautas Smirnovas
Malgorzata Kotulska
author_sort Natalia Szulc
title Bioinformatics methods for identification of amyloidogenic peptides show robustness to misannotated training data
title_short Bioinformatics methods for identification of amyloidogenic peptides show robustness to misannotated training data
title_full Bioinformatics methods for identification of amyloidogenic peptides show robustness to misannotated training data
title_fullStr Bioinformatics methods for identification of amyloidogenic peptides show robustness to misannotated training data
title_full_unstemmed Bioinformatics methods for identification of amyloidogenic peptides show robustness to misannotated training data
title_sort bioinformatics methods for identification of amyloidogenic peptides show robustness to misannotated training data
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/1b581be9c50f488f8323d496d600981b
work_keys_str_mv AT nataliaszulc bioinformaticsmethodsforidentificationofamyloidogenicpeptidesshowrobustnesstomisannotatedtrainingdata
AT michałburdukiewicz bioinformaticsmethodsforidentificationofamyloidogenicpeptidesshowrobustnesstomisannotatedtrainingdata
AT marlenagasiorgłogowska bioinformaticsmethodsforidentificationofamyloidogenicpeptidesshowrobustnesstomisannotatedtrainingdata
AT jakubwwojciechowski bioinformaticsmethodsforidentificationofamyloidogenicpeptidesshowrobustnesstomisannotatedtrainingdata
AT jarosławchilimoniuk bioinformaticsmethodsforidentificationofamyloidogenicpeptidesshowrobustnesstomisannotatedtrainingdata
AT pawełmackiewicz bioinformaticsmethodsforidentificationofamyloidogenicpeptidesshowrobustnesstomisannotatedtrainingdata
AT tomassneideris bioinformaticsmethodsforidentificationofamyloidogenicpeptidesshowrobustnesstomisannotatedtrainingdata
AT vytautassmirnovas bioinformaticsmethodsforidentificationofamyloidogenicpeptidesshowrobustnesstomisannotatedtrainingdata
AT malgorzatakotulska bioinformaticsmethodsforidentificationofamyloidogenicpeptidesshowrobustnesstomisannotatedtrainingdata
_version_ 1718392541823369216