SpeCollate: Deep cross-modal similarity network for mass spectrometry data based peptide deductions

Historically, the database search algorithms have been the de facto standard for inferring peptides from mass spectrometry (MS) data. Database search algorithms deduce peptides by transforming theoretical peptides into theoretical spectra and matching them to the experimental spectra. Heuristic simi...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Muhammad Usman Tariq, Fahad Saeed
Formato:	article
Lenguaje:	EN
Publicado:	Public Library of Science (PLoS) 2021
Materias:	Medicine R Science Q
Acceso en línea:	https://doaj.org/article/7fa2fb9fe3d84643839cd3426fccaa67
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:7fa2fb9fe3d84643839cd3426fccaa67
record_format	dspace
spelling	oai:doaj.org-article:7fa2fb9fe3d84643839cd3426fccaa672021-11-04T07:42:08ZSpeCollate: Deep cross-modal similarity network for mass spectrometry data based peptide deductions1932-6203https://doaj.org/article/7fa2fb9fe3d84643839cd3426fccaa672021-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8555789/?tool=EBIhttps://doaj.org/toc/1932-6203Historically, the database search algorithms have been the de facto standard for inferring peptides from mass spectrometry (MS) data. Database search algorithms deduce peptides by transforming theoretical peptides into theoretical spectra and matching them to the experimental spectra. Heuristic similarity-scoring functions are used to match an experimental spectrum to a theoretical spectrum. However, the heuristic nature of the scoring functions and the simple transformation of the peptides into theoretical spectra, along with noisy mass spectra for the less abundant peptides, can introduce a cascade of inaccuracies. In this paper, we design and implement a Deep Cross-Modal Similarity Network called SpeCollate, which overcomes these inaccuracies by learning the similarity function between experimental spectra and peptides directly from the labeled MS data. SpeCollate transforms spectra and peptides into a shared Euclidean subspace by learning fixed size embeddings for both. Our proposed deep-learning network trains on sextuplets of positive and negative examples coupled with our custom-designed SNAP-loss function. Online hardest negative mining is used to select the appropriate negative examples for optimal training performance. We use 4.8 million sextuplets obtained from the NIST and MassIVE peptide libraries to train the network and demonstrate that for closed search, SpeCollate is able to perform better than Crux and MSFragger in terms of the number of peptide-spectrum matches (PSMs) and unique peptides identified under 1% FDR for real-world data. SpeCollate also identifies a large number of peptides not reported by either Crux or MSFragger. To the best of our knowledge, our proposed SpeCollate is the first deep-learning network that can determine the cross-modal similarity between peptides and mass-spectra for MS-based proteomics. We believe SpeCollate is significant progress towards developing machine-learning solutions for MS-based omics data analysis. SpeCollate is available at https://deepspecs.github.io/.Muhammad Usman TariqFahad SaeedPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 16, Iss 10 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Medicine R Science Q
spellingShingle	Medicine R Science Q Muhammad Usman Tariq Fahad Saeed SpeCollate: Deep cross-modal similarity network for mass spectrometry data based peptide deductions
description	Historically, the database search algorithms have been the de facto standard for inferring peptides from mass spectrometry (MS) data. Database search algorithms deduce peptides by transforming theoretical peptides into theoretical spectra and matching them to the experimental spectra. Heuristic similarity-scoring functions are used to match an experimental spectrum to a theoretical spectrum. However, the heuristic nature of the scoring functions and the simple transformation of the peptides into theoretical spectra, along with noisy mass spectra for the less abundant peptides, can introduce a cascade of inaccuracies. In this paper, we design and implement a Deep Cross-Modal Similarity Network called SpeCollate, which overcomes these inaccuracies by learning the similarity function between experimental spectra and peptides directly from the labeled MS data. SpeCollate transforms spectra and peptides into a shared Euclidean subspace by learning fixed size embeddings for both. Our proposed deep-learning network trains on sextuplets of positive and negative examples coupled with our custom-designed SNAP-loss function. Online hardest negative mining is used to select the appropriate negative examples for optimal training performance. We use 4.8 million sextuplets obtained from the NIST and MassIVE peptide libraries to train the network and demonstrate that for closed search, SpeCollate is able to perform better than Crux and MSFragger in terms of the number of peptide-spectrum matches (PSMs) and unique peptides identified under 1% FDR for real-world data. SpeCollate also identifies a large number of peptides not reported by either Crux or MSFragger. To the best of our knowledge, our proposed SpeCollate is the first deep-learning network that can determine the cross-modal similarity between peptides and mass-spectra for MS-based proteomics. We believe SpeCollate is significant progress towards developing machine-learning solutions for MS-based omics data analysis. SpeCollate is available at https://deepspecs.github.io/.
format	article
author	Muhammad Usman Tariq Fahad Saeed
author_facet	Muhammad Usman Tariq Fahad Saeed
author_sort	Muhammad Usman Tariq
title	SpeCollate: Deep cross-modal similarity network for mass spectrometry data based peptide deductions
title_short	SpeCollate: Deep cross-modal similarity network for mass spectrometry data based peptide deductions
title_full	SpeCollate: Deep cross-modal similarity network for mass spectrometry data based peptide deductions
title_fullStr	SpeCollate: Deep cross-modal similarity network for mass spectrometry data based peptide deductions
title_full_unstemmed	SpeCollate: Deep cross-modal similarity network for mass spectrometry data based peptide deductions
title_sort	specollate: deep cross-modal similarity network for mass spectrometry data based peptide deductions
publisher	Public Library of Science (PLoS)
publishDate	2021
url	https://doaj.org/article/7fa2fb9fe3d84643839cd3426fccaa67
work_keys_str_mv	AT muhammadusmantariq specollatedeepcrossmodalsimilaritynetworkformassspectrometrydatabasedpeptidedeductions AT fahadsaeed specollatedeepcrossmodalsimilaritynetworkformassspectrometrydatabasedpeptidedeductions
_version_	1718445032466284544

SpeCollate: Deep cross-modal similarity network for mass spectrometry data based peptide deductions

Ejemplares similares