MSA: reproducible mutational signature attribution with confidence based on simulations

Abstract Background Mutational signatures proved to be a useful tool for identifying patterns of mutations in genomes, often providing valuable insights about mutagenic processes or normal DNA damage. De novo extraction of signatures is commonly performed using Non-Negative Matrix Factorisation meth...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autor principal: Sergey Senkin
Formato: article
Lenguaje:EN
Publicado: BMC 2021
Materias:
MSA
Acceso en línea:https://doaj.org/article/d101f8dfde724132b54e2266d9c6cf30
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:d101f8dfde724132b54e2266d9c6cf30
record_format dspace
spelling oai:doaj.org-article:d101f8dfde724132b54e2266d9c6cf302021-11-07T12:22:16ZMSA: reproducible mutational signature attribution with confidence based on simulations10.1186/s12859-021-04450-81471-2105https://doaj.org/article/d101f8dfde724132b54e2266d9c6cf302021-11-01T00:00:00Zhttps://doi.org/10.1186/s12859-021-04450-8https://doaj.org/toc/1471-2105Abstract Background Mutational signatures proved to be a useful tool for identifying patterns of mutations in genomes, often providing valuable insights about mutagenic processes or normal DNA damage. De novo extraction of signatures is commonly performed using Non-Negative Matrix Factorisation methods, however, accurate attribution of these signatures to individual samples is a distinct problem requiring uncertainty estimation, particularly in noisy scenarios or when the acting signatures have similar shapes. Whilst many packages for signature attribution exist, a few provide accuracy measures, and most are not easily reproducible nor scalable in high-performance computing environments. Results We present Mutational Signature Attribution (MSA), a reproducible pipeline designed to assign signatures of different mutation types on a single-sample basis, using Non-Negative Least Squares method with optimisation based on configurable simulations. Parametric bootstrap is proposed as a way to measure statistical uncertainties of signature attribution. Supported mutation types include single and doublet base substitutions, indels and structural variants. Results are validated using simulations with reference COSMIC signatures, as well as randomly generated signatures. Conclusions MSA is a tool for optimised mutational signature attribution based on simulations, providing confidence intervals using parametric bootstrap. It comprises a set of Python scripts unified in a single Nextflow pipeline with containerisation for cross-platform reproducibility and scalability in high-performance computing environments. The tool is publicly available from https://gitlab.com/s.senkin/MSA .Sergey SenkinBMCarticleMSAMutational signaturesNNLSParametric bootstrapNextflowComputer applications to medicine. Medical informaticsR858-859.7Biology (General)QH301-705.5ENBMC Bioinformatics, Vol 22, Iss 1, Pp 1-11 (2021)
institution DOAJ
collection DOAJ
language EN
topic MSA
Mutational signatures
NNLS
Parametric bootstrap
Nextflow
Computer applications to medicine. Medical informatics
R858-859.7
Biology (General)
QH301-705.5
spellingShingle MSA
Mutational signatures
NNLS
Parametric bootstrap
Nextflow
Computer applications to medicine. Medical informatics
R858-859.7
Biology (General)
QH301-705.5
Sergey Senkin
MSA: reproducible mutational signature attribution with confidence based on simulations
description Abstract Background Mutational signatures proved to be a useful tool for identifying patterns of mutations in genomes, often providing valuable insights about mutagenic processes or normal DNA damage. De novo extraction of signatures is commonly performed using Non-Negative Matrix Factorisation methods, however, accurate attribution of these signatures to individual samples is a distinct problem requiring uncertainty estimation, particularly in noisy scenarios or when the acting signatures have similar shapes. Whilst many packages for signature attribution exist, a few provide accuracy measures, and most are not easily reproducible nor scalable in high-performance computing environments. Results We present Mutational Signature Attribution (MSA), a reproducible pipeline designed to assign signatures of different mutation types on a single-sample basis, using Non-Negative Least Squares method with optimisation based on configurable simulations. Parametric bootstrap is proposed as a way to measure statistical uncertainties of signature attribution. Supported mutation types include single and doublet base substitutions, indels and structural variants. Results are validated using simulations with reference COSMIC signatures, as well as randomly generated signatures. Conclusions MSA is a tool for optimised mutational signature attribution based on simulations, providing confidence intervals using parametric bootstrap. It comprises a set of Python scripts unified in a single Nextflow pipeline with containerisation for cross-platform reproducibility and scalability in high-performance computing environments. The tool is publicly available from https://gitlab.com/s.senkin/MSA .
format article
author Sergey Senkin
author_facet Sergey Senkin
author_sort Sergey Senkin
title MSA: reproducible mutational signature attribution with confidence based on simulations
title_short MSA: reproducible mutational signature attribution with confidence based on simulations
title_full MSA: reproducible mutational signature attribution with confidence based on simulations
title_fullStr MSA: reproducible mutational signature attribution with confidence based on simulations
title_full_unstemmed MSA: reproducible mutational signature attribution with confidence based on simulations
title_sort msa: reproducible mutational signature attribution with confidence based on simulations
publisher BMC
publishDate 2021
url https://doaj.org/article/d101f8dfde724132b54e2266d9c6cf30
work_keys_str_mv AT sergeysenkin msareproduciblemutationalsignatureattributionwithconfidencebasedonsimulations
_version_ 1718443516945760256