MSA: reproducible mutational signature attribution with confidence based on simulations

Abstract Background Mutational signatures proved to be a useful tool for identifying patterns of mutations in genomes, often providing valuable insights about mutagenic processes or normal DNA damage. De novo extraction of signatures is commonly performed using Non-Negative Matrix Factorisation meth...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autor principal: Sergey Senkin
Formato: article
Lenguaje:EN
Publicado: BMC 2021
Materias:
MSA
Acceso en línea:https://doaj.org/article/d101f8dfde724132b54e2266d9c6cf30
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Sumario:Abstract Background Mutational signatures proved to be a useful tool for identifying patterns of mutations in genomes, often providing valuable insights about mutagenic processes or normal DNA damage. De novo extraction of signatures is commonly performed using Non-Negative Matrix Factorisation methods, however, accurate attribution of these signatures to individual samples is a distinct problem requiring uncertainty estimation, particularly in noisy scenarios or when the acting signatures have similar shapes. Whilst many packages for signature attribution exist, a few provide accuracy measures, and most are not easily reproducible nor scalable in high-performance computing environments. Results We present Mutational Signature Attribution (MSA), a reproducible pipeline designed to assign signatures of different mutation types on a single-sample basis, using Non-Negative Least Squares method with optimisation based on configurable simulations. Parametric bootstrap is proposed as a way to measure statistical uncertainties of signature attribution. Supported mutation types include single and doublet base substitutions, indels and structural variants. Results are validated using simulations with reference COSMIC signatures, as well as randomly generated signatures. Conclusions MSA is a tool for optimised mutational signature attribution based on simulations, providing confidence intervals using parametric bootstrap. It comprises a set of Python scripts unified in a single Nextflow pipeline with containerisation for cross-platform reproducibility and scalability in high-performance computing environments. The tool is publicly available from https://gitlab.com/s.senkin/MSA .