Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling.
Identifying relevant signatures for clinical patient outcome is a fundamental task in high-throughput studies. Signatures, composed of features such as mRNAs, miRNAs, SNPs or other molecular variables, are often non-overlapping, even though they have been identified from similar experiments consider...
Guardado en:
Autores principales: | , , , , , , , , , , , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Public Library of Science (PLoS)
2014
|
Materias: | |
Acceso en línea: | https://doaj.org/article/7a99a75d80854110a1d4c93ea3186528 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:7a99a75d80854110a1d4c93ea3186528 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:7a99a75d80854110a1d4c93ea31865282021-11-25T05:57:35ZRobust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling.1932-620310.1371/journal.pone.0108818https://doaj.org/article/7a99a75d80854110a1d4c93ea31865282014-01-01T00:00:00Zhttps://doi.org/10.1371/journal.pone.0108818https://doaj.org/toc/1932-6203Identifying relevant signatures for clinical patient outcome is a fundamental task in high-throughput studies. Signatures, composed of features such as mRNAs, miRNAs, SNPs or other molecular variables, are often non-overlapping, even though they have been identified from similar experiments considering samples with the same type of disease. The lack of a consensus is mostly due to the fact that sample sizes are far smaller than the numbers of candidate features to be considered, and therefore signature selection suffers from large variation. We propose a robust signature selection method that enhances the selection stability of penalized regression algorithms for predicting survival risk. Our method is based on an aggregation of multiple, possibly unstable, signatures obtained with the preconditioned lasso algorithm applied to random (internal) subsamples of a given cohort data, where the aggregated signature is shrunken by a simple thresholding strategy. The resulting method, RS-PL, is conceptually simple and easy to apply, relying on parameters automatically tuned by cross validation. Robust signature selection using RS-PL operates within an (external) subsampling framework to estimate the selection probabilities of features in multiple trials of RS-PL. These probabilities are used for identifying reliable features to be included in a signature. Our method was evaluated on microarray data sets from neuroblastoma, lung adenocarcinoma, and breast cancer patients, extracting robust and relevant signatures for predicting survival risk. Signatures obtained by our method achieved high prediction performance and robustness, consistently over the three data sets. Genes with high selection probability in our robust signatures have been reported as cancer-relevant. The ordering of predictor coefficients associated with signatures was well-preserved across multiple trials of RS-PL, demonstrating the capability of our method for identifying a transferable consensus signature. The software is available as an R package rsig at CRAN (http://cran.r-project.org).Sangkyun LeeJörg RahnenführerMichel LangKatleen De PreterPieter MestdaghJan KosterRogier VersteegRaymond L StallingsLuigi VaresioShahab AsgharzadehJohannes H SchulteKathrin FielitzMelanie SchwermerKatharina MorikAlexander SchrammPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 9, Iss 10, p e108818 (2014) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Medicine R Science Q |
spellingShingle |
Medicine R Science Q Sangkyun Lee Jörg Rahnenführer Michel Lang Katleen De Preter Pieter Mestdagh Jan Koster Rogier Versteeg Raymond L Stallings Luigi Varesio Shahab Asgharzadeh Johannes H Schulte Kathrin Fielitz Melanie Schwermer Katharina Morik Alexander Schramm Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling. |
description |
Identifying relevant signatures for clinical patient outcome is a fundamental task in high-throughput studies. Signatures, composed of features such as mRNAs, miRNAs, SNPs or other molecular variables, are often non-overlapping, even though they have been identified from similar experiments considering samples with the same type of disease. The lack of a consensus is mostly due to the fact that sample sizes are far smaller than the numbers of candidate features to be considered, and therefore signature selection suffers from large variation. We propose a robust signature selection method that enhances the selection stability of penalized regression algorithms for predicting survival risk. Our method is based on an aggregation of multiple, possibly unstable, signatures obtained with the preconditioned lasso algorithm applied to random (internal) subsamples of a given cohort data, where the aggregated signature is shrunken by a simple thresholding strategy. The resulting method, RS-PL, is conceptually simple and easy to apply, relying on parameters automatically tuned by cross validation. Robust signature selection using RS-PL operates within an (external) subsampling framework to estimate the selection probabilities of features in multiple trials of RS-PL. These probabilities are used for identifying reliable features to be included in a signature. Our method was evaluated on microarray data sets from neuroblastoma, lung adenocarcinoma, and breast cancer patients, extracting robust and relevant signatures for predicting survival risk. Signatures obtained by our method achieved high prediction performance and robustness, consistently over the three data sets. Genes with high selection probability in our robust signatures have been reported as cancer-relevant. The ordering of predictor coefficients associated with signatures was well-preserved across multiple trials of RS-PL, demonstrating the capability of our method for identifying a transferable consensus signature. The software is available as an R package rsig at CRAN (http://cran.r-project.org). |
format |
article |
author |
Sangkyun Lee Jörg Rahnenführer Michel Lang Katleen De Preter Pieter Mestdagh Jan Koster Rogier Versteeg Raymond L Stallings Luigi Varesio Shahab Asgharzadeh Johannes H Schulte Kathrin Fielitz Melanie Schwermer Katharina Morik Alexander Schramm |
author_facet |
Sangkyun Lee Jörg Rahnenführer Michel Lang Katleen De Preter Pieter Mestdagh Jan Koster Rogier Versteeg Raymond L Stallings Luigi Varesio Shahab Asgharzadeh Johannes H Schulte Kathrin Fielitz Melanie Schwermer Katharina Morik Alexander Schramm |
author_sort |
Sangkyun Lee |
title |
Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling. |
title_short |
Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling. |
title_full |
Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling. |
title_fullStr |
Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling. |
title_full_unstemmed |
Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling. |
title_sort |
robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling. |
publisher |
Public Library of Science (PLoS) |
publishDate |
2014 |
url |
https://doaj.org/article/7a99a75d80854110a1d4c93ea3186528 |
work_keys_str_mv |
AT sangkyunlee robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling AT jorgrahnenfuhrer robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling AT michellang robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling AT katleendepreter robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling AT pietermestdagh robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling AT jankoster robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling AT rogierversteeg robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling AT raymondlstallings robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling AT luigivaresio robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling AT shahabasgharzadeh robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling AT johanneshschulte robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling AT kathrinfielitz robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling AT melanieschwermer robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling AT katharinamorik robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling AT alexanderschramm robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling |
_version_ |
1718414349207339008 |