Effect of size and heterogeneity of samples on biomarker discovery: synthetic and real data assessment.

<h4>Motivation</h4>The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for the discovery of biomarkers using microarray data often provide results with limited overlap. These diffe...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Barbara Di Camillo, Tiziana Sanavia, Matteo Martini, Giuseppe Jurman, Francesco Sambo, Annalisa Barla, Margherita Squillario, Cesare Furlanello, Gianna Toffolo, Claudio Cobelli
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2012
Materias:
R
Q
Acceso en línea:https://doaj.org/article/a29daff6dcd743e2be8cde112e3346ea
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:a29daff6dcd743e2be8cde112e3346ea
record_format dspace
spelling oai:doaj.org-article:a29daff6dcd743e2be8cde112e3346ea2021-11-18T07:26:10ZEffect of size and heterogeneity of samples on biomarker discovery: synthetic and real data assessment.1932-620310.1371/journal.pone.0032200https://doaj.org/article/a29daff6dcd743e2be8cde112e3346ea2012-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/22403633/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203<h4>Motivation</h4>The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for the discovery of biomarkers using microarray data often provide results with limited overlap. These differences are imputable to 1) dataset size (few subjects with respect to the number of features); 2) heterogeneity of the disease; 3) heterogeneity of experimental protocols and computational pipelines employed in the analysis. In this paper, we focus on the first two issues and assess, both on simulated (through an in silico regulation network model) and real clinical datasets, the consistency of candidate biomarkers provided by a number of different methods.<h4>Methods</h4>We extensively simulated the effect of heterogeneity characteristic of complex diseases on different sets of microarray data. Heterogeneity was reproduced by simulating both intrinsic variability of the population and the alteration of regulatory mechanisms. Population variability was simulated by modeling evolution of a pool of subjects; then, a subset of them underwent alterations in regulatory mechanisms so as to mimic the disease state.<h4>Results</h4>The simulated data allowed us to outline advantages and drawbacks of different methods across multiple studies and varying number of samples and to evaluate precision of feature selection on a benchmark with known biomarkers. Although comparable classification accuracy was reached by different methods, the use of external cross-validation loops is helpful in finding features with a higher degree of precision and stability. Application to real data confirmed these results.Barbara Di CamilloTiziana SanaviaMatteo MartiniGiuseppe JurmanFrancesco SamboAnnalisa BarlaMargherita SquillarioCesare FurlanelloGianna ToffoloClaudio CobelliPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 7, Iss 3, p e32200 (2012)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Barbara Di Camillo
Tiziana Sanavia
Matteo Martini
Giuseppe Jurman
Francesco Sambo
Annalisa Barla
Margherita Squillario
Cesare Furlanello
Gianna Toffolo
Claudio Cobelli
Effect of size and heterogeneity of samples on biomarker discovery: synthetic and real data assessment.
description <h4>Motivation</h4>The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for the discovery of biomarkers using microarray data often provide results with limited overlap. These differences are imputable to 1) dataset size (few subjects with respect to the number of features); 2) heterogeneity of the disease; 3) heterogeneity of experimental protocols and computational pipelines employed in the analysis. In this paper, we focus on the first two issues and assess, both on simulated (through an in silico regulation network model) and real clinical datasets, the consistency of candidate biomarkers provided by a number of different methods.<h4>Methods</h4>We extensively simulated the effect of heterogeneity characteristic of complex diseases on different sets of microarray data. Heterogeneity was reproduced by simulating both intrinsic variability of the population and the alteration of regulatory mechanisms. Population variability was simulated by modeling evolution of a pool of subjects; then, a subset of them underwent alterations in regulatory mechanisms so as to mimic the disease state.<h4>Results</h4>The simulated data allowed us to outline advantages and drawbacks of different methods across multiple studies and varying number of samples and to evaluate precision of feature selection on a benchmark with known biomarkers. Although comparable classification accuracy was reached by different methods, the use of external cross-validation loops is helpful in finding features with a higher degree of precision and stability. Application to real data confirmed these results.
format article
author Barbara Di Camillo
Tiziana Sanavia
Matteo Martini
Giuseppe Jurman
Francesco Sambo
Annalisa Barla
Margherita Squillario
Cesare Furlanello
Gianna Toffolo
Claudio Cobelli
author_facet Barbara Di Camillo
Tiziana Sanavia
Matteo Martini
Giuseppe Jurman
Francesco Sambo
Annalisa Barla
Margherita Squillario
Cesare Furlanello
Gianna Toffolo
Claudio Cobelli
author_sort Barbara Di Camillo
title Effect of size and heterogeneity of samples on biomarker discovery: synthetic and real data assessment.
title_short Effect of size and heterogeneity of samples on biomarker discovery: synthetic and real data assessment.
title_full Effect of size and heterogeneity of samples on biomarker discovery: synthetic and real data assessment.
title_fullStr Effect of size and heterogeneity of samples on biomarker discovery: synthetic and real data assessment.
title_full_unstemmed Effect of size and heterogeneity of samples on biomarker discovery: synthetic and real data assessment.
title_sort effect of size and heterogeneity of samples on biomarker discovery: synthetic and real data assessment.
publisher Public Library of Science (PLoS)
publishDate 2012
url https://doaj.org/article/a29daff6dcd743e2be8cde112e3346ea
work_keys_str_mv AT barbaradicamillo effectofsizeandheterogeneityofsamplesonbiomarkerdiscoverysyntheticandrealdataassessment
AT tizianasanavia effectofsizeandheterogeneityofsamplesonbiomarkerdiscoverysyntheticandrealdataassessment
AT matteomartini effectofsizeandheterogeneityofsamplesonbiomarkerdiscoverysyntheticandrealdataassessment
AT giuseppejurman effectofsizeandheterogeneityofsamplesonbiomarkerdiscoverysyntheticandrealdataassessment
AT francescosambo effectofsizeandheterogeneityofsamplesonbiomarkerdiscoverysyntheticandrealdataassessment
AT annalisabarla effectofsizeandheterogeneityofsamplesonbiomarkerdiscoverysyntheticandrealdataassessment
AT margheritasquillario effectofsizeandheterogeneityofsamplesonbiomarkerdiscoverysyntheticandrealdataassessment
AT cesarefurlanello effectofsizeandheterogeneityofsamplesonbiomarkerdiscoverysyntheticandrealdataassessment
AT giannatoffolo effectofsizeandheterogeneityofsamplesonbiomarkerdiscoverysyntheticandrealdataassessment
AT claudiocobelli effectofsizeandheterogeneityofsamplesonbiomarkerdiscoverysyntheticandrealdataassessment
_version_ 1718423491886186496