Machine learning applications for prediction of relapse in childhood acute lymphoblastic leukemia

Abstract The prediction of relapse in childhood acute lymphoblastic leukemia (ALL) is a critical factor for successful treatment and follow-up planning. Our goal was to construct an ALL relapse prediction model based on machine learning algorithms. Monte Carlo cross-validation nested by 10-fold cros...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Liyan Pan, Guangjian Liu, Fangqin Lin, Shuling Zhong, Huimin Xia, Xin Sun, Huiying Liang
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2017
Materias:
R
Q
Acceso en línea:https://doaj.org/article/dea91707cbc6485f9137aa31ade2002e
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:dea91707cbc6485f9137aa31ade2002e
record_format dspace
spelling oai:doaj.org-article:dea91707cbc6485f9137aa31ade2002e2021-12-02T16:08:10ZMachine learning applications for prediction of relapse in childhood acute lymphoblastic leukemia10.1038/s41598-017-07408-02045-2322https://doaj.org/article/dea91707cbc6485f9137aa31ade2002e2017-08-01T00:00:00Zhttps://doi.org/10.1038/s41598-017-07408-0https://doaj.org/toc/2045-2322Abstract The prediction of relapse in childhood acute lymphoblastic leukemia (ALL) is a critical factor for successful treatment and follow-up planning. Our goal was to construct an ALL relapse prediction model based on machine learning algorithms. Monte Carlo cross-validation nested by 10-fold cross-validation was used to rank clinical variables on the randomly split training sets of 336 newly diagnosed ALL children, and a forward feature selection algorithm was employed to find the shortest list of most discriminatory variables. To enable an unbiased estimation of the prediction model to new patients, besides the split test sets of 150 patients, we introduced another independent data set of 84 patients to evaluate the model. The Random Forest model with 14 features achieved a cross-validation accuracy of 0.827 ± 0.031 on one set and an accuracy of 0.798 on the other, with the area under the curve of 0.902 ± 0.027 and 0.904, respectively. The model performed well across different risk-level groups, with the best accuracy of 0.829 in the standard-risk group. To our knowledge, this is the first study to use machine learning models to predict childhood ALL relapse based on medical data from Electronic Medical Record, which will further facilitate stratification treatments.Liyan PanGuangjian LiuFangqin LinShuling ZhongHuimin XiaXin SunHuiying LiangNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 7, Iss 1, Pp 1-9 (2017)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Liyan Pan
Guangjian Liu
Fangqin Lin
Shuling Zhong
Huimin Xia
Xin Sun
Huiying Liang
Machine learning applications for prediction of relapse in childhood acute lymphoblastic leukemia
description Abstract The prediction of relapse in childhood acute lymphoblastic leukemia (ALL) is a critical factor for successful treatment and follow-up planning. Our goal was to construct an ALL relapse prediction model based on machine learning algorithms. Monte Carlo cross-validation nested by 10-fold cross-validation was used to rank clinical variables on the randomly split training sets of 336 newly diagnosed ALL children, and a forward feature selection algorithm was employed to find the shortest list of most discriminatory variables. To enable an unbiased estimation of the prediction model to new patients, besides the split test sets of 150 patients, we introduced another independent data set of 84 patients to evaluate the model. The Random Forest model with 14 features achieved a cross-validation accuracy of 0.827 ± 0.031 on one set and an accuracy of 0.798 on the other, with the area under the curve of 0.902 ± 0.027 and 0.904, respectively. The model performed well across different risk-level groups, with the best accuracy of 0.829 in the standard-risk group. To our knowledge, this is the first study to use machine learning models to predict childhood ALL relapse based on medical data from Electronic Medical Record, which will further facilitate stratification treatments.
format article
author Liyan Pan
Guangjian Liu
Fangqin Lin
Shuling Zhong
Huimin Xia
Xin Sun
Huiying Liang
author_facet Liyan Pan
Guangjian Liu
Fangqin Lin
Shuling Zhong
Huimin Xia
Xin Sun
Huiying Liang
author_sort Liyan Pan
title Machine learning applications for prediction of relapse in childhood acute lymphoblastic leukemia
title_short Machine learning applications for prediction of relapse in childhood acute lymphoblastic leukemia
title_full Machine learning applications for prediction of relapse in childhood acute lymphoblastic leukemia
title_fullStr Machine learning applications for prediction of relapse in childhood acute lymphoblastic leukemia
title_full_unstemmed Machine learning applications for prediction of relapse in childhood acute lymphoblastic leukemia
title_sort machine learning applications for prediction of relapse in childhood acute lymphoblastic leukemia
publisher Nature Portfolio
publishDate 2017
url https://doaj.org/article/dea91707cbc6485f9137aa31ade2002e
work_keys_str_mv AT liyanpan machinelearningapplicationsforpredictionofrelapseinchildhoodacutelymphoblasticleukemia
AT guangjianliu machinelearningapplicationsforpredictionofrelapseinchildhoodacutelymphoblasticleukemia
AT fangqinlin machinelearningapplicationsforpredictionofrelapseinchildhoodacutelymphoblasticleukemia
AT shulingzhong machinelearningapplicationsforpredictionofrelapseinchildhoodacutelymphoblasticleukemia
AT huiminxia machinelearningapplicationsforpredictionofrelapseinchildhoodacutelymphoblasticleukemia
AT xinsun machinelearningapplicationsforpredictionofrelapseinchildhoodacutelymphoblasticleukemia
AT huiyingliang machinelearningapplicationsforpredictionofrelapseinchildhoodacutelymphoblasticleukemia
_version_ 1718384623966224384