Imbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants

Abstract Disease and trait-associated variants represent a tiny minority of all known genetic variation, and therefore there is necessarily an imbalance between the small set of available disease-associated and the much larger set of non-deleterious genomic variation, especially in non-coding regula...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Max Schubach, Matteo Re, Peter N. Robinson, Giorgio Valentini
Formato:	article
Lenguaje:	EN
Publicado:	Nature Portfolio 2017
Materias:	Medicine R Science Q
Acceso en línea:	https://doaj.org/article/cb143f66e0ab410fb2077b350ce7d69e
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:cb143f66e0ab410fb2077b350ce7d69e
record_format	dspace
spelling	oai:doaj.org-article:cb143f66e0ab410fb2077b350ce7d69e2021-12-02T11:53:08ZImbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants10.1038/s41598-017-03011-52045-2322https://doaj.org/article/cb143f66e0ab410fb2077b350ce7d69e2017-06-01T00:00:00Zhttps://doi.org/10.1038/s41598-017-03011-5https://doaj.org/toc/2045-2322Abstract Disease and trait-associated variants represent a tiny minority of all known genetic variation, and therefore there is necessarily an imbalance between the small set of available disease-associated and the much larger set of non-deleterious genomic variation, especially in non-coding regulatory regions of human genome. Machine Learning (ML) methods for predicting disease-associated non-coding variants are faced with a chicken and egg problem - such variants cannot be easily found without ML, but ML cannot begin to be effective until a sufficient number of instances have been found. Most of state-of-the-art ML-based methods do not adopt specific imbalance-aware learning techniques to deal with imbalanced data that naturally arise in several genome-wide variant scoring problems, thus resulting in a significant reduction of sensitivity and precision. We present a novel method that adopts imbalance-aware learning strategies based on resampling techniques and a hyper-ensemble approach that outperforms state-of-the-art methods in two different contexts: the prediction of non-coding variants associated with Mendelian and with complex diseases. We show that imbalance-aware ML is a key issue for the design of robust and accurate prediction algorithms and we provide a method and an easy-to-use software tool that can be effectively applied to this challenging prediction task.Max SchubachMatteo RePeter N. RobinsonGiorgio ValentiniNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 7, Iss 1, Pp 1-12 (2017)
institution	DOAJ
collection	DOAJ
language	EN
topic	Medicine R Science Q
spellingShingle	Medicine R Science Q Max Schubach Matteo Re Peter N. Robinson Giorgio Valentini Imbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants
description	Abstract Disease and trait-associated variants represent a tiny minority of all known genetic variation, and therefore there is necessarily an imbalance between the small set of available disease-associated and the much larger set of non-deleterious genomic variation, especially in non-coding regulatory regions of human genome. Machine Learning (ML) methods for predicting disease-associated non-coding variants are faced with a chicken and egg problem - such variants cannot be easily found without ML, but ML cannot begin to be effective until a sufficient number of instances have been found. Most of state-of-the-art ML-based methods do not adopt specific imbalance-aware learning techniques to deal with imbalanced data that naturally arise in several genome-wide variant scoring problems, thus resulting in a significant reduction of sensitivity and precision. We present a novel method that adopts imbalance-aware learning strategies based on resampling techniques and a hyper-ensemble approach that outperforms state-of-the-art methods in two different contexts: the prediction of non-coding variants associated with Mendelian and with complex diseases. We show that imbalance-aware ML is a key issue for the design of robust and accurate prediction algorithms and we provide a method and an easy-to-use software tool that can be effectively applied to this challenging prediction task.
format	article
author	Max Schubach Matteo Re Peter N. Robinson Giorgio Valentini
author_facet	Max Schubach Matteo Re Peter N. Robinson Giorgio Valentini
author_sort	Max Schubach
title	Imbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants
title_short	Imbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants
title_full	Imbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants
title_fullStr	Imbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants
title_full_unstemmed	Imbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants
title_sort	imbalance-aware machine learning for predicting rare and common disease-associated non-coding variants
publisher	Nature Portfolio
publishDate	2017
url	https://doaj.org/article/cb143f66e0ab410fb2077b350ce7d69e
work_keys_str_mv	AT maxschubach imbalanceawaremachinelearningforpredictingrareandcommondiseaseassociatednoncodingvariants AT matteore imbalanceawaremachinelearningforpredictingrareandcommondiseaseassociatednoncodingvariants AT peternrobinson imbalanceawaremachinelearningforpredictingrareandcommondiseaseassociatednoncodingvariants AT giorgiovalentini imbalanceawaremachinelearningforpredictingrareandcommondiseaseassociatednoncodingvariants
_version_	1718394894045675520

Imbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants

Ejemplares similares