PaIRKAT: A pathway integrated regression-based kernel association test with applications to metabolomics and COPD phenotypes
High-throughput data such as metabolomics, genomics, transcriptomics, and proteomics have become familiar data types within the “-omics” family. For this work, we focus on subsets that interact with one another and represent these “pathways” as graphs. Observed pathways often have disjoint component...
Guardado en:
Autores principales: | , , , , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Public Library of Science (PLoS)
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/9dd656448a8744ffa1cf7eae39132c65 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:9dd656448a8744ffa1cf7eae39132c65 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:9dd656448a8744ffa1cf7eae39132c652021-11-11T05:49:24ZPaIRKAT: A pathway integrated regression-based kernel association test with applications to metabolomics and COPD phenotypes1553-734X1553-7358https://doaj.org/article/9dd656448a8744ffa1cf7eae39132c652021-10-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8565741/?tool=EBIhttps://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358High-throughput data such as metabolomics, genomics, transcriptomics, and proteomics have become familiar data types within the “-omics” family. For this work, we focus on subsets that interact with one another and represent these “pathways” as graphs. Observed pathways often have disjoint components, i.e., nodes or sets of nodes (metabolites, etc.) not connected to any other within the pathway, which notably lessens testing power. In this paper we propose the Pathway Integrated Regression-based Kernel Association Test (PaIRKAT), a new kernel machine regression method for incorporating known pathway information into the semi-parametric kernel regression framework. This work extends previous kernel machine approaches. This paper also contributes an application of a graph kernel regularization method for overcoming disconnected pathways. By incorporating a regularized or “smoothed” graph into a score test, PaIRKAT can provide more powerful tests for associations between biological pathways and phenotypes of interest and will be helpful in identifying novel pathways for targeted clinical research. We evaluate this method through several simulation studies and an application to real metabolomics data from the COPDGene study. Our simulation studies illustrate the robustness of this method to incorrect and incomplete pathway knowledge, and the real data analysis shows meaningful improvements of testing power in pathways. PaIRKAT was developed for application to metabolomic pathway data, but the techniques are easily generalizable to other data sources with a graph-like structure. Author summary PaIRKAT is a tool for improving testing power on high dimensional data by including graph topography in the kernel machine regression setting. Studies on high-dimensional data can struggle to include the complex relationships between variables. The semi-parametric kernel machine regression model is a powerful tool for capturing these types of relationships. They provide a framework for testing for relationships between outcomes of interest and high dimensional data such as metabolomic, genomic, or proteomic pathways. Our paper proposes a kernel machine method for including known biological connections between high dimensional variables by representing them as edges of ‘graphs’ or ‘networks.’ It is common for nodes (e.g., metabolites) to be disconnected from all others within the graph, which leads to meaningful decreases in testing power when graph information is ignored. We include a graph regularization or ‘smoothing’ approach for managing this issue. We demonstrate the benefits of this approach through simulation studies and an application to the metabolomic data from the COPDGene study.Charlie M. CarpenterWeiming ZhangLucas GillenwaterCameron SevernTusharkanti GhoshRussell BowlerKaterina KechrisDebashis GhoshPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 17, Iss 10 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Biology (General) QH301-705.5 |
spellingShingle |
Biology (General) QH301-705.5 Charlie M. Carpenter Weiming Zhang Lucas Gillenwater Cameron Severn Tusharkanti Ghosh Russell Bowler Katerina Kechris Debashis Ghosh PaIRKAT: A pathway integrated regression-based kernel association test with applications to metabolomics and COPD phenotypes |
description |
High-throughput data such as metabolomics, genomics, transcriptomics, and proteomics have become familiar data types within the “-omics” family. For this work, we focus on subsets that interact with one another and represent these “pathways” as graphs. Observed pathways often have disjoint components, i.e., nodes or sets of nodes (metabolites, etc.) not connected to any other within the pathway, which notably lessens testing power. In this paper we propose the Pathway Integrated Regression-based Kernel Association Test (PaIRKAT), a new kernel machine regression method for incorporating known pathway information into the semi-parametric kernel regression framework. This work extends previous kernel machine approaches. This paper also contributes an application of a graph kernel regularization method for overcoming disconnected pathways. By incorporating a regularized or “smoothed” graph into a score test, PaIRKAT can provide more powerful tests for associations between biological pathways and phenotypes of interest and will be helpful in identifying novel pathways for targeted clinical research. We evaluate this method through several simulation studies and an application to real metabolomics data from the COPDGene study. Our simulation studies illustrate the robustness of this method to incorrect and incomplete pathway knowledge, and the real data analysis shows meaningful improvements of testing power in pathways. PaIRKAT was developed for application to metabolomic pathway data, but the techniques are easily generalizable to other data sources with a graph-like structure. Author summary PaIRKAT is a tool for improving testing power on high dimensional data by including graph topography in the kernel machine regression setting. Studies on high-dimensional data can struggle to include the complex relationships between variables. The semi-parametric kernel machine regression model is a powerful tool for capturing these types of relationships. They provide a framework for testing for relationships between outcomes of interest and high dimensional data such as metabolomic, genomic, or proteomic pathways. Our paper proposes a kernel machine method for including known biological connections between high dimensional variables by representing them as edges of ‘graphs’ or ‘networks.’ It is common for nodes (e.g., metabolites) to be disconnected from all others within the graph, which leads to meaningful decreases in testing power when graph information is ignored. We include a graph regularization or ‘smoothing’ approach for managing this issue. We demonstrate the benefits of this approach through simulation studies and an application to the metabolomic data from the COPDGene study. |
format |
article |
author |
Charlie M. Carpenter Weiming Zhang Lucas Gillenwater Cameron Severn Tusharkanti Ghosh Russell Bowler Katerina Kechris Debashis Ghosh |
author_facet |
Charlie M. Carpenter Weiming Zhang Lucas Gillenwater Cameron Severn Tusharkanti Ghosh Russell Bowler Katerina Kechris Debashis Ghosh |
author_sort |
Charlie M. Carpenter |
title |
PaIRKAT: A pathway integrated regression-based kernel association test with applications to metabolomics and COPD phenotypes |
title_short |
PaIRKAT: A pathway integrated regression-based kernel association test with applications to metabolomics and COPD phenotypes |
title_full |
PaIRKAT: A pathway integrated regression-based kernel association test with applications to metabolomics and COPD phenotypes |
title_fullStr |
PaIRKAT: A pathway integrated regression-based kernel association test with applications to metabolomics and COPD phenotypes |
title_full_unstemmed |
PaIRKAT: A pathway integrated regression-based kernel association test with applications to metabolomics and COPD phenotypes |
title_sort |
pairkat: a pathway integrated regression-based kernel association test with applications to metabolomics and copd phenotypes |
publisher |
Public Library of Science (PLoS) |
publishDate |
2021 |
url |
https://doaj.org/article/9dd656448a8744ffa1cf7eae39132c65 |
work_keys_str_mv |
AT charliemcarpenter pairkatapathwayintegratedregressionbasedkernelassociationtestwithapplicationstometabolomicsandcopdphenotypes AT weimingzhang pairkatapathwayintegratedregressionbasedkernelassociationtestwithapplicationstometabolomicsandcopdphenotypes AT lucasgillenwater pairkatapathwayintegratedregressionbasedkernelassociationtestwithapplicationstometabolomicsandcopdphenotypes AT cameronsevern pairkatapathwayintegratedregressionbasedkernelassociationtestwithapplicationstometabolomicsandcopdphenotypes AT tusharkantighosh pairkatapathwayintegratedregressionbasedkernelassociationtestwithapplicationstometabolomicsandcopdphenotypes AT russellbowler pairkatapathwayintegratedregressionbasedkernelassociationtestwithapplicationstometabolomicsandcopdphenotypes AT katerinakechris pairkatapathwayintegratedregressionbasedkernelassociationtestwithapplicationstometabolomicsandcopdphenotypes AT debashisghosh pairkatapathwayintegratedregressionbasedkernelassociationtestwithapplicationstometabolomicsandcopdphenotypes |
_version_ |
1718439518403559424 |