Data Harmonization and Data Pooling from Cohort Studies: A Practical Approach for Data Management

Data pooling from pre-existing multiple datasets can be useful to increase study sample size and statistical power to answer a research question. However, individual datasets may contain variables that measure the same construct differently, posing challenges for data pooling. Variable harmonizatio...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Kamala Adhikari, Scott B Patten, Alka B Patel, Shahirose Premji, Suzanne Tough, Nicole Letourneau, Gerald Giesbrecht, Amy Metcalfe
Formato: article
Lenguaje:EN
Publicado: Swansea University 2021
Materias:
Acceso en línea:https://doaj.org/article/f0b734d3454b4ffc9f9840f3a51493db
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:f0b734d3454b4ffc9f9840f3a51493db
record_format dspace
spelling oai:doaj.org-article:f0b734d3454b4ffc9f9840f3a51493db2021-12-03T15:47:28ZData Harmonization and Data Pooling from Cohort Studies: A Practical Approach for Data Management10.23889/ijpds.v6i1.16802399-4908https://doaj.org/article/f0b734d3454b4ffc9f9840f3a51493db2021-11-01T00:00:00Zhttps://ijpds.org/article/view/1680https://doaj.org/toc/2399-4908 Data pooling from pre-existing multiple datasets can be useful to increase study sample size and statistical power to answer a research question. However, individual datasets may contain variables that measure the same construct differently, posing challenges for data pooling. Variable harmonization, an approach that can generate comparable datasets from heterogeneous sources, can address this issue in some circumstances. As an illustrative example, this paper describes the data harmonization strategies that helped generate comparable datasets across two Canadian pregnancy cohort studies– the All Our Families and the Alberta Pregnancy Outcomes and Nutrition. Variables were harmonized considering multiple features across the datasets: the construct measured; question asked/response options; the measurement scale used; the frequency of measurement; timing of measurement, and the data structure. Completely matching, partially matching, and completely un-matching variables across the datasets were determined based on these features. Variables that were an exact match were pooled as is. Partially matching variables were synchronized across the datasets considering the frequency of measurement, the timing of measurement, and response options. Variables that were completely unmatching could not be harmonized into a single variable. The variable harmonization strategies that were used to generate comparable cohort datasets for data pooling are applicable to other data sources. Future studies may employ or evaluate these strategies. Variable harmonization and pooling provide an opportunity to increase study power and the utility of existing data, permitting researchers to answer novel research questions in a statistically efficient, timely, and cost-efficient manner that could not be achieved using a single data source. Kamala AdhikariScott B Patten Alka B Patel Shahirose Premji Suzanne Tough Nicole Letourneau Gerald Giesbrecht Amy Metcalfe Swansea UniversityarticleDemography. Population. Vital eventsHB848-3697ENInternational Journal of Population Data Science, Vol 6, Iss 1 (2021)
institution DOAJ
collection DOAJ
language EN
topic Demography. Population. Vital events
HB848-3697
spellingShingle Demography. Population. Vital events
HB848-3697
Kamala Adhikari
Scott B Patten
Alka B Patel
Shahirose Premji
Suzanne Tough
Nicole Letourneau
Gerald Giesbrecht
Amy Metcalfe
Data Harmonization and Data Pooling from Cohort Studies: A Practical Approach for Data Management
description Data pooling from pre-existing multiple datasets can be useful to increase study sample size and statistical power to answer a research question. However, individual datasets may contain variables that measure the same construct differently, posing challenges for data pooling. Variable harmonization, an approach that can generate comparable datasets from heterogeneous sources, can address this issue in some circumstances. As an illustrative example, this paper describes the data harmonization strategies that helped generate comparable datasets across two Canadian pregnancy cohort studies– the All Our Families and the Alberta Pregnancy Outcomes and Nutrition. Variables were harmonized considering multiple features across the datasets: the construct measured; question asked/response options; the measurement scale used; the frequency of measurement; timing of measurement, and the data structure. Completely matching, partially matching, and completely un-matching variables across the datasets were determined based on these features. Variables that were an exact match were pooled as is. Partially matching variables were synchronized across the datasets considering the frequency of measurement, the timing of measurement, and response options. Variables that were completely unmatching could not be harmonized into a single variable. The variable harmonization strategies that were used to generate comparable cohort datasets for data pooling are applicable to other data sources. Future studies may employ or evaluate these strategies. Variable harmonization and pooling provide an opportunity to increase study power and the utility of existing data, permitting researchers to answer novel research questions in a statistically efficient, timely, and cost-efficient manner that could not be achieved using a single data source.
format article
author Kamala Adhikari
Scott B Patten
Alka B Patel
Shahirose Premji
Suzanne Tough
Nicole Letourneau
Gerald Giesbrecht
Amy Metcalfe
author_facet Kamala Adhikari
Scott B Patten
Alka B Patel
Shahirose Premji
Suzanne Tough
Nicole Letourneau
Gerald Giesbrecht
Amy Metcalfe
author_sort Kamala Adhikari
title Data Harmonization and Data Pooling from Cohort Studies: A Practical Approach for Data Management
title_short Data Harmonization and Data Pooling from Cohort Studies: A Practical Approach for Data Management
title_full Data Harmonization and Data Pooling from Cohort Studies: A Practical Approach for Data Management
title_fullStr Data Harmonization and Data Pooling from Cohort Studies: A Practical Approach for Data Management
title_full_unstemmed Data Harmonization and Data Pooling from Cohort Studies: A Practical Approach for Data Management
title_sort data harmonization and data pooling from cohort studies: a practical approach for data management
publisher Swansea University
publishDate 2021
url https://doaj.org/article/f0b734d3454b4ffc9f9840f3a51493db
work_keys_str_mv AT kamalaadhikari dataharmonizationanddatapoolingfromcohortstudiesapracticalapproachfordatamanagement
AT scottbpatten dataharmonizationanddatapoolingfromcohortstudiesapracticalapproachfordatamanagement
AT alkabpatel dataharmonizationanddatapoolingfromcohortstudiesapracticalapproachfordatamanagement
AT shahirosepremji dataharmonizationanddatapoolingfromcohortstudiesapracticalapproachfordatamanagement
AT suzannetough dataharmonizationanddatapoolingfromcohortstudiesapracticalapproachfordatamanagement
AT nicoleletourneau dataharmonizationanddatapoolingfromcohortstudiesapracticalapproachfordatamanagement
AT geraldgiesbrecht dataharmonizationanddatapoolingfromcohortstudiesapracticalapproachfordatamanagement
AT amymetcalfe dataharmonizationanddatapoolingfromcohortstudiesapracticalapproachfordatamanagement
_version_ 1718373183526010880