Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence

The Developmental Origins of Health and Disease (DOHaD) framework aims to understand how early life exposures shape lifecycle health. To date, no comprehensive list of these exposures and their interactions has been developed, which limits our ability to predict trajectories of risk and resiliency i...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Shrankhala Tewari, Pablo Toledo Margalef, Ayesha Kareem, Ayah Abdul-Hussein, Marina White, Ashley Wazana, Sandra T. Davidge, Claudio Delrieux, Kristin L. Connor
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
R
Acceso en línea:https://doaj.org/article/a838101fc2b0461d8266cd34590531e7
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:a838101fc2b0461d8266cd34590531e7
record_format dspace
spelling oai:doaj.org-article:a838101fc2b0461d8266cd34590531e72021-11-25T18:06:50ZMining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence10.3390/jpm111110642075-4426https://doaj.org/article/a838101fc2b0461d8266cd34590531e72021-10-01T00:00:00Zhttps://www.mdpi.com/2075-4426/11/11/1064https://doaj.org/toc/2075-4426The Developmental Origins of Health and Disease (DOHaD) framework aims to understand how early life exposures shape lifecycle health. To date, no comprehensive list of these exposures and their interactions has been developed, which limits our ability to predict trajectories of risk and resiliency in humans. To address this gap, we developed a model that uses text-mining, machine learning, and natural language processing approaches to automate search, data extraction, and content analysis from DOHaD-related research articles available in PubMed. Our first model captured 2469 articles, which were subsequently categorised into topics based on word frequencies within the titles and abstracts. A manual screening validated 848 of these as relevant, which were used to develop a revised model that finally captured 2098 articles that largely fell under the most prominently researched domains related to our specific DOHaD focus. The articles were clustered according to latent topic extraction, and 23 experts in the field independently labelled the perceived topics. Consensus analysis on this labelling yielded mostly from fair to substantial agreement, which demonstrates that automated models can be developed to successfully retrieve and classify research literature, as a first step to gather evidence related to DOHaD risk and resilience factors that influence later life human health.Shrankhala TewariPablo Toledo MargalefAyesha KareemAyah Abdul-HusseinMarina WhiteAshley WazanaSandra T. DavidgeClaudio DelrieuxKristin L. ConnorMDPI AGarticleDevelopmental Origins of Health and Diseasedevelopmental programmingmachine learningnatural language processingtext miningMedicineRENJournal of Personalized Medicine, Vol 11, Iss 1064, p 1064 (2021)
institution DOAJ
collection DOAJ
language EN
topic Developmental Origins of Health and Disease
developmental programming
machine learning
natural language processing
text mining
Medicine
R
spellingShingle Developmental Origins of Health and Disease
developmental programming
machine learning
natural language processing
text mining
Medicine
R
Shrankhala Tewari
Pablo Toledo Margalef
Ayesha Kareem
Ayah Abdul-Hussein
Marina White
Ashley Wazana
Sandra T. Davidge
Claudio Delrieux
Kristin L. Connor
Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence
description The Developmental Origins of Health and Disease (DOHaD) framework aims to understand how early life exposures shape lifecycle health. To date, no comprehensive list of these exposures and their interactions has been developed, which limits our ability to predict trajectories of risk and resiliency in humans. To address this gap, we developed a model that uses text-mining, machine learning, and natural language processing approaches to automate search, data extraction, and content analysis from DOHaD-related research articles available in PubMed. Our first model captured 2469 articles, which were subsequently categorised into topics based on word frequencies within the titles and abstracts. A manual screening validated 848 of these as relevant, which were used to develop a revised model that finally captured 2098 articles that largely fell under the most prominently researched domains related to our specific DOHaD focus. The articles were clustered according to latent topic extraction, and 23 experts in the field independently labelled the perceived topics. Consensus analysis on this labelling yielded mostly from fair to substantial agreement, which demonstrates that automated models can be developed to successfully retrieve and classify research literature, as a first step to gather evidence related to DOHaD risk and resilience factors that influence later life human health.
format article
author Shrankhala Tewari
Pablo Toledo Margalef
Ayesha Kareem
Ayah Abdul-Hussein
Marina White
Ashley Wazana
Sandra T. Davidge
Claudio Delrieux
Kristin L. Connor
author_facet Shrankhala Tewari
Pablo Toledo Margalef
Ayesha Kareem
Ayah Abdul-Hussein
Marina White
Ashley Wazana
Sandra T. Davidge
Claudio Delrieux
Kristin L. Connor
author_sort Shrankhala Tewari
title Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence
title_short Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence
title_full Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence
title_fullStr Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence
title_full_unstemmed Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence
title_sort mining early life risk and resiliency factors and their influences in human populations from pubmed: a machine learning approach to discover dohad evidence
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/a838101fc2b0461d8266cd34590531e7
work_keys_str_mv AT shrankhalatewari miningearlyliferiskandresiliencyfactorsandtheirinfluencesinhumanpopulationsfrompubmedamachinelearningapproachtodiscoverdohadevidence
AT pablotoledomargalef miningearlyliferiskandresiliencyfactorsandtheirinfluencesinhumanpopulationsfrompubmedamachinelearningapproachtodiscoverdohadevidence
AT ayeshakareem miningearlyliferiskandresiliencyfactorsandtheirinfluencesinhumanpopulationsfrompubmedamachinelearningapproachtodiscoverdohadevidence
AT ayahabdulhussein miningearlyliferiskandresiliencyfactorsandtheirinfluencesinhumanpopulationsfrompubmedamachinelearningapproachtodiscoverdohadevidence
AT marinawhite miningearlyliferiskandresiliencyfactorsandtheirinfluencesinhumanpopulationsfrompubmedamachinelearningapproachtodiscoverdohadevidence
AT ashleywazana miningearlyliferiskandresiliencyfactorsandtheirinfluencesinhumanpopulationsfrompubmedamachinelearningapproachtodiscoverdohadevidence
AT sandratdavidge miningearlyliferiskandresiliencyfactorsandtheirinfluencesinhumanpopulationsfrompubmedamachinelearningapproachtodiscoverdohadevidence
AT claudiodelrieux miningearlyliferiskandresiliencyfactorsandtheirinfluencesinhumanpopulationsfrompubmedamachinelearningapproachtodiscoverdohadevidence
AT kristinlconnor miningearlyliferiskandresiliencyfactorsandtheirinfluencesinhumanpopulationsfrompubmedamachinelearningapproachtodiscoverdohadevidence
_version_ 1718411634577244160