Regularized machine learning on molecular graph model explains systematic error in DFT enthalpies

Abstract A major goal of materials research is the discovery of novel and efficient heterogeneous catalysts for various chemical processes. In such studies, the candidate catalyst material is modeled using tens to thousands of chemical species and elementary reactions. Density Functional Theory (DFT...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Himaghna Bhattacharjee, Nikolaos Anesiadis, Dionisios G. Vlachos
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/aacb8c675ade43e7b47d6c552499e479
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:aacb8c675ade43e7b47d6c552499e479
record_format dspace
spelling oai:doaj.org-article:aacb8c675ade43e7b47d6c552499e4792021-12-02T16:14:09ZRegularized machine learning on molecular graph model explains systematic error in DFT enthalpies10.1038/s41598-021-93854-w2045-2322https://doaj.org/article/aacb8c675ade43e7b47d6c552499e4792021-07-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-93854-whttps://doaj.org/toc/2045-2322Abstract A major goal of materials research is the discovery of novel and efficient heterogeneous catalysts for various chemical processes. In such studies, the candidate catalyst material is modeled using tens to thousands of chemical species and elementary reactions. Density Functional Theory (DFT) is widely used to calculate the thermochemistry of these species which might be surface species or gas-phase molecules. The use of an approximate exchange correlation functional in the DFT framework introduces an important source of error in such models. This is especially true in the calculation of gas phase molecules whose thermochemistry is calculated using the same planewave basis set as the rest of the surface mechanism. Unfortunately, the nature and magnitude of these errors is unknown for most practical molecules. Here, we investigate the error in the enthalpy of formation for 1676 gaseous species using two different DFT levels of theory and the ‘ground truth values’ obtained from the NIST database. We featurize molecules using graph theory. We use a regularized algorithm to discover a sparse model of the error and identify important molecular fragments that drive this error. The model is robust to rigorous statistical tests and is used to correct DFT thermochemistry, achieving more than an order of magnitude improvement.Himaghna BhattacharjeeNikolaos AnesiadisDionisios G. VlachosNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-10 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Himaghna Bhattacharjee
Nikolaos Anesiadis
Dionisios G. Vlachos
Regularized machine learning on molecular graph model explains systematic error in DFT enthalpies
description Abstract A major goal of materials research is the discovery of novel and efficient heterogeneous catalysts for various chemical processes. In such studies, the candidate catalyst material is modeled using tens to thousands of chemical species and elementary reactions. Density Functional Theory (DFT) is widely used to calculate the thermochemistry of these species which might be surface species or gas-phase molecules. The use of an approximate exchange correlation functional in the DFT framework introduces an important source of error in such models. This is especially true in the calculation of gas phase molecules whose thermochemistry is calculated using the same planewave basis set as the rest of the surface mechanism. Unfortunately, the nature and magnitude of these errors is unknown for most practical molecules. Here, we investigate the error in the enthalpy of formation for 1676 gaseous species using two different DFT levels of theory and the ‘ground truth values’ obtained from the NIST database. We featurize molecules using graph theory. We use a regularized algorithm to discover a sparse model of the error and identify important molecular fragments that drive this error. The model is robust to rigorous statistical tests and is used to correct DFT thermochemistry, achieving more than an order of magnitude improvement.
format article
author Himaghna Bhattacharjee
Nikolaos Anesiadis
Dionisios G. Vlachos
author_facet Himaghna Bhattacharjee
Nikolaos Anesiadis
Dionisios G. Vlachos
author_sort Himaghna Bhattacharjee
title Regularized machine learning on molecular graph model explains systematic error in DFT enthalpies
title_short Regularized machine learning on molecular graph model explains systematic error in DFT enthalpies
title_full Regularized machine learning on molecular graph model explains systematic error in DFT enthalpies
title_fullStr Regularized machine learning on molecular graph model explains systematic error in DFT enthalpies
title_full_unstemmed Regularized machine learning on molecular graph model explains systematic error in DFT enthalpies
title_sort regularized machine learning on molecular graph model explains systematic error in dft enthalpies
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/aacb8c675ade43e7b47d6c552499e479
work_keys_str_mv AT himaghnabhattacharjee regularizedmachinelearningonmoleculargraphmodelexplainssystematicerrorindftenthalpies
AT nikolaosanesiadis regularizedmachinelearningonmoleculargraphmodelexplainssystematicerrorindftenthalpies
AT dionisiosgvlachos regularizedmachinelearningonmoleculargraphmodelexplainssystematicerrorindftenthalpies
_version_ 1718384369324785664