Dirty engineering data-driven inverse prediction machine learning model

Abstract Most data-driven machine learning (ML) approaches established in metallurgy research fields are focused on a build-up of reliable quantitative models that predict a material property from a given set of material conditions. In general, the input feature dimension (the number of material con...

Full description

Saved in:
Bibliographic Details
Main Authors: Jin-Woong Lee, Woon Bae Park, Byung Do Lee, Seonghwan Kim, Nam Hoon Goo, Kee-Sun Sohn
Format: article
Language:EN
Published: Nature Portfolio 2020
Subjects:
R
Q
Online Access:https://doaj.org/article/97f45af18b5640c5a76a5470b7e82b31
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Most data-driven machine learning (ML) approaches established in metallurgy research fields are focused on a build-up of reliable quantitative models that predict a material property from a given set of material conditions. In general, the input feature dimension (the number of material condition variables) is much higher than the output feature dimension (the number of material properties of concern). Rather than such a forward-prediction ML model, it is necessary to develop so-called inverse-design modeling, wherein required material conditions could be deduced from a set of desired material properties. Here we report a novel inverse design strategy that employs two independent approaches: a metaheuristics-assisted inverse reading of conventional forward ML models and an atypical inverse ML model based on a modified variational autoencoder. These two unprecedented approaches were successful and led to overlapped results, from which we pinpointed several novel thermo-mechanically controlled processed (TMCP) steel alloy candidates that were validated by a rule-based thermodynamic calculation tool (Thermo-Calc.). We also suggested a practical protocol to elucidate how to treat engineering data collected from industry, which is not prepared as independent and identically distributed (IID) random data.