NIST uses AI for high-throughput data evaluation, which is critical to the chemical manufacturing enterprise.
The chemical manufacturing design process relies heavily on the recommended values that engineers and researchers usually obtain from handbooks and databases. Those sources often lack easy ways to judge the reliability of information, either in the sense of where the data come from or what the uncertainty of the underlying measurements is. This project attempts to provide recommendations for property values that are in some way “optimal,” while giving consumers the ability to understand where the numbers come from and determine how much trust they should warrant.
The thermophysical and thermochemical data that fall within the primary scope of this activity have tremendous impact on the cost of manufacturing plants. For example, a 10% error in the vapor pressure of a near-boiling mixture could require doubling the height of the associated distillation column and the cost of that component and, as such a column would be a major driver of plant construction costs, significantly increasing the necessary capital investment of the overall plant. Such large capital outlays (capital investment in a new chemical manufacturing plant will be of order $1 billion) dictate the use of very conservative engineering factors, resulting in unnecessary costs and environmental impacts.
This project leverages artificial intelligence technologies broadly, testing and developing methodologies against the world’s largest corpus of its kind. Transforming data from tables, plots, and written descriptions as presented in the scientific literature into a consistent, structured format has historically been very labor intensive. Expediting these curation efforts through computer vision and information filtering systems removes a substantial amount of repetitive work from subject matter experts (SMEs), tremendously boosting the efficiency with which they can apply their knowledge. Our goal is to make an SME’s every click matter.
As well, the data available in the experimental literature are sparse and heterogenous, and so developing transfer learning methodologies becomes essential in understanding the quality of new data. Using machine learning to predict properties based on chemical structure is a core element of comparing data for different compounds, and even physically motivated models require robust statistical analyses to characterize their reliabilities. As we seek ways to holistically evaluate these recommendations (a task that is still very much computationally intractable), we are constantly utilizing and evaluating the latest developments in quantum chemical computations, cheminformatics, and machine learning; all with the goal of boosting the competitiveness of our industrial partners and supporting research around the world.