Current terminology used to describe materials data is heterogeneous, redundant, and often ambiguous. The lack of common, community-based terminology hinders the discovery and integration of material data for improved design of advanced materials. Intuitive, flexible, and evolving terminology plays a significant role in capitalizing on recommended knowledge representation models for material engineering applications. We are developing a rules-based approach with initial examples from a growing corpus of materials terms in the NIST Materials Repository (https://materialsdata.nist.gov). Our method aims to establish a common, and consistent evolving set of rules for creating or extending terminology as needed to describe materials data. The rules are intended to be simple and generalizable for users to understand and extend. The rules are also for other groups to apply to repositories they are building and to guide machines during automated processing of the terms and their execution.
Many Indo-European languages utilize a limited set of highly reused, non-synonymous, short semantically relevant words called roots that can be combined to facilitate the building of new compounded terms such as peanut butter and watch dog. This approach, which is more prominent in certain languages such as Sanskrit, Latin, and German permits the creation of terms on-demand as well as the replacement of a root in an existing term by one or more other roots to create a new, related term. As illustrated in the figure below, the proposed MGI terminology makes use of these root and term concepts to generate terms on-demand in order to establish a common and evolving vocabulary that is based on use cases and related to developing ontologies.
Selected rules with examples:
Choose frequently used short words as roots such as crystal.
Keep ‘roots’ in singular form such as property instead of properties.
Avoid including special characters (such as “’”, ‘:’,’_’,’-‘,’=’) in a root such as Xray instead of X-Ray.
Avoid the use of superfluous words, including stop words such as 'of', 'with' etc, in a term such as vaporization heat instead of heat of vaporization.
Concatenate ‘roots’ by a hyphen (-) to form a term such as Crystal-structure.
Create reasonably discriminating terms. If needed add additional ‘roots’ to a term to increase its discriminating power such as Spectroscopy-XPS instead of XPS.
Examples of rule-based terms:
General example: Watch dog
Compounding two roots, watch and dog, creates a new term with meaning related to the qualified root, dog and the qualifying root, watch.
Materials science example: Crystal-structure-FCC-Be-diffusion
Compounding five roots, Crystal-structure with FCC and Be diffusion, creates a new term with meaning related to the qualified roots and the qualifying root, crystal structure.
Building of an infrastructure to create terminology is not new to humans. Over the past thousands of years, humans have developed many languages and the terminology needed for languages to evolve. What is new is devise a way to adopt and adapt some of the elegant linguistic concepts to build the terminology infrastructure for the Materials Genome Initiative (MGI), a multi-agency initiative designed to accelerate discovery, development and deployment of advanced materials. The consistency among the words (terminology) that researchers use to describe and share data on advanced materials is an essential component of this national infrastructure. Current terminology used by the MGI community is ad hoc and heterogeneous. To help in accelerating the discovery and integration of materials data for improved design of advanced materials, we have constructed a “root” and rule-based approach that will help the community build re-useable, extensible and automation friendly terminology to describe MGI data in an intuitive way.
After reviewing many possibilities, we chose to adopt some of the rule and root-based concepts used by few Indo-European languages such as Sanskrit, Latin and German to build MGI terminology using English words. Unlike spoken or written linguistic terminology, MGI terminology is expected to be not only human friendly, but also machine-friendly. For this reason we had to give special consideration for the requirements of text-mining techniques such as Natural Language Programming, data-graphs and databases. The proposed MGI terminology effort takes advantage of our past experience in developing ChemBLAST (terminology for chemical structures) and Cell image terminology (cell image data).
In particular, we focused on producing terms that describe material properties. The terminology then could be used by database developers to develop user-friendly web interfaces to archive and distribute MGI data. Community may use these databases to get succinct answers to their product related questions which may lead to decrease in the time needed to develop new products.
CURRENT AND PLANNED OUTPUTS:
This novel technique has been implemented to enhance and extend MGI/Dspace search capabilities
New features include user friendly interface that produces succinct answers to
1.Search on data from Dspace (Right panel in Dspace website (https://materialsdata.nist.gov/dspace/xmlui/handle/11115/75 )
2.Search/download NIST publications related to MGI projects using metadata related to those found in Dspace.
Work is underway to use this novel software generated semantic, evolving and extensible terminology to improve the search experience of publications distributed by
1.International Union of Crystallography
2.American Physical Society .
Within NIST resources, efforts are underway to use this method to improve search experience of MML/ODI managed data.gov and Biosystems and Biomaterials Division (644) web pages with future possibilities to cover the entire MML webpage.
TN Bhat, L. Bartolo, U. Kattner, C. Campbell and J. Elliott, “Strategy for Extensible, Evolving Terminology for the Material Genome Initiative Efforts”, JOM, online July 2015.