A semantic asset is a digital resource (e.g., research paper, dataset, software, model) that has been enriched with metadata and semantic annotations that are designed to be machine-readable to make it easier for users to find resources, understand their meaning and relationships, and combine them in their research.
METIS provides access to semantically enriched research products through the use of the NIST Extensible Resource Data Model (NERDm), a JSON-LD-formatted metadata schema used by the NIST Public Data Repository (PDR) and Science Data Portal to describe data resources available from NIST. NERDm is defined using JSON Schema and is designed using best practices and standards-oriented representations, enabling efficient discovery and integration of research products that adheres to Findability, Accessibility, Interoperability, and Reusability (FAIR) principles. This allows for seamless exploration, combination, and analysis of data from diverse sources.
The METIS project is currently working to enhance its semantic assets in three ways:
These three items are intended to address perceived current metadata gaps for the METIS use cases.
The efficient organization and search of large resource collections rely heavily on the availability of standardized terms for annotation. However, identifying and defining these terms can be an arduous task that spans multiple years, requiring significant input from domain experts. To expedite this process, our approach leverages terms from diverse sources and employs large language models (specifically, gpt-turbo-instruct and ChatGPT 3.5) to generate a substantial collection of approximately 12,000 proto-definitions.
Our primary objective is to facilitate the vocabulary creation process by providing domain experts with an extensive collection of plausible candidate terms for their selection, review, editing, and refinement. This approach enables domain experts to focus on higher-level tasks such as reviewing, validating, and fine-tuning the generated terms, rather than starting from scratch. Our goal is to accelerate the community development of a standardized vocabulary, thereby enhancing the discoverability, accessibility, and usability of large resource collections for various CHIPS stakeholders.
Encouraged by our initial results, we have started the process of developing the Controlled Vocabulary Curation System (CVCS). This web-based application will allow domain experts to generate proto-definitions for terms of their choice and to manage the term review and curation process.
We have started developing a taxonomy to categorize and organize METIS resources. The METIS taxonomy will benefit from the METIS Controlled Vocabulary to provide definitions for items in the taxonomy.
We are also working towards the development of an ontology to serve as a formal logical model that will provide representations of concepts, relationships, and axioms, to facilitate machine-readable understanding and reasoning about resources. The development of the METIS ontology will benefit from our work on the controlled vocabulary and taxonomy.
The METIS project's focus on semantic assets and its current efforts with controlled vocabularies, and topic taxonomies are closely related to other areas in the realm of semantics:
These related areas will inform and shape the evolution of METIS's semantic assets, ultimately enhancing the overall discoverability, accessibility, and usability of research products for CHIPS stakeholders.