NIST announced two AI evaluation programs: Assessing Risks and Impacts of AI (ARIA) on July 26, 2024, and the NIST GenAI Challenge on April 29, 2024.
The development and utility of trustworthy AI products and services depends heavily on reliable measurements and evaluations of underlying technologies and their use. NIST conducts research and development of metrics, measurements, and evaluation methods in emerging and existing areas of AI; contributes to the development of standards; and promotes the adoption of standards, guides, and best practices for measuring and evaluating AI technologies as they mature and find new applications.
On October 30, 2023, President Biden signed an Executive Order (EO) to build US capacity to measure and manage the risks of AI systems to ensure safety, security, and trust, while promoting an innovative, competitive AI ecosystem that supports workers and protects consumers. Learn more about NIST's responsibilities under the EO and the creation of the US Artificial Intelligence Safety Institute, including the Consortium that is being established.
NIST has a long history of AI measurement and evaluation activities, starting in the late 1960s with the measurement and evaluation of automated fingerprint identification systems. Since then, NIST has designed and conducted hundreds of evaluations of thousands of AI systems. While these activities typically have focused on measures of accuracy and robustness, other types of AI-related measurements and evaluations under investigation include bias, interpretability, and transparency. Working collaboratively with others, NIST aims to expand these efforts, driving AI research and enabling progress by:
NIST projects are carried out by researchers from a variety of disciplines across the NIST laboratories and frequently in collaboration with industry, other government agencies, and academia. The NIST AI Innovation Lab (NAIIL) leads or coordinates many of these efforts. In addition, the new US Artificial Intelligence Safety Institute and a Consortium will be a key element of NIST's work on AI measurement and evaluation.
Two major recent evaluation efforts are the Assessing Risks and Impacts of AI (ARIA) and NIST GenAI programs.
These activities are part of NIST’s efforts to build a strong and active community around the measurement and evaluation of AI technologies – and complement NIST’s establishment of forums dedicated to the advancement of AI metrology research. This spurs collaboration among those who design, develop, deploy, test, and evaluate AI technologies and helps to meet the needs of a broad and diverse AI community. Events convened by NIST to strengthen the AI measurement and evaluation community include:
For more information about how to engage with NIST on AI, see: Engage
NIST has been engaged in focused efforts to establish common terminologies, definitions, and taxonomies of concepts pertaining to characteristics of AI technologies in order to form the necessary underpinnings for trustworthy AI systems. Those characteristics include accuracy, explainability and interpretability, privacy, reliability, robustness, safety, security (resilience), and mitigation of harmful bias. Each requires its own portfolio of measurements and evaluations, and context is crucial. How a given component is measured and evaluated can change based on the context in which the AI system operates.
For each characteristic, NIST has produced – or aims to document and improve – the definitions, applications, tasks, and strengths and limitations of metrics and measurement methods in use or being proposed. NIST also has developed – or may prepare and curate – meaningful data sets with respect to select attributes of interest and apply chosen metrics and measurement methods to various AI systems.