NIST will hold a virtual workshop on Artificial Intelligence Measurement and Evaluation June 15-17, 2021. The three-day workshop aims to bring together stakeholders and experts to identify the most pressing needs for AI measurement and evaluation and to advance the state of the art and practice.
NIST is assigned responsibility by statute to advance underlying research for measuring and assessing AI technologies. That includes the development of AI data standards and best practices, as well as AI evaluation and testing methodologies and standards. NIST is working collaboratively with the private and public sectors to help prioritize and work on its AI activities.
Panels and discussions will be organized to provide feedback on topics related to the AI Measurement and Evaluation, and to influence the future direction of NIST efforts in this area.
This workshop will be ideal for:
Stay connected with the latest NIST AIME updates -- sign up for the mailing list by either:
A new project called Dioptra has been released on GitHub at https://github.com/usnistgov/dioptra!
Dioptra is a test bed software currently focused on adversarial machine learning and defensive mitigations. It is in a pre-release status but we would like to start collecting community feedback.
Workshop Read-Ahead: Artificial Intelligence Measurement and Evaluation at the National Institute of Standards and Technology (Draft)
Fact Sheet: NIST AI Program
All times EDT (UTC-4)
Start Time | End Time | Topic |
---|---|---|
11:00 AM | 11:20 AM |
Welcome, Workshop Goals & Logistics, Overview
|
11:20 AM | 11:50 AM |
Keynote: A National Security Perspective on AI Measurement and Evaluation
|
11:50 AM | 12:00 PM | Break |
12:00 PM | 1:30 PM |
Panel 1: Measuring with Purpose Discussion of the needs for and uses of AI evaluation outputs and their role in driving down-stream processes, including the requirements and properties important for an AI evaluation to possess in order to be fit for the intended uses. Identification of areas for which current measurement and evaluation approaches are insufficient or do not exist, where further AI metrology research would be beneficial. Moderator:
Panelists:
|
1:30 PM | 1:45 PM | Break and Discussion Time |
1:45 PM | 2:15 PM |
Panel 2: Overview of Past & Current Evaluations Overview of the evaluation-driven research paradigm that has been used at NIST to evaluate AI systems, with a description of the various styles of evaluations, as well as examples of some of the AI measurement and evaluation activities conducted at NIST. Moderator:
Panelists:
|
2:15 PM | 2:45 PM |
Panel 3: Discussion of NIST/Community Future Work (slides) Discussion of the limitations of current AI measurement and evaluation activities that prevent them from addressing all the needs for AI measurement and evaluation, and future plans for NIST to address these limitations together with the research community. |
2:45 PM | 3:00 PM | Break |
3:00 PM | 4:00 PM |
Panel 4: Evaluating AI during Operation Discussion of AI evaluation in production/operational environments, including topics drawn from: MLOps; Operational evaluation metrics/Business metrics; Model quality/Data drift with online data; Latency, throughput, and scalability issues; Adversarial attacks and robustness to corruptions/perturbations; Governance and regulatory compliance. Moderator:
Panelists:
|
4:00 PM | 4:15 PM |
Closing Remarks NIST Workshop Organizing Committee |
4:15 PM | 5:00 PM | After Hours: Slack with NIST staff |
All times EDT (UTC-4)
Start Time | End Time | Topic |
---|---|---|
11:00 AM | 11:30 AM |
Keynote
|
11:30 AM | 12:30 PM |
Panel 5: Evaluation Design Process Discussion of the processes and procedures for designing evaluations of AI systems, including: the high-level considerations and decisions that must be made in order to design and implement effective evaluations; the components of and relationships between the various evaluation design elements; and the role of the applications and overall evaluation goals in evaluation design. Moderator:
Panelists:
|
12:30 PM | 12:45 PM | Break |
12:45 PM | 1:45 PM |
Panel 6: Metrics and Measurement Methods Discussion of: the properties of an AI system that can/should be measured, and which properties have/lack metrics and measurement methods; the different measurement methods that are used to measure AI and their strengths/limitations; the different types and uses of metrics, and the various properties that a metric can poses; the impacts of the chosen metrics and measurements methods have on an evaluation; when is it important to have glass box access to AI systems for evaluation, and when the design/approach taken by an AI system influences the choice of metrics/measurement methods. Moderator:
Panelists:
|
1:45 PM | 2:00 PM | Break |
2:00 PM | 3:00 PM |
Panel 7: Data and Data Sets Data collection methods and dataset design for AI system measurement and evaluation, along with discussions drawing from the following topics: approaches for data annotation/labeling; uncertain, missing, or non-existence of ground truth; how much data is necessary; needs for and uses of simulated/generated data; roles of common datasets in research; repurposing of data; ethical and privacy considerations; et al. Moderator:
Panelists:
|
3:00 PM | 3:15 PM | Break |
3:15 PM | 4:15 PM |
Panel 8: Limitations, Challenges, and Future Directions of Evaluation Discussion of the limitations, challenges, shortcomings, and future directions for the evaluation and measurement of AI, including the new or emerging evaluation paradigms, the ability/inability to generalize evaluation results and its policy implications. Needs and plans for improvements to existing measurement and evaluation activities as well as the creation of new AI evaluation challenge problems and measurement research. Moderator:
Panelists:
|
4:15 PM | 4:30 PM | Break and Slack Discussion Time |
4:30 PM | 5:00 PM |
Closing Remarks NIST Workshop Organizing Committee |
5:00 PM | 5:30 PM | After Hours: Slack with NIST staff |
All times EDT (UTC-4)
Start Time | End Time | Topic |
---|---|---|
11:00 AM | 11:30 AM |
Keynote: AI test and evaluation from National AI Initiative Perspective
|
11:30 AM | 12:30 PM |
Panel 9: Measuring Concepts that are Complex, Contextual, and Abstract Discussion of the challenges and approaches for measuring AI system characteristics that are complex, contextual, and/or abstract, or are otherwise difficult to quantify (such as explainability, bias, trustworthiness, safety, etc.) including the role that descriptive and/or qualitative measurements should play in these cases. Moderator:
Panelists:
|
12:30 PM | 12:45 PM | Break |
12:45 PM | 1:45 PM |
Panel 10: Measuring with Humans in the Mix Discussion of the measurement and evaluation of AI systems that work in cooperation with humans, including the roles and relationships between the AI systems and the humans, and the challenges of and approaches to Moderator:
Panelists:
|
1:45 PM | 2:00 PM | Break |
2:00 PM | 3:00 PM |
Panel 11: Software Infrastructure Overview, Existing Tools and Future Desires Discussion of the landscape, challenges, and needs of developing tools and infrastructure for the particular purpose of measuring, testing, and evaluating AI systems. Moderator:
Panelists:
|
3:00 PM | 3:15 PM | Break and Discussion Time |
3:15 PM | 4:15 PM |
Panel 12: Practical Considerations and Best Practices for Measurement and Evaluation Discussion of the practical considerations and concrete best practices for the measurement and evaluation of AI-based systems, including the testing and evaluation strategies that can be used to mitigate privacy loss or intellectual property exposure in AI testing. Moderator:
Panelists:
|
4:15 PM | 4:30 PM | Break |
4:30 PM | 5:00 PM |
NIST: Workshop Debrief and Next Steps NIST Workshop Organizing Committee |