Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Information Technology

Artificial intelligence

AI Metrology Colloquia Series

As a follow-on to the National Academies of Science, Engineering, and Medicine workshop on Assessing and Improving AI Trustworthiness (link) and the National Institute of Standards and Technology (NIST) workshop on AI Measurement and Evaluation (link), NIST has begun hosting a bi-weekly AI metrology colloquia series, where leading researchers share current and recent work in AI measurement and evaluation.

This series provide a dedicated venue for the presentation and discussion of AI metrology research and to spur collaboration among AI metrology researchers in order to help advance the state-of-the-art in AI measurement and evaluation. The series is open to the public and the presentation formats are flexible, though generally consist of 50-minute talks with 10 minutes of questions and discussion. All talks start at 12:00 p.m. ET.

Information on viewing the series can be found here.

Please contact aime [at] nist.gov (aime[at]nist[dot]gov) with any questions, or to join the AIME mailing list.

2025 Schedule

Date	Presenter	Title
Jan 16	Lora Aroyo /Google	Beyond Accuracy: Evaluating AI in a World of Diverse Perspectives
Jan 30	Francesco Locatello /Inst. of Sci and Tech Austria	Evaluation and Identification Challenges in Causal Modeling
Feb 13
Feb 27

2024 Schedule

Date	Presenter	Title
Jan 18	Elham Tabassi / NIST	NIST Risk Management Framework (RMF) (Postponed)
Feb 01	Anupam Datta / Truera	Evaluating and Monitoring LLM applications
Feb 15	Dr. Marta Kwiatkowska /University of Oxford	Safety and robustness for deep learning with provable guarantees
Feb 29	Vera Liao / Microsoft	Bridging the Socio-Technical Gap: Towards Explainable and Responsible AI
March 14	Bo Li / University of Chicago	Risk Assessment, Safety Alignment, and Guardrails for Generative Models
March 28	Phil Koopman / Carnegie Mellon University	Safety Performance Indicators and Continuous Improvement Feedback
April 11	Josh Tobin / Gantry	Evaluating LLM-based applications
April 25	Peter Kairouz / Research Scientist at Google	Navigating Privacy Risks in (Large) Language Models
May 9	Asma Ben Abacha / Microsoft	Cancelled -- Challenges and Opportunities in Clinical Note Generation and Evaluation
May 23	Henning Müller, HES-SO	Scientific challenges in medical imaging, benchmarks, how to assure fair comparisons
June 20	Dr. Aldo Badano / FDA	Regulatory Science Tools for accelerating innovation in medical AI
July 18	Tom Goldstein / University of Maryland	LLM detectors work really well if you just ignore their mistakes
Aug 01	Laura Weidinger /DeepMind	Safety and Responsibility Evaluations of Advanced AI Models
Aug 29	Lianmin Zheng / UC Berkeley	LLM Evaluation: Chatbot Arena and LLM-as-a-Judge
Sep 12	Dr. Gemma Galdon-Clavell	AI Auditing for a Fairer Future: Addressing Bias and Accountability
Sep 26	Lintang Sutawika	Challenges and Considerations in Language Model Evaluation
Oct 10	Lingjuan Lyu	A Pathway Towards Responsible Generative AI
Oct 24
Nov 9	Peter Henderson / Princeton University	Challenges for Pre-Deployment Evaluation of AI Safety
Nov 21	Arvind Narayanan / Princeton University	The Science of AI Agent Evaluation
Dec 5	Joe Chalfoun and Adele Peskin / NIST	Using Convolutional Neural Networks to Solve Imaging Problems with Limited Training Data
Dec 19	Mark Díaz /Google	Whose Ground Truth? Bridging AI Evaluation and Real-World Plurality

2023 Schedule

Date	Presenter	Title
Jan 19	Robert R. Hoffman / Institute for Human & Machine Cognition	Psychometrics for AI Measurement Science
Feb 02	Ludwig Schmidt / University of Washington	A Data-Centric View on Reliable Generalization
Feb 16	Rich Caruana / Microsoft Research	High Accuracy Is Not Enough --- Not Everything That Is Important Can Be Measured
March 02	Nazneen Rajani / Hugging Face	The Wild West of NLP Modeling, Documentation, and Evaluation
March 16	Ben Shneiderman /University of Maryland	Human-Centered AI: Ensuring Human Control while Increasing Automation
March 30	Isabelle Guyon / Google Brain	Datasets and benchmarks for reproducible ML research: are we there yet?
April 13	Peter Fontana /National Institute of Standards and Technology	Towards a Structured Evaluation Methodology for Artificial Intelligence Technology
April 27	Jutta Treviranus	Statistical Discrimination
May 11	Juho Kim / KAIST	Interaction-Centric AI
May 25	Sina Fazelpour / Northeastern	ML Trade-offs and Values in Sociotechnical Systems
June 8	Rishi Bommasani / Stanford CRFM	Making Foundation Models Transparent
June 22	Visvanathan Ramesh / Goethe University	Transdisciplinary Systems perspective for AI
July 20	Pin-Yu Chen / IBM	Foundational Robustness of Foundation Models
Aug 3	James Zou / Stanford University	Data-centric AI: what is it good for and why do we need it?
Aug 17	Olivia Wiles / Google Deepmind	Rigorous Evaluation of Machine Learning Models
Aug 31	Patrick Hall	Machine Learning for High-Risk Applications
Sep 14	Pradeep Natarajan / Amazon	Recent advances in building Responsible LM technologies at Alexa: Privacy, Inclusivity, and Disambiguation
Sep 28	Jason Yik / Harvard University, neurobench.ai	NeuroBench: Advancing Neuromorphic Computing through Collaborative, Fair and Representative Benchmarking
Oct 26	Chris Welty / Sr. Research Scientist at Google	p-Value: A statistically rigorous approach to machine learning model comparison
Nov 09	Joon Sung Park / Stanford University	Generative Agents: Interactive Simulacra of Human Behavior
Dec 07	Joaquin Vanschoren / Eindhoven University of Technology (TU/e)	Systematic benchmarking for AI safety and Machine Learning research

2022 Schedule

Date	Presenter	Title
December 8	Sharon Yixuan Li / University of Wisconsin Madison	How to Handle Data Shifts? Challenges, Research Progress and Path Forward
November 17	Prof. Emiliano De Cristofaro / University College London	Privacy and Machine Learning: The Good, The Bad, and The Ugly
November 3	Peter Bajcsy, Software and Systems Division, ITL, NIST	Explainable AI Models via Utilization Measurements
October 20	Soheil Feizi / University of Maryland	Symptoms or Diseases: Understanding Reliability Issues in Deep Learning and Potential Ways to Fix Them
October 6	Thomas Dietterich / Oregon State University	Methodological Issues in Anomaly Detection Research
September 22	Douwe Kiela / Head of Research at Hugging Face	Rethinking benchmarking in AI: Evaluation-as-a-Service and Dynamic Adversarial Data Collection
September 8	Been Kim / Google Brain	Bridging the representation gap between humans and machines: first steps
August 25	Aylin Caliskan and Robert Wolfe / University of Washington	Quantifying Biases and Societal Defaults in Word Embeddings and Language-Vision AI
August 11	Chunyuan Li / Microsoft Research	A Vision-Language Approach to Computer Vision in the Wild: Modeling and Benchmarking
July 28	Nicholas Carlini / Google Brain	Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples
July 14	Andrew Trask, OpenMined / University of Oxford / Centre for the Governance of AI	Privacy-preserving AI
June 16	Theodore Jensen / National Institute of Standards and Technology	User Trust Appropriateness in Human-AI Interaction
June 2	Reva Schwartz, Apostol Vassilev, Kristen Greene & Lori A. Perine / National Institute of Standards and Technology	Towards a Standard for Identifying and Managing Bias in Artificial Intelligence (NIST Special Publication 1270)
May 19	Jonathan Fiscus, NIST	The Activities in Extended Video Evaluations : A Case Study in AI Metrology
May 5	Judy Hoffman, Georgia Tech	Measuring and Mitigating Bias in Vision Systems
April 21	Yuekai Sun, Statistics Department at the University of Michigan	Statistical Perspectives on Federated Learning
April 7	Rayid Ghani, Professor in Machine Learning and Public Policy at Carnegie Mellon University	Practical Lessons and Challenges in Building Fair and Equitable AI/ML Systems
March 24	Haiying Guan, NIST	Open Media Forensic Challenge (OpenMFC) Evaluation Program
March 10	Dan Weld, Allen Institute for AI (AI2)	Optimizing Human-AI Teams
February 24	Peter Hase, UNC	Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?
February 10	Timnit Gebru, Distributed AI Research Institute (DAIR)	DAIR & AI and Its Consequences
January 27	Brian Stanton, NIST	Trust and Artificial Intelligence

2021 Schedule

Date	Presenter	Title
December 16	Rich Kuhn, NIST	How Can We Provide Assured Autonomy?
December 2	Andreas Holzinger, Medical University Graz / Graz University of Technology / University of Alberta	Assessing and Improving AI Trustworthiness with the Systems Causability Scale
November 18	David Kanter, ML Commons	Introduction to MLCommons and MLPerf
November 4	Michael Sharp, NIST	Risk Management in Industrial Artificial Intelligence
October 21	Finale Doshi-Velez, Harvard	The Promises, Pitfalls, and Validation of Explainable AI
October 7	-----
September 23	Jonathon Phillips, NIST	Face Recognition: from Evaluations to Experiment
September 9	José Hernández-Orallo, Universitat Politècnica de València / Leverhulme Centre for the Future of Intelligence (Cambridge)	Measuring Capabilities and Generality in Artificial Intelligence
August 26	Rachael Sexton, NIST	Understanding & Evaluating Informed NLP Systems: The Road to Technical Language Processing
August 12	Michael Majurski, NIST	Trojan Detection Evaluation: Finding Hidden Behavior in AI Models
July 29	Ellen Voorhees, NIST	Operationalizing Trustworthy AI

NOTE: Portions of the events may be recorded and audience Q&A or comments may be captured. The recorded event may be edited and rebroadcast or otherwise made publicly available by NIST. By registering for -- or attending -- this event, you acknowledge and consent to being recorded.

Information technology and Artificial intelligence

Created March 15, 2022, Updated January 11, 2025