As a follow-on to the National Academies of Science, Engineering, and Medicine workshop on Assessing and Improving AI Trustworthiness (link) and the National Institute of Standards and Technology (NIST) workshop on AI Measurement and Evaluation (link), NIST has begun hosting a bi-weekly AI metrology colloquia series, where leading researchers share current and recent work in AI measurement and evaluation.
This series provide a dedicated venue for the presentation and discussion of AI metrology research and to spur collaboration among AI metrology researchers in order to help advance the state-of-the-art in AI measurement and evaluation. The series is open to the public and the presentation formats are flexible, though generally consist of 50-minute talks with 10 minutes of questions and discussion. All talks start at 12:00 p.m. ET.
Information on viewing the series can be found here.
Please contact aime [at] nist.gov (aime[at]nist[dot]gov) with any questions, or to join the AIME mailing list.
Date | Presenter | Title |
---|---|---|
Jan 16 | Lora Aroyo /Google | |
Jan 30 | Francesco Locatello /Inst. of Sci and Tech Austria | Evaluation and Identification Challenges in Causal Modeling |
Feb 13 | ||
Feb 27 |
Date | Presenter | Title |
---|---|---|
Jan 18 | Elham Tabassi / NIST | NIST Risk Management Framework (RMF) (Postponed) |
Feb 01 | Anupam Datta / Truera | Evaluating and Monitoring LLM applications |
Feb 15 | Dr. Marta Kwiatkowska /University of Oxford | Safety and robustness for deep learning with provable guarantees |
Feb 29 | Vera Liao / Microsoft | Bridging the Socio-Technical Gap: Towards Explainable and Responsible AI |
March 14 | Bo Li / University of Chicago | Risk Assessment, Safety Alignment, and Guardrails for Generative Models |
March 28 | Phil Koopman / Carnegie Mellon University | Safety Performance Indicators and Continuous Improvement Feedback |
April 11 | Josh Tobin / Gantry | Evaluating LLM-based applications |
April 25 | Peter Kairouz / Research Scientist at Google | Navigating Privacy Risks in (Large) Language Models |
May 9 | Asma Ben Abacha / Microsoft | Cancelled -- Challenges and Opportunities in Clinical Note Generation and Evaluation |
May 23 | Henning Müller, HES-SO | Scientific challenges in medical imaging, benchmarks, how to assure fair comparisons |
June 20 | Dr. Aldo Badano / FDA | Regulatory Science Tools for accelerating innovation in medical AI |
July 18 | Tom Goldstein / University of Maryland | LLM detectors work really well if you just ignore their mistakes |
Aug 01 | Laura Weidinger /DeepMind | Safety and Responsibility Evaluations of Advanced AI Models |
Aug 29 | Lianmin Zheng / UC Berkeley | LLM Evaluation: Chatbot Arena and LLM-as-a-Judge |
Sep 12 | Dr. Gemma Galdon-Clavell | AI Auditing for a Fairer Future: Addressing Bias and Accountability |
Sep 26 | Lintang Sutawika | Challenges and Considerations in Language Model Evaluation |
Oct 10 | Lingjuan Lyu | A Pathway Towards Responsible Generative AI |
Oct 24 | ||
Nov 9 | Peter Henderson / Princeton University | Challenges for Pre-Deployment Evaluation of AI Safety |
Nov 21 | Arvind Narayanan / Princeton University | |
Dec 5 | Joe Chalfoun / NIST | |
Dec 19 | Mark Díaz /Google | Whose Ground Truth? Bridging AI Evaluation and Real-World Plurality |
Date | Presenter | Title |
---|---|---|
Jan 19 | Robert R. Hoffman / Institute for Human & Machine Cognition | Psychometrics for AI Measurement Science |
Feb 02 | Ludwig Schmidt / University of Washington | A Data-Centric View on Reliable Generalization |
Feb 16 | Rich Caruana / Microsoft Research | High Accuracy Is Not Enough --- Not Everything That Is Important Can Be Measured |
March 02 | Nazneen Rajani / Hugging Face | The Wild West of NLP Modeling, Documentation, and Evaluation |
March 16 | Ben Shneiderman /University of Maryland | Human-Centered AI: Ensuring Human Control while Increasing Automation |
March 30 | Isabelle Guyon / Google Brain | Datasets and benchmarks for reproducible ML research: are we there yet? |
April 13 | Peter Fontana /National Institute of Standards and Technology | Towards a Structured Evaluation Methodology for Artificial Intelligence Technology |
April 27 | Jutta Treviranus | Statistical Discrimination |
May 11 | Juho Kim / KAIST | Interaction-Centric AI |
May 25 | Sina Fazelpour / Northeastern | ML Trade-offs and Values in Sociotechnical Systems |
June 8 | Rishi Bommasani / Stanford CRFM | Making Foundation Models Transparent |
June 22 | Visvanathan Ramesh / Goethe University | Transdisciplinary Systems perspective for AI |
July 20 | Pin-Yu Chen / IBM | Foundational Robustness of Foundation Models |
Aug 3 | James Zou / Stanford University | Data-centric AI: what is it good for and why do we need it? |
Aug 17 | Olivia Wiles / Google Deepmind | Rigorous Evaluation of Machine Learning Models |
Aug 31 | Patrick Hall | Machine Learning for High-Risk Applications |
Sep 14 | Pradeep Natarajan / Amazon | Recent advances in building Responsible LM technologies at Alexa: Privacy, Inclusivity, and Disambiguation |
Sep 28 | Jason Yik / Harvard University, neurobench.ai | NeuroBench: Advancing Neuromorphic Computing through Collaborative, Fair and Representative Benchmarking |
Oct 26 | Chris Welty / Sr. Research Scientist at Google | p-Value: A statistically rigorous approach to machine learning model comparison |
Nov 09 | Joon Sung Park / Stanford University | Generative Agents: Interactive Simulacra of Human Behavior |
Dec 07 | Joaquin Vanschoren / Eindhoven University of Technology (TU/e) | Systematic benchmarking for AI safety and Machine Learning research |
Date | Presenter | Title |
---|---|---|
December 8 | Sharon Yixuan Li / University of Wisconsin Madison | How to Handle Data Shifts? Challenges, Research Progress and Path Forward |
November 17 | Prof. Emiliano De Cristofaro / University College London | Privacy and Machine Learning: The Good, The Bad, and The Ugly |
November 3 | Peter Bajcsy, Software and Systems Division, ITL, NIST | Explainable AI Models via Utilization Measurements |
October 20 | Soheil Feizi / University of Maryland | Symptoms or Diseases: Understanding Reliability Issues in Deep Learning and Potential Ways to Fix Them |
October 6 | Thomas Dietterich / Oregon State University | Methodological Issues in Anomaly Detection Research |
September 22 | Douwe Kiela / Head of Research at Hugging Face | Rethinking benchmarking in AI: Evaluation-as-a-Service and Dynamic Adversarial Data Collection |
September 8 | Been Kim / Google Brain | Bridging the representation gap between humans and machines: first steps |
August 25 | Aylin Caliskan and Robert Wolfe / University of Washington | Quantifying Biases and Societal Defaults in Word Embeddings and Language-Vision AI |
August 11 | Chunyuan Li / Microsoft Research | A Vision-Language Approach to Computer Vision in the Wild: Modeling and Benchmarking |
July 28 | Nicholas Carlini / Google Brain | Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples |
July 14 | Andrew Trask, OpenMined / University of Oxford / Centre for the Governance of AI | Privacy-preserving AI |
June 16 | Theodore Jensen / National Institute of Standards and Technology | User Trust Appropriateness in Human-AI Interaction |
June 2 | Reva Schwartz, Apostol Vassilev, Kristen Greene & Lori A. Perine / National Institute of Standards and Technology | Towards a Standard for Identifying and Managing Bias in Artificial Intelligence (NIST Special Publication 1270) |
May 19 | Jonathan Fiscus, NIST | The Activities in Extended Video Evaluations : A Case Study in AI Metrology |
May 5 | Judy Hoffman, Georgia Tech | Measuring and Mitigating Bias in Vision Systems |
April 21 | Yuekai Sun, Statistics Department at the University of Michigan | Statistical Perspectives on Federated Learning |
April 7 | Rayid Ghani, Professor in Machine Learning and Public Policy at Carnegie Mellon University | Practical Lessons and Challenges in Building Fair and Equitable AI/ML Systems |
March 24 | Haiying Guan, NIST | Open Media Forensic Challenge (OpenMFC) Evaluation Program |
March 10 | Dan Weld, Allen Institute for AI (AI2) | Optimizing Human-AI Teams |
February 24 | Peter Hase, UNC | Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? |
February 10 | Timnit Gebru, Distributed AI Research Institute (DAIR) | DAIR & AI and Its Consequences |
January 27 | Brian Stanton, NIST | Trust and Artificial Intelligence |
Date | Presenter | Title |
---|---|---|
December 16 | Rich Kuhn, NIST | How Can We Provide Assured Autonomy? |
December 2 | Andreas Holzinger, Medical University Graz / Graz University of Technology / University of Alberta | Assessing and Improving AI Trustworthiness with the Systems Causability Scale |
November 18 | David Kanter, ML Commons | Introduction to MLCommons and MLPerf |
November 4 | Michael Sharp, NIST | Risk Management in Industrial Artificial Intelligence |
October 21 | Finale Doshi-Velez, Harvard | The Promises, Pitfalls, and Validation of Explainable AI |
October 7 | ----- | |
September 23 | Jonathon Phillips, NIST | Face Recognition: from Evaluations to Experiment |
September 9 | José Hernández-Orallo, Universitat Politècnica de València / Leverhulme Centre for the Future of Intelligence (Cambridge) | Measuring Capabilities and Generality in Artificial Intelligence |
August 26 | Rachael Sexton, NIST | Understanding & Evaluating Informed NLP Systems: The Road to Technical Language Processing |
August 12 | Michael Majurski, NIST | Trojan Detection Evaluation: Finding Hidden Behavior in AI Models |
July 29 | Ellen Voorhees, NIST | Operationalizing Trustworthy AI |
NOTE: Portions of the events may be recorded and audience Q&A or comments may be captured. The recorded event may be edited and rebroadcast or otherwise made publicly available by NIST. By registering for -- or attending -- this event, you acknowledge and consent to being recorded.