Expensive chaotic models, like those used in weather simulation, present a unique set of challenges for tasks like validation and calibration. Often, they are computationally expensive, highly sensitive to parameters, and are written in languages that don't support approaches like automatic differentiation. This talk will provide an overview of approaches that employ ultra-short runs and ensembles to address these challenges with a specific focus on methods to detect unwanted changes in large model codes.