Mikhail (Misha) Belkin
Computer Science & Engineering, University of California - San Diego
Tuesday, February 21, 2023, 3:00-4:00 PM ET (1:00-2:00 PM MT)
A video of this talk will be made available to NIST staff in the Math channel on NISTube, which is accessible from the NIST internal home page. It will be taken down from NISTube after 12 months at which point it can be requested by emailing the ACMD Seminar Chair.
Abstract: Recent empirical successes of deep learning have exposed significant gaps in our fundamental understanding of learning and optimization mechanisms. Modern best practices for model selection are in direct contradiction to the methodologies suggested by classical analyses. Similarly, the efficiency of SGD-based local methods used in training modern models, appeared at odds with the standard intuitions on optimization.
First, I will present evidence, empirical and mathematical, that necessitates revisiting classical statistical notions, such as over-fitting. I will continue to discuss the emerging understanding of generalization, and, in particular, the "double descent" risk curve, which extends the classical U-shaped generalization curve beyond the point of interpolation.
Second, I will discuss why the landscapes of over-parameterized neural networks are generically never convex, even locally. Instead they satisfy the Polyak-Lojasiewicz (PL) condition across most of the parameter space instead, presents an powerful framework for optimization in general over-parameterized models and allows SGD-type methods to converge to a global minimum.
While our understanding has significantly grown in the last few years, a key piece of the puzzle remains -- how does optimization align with statistics to form the complete mathematical picture of modern ML?
Bio: Mikhail Belkin received his Ph.D. in 2003 from the Department of Mathematics at the University of Chicago. His research interests are in theory and applications of machine learning and data analysis. Some of his well-known work includes widely used Laplacian Eigenmaps, Graph Regularization and Manifold Regularization algorithms, which brought ideas from classical differential geometry and spectral analysis to data science. His recent work has been concerned with understanding remarkable mathematical and statistical phenomena observed in deep learning. This empirical evidence necessitated revisiting some of the basic concepts in statistics and optimization. One of his key recent findings is the "double descent" risk curve that extends the textbook U-shaped bias-variance trade-off curve beyond the point of interpolation. He has served on the editorial boards of the Journal of Machine Learning Research, IEEE Pattern Analysis and Machine Intelligence and SIAM Journal on Mathematics of Data Science.
Host: Gunay Dogan
Note: This talk will be recorded to provide access to NIST staff and associates who could not be present to the time of the seminar. The recording will be made available in the Math channel on NISTube, which is accessible only on the NIST internal network. This recording could be released to the public through a Freedom of Information Act (FOIA) request. Do not discuss or visually present any sensitive (CUI/PII/BII) material. Ensure that no inappropriate material or any minors are contained within the background of any recording. (To facilitate this, we request that cameras of attendees are muted except when asking questions.)
Note: Visitors from outside NIST must contact Lochi Orr (410) 598-6986; at least 24 hours in advance.