@SimonsInstituteTOC
  @SimonsInstituteTOC
Simons Institute | Towards Understanding Generalization Properties of Score-Based Losses @SimonsInstituteTOC | Uploaded 1 week ago | Updated 18 hours ago
Andrej Risteski (Carnegie Mellon University)
https://simons.berkeley.edu/talks/andrej-risteski-carnegie-mellon-university-2024-08-27
Modern Paradigms in Generalization Boot Camp

Score-based losses have emerged as a more computationally appealing alternative to maximum likelihood for fitting (probabilistic) generative models with an intractable likelihood (for example, energy-based models and diffusion models). What is gained by foregoing maximum likelihood is a tractable gradient-based training algorithm. What is lost is less clear: in particular, since maximum likelihood is asymptotically optimal in terms of statistical efficiency, how suboptimal are score-based losses? I will survey a recently developing connection relating the statistical efficiency of broad families of generalized score losses, to the algorithmic efficiency of a natural inference-time algorithm: namely, the mixing time of a suitable diffusion using the score that can be used to draw samples from the model. This “dictionary” allows us to elucidate the design space for score losses with good statistical behavior, by “translating” techniques for speeding up Markov chain convergence (e.g., preconditioning and lifting). I will briefly also touch upon a parallel story for learning discrete probability distributions, in which the "analogue" of score-based losses is played by masked prediction-like losses. Finally, I will end with an outlook of theory for generative models more broadly, both in the short- and long-term.
Towards Understanding Generalization Properties of Score-Based LossesPrediction, Generalization, Complexity: Revisiting the Classical View from Statistics Part 1From Simulated Subjectivity to Collective Consciousness in Large Language ModelsRe-thinking Transformers: Searching for Efficient Linear Layers over a Continuous Space of...Specification-guided Reinforcement learningRobust Optimization and GeneralizationImproved Bounds for Fully Dynamic Matching via Ordered Ruzsa-Szemeredi GraphsError Embraced: Making Trustworthy Scientific Decisions with Imperfect PredictionsStochastic Minimum Vertex Cover with Few Queries: a 3/2-approximationUnderstanding the expressive power of transformers through the lens of formal language theoryController Synthesis Beyond the Worst CaseSocial Behavior Prediction from Video Observations

Towards Understanding Generalization Properties of Score-Based Losses @SimonsInstituteTOC

SHARE TO X SHARE TO REDDIT SHARE TO FACEBOOK WALLPAPER