Sebastien BubeckThe new wave of AI systems, ChatGPT and its more powerful successors, exhibit extraordinary capabilities across a broad swath of domains. In light of this, we discuss whether artificial INTELLIGENCE has arrived.
Sparks of AGI: early experiments with GPT-4Sebastien Bubeck2023-04-06 | The new wave of AI systems, ChatGPT and its more powerful successors, exhibit extraordinary capabilities across a broad swath of domains. In light of this, we discuss whether artificial INTELLIGENCE has arrived.
Paper available here: arxiv.org/abs/2303.12712 Video recorded at MIT on March 22nd, 2023Textbooks Are All You NeedSebastien Bubeck2023-09-12 | I discuss the power of the "Textbooks Are All You Need" methodology to build much more compact LLMs using higher quality data. I emphasize phi-1 (coding LLM w. 1.3B parameters) arxiv.org/abs/2306.11644 and phi-1.5 (common sense reasoning LLM w. 1.3B parameters)arxiv.org/abs/2309.05463, and the original inspiration from TinyStories by Eldan and Li (fluent English LLM w. 10M parameters) arxiv.org/abs/2305.07759.TinyStories by Ronen EldanSebastien Bubeck2023-06-30 | How Small Can Language Models Be and Still Speak Coherent English? Ronen Eldan discusses his recent work with Yuanzhi Li to answer this question. Based on arxiv.org/abs/2305.07759 .Learning threshold neurons via the Edge of StabilitySebastien Bubeck2023-03-02 | Presentation by Kwangjun Ahn and Felipe Suarez of their work arxiv.org/abs/2212.07469 with S. Bubeck, S. Chewi, Y.T. Lee and Y. Zhang. This work presents an analysis of the non-convex dynamic in a toy model (the sparse coding model) that captures the essence of the emergence of edge detectors in convolutional neural networks. Surprisingly this emergence is connected to the well-documented phenomenon of instability in training neural networks (aka Edge of Stability).Physics of AISebastien Bubeck2023-02-20 | We propose an approach to the science of deep learning that roughly follows what physicists do to understand reality: (1) explore phenomena through controlled experiments, and (2) build theories based on toy mathematical models and non-fully- rigorous mathematical reasoning. I illustrate (1) with the LEGO study (LEGO stands for Learning Equality and Group Operations), where we observe how transformers learn to solve simple linear systems of equations. I will also briefly illustrate (2) with an analysis of the emergence of threshold units when training a two-layers neural network to solve a simple sparse coding problem. The latter analysis connects to the recently discovered Edge of Stability phenomenon.
I gave this talk at an NSF Town Hall where the goal was to discuss successes of deep learning especially in light of more traditional fields (other talks can be found here: nsf.gov/events/event_summ.jsp?cntn_id=304013&org=CISE).A Universal Law of RobustnessSebastien Bubeck2021-08-24 | I give a tentative theoretical justification for why large overparametrization is important in neural networks.
Primarily based on "A Universal Law of Robustness via Isoperimetry" by S.B. and Mark Sellke. arxiv.org/abs/2105.12806Adversarial examples at initializationSebastien Bubeck2021-04-09 | I discuss adversarial examples on two-layers neural networks at (random) initialization.
Based on ``A single gradient step finds adversarial examples on random two-layers neural networks" by S.B., Yeshwanth Cherapanamjeri, Gauthier Gidel, and Remi Tachet des Combes. arxiv.org/abs/2104.03863Crash course on tensors with application to neural networksSebastien Bubeck2020-11-30 | Crash course on tensors (what they are, what cross norms are, basic generalities about nuclear norm/operator and rank), followed by an application of this to the law of robustness conjecture for neural networks (see arxiv.org/abs/2009.14444 and youtu.be/uRarIjJGmhs).
Typo: at minute 7 (and afterwards) it is said that the p^{th} tensor power of R^d is identified with R^{dp}, but of course it should be R^{d^p}.A law of robustness for neural networksSebastien Bubeck2020-10-06 | I describe a mathematical conjecture potentially establishing overparametrization as a law of robustness for neural networks. In particular it would imply that neural networks can be smooth if and only if they are greatly overparametrized.
Based on ``A law of robustness for two-layers neural networks" by S.B., Yuanzhi Li, and Dheeraj Nagaraj. arxiv.org/abs/2009.14444Provable limitations of kernel methodsSebastien Bubeck2020-09-09 | Simple demonstration of the limitations of kernel methods, even if the training data points can be chosen (the latter makes parity with noise easy to learn for example). Based on the work of Zeyuan Allen-Zhu and Yuanzhi Li arxiv.org/abs/2001.04413 .Memorization with small neural networksSebastien Bubeck2020-07-05 | We present simple constructions of memorizing neural network with optimal number of neurons. In particular we describe how this can be achieved in the NTK (neural tangent kernel) regime.
Based on ``Network size and weights size for memorization with two-layers neural networks" by S.B., R. Eldan, Y.T. Lee and D. Mikulincer. arxiv.org/abs/2006.02855Coordination without communicationSebastien Bubeck2020-04-30 | We present a two-players version of the classical (stochastic) prediction with expert advice setting. We describe a strategy with no communication between the players that achieves optimal regret.
Based on ``Coordination without communication: optimal regret in two players multi-armed bandits" by S.B. and T. Budzinski arxiv.org/abs/2002.07596 .Randomized smoothing for certified robustnessSebastien Bubeck2020-03-30 | We give a short proof of the Cohen-Rosenfeld-Kolter theorem on the certified robustness of randomized smoothing.
Lecture 9: - The entropic barrier for linear bandits - Best of both worlds property for multi-armed bandits - The mystery of mirror descent's adaptivity
Lecture 8: - Introduction to the linear bandit problem - Unbiased estimation for linear bandit and its variance term - The Abernethy-Hazan-Rakhlin sampling scheme
Lecture 7: - Refined regret bound for multiplicative weights update from the general theorem for mirror descent - Multi-armed bandit and importance sampling - Optimal regret for multi-armed bandit with Tsallis entropy
Lecture 6: - Discrete time analysis of mirror descent (with proof derived from the continuous time analysis) - Revisiting Multiplicative Weights Update (from lecture 2) with discrete time mirror descent
Lecture 5: - From trees to general metric spaces: random embeddings - The log(n) distortion theorem of Bartal 96/FRT03 - A ``simple" proof by James R. Lee tcsmath.github.io/online/2018/04/12/metric-approx
Lecture 1: - Brief reminders of convexity - Classical analysis of gradient descent - Robustness property and regret interpretation
Lecture notes can be found here: hdpa2019.sciencesconf.org/data/pages/bubeck_hdpa.pdfOnline Lipschitz Selection, Lecture 5/5Sebastien Bubeck2019-03-18 | Lectures on Online Lipschitz Selection by Sebastien Bubeck for the XIV Escuela de Verano en Matematicas Discretas http://eventos.cmm.uchile.cl/discretas2019/
Lecture 5: - Fractional setting for k-server - Mirror descent approach for fractional k-server - Application: complete proof of log(k) for weighted paging - Some insights to generalize from stars to treesOnline Lipschitz Selection, Lecture 4/5Sebastien Bubeck2019-03-06 | Lectures on Online Lipschitz Selection by Sebastien Bubeck for the XIV Escuela de Verano en Matematicas Discretas http://eventos.cmm.uchile.cl/discretas2019/
Lecture 4: - Entropic regularization for MTS on weighted stars - Multiscale entropic regularization for MTS on treesOnline Lipschitz Selection, Lecture 3/5Sebastien Bubeck2019-02-26 | Lectures on Online Lipschitz Selection by Sebastien Bubeck for the XIV Escuela de Verano en Matematicas Discretas http://eventos.cmm.uchile.cl/discretas2019/
Lecture 3: - Brief reminder of the metrical task system framework - Definition of Mirror Descent - Potential analysis of MD (in continuous time)Online Lipschitz Selection, Lecture 2/5Sebastien Bubeck2019-02-14 | Lectures on Online Lipschitz Selection by Sebastien Bubeck for the XIV Escuela de Verano en Matematicas Discretas http://eventos.cmm.uchile.cl/discretas2019/
Lecture 2: - Description of the main conjectures (MTS, k-server, chasing convex body) - The gradient descent insight from online learning, and a simpler continuous time version of this insightOnline Lipschitz Selection, Lecture 1/5Sebastien Bubeck2019-02-06 | Lectures on Online Lipschitz Selection by Sebastien Bubeck for the XIV Escuela de Verano en Matematicas Discretas http://eventos.cmm.uchile.cl/discretas2019/
Lecture 1: - Classical selection problems - A definition of ``Online Lipschitz" (a.k.a., competitive analysis) - The base case: online selection in uniform spaces (illustration of the power of randomization)Introduction to Statistical Learning Theory, Lecture 4/4Sebastien Bubeck2018-08-07 | Introduction to Statistical Learning Theory by Sebastien Bubeck for the 2018 Summer School ``Operations Research and Machine Learning" https://cermics-lab.enpc.fr/summer-school-operations-research-and-machine-learning/
Lecture 4: - Information theoretic perspective on generalization - PAC-Bayes analysis - Online learning - Margin theory - A brief introduction to robust machine learningIntroduction to Statistical Learning Theory, Lecture 3/4Sebastien Bubeck2018-07-30 | Introduction to Statistical Learning Theory by Sebastien Bubeck for the 2018 Summer School ``Operations Research and Machine Learning" https://cermics-lab.enpc.fr/summer-school-operations-research-and-machine-learning/
Lecture 3: - Reminders of first 2 lectures and VC dimension - Generalization via stability - Generalization via robustnessIntroduction to Statistical Learning Theory, Lecture 2/4Sebastien Bubeck2018-07-23 | Introduction to Statistical Learning Theory by Sebastien Bubeck for the 2018 Summer School ``Operations Research and Machine Learning" https://cermics-lab.enpc.fr/summer-school-operations-research-and-machine-learning/
Lecture 2: - Brief reminder of concentration inequalities - Glivenko-Cantelli class and Rademacher complexity - Application: analysis of bounded regression - Brief discussion of type/cotype and application to the analysis of LASSOIntroduction to Statistical Learning Theory, Lecture 1/4Sebastien Bubeck2018-07-16 | Introduction to Statistical Learning Theory by Sebastien Bubeck for the 2018 Summer School ``Operations Research and Machine Learning" https://cermics-lab.enpc.fr/summer-school-operations-research-and-machine-learning/
Lecture 1: - What SLT is about - Sample complexity/PAC learning - Canonical settings (linear classification, linear regression, logistic regression, SVM, neural networks) - Empirical Risk Minimization and the problem of overfitting - Nearest neighbor classificationBandit Convex Optimization, PGMO Lecture 4Sebastien Bubeck2018-05-14 | Lectures on Bandit Convex Optimization by Sebastien Bubeck for the Gaspard Monge Program in Optimization https://www.fondation-hadamard.fr/en/pgmo-seminars-courses/courses
Lecture 4: - BCO with one-point estimate of the gradient - Kernel methods for BCO - Some topics to go further (contextual bandits, easy data bounds)Bandit Convex Optimization, PGMO Lecture 3Sebastien Bubeck2018-04-19 | Lectures on Bandit Convex Optimization by Sebastien Bubeck for the Gaspard Monge Program in Optimization https://www.fondation-hadamard.fr/en/pgmo-seminars-courses/courses
Lecture 3: - Online combinatorial optimization (full information, semi-bandit, bandit) - Crash course in interior point methods (starts at 1:01:40) - Linear bandits after Abernethy-Hazan-Rakhlin. Optimal bound via the entropic barrier of Bubeck-Eldan.Bandit Convex Optimization, PGMO Lecture 2Sebastien Bubeck2018-04-04 | Lectures on Bandit Convex Optimization by Sebastien Bubeck for the Gaspard Monge Program in Optimization https://www.fondation-hadamard.fr/en/pgmo-seminars-courses/courses
Lecture 2: - Introduction to the mirror descent algorithm - Continuous time and discrete time analyses - Multi-armed bandit regularizers (entropy, INF, and log-barrier) and their associated regret (classical bound, optimal bound, and small loss bound) - Connection with online algorithms (in particular metrical task systems)Bandit Convex Optimization, PGMO Lecture 1Sebastien Bubeck2018-03-24 | Lectures on Bandit Convex Optimization by Sebastien Bubeck for the Gaspard Monge Program in Optimization https://www.fondation-hadamard.fr/en/pgmo-seminars-courses/courses
Lecture 1: - Introduction to regret analysis - Basic multiplicative weights intuition and analysis - Game theoretic/Information theoretic approach to regret analysis (both with ``full information" and ``bandit feedback")