Sparks of AGI: early experiments with GPT-4 @SebastienBubeck

Sebastien Bubeck The new wave of AI systems, ChatGPT and its more powerful successors, exhibit extraordinary capabilities across a broad swath of domains. In light of this, we discuss whether artificial INTELLIGENCE has arrived.

Paper available here: arxiv.org/abs/2303.12712
Video recorded at MIT on March 22nd, 2023

updated 1 year ago

Sparks of AGI: early experiments with GPT-4

Sebastien Bubeck 2023-04-06 | The new wave of AI systems, ChatGPT and its more powerful successors, exhibit extraordinary capabilities across a broad swath of domains. In light of this, we discuss whether artificial INTELLIGENCE has arrived.

Paper available here: arxiv.org/abs/2303.12712
Video recorded at MIT on March 22nd, 2023

Textbooks Are All You Need

Sebastien Bubeck 2023-09-12 | I discuss the power of the "Textbooks Are All You Need" methodology to build much more compact LLMs using higher quality data. I emphasize phi-1 (coding LLM w. 1.3B parameters) arxiv.org/abs/2306.11644 and phi-1.5 (common sense reasoning LLM w. 1.3B parameters)arxiv.org/abs/2309.05463, and the original inspiration from TinyStories by Eldan and Li (fluent English LLM w. 10M parameters) arxiv.org/abs/2305.07759.

TinyStories by Ronen Eldan

Sebastien Bubeck 2023-06-30 | How Small Can Language Models Be and Still Speak Coherent English? Ronen Eldan discusses his recent work with Yuanzhi Li to answer this question. Based on arxiv.org/abs/2305.07759 .

Learning threshold neurons via the Edge of Stability

Sebastien Bubeck 2023-03-02 | Presentation by Kwangjun Ahn and Felipe Suarez of their work arxiv.org/abs/2212.07469 with S. Bubeck, S. Chewi, Y.T. Lee and Y. Zhang. This work presents an analysis of the non-convex dynamic in a toy model (the sparse coding model) that captures the essence of the emergence of edge detectors in convolutional neural networks. Surprisingly this emergence is connected to the well-documented phenomenon of instability in training neural networks (aka Edge of Stability).

Physics of AI

Sebastien Bubeck 2023-02-20 | We propose an approach to the science of deep learning that roughly follows what physicists do to understand reality: (1) explore phenomena through controlled experiments, and (2) build theories based on toy mathematical models and non-fully- rigorous mathematical reasoning. I illustrate (1) with the LEGO study (LEGO stands for Learning Equality and Group Operations), where we observe how transformers learn to solve simple linear systems of equations. I will also briefly illustrate (2) with an analysis of the emergence of threshold units when training a two-layers neural network to solve a simple sparse coding problem. The latter analysis connects to the recently discovered Edge of Stability phenomenon.

(1) arxiv.org/abs/2206.04301
(2) arxiv.org/abs/2212.07469

Unveiling Transformers with LEGO

Sebastien Bubeck 2022-06-30 | Based on joint work with Yi Zhang, Arturs Backurs, Ronen Eldan, Suriya Gunasekar, Tal Wagner arxiv.org/abs/2206.04301.

Mathematical theory of deep learning: Can we do it? Should we do it?

Sebastien Bubeck 2022-01-20 | Extended motivational speech to study deep learning mathematically.

I gave this talk at an NSF Town Hall where the goal was to discuss successes of deep learning especially in light of more traditional fields (other talks can be found here: nsf.gov/events/event_summ.jsp?cntn_id=304013&org=CISE).

A Universal Law of Robustness

Sebastien Bubeck 2021-08-24 | I give a tentative theoretical justification for why large overparametrization is important in neural networks.

Primarily based on "A Universal Law of Robustness via Isoperimetry" by S.B. and Mark Sellke.
arxiv.org/abs/2105.12806

Adversarial examples at initialization

Sebastien Bubeck 2021-04-09 | I discuss adversarial examples on two-layers neural networks at (random) initialization.

Based on ``A single gradient step finds adversarial examples on random two-layers neural networks" by S.B., Yeshwanth Cherapanamjeri, Gauthier Gidel, and Remi Tachet des Combes.
arxiv.org/abs/2104.03863

Crash course on tensors with application to neural networks

Sebastien Bubeck 2020-11-30 | Crash course on tensors (what they are, what cross norms are, basic generalities about nuclear norm/operator and rank), followed by an application of this to the law of robustness conjecture for neural networks (see arxiv.org/abs/2009.14444 and youtu.be/uRarIjJGmhs).

Typo: at minute 7 (and afterwards) it is said that the p^{th} tensor power of R^d is identified with R^{dp}, but of course it should be R^{d^p}.

A law of robustness for neural networks

Sebastien Bubeck 2020-10-06 | I describe a mathematical conjecture potentially establishing overparametrization as a law of robustness for neural networks. In particular it would imply that neural networks can be smooth if and only if they are greatly overparametrized.

Based on ``A law of robustness for two-layers neural networks" by S.B., Yuanzhi Li, and Dheeraj Nagaraj. arxiv.org/abs/2009.14444

Provable limitations of kernel methods

Sebastien Bubeck 2020-09-09 | Simple demonstration of the limitations of kernel methods, even if the training data points can be chosen (the latter makes parity with noise easy to learn for example). Based on the work of Zeyuan Allen-Zhu and Yuanzhi Li arxiv.org/abs/2001.04413 .

Memorization with small neural networks

Sebastien Bubeck 2020-07-05 | We present simple constructions of memorizing neural network with optimal number of neurons. In particular we describe how this can be achieved in the NTK (neural tangent kernel) regime.

Based on ``Network size and weights size for memorization with two-layers neural networks" by S.B., R. Eldan, Y.T. Lee and D. Mikulincer. arxiv.org/abs/2006.02855

Coordination without communication

Sebastien Bubeck 2020-04-30 | We present a two-players version of the classical (stochastic) prediction with expert advice setting. We describe a strategy with no communication between the players that achieves optimal regret.

Based on ``Coordination without communication: optimal regret in two players multi-armed bandits" by S.B. and T. Budzinski arxiv.org/abs/2002.07596 .

Randomized smoothing for certified robustness

Sebastien Bubeck 2020-03-30 | We give a short proof of the Cohen-Rosenfeld-Kolter theorem on the certified robustness of randomized smoothing.

Cohen-Rosenfeld-Kolter paper: arxiv.org/abs/1902.02918
Presented proof can be found in Salman-Yang-Li-Zhang-Zhang-Razenshteyn-Bubeck: arxiv.org/abs/1906.04584
For more background on adversarial examples see e.g., this video: youtube.com/watch?v=QWjPGHVrcEQ&list=PLgKuh-lKre12c2Il9mNX0Cmp9Z4oFNrQh

How to trap a gradient flow

Sebastien Bubeck 2020-02-19 | New results about finding stationary points of non-convex functions in low dimension.
Based on this paper arxiv.org/abs/2001.02968 , joint work with Dan Mikulincer.

Five Miracles of Mirror Descent, Lecture 9/9

Sebastien Bubeck 2019-09-26 | Lectures on ``some geometric aspects of randomized online decision making" by Sebastien Bubeck for the summer school
HDPA-2019 (High dimensional probability and algorithms) hdpa2019.sciencesconf.org

Lecture 9:
- The entropic barrier for linear bandits
- Best of both worlds property for multi-armed bandits
- The mystery of mirror descent's adaptivity

Lecture notes can be found here: hdpa2019.sciencesconf.org/data/pages/bubeck_hdpa.pdf

Five Miracles of Mirror Descent, Lecture 8/9

Sebastien Bubeck 2019-09-22 | Lectures on ``some geometric aspects of randomized online decision making" by Sebastien Bubeck for the summer school
HDPA-2019 (High dimensional probability and algorithms) hdpa2019.sciencesconf.org

Lecture 8:
- Introduction to the linear bandit problem
- Unbiased estimation for linear bandit and its variance term
- The Abernethy-Hazan-Rakhlin sampling scheme

Lecture notes can be found here: hdpa2019.sciencesconf.org/data/pages/bubeck_hdpa.pdf

Five Miracles of Mirror Descent, Lecture 7/9

Sebastien Bubeck 2019-09-16 | Lectures on ``some geometric aspects of randomized online decision making" by Sebastien Bubeck for the summer school
HDPA-2019 (High dimensional probability and algorithms) hdpa2019.sciencesconf.org

Lecture 7:
- Refined regret bound for multiplicative weights update from the general theorem for mirror descent
- Multi-armed bandit and importance sampling
- Optimal regret for multi-armed bandit with Tsallis entropy

Lecture notes can be found here: hdpa2019.sciencesconf.org/data/pages/bubeck_hdpa.pdf

Five Miracles of Mirror Descent, Lecture 6/9

Sebastien Bubeck 2019-09-09 | Lectures on ``some geometric aspects of randomized online decision making" by Sebastien Bubeck for the summer school HDPA-2019 (High dimensional probability and algorithms) hdpa2019.sciencesconf.org

Lecture 6:
- Discrete time analysis of mirror descent (with proof derived from the continuous time analysis)
- Revisiting Multiplicative Weights Update (from lecture 2) with discrete time mirror descent

Lecture notes can be found here: hdpa2019.sciencesconf.org/data/pages/bubeck_hdpa.pdf

Five Miracles of Mirror Descent, Lecture 5/9

Sebastien Bubeck 2019-08-21 | Lectures on ``some geometric aspects of randomized online decision making" by Sebastien Bubeck for the summer school
HDPA-2019 (High dimensional probability and algorithms) hdpa2019.sciencesconf.org

Lecture 5:
- From trees to general metric spaces: random embeddings
- The log(n) distortion theorem of Bartal 96/FRT03
- A ``simple" proof by James R. Lee tcsmath.github.io/online/2018/04/12/metric-approx

Lecture notes can be found here: hdpa2019.sciencesconf.org/data/pages/bubeck_hdpa.pdf

Five Miracles of Mirror Descent, Lecture 4/9

Sebastien Bubeck 2019-07-25 | Lectures on ``some geometric aspects of randomized online decision making" by Sebastien Bubeck for the summer school
HDPA-2019 (High dimensional probability and algorithms) hdpa2019.sciencesconf.org

Lecture 4:
- Complete analysis of MTS on weighted stars
- Some ideas to extend the analysis to trees

Lecture notes can be found here: hdpa2019.sciencesconf.org/data/pages/bubeck_hdpa.pdf

Five Miracles of Mirror Descent, Lecture 3/9

Sebastien Bubeck 2019-07-22 | Lectures on ``some geometric aspects of randomized online decision making" by Sebastien Bubeck for the summer school
HDPA-2019 (High dimensional probability and algorithms) hdpa2019.sciencesconf.org

Lecture 3:
- Definition of Metrical Task Systems (MTS)
- MTS examples
- Combinatorial arguments for uniform MTS
- Continuous space-time MTS framework

Lecture notes can be found here: hdpa2019.sciencesconf.org/data/pages/bubeck_hdpa.pdf

Five Miracles of Mirror Descent, Lecture 2/9

Sebastien Bubeck 2019-07-18 | Lectures on ``some geometric aspects of randomized online decision making" by Sebastien Bubeck for the summer school HDPA-2019 (High dimensional probability and algorithms) hdpa2019.sciencesconf.org

Lecture 2:
- Multiplicative update algorithm and its regret analysis
- Continuous time mirror descent and its potential analysis

Lecture notes can be found here: hdpa2019.sciencesconf.org/data/pages/bubeck_hdpa.pdf

Five Miracles of Mirror Descent, Lecture 1/9

Sebastien Bubeck 2019-07-15 | Lectures on ``some geometric aspects of randomized online decision making" by Sebastien Bubeck for the summer school HDPA-2019 (High dimensional probability and algorithms) hdpa2019.sciencesconf.org

Lecture 1:
- Brief reminders of convexity
- Classical analysis of gradient descent
- Robustness property and regret interpretation

Lecture notes can be found here: hdpa2019.sciencesconf.org/data/pages/bubeck_hdpa.pdf

Online Lipschitz Selection, Lecture 5/5

Sebastien Bubeck 2019-03-18 | Lectures on Online Lipschitz Selection by Sebastien Bubeck for the XIV Escuela de Verano en Matematicas Discretas http://eventos.cmm.uchile.cl/discretas2019/

Lecture 5:
- Fractional setting for k-server
- Mirror descent approach for fractional k-server
- Application: complete proof of log(k) for weighted paging
- Some insights to generalize from stars to trees

Online Lipschitz Selection, Lecture 4/5

Sebastien Bubeck 2019-03-06 | Lectures on Online Lipschitz Selection by Sebastien Bubeck for the XIV Escuela de Verano en Matematicas Discretas http://eventos.cmm.uchile.cl/discretas2019/

Lecture 4:
- Entropic regularization for MTS on weighted stars
- Multiscale entropic regularization for MTS on trees

Online Lipschitz Selection, Lecture 3/5

Sebastien Bubeck 2019-02-26 | Lectures on Online Lipschitz Selection by Sebastien Bubeck for the XIV Escuela de Verano en Matematicas Discretas http://eventos.cmm.uchile.cl/discretas2019/

Lecture 3:
- Brief reminder of the metrical task system framework
- Definition of Mirror Descent
- Potential analysis of MD (in continuous time)

Online Lipschitz Selection, Lecture 2/5

Sebastien Bubeck 2019-02-14 | Lectures on Online Lipschitz Selection by Sebastien Bubeck for the XIV Escuela de Verano en Matematicas Discretas http://eventos.cmm.uchile.cl/discretas2019/

Lecture 2:
- Description of the main conjectures (MTS, k-server, chasing convex body)
- The gradient descent insight from online learning, and a simpler continuous time version of this insight

Online Lipschitz Selection, Lecture 1/5

Sebastien Bubeck 2019-02-06 | Lectures on Online Lipschitz Selection by Sebastien Bubeck for the XIV Escuela de Verano en Matematicas Discretas http://eventos.cmm.uchile.cl/discretas2019/

Lecture 1:
- Classical selection problems
- A definition of ``Online Lipschitz" (a.k.a., competitive analysis)
- The base case: online selection in uniform spaces (illustration of the power of randomization)

Introduction to Statistical Learning Theory, Lecture 4/4

Sebastien Bubeck 2018-08-07 | Introduction to Statistical Learning Theory by Sebastien Bubeck for the 2018 Summer School ``Operations Research and Machine Learning" https://cermics-lab.enpc.fr/summer-school-operations-research-and-machine-learning/

Lecture 4:
- Information theoretic perspective on generalization
- PAC-Bayes analysis
- Online learning
- Margin theory
- A brief introduction to robust machine learning

Introduction to Statistical Learning Theory, Lecture 3/4

Sebastien Bubeck 2018-07-30 | Introduction to Statistical Learning Theory by Sebastien Bubeck for the 2018 Summer School ``Operations Research and Machine Learning" https://cermics-lab.enpc.fr/summer-school-operations-research-and-machine-learning/

Lecture 3:
- Reminders of first 2 lectures and VC dimension
- Generalization via stability
- Generalization via robustness

Introduction to Statistical Learning Theory, Lecture 2/4

Sebastien Bubeck 2018-07-23 | Introduction to Statistical Learning Theory by Sebastien Bubeck for the 2018 Summer School ``Operations Research and Machine Learning" https://cermics-lab.enpc.fr/summer-school-operations-research-and-machine-learning/

Lecture 2:
- Brief reminder of concentration inequalities
- Glivenko-Cantelli class and Rademacher complexity
- Application: analysis of bounded regression
- Brief discussion of type/cotype and application to the analysis of LASSO

Introduction to Statistical Learning Theory, Lecture 1/4

Sebastien Bubeck 2018-07-16 | Introduction to Statistical Learning Theory by Sebastien Bubeck for the 2018 Summer School ``Operations Research and Machine Learning"
https://cermics-lab.enpc.fr/summer-school-operations-research-and-machine-learning/

Lecture 1:
- What SLT is about
- Sample complexity/PAC learning
- Canonical settings (linear classification, linear regression, logistic regression, SVM, neural networks)
- Empirical Risk Minimization and the problem of overfitting
- Nearest neighbor classification

Bandit Convex Optimization, PGMO Lecture 4

Sebastien Bubeck 2018-05-14 | Lectures on Bandit Convex Optimization by Sebastien Bubeck for the Gaspard Monge Program in Optimization https://www.fondation-hadamard.fr/en/pgmo-seminars-courses/courses

Lecture 4:
- BCO with one-point estimate of the gradient
- Kernel methods for BCO
- Some topics to go further (contextual bandits, easy data bounds)

Bandit Convex Optimization, PGMO Lecture 3

Sebastien Bubeck 2018-04-19 | Lectures on Bandit Convex Optimization by Sebastien Bubeck for the Gaspard Monge Program in Optimization https://www.fondation-hadamard.fr/en/pgmo-seminars-courses/courses

Lecture 3:
- Online combinatorial optimization (full information, semi-bandit, bandit)
- Crash course in interior point methods (starts at 1:01:40)
- Linear bandits after Abernethy-Hazan-Rakhlin. Optimal bound via the entropic barrier of Bubeck-Eldan.

Bandit Convex Optimization, PGMO Lecture 2

Sebastien Bubeck 2018-04-04 | Lectures on Bandit Convex Optimization by Sebastien Bubeck for the Gaspard Monge Program in Optimization https://www.fondation-hadamard.fr/en/pgmo-seminars-courses/courses

Lecture 2:
- Introduction to the mirror descent algorithm
- Continuous time and discrete time analyses
- Multi-armed bandit regularizers (entropy, INF, and log-barrier) and their associated regret (classical bound, optimal bound, and small loss bound)
- Connection with online algorithms (in particular metrical task systems)

Bandit Convex Optimization, PGMO Lecture 1

Sebastien Bubeck 2018-03-24 | Lectures on Bandit Convex Optimization by Sebastien Bubeck for the Gaspard Monge Program in Optimization https://www.fondation-hadamard.fr/en/pgmo-seminars-courses/courses

Lecture 1:
- Introduction to regret analysis
- Basic multiplicative weights intuition and analysis
- Game theoretic/Information theoretic approach to regret analysis (both with ``full information" and ``bandit feedback")