Join me on a deep dive to understand the most successful neural network ever invented: the transformer. Transformers, originally invented for natural language translation, are now everywhere. They have fast taken over the world of machine learning (and the world more generally) and are now used for almost every application, not the least of which is ChatGPT.
In this video I take a more constructive approach to explaining the transformer: starting from a simple convolutional neural network, I will step through all of the changes that need to be made, along with the motivations for why these changes need to be made.
*By "from scratch" I mean "from a comprehensive mastery of the intricacies of convolutional neural network training dynamics". Here is a refresher on CNNs: youtube.com/watch?v=8iIdWHjleIs
Chapters: 00:00 Intro 01:13 CNNs for text 05:28 Pairwise Convolutions 07:54 Self-Attention 13:39 Optimizations
Join me on a deep dive to understand the most successful neural network ever invented: the transformer. Transformers, originally invented for natural language translation, are now everywhere. They have fast taken over the world of machine learning (and the world more generally) and are now used for almost every application, not the least of which is ChatGPT.
In this video I take a more constructive approach to explaining the transformer: starting from a simple convolutional neural network, I will step through all of the changes that need to be made, along with the motivations for why these changes need to be made.
*By "from scratch" I mean "from a comprehensive mastery of the intricacies of convolutional neural network training dynamics". Here is a refresher on CNNs: youtube.com/watch?v=8iIdWHjleIs
Chapters: 00:00 Intro 01:13 CNNs for text 05:28 Pairwise Convolutions 07:54 Self-Attention 13:39 OptimizationsA better way to think about Taylor series #SoMEpiAlgorithmic Simplicity2024-08-05 | #somepi #someπ
0:00 - Intro 1:06 - The Fundamental Theorem of Calculus 2:27 - Deriving Taylor's Polynomial 7:33 - Approximation Error Convergence Analysis 12:21 - Deriving the Generalized Taylor's Polynomial
Taylor's polynomial expansion is a core part of high-school level calculus. However, I was never satisfied with the way it was taught to me, as the motivation for it seemed to come out of nowhere. In this video, I show how Taylor's polynomial, and an explicit formula for the error of the polynomial approximation, and a generalized version of Taylor's polynomial with multiple centres, are all the result of just applying the fundamental theorem of calculus over and over again.
While I was working on this video, I was contacted by the team from GiveInternet, which is a charity organization that aims to provide internet access to students in under-developed countries. They offered to do a collaboration with me, but I didn't have time to look into their organization properly before this video went out, so I did not accept. Nevertheless, it seemed like a good organization, so if you want to donate money to their cause you can do so here: giveinternet.org/AlgorithmicSimplicity . To be clear, they have not sponsored this video in any way.MAMBA from Scratch: Neural Nets Better and Faster than TransformersAlgorithmic Simplicity2024-05-01 | Mamba is a new neural network architecture that came out this year, and it performs better than transformers at language modelling! This is probably the most exciting development in AI since 2017. In this video I explain how to derive Mamba from the perspective of linear RNNs. And don't worry, there's no state space model theory needed!
00:00 Intro 01:33 Recurrent Neural Networks 05:24 Linear Recurrent Neural Networks 06:57 Parallelizing Linear RNNs 15:33 Vanishing and Exploding Gradients 19:08 Stable initialization 21:53 State Space Models 24:33 Mamba 25:26 The High Performance Memory Trick 27:35 The Mamba DramaWhy Does Diffusion Work Better than Auto-Regression?Algorithmic Simplicity2024-02-16 | Have you ever wondered how generative AI actually works? Well the short answer is, in exactly the same as way as regular AI!
In this video I break down the state of the art in generative AI - Auto-regressors and Denoising Diffusion models - and explain how this seemingly magical technology is all the result of curve fitting, like the rest of machine learning.
Come learn the differences (and similarities!) between auto-regression and diffusion, why these methods are needed to perform generation of complex natural data, and why diffusion models work better for image generation but are not used for text generation.
Chapters: 00:00 Intro to Generative AI 02:40 Why Naïve Generation Doesn't Work 03:52 Auto-regression 08:32 Generalized Auto-regression 11:43 Denoising Diffusion 14:19 Optimizations 14:30 Re-using Models and Causal Architectures 16:35 Diffusion Models Predict the Noise Instead of the Image 18:19 Conditional Generation 19:08 Classifier-free GuidanceWhy do Convolutional Neural Networks work so well?Algorithmic Simplicity2022-10-29 | While deep learning has existed since the 1970s, it wasn't until 2010 that deep learning exploded in popularity, to the point that deep neural networks are now used ubiquitously for all machine learning tasks. The reason for this explosion is the invention of the convolutional neural network. This remarkably simple architecture allowed neural networks to be trained on new kinds of data which were previously thought impossible.
In this video I discuss what a convolutional neural network is, why it is needed, what it can and cannot do, and why it works so damn well.
00:00 Intro 01:18 The curse of dimensionality 06:39 Convolutional neural networks 13:09 The spatial structure of images 15:06 ConclusionBut what is a neural network REALLY?Algorithmic Simplicity2022-08-16 | My submission for 2022 #SoME2. In this video I try to explain what a neural network is in the simplest way possible. That means no linear algebra, no calculus, and definitely no statistics. The aim is to be accessible to absolutely anyone.
00:00 Intro 00:47 Gauss & Parametric Regression 02:59 Fitting a Straight Line 06:39 Defining a 1-layer Neural Network 09:29 Defining a 2-layer Neural Network
Part of the motivation for making this video is to try to dispel some of the misunderstandings around #deeplearning and to highlight 1) just how simple the neural network algorithm actually is and 2) just how NOT like a human brain it is.
I also haven't seen Gauss's original discovery of parametric regression presented anywhere before, and I think its a fun story to highlight just how far (and how little) data science has come in 200 years.
***************************
In full disclosure, planets do not orbit in straight lines, and Gauss did not fit a straight line to Ceres' positions, but rather an ellipse (in 3d).