Transformer Neural Networks Derived from Scratch @algorithmicsimplicity

Algorithmic Simplicity #transformers #chatgpt #SoME3 #deeplearning

Join me on a deep dive to understand the most successful neural network ever invented: the transformer. Transformers, originally invented for natural language translation, are now everywhere. They have fast taken over the world of machine learning (and the world more generally) and are now used for almost every application, not the least of which is ChatGPT.

In this video I take a more constructive approach to explaining the transformer: starting from a simple convolutional neural network, I will step through all of the changes that need to be made, along with the motivations for why these changes need to be made.

*By "from scratch" I mean "from a comprehensive mastery of the intricacies of convolutional neural network training dynamics". Here is a refresher on CNNs: youtube.com/watch?v=8iIdWHjleIs

Chapters:
00:00 Intro
01:13 CNNs for text
05:28 Pairwise Convolutions
07:54 Self-Attention
13:39 Optimizations

updated 1 year ago

Transformer Neural Networks Derived from Scratch

Algorithmic Simplicity 2023-08-18 | #transformers #chatgpt #SoME3 #deeplearning

Join me on a deep dive to understand the most successful neural network ever invented: the transformer. Transformers, originally invented for natural language translation, are now everywhere. They have fast taken over the world of machine learning (and the world more generally) and are now used for almost every application, not the least of which is ChatGPT.

In this video I take a more constructive approach to explaining the transformer: starting from a simple convolutional neural network, I will step through all of the changes that need to be made, along with the motivations for why these changes need to be made.

*By "from scratch" I mean "from a comprehensive mastery of the intricacies of convolutional neural network training dynamics". Here is a refresher on CNNs: youtube.com/watch?v=8iIdWHjleIs

Chapters:
00:00 Intro
01:13 CNNs for text
05:28 Pairwise Convolutions
07:54 Self-Attention
13:39 Optimizations

A better way to think about Taylor series #SoMEpi

Algorithmic Simplicity 2024-08-05 | #somepi #someπ

0:00 - Intro
1:06 - The Fundamental Theorem of Calculus
2:27 - Deriving Taylor's Polynomial
7:33 - Approximation Error Convergence Analysis
12:21 - Deriving the Generalized Taylor's Polynomial

Taylor's polynomial expansion is a core part of high-school level calculus. However, I was never satisfied with the way it was taught to me, as the motivation for it seemed to come out of nowhere. In this video, I show how Taylor's polynomial, and an explicit formula for the error of the polynomial approximation, and a generalized version of Taylor's polynomial with multiple centres, are all the result of just applying the fundamental theorem of calculus over and over again.

While I was working on this video, I was contacted by the team from GiveInternet, which is a charity organization that aims to provide internet access to students in under-developed countries. They offered to do a collaboration with me, but I didn't have time to look into their organization properly before this video went out, so I did not accept. Nevertheless, it seemed like a good organization, so if you want to donate money to their cause you can do so here: giveinternet.org/AlgorithmicSimplicity . To be clear, they have not sponsored this video in any way.

MAMBA from Scratch: Neural Nets Better and Faster than Transformers

Algorithmic Simplicity 2024-05-01 | Mamba is a new neural network architecture that came out this year, and it performs better than transformers at language modelling! This is probably the most exciting development in AI since 2017. In this video I explain how to derive Mamba from the perspective of linear RNNs. And don't worry, there's no state space model theory needed!

Mamba paper: openreview.net/forum?id=AL1fq05o7H
Linear RNN paper: openreview.net/forum?id=M3Yd3QyRG4

#mamba
#deeplearning
#largelanguagemodels

00:00 Intro
01:33 Recurrent Neural Networks
05:24 Linear Recurrent Neural Networks
06:57 Parallelizing Linear RNNs
15:33 Vanishing and Exploding Gradients
19:08 Stable initialization
21:53 State Space Models
24:33 Mamba
25:26 The High Performance Memory Trick
27:35 The Mamba Drama

Why Does Diffusion Work Better than Auto-Regression?

Algorithmic Simplicity 2024-02-16 | Have you ever wondered how generative AI actually works? Well the short answer is, in exactly the same as way as regular AI!

In this video I break down the state of the art in generative AI - Auto-regressors and Denoising Diffusion models - and explain how this seemingly magical technology is all the result of curve fitting, like the rest of machine learning.

Come learn the differences (and similarities!) between auto-regression and diffusion, why these methods are needed to perform generation of complex natural data, and why diffusion models work better for image generation but are not used for text generation.

The following generative models were featured as demos in this video:
Images: Adobe Firefly (adobe.com/products/firefly.html)
Text: ChatGPT (chat.openai.com)
Audio: Suno.ai (suno.ai)
Code: Gemini (gemini.google.com/app)
Video: Lumiere (Lumiere-video.github.io)

Chapters:
00:00 Intro to Generative AI
02:40 Why Naïve Generation Doesn't Work
03:52 Auto-regression
08:32 Generalized Auto-regression
11:43 Denoising Diffusion
14:19 Optimizations
14:30 Re-using Models and Causal Architectures
16:35 Diffusion Models Predict the Noise Instead of the Image
18:19 Conditional Generation
19:08 Classifier-free Guidance

Why do Convolutional Neural Networks work so well?

Algorithmic Simplicity 2022-10-29 | While deep learning has existed since the 1970s, it wasn't until 2010 that deep learning exploded in popularity, to the point that deep neural networks are now used ubiquitously for all machine learning tasks. The reason for this explosion is the invention of the convolutional neural network. This remarkably simple architecture allowed neural networks to be trained on new kinds of data which were previously thought impossible.

In this video I discuss what a convolutional neural network is, why it is needed, what it can and cannot do, and why it works so damn well.

00:00 Intro
01:18 The curse of dimensionality
06:39 Convolutional neural networks
13:09 The spatial structure of images
15:06 Conclusion

But what is a neural network REALLY?

Algorithmic Simplicity 2022-08-16 | My submission for 2022 #SoME2. In this video I try to explain what a neural network is in the simplest way possible. That means no linear algebra, no calculus, and definitely no statistics. The aim is to be accessible to absolutely anyone.

00:00 Intro
00:47 Gauss & Parametric Regression
02:59 Fitting a Straight Line
06:39 Defining a 1-layer Neural Network
09:29 Defining a 2-layer Neural Network

Part of the motivation for making this video is to try to dispel some of the misunderstandings around #deeplearning and to highlight 1) just how simple the neural network algorithm actually is and 2) just how NOT like a human brain it is.

I also haven't seen Gauss's original discovery of parametric regression presented anywhere before, and I think its a fun story to highlight just how far (and how little) data science has come in 200 years.

***************************

In full disclosure, planets do not orbit in straight lines, and Gauss did not fit a straight line to Ceres' positions, but rather an ellipse (in 3d).