MAMBA from Scratch: Neural Nets Better and Faster than Transformers @algorithmicsimplicity

Algorithmic Simplicity | MAMBA from Scratch: Neural Nets Better and Faster than Transformers @algorithmicsimplicity | Uploaded May 2024 | Updated October 2024, 2 hours ago.
Mamba is a new neural network architecture that came out this year, and it performs better than transformers at language modelling! This is probably the most exciting development in AI since 2017. In this video I explain how to derive Mamba from the perspective of linear RNNs. And don't worry, there's no state space model theory needed!

Mamba paper: openreview.net/forum?id=AL1fq05o7H
Linear RNN paper: openreview.net/forum?id=M3Yd3QyRG4

#mamba
#deeplearning
#largelanguagemodels

00:00 Intro
01:33 Recurrent Neural Networks
05:24 Linear Recurrent Neural Networks
06:57 Parallelizing Linear RNNs
15:33 Vanishing and Exploding Gradients
19:08 Stable initialization
21:53 State Space Models
24:33 Mamba
25:26 The High Performance Memory Trick
27:35 The Mamba Drama

A better way to think about Taylor series #SoMEpi

Transformer Neural Networks Derived from Scratch

Why Does Diffusion Work Better than Auto-Regression?