@algorithmicsimplicity
  @algorithmicsimplicity
Algorithmic Simplicity | MAMBA from Scratch: Neural Nets Better and Faster than Transformers @algorithmicsimplicity | Uploaded May 2024 | Updated October 2024, 2 hours ago.
Mamba is a new neural network architecture that came out this year, and it performs better than transformers at language modelling! This is probably the most exciting development in AI since 2017. In this video I explain how to derive Mamba from the perspective of linear RNNs. And don't worry, there's no state space model theory needed!

Mamba paper: openreview.net/forum?id=AL1fq05o7H
Linear RNN paper: openreview.net/forum?id=M3Yd3QyRG4

#mamba
#deeplearning
#largelanguagemodels

00:00 Intro
01:33 Recurrent Neural Networks
05:24 Linear Recurrent Neural Networks
06:57 Parallelizing Linear RNNs
15:33 Vanishing and Exploding Gradients
19:08 Stable initialization
21:53 State Space Models
24:33 Mamba
25:26 The High Performance Memory Trick
27:35 The Mamba Drama
MAMBA from Scratch: Neural Nets Better and Faster than TransformersA better way to think about Taylor series #SoMEpiTransformer Neural Networks Derived from ScratchWhy Does Diffusion Work Better than Auto-Regression?

MAMBA from Scratch: Neural Nets Better and Faster than Transformers @algorithmicsimplicity

SHARE TO X SHARE TO REDDIT SHARE TO FACEBOOK WALLPAPER