Ctrl Menu
Web Portal
10:33 May 15, 2021
OpenAI Sora and DiTs: Scalable Diffusion Models with Transformers
Web Portal
_ Minimize
Full View
Guide View
Open
Back
Forward
Go
|<
>
O
>|
10:33 / 11:11
OpenAI Sora and DiTs: Scalable Diffusion Models with Transformers @gabrielmongaras
updated 8 months ago
OpenAI Sora and DiTs: Scalable Diffusion Models with Transformers
Gabriel Mongaras 2024-02-18 | Sora: openai.com/sora Sora paper (Video generation models as world simulators): openai.com/research/video-generation-models-as-world-simulators DiTs - Scalable Diffusion Models with Transformers paper: arxiv.org/abs/2212.09748 My notes: drive.google.com/file/d/1h2pcgkrI0b6965f1xjf4kTyhhvxZNM3b/view?usp=drive_link
Deterministic Image Editing with DDPM Inversion, DDIM Inversion, Null Inversion and Prompt-to-Prompt
Gabriel Mongaras 2024-07-31 | Null-text Inversion for Editing Real Images using Guided Diffusion Models: arxiv.org/abs/2211.09794 An Edit Friendly DDPM Noise Space: Inversion and Manipulations: arxiv.org/abs/2304.06140 Prompt-to-Prompt Image Editing with Cross Attention Control: arxiv.org/abs/2208.01626 00:00 Intro 01:24 Current image editing techniques 11:42 Deriving DDPM and DDIM 23:08 DDIM inversion 32:46 Null inversion 47:15 DDPM inversion 1:01:18 Prompt-to-prompt 1:10:52 Conclusion
Attending to Topological Spaces: The Cellular Transformer
Gabriel Mongaras 2024-07-22 | Paper here: arxiv.org/abs/2405.14094 Notes: drive.google.com/file/d/12g_KkHqXD6mEDILJzYbCC08i8cDHITfC/view?usp=drive_link 00:00 Intro 01:39 Cellular complexes 07:26 K-cochain 13:26 Defining structure on the cell 20:28 Cellular transformer 34:18 Positional encodings and outro
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Gabriel Mongaras 2024-07-12 | Paper here: arxiv.org/abs/2407.04620 Code!: github.com/test-time-training/ttt-lm-pytorch Notes: drive.google.com/file/d/127a1UBm_IN_WMKG-DmEvfJ8Pja-9BwDk/view?usp=drive_link 00:00 Intro 04:40 Problem with RNNs 06:38 Meta learning and method idea 09:13 Update rule and RNN inner loop 15:07 Learning the loss function outer loop 21:21 Parallelizing training 30:05 Results
WARP: On the Benefits of Weight Averaged Rewarded Policies
Gabriel Mongaras 2024-07-06 | Paper here: arxiv.org/abs/2406.16768 Notes: drive.google.com/file/d/11UK7mEZwNVUMYuXwvOTfaqHhN8zSYm5M/view?usp=drive_link 00:00 Intro and RLHF 17:30 Problems with RLHF 21:08 Overview of their method 23:47 EMA 28:00 Combining policies with SLERP 37:34 Linear interpolation towards initialization 40:32 Code 44:16 Results
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
Gabriel Mongaras 2024-06-25 | Paper: arxiv.org/abs/2308.07926 Paper page: qiuyu96.github.io/CoDeF Code: github.com/qiuyu96/CoDeF My notes: drive.google.com/file/d/10PMKdd5XBd6Y60HlRB9IW9naR2bWziDT/view?usp=drive_link 00:00 Intro 03:00 Method overview 08:40 Method details 15:24 Tricks done for training and how to actually train this thing 19:24 Flow loss and masking 25:10 Conclusion
Mamba 2 - Transformers are SSMs: Generalized Models and Efficient Algorithms Through SSS Duality
Gabriel Mongaras 2024-06-16 | Paper here: arxiv.org/abs/2405.21060 Code!: github.com/state-spaces/mamba/blob/main/mamba_ssm/modules/mamba2.py Notes: drive.google.com/file/d/1--XGPFeXQyx4CPxgYjzR4qrLd-baLWQC/view?usp=sharing 00:00 Intro 01:45 SSMs 08:00 Quadratic form of an SSM 15:02 Expanded form of an SSM 24:00 Attention - it's all you need?? 29:55 Kernel attention 32:50 Linear attention 34:32 Relating attention to SSMs 38:35 Defining the M matrix 43:48 Splitting the M matrix 46:30 Off diagonal decomposition 54:00 Recurrent form of the off diagonal 1:03:30 Combining the M matrix blocks and code 1:06:22 Complexity and other analysis
CoPE - Contextual Position Encoding: Learning to Count Whats Important
Gabriel Mongaras 2024-06-04 | Paper: arxiv.org/abs/2405.18719 My notes: drive.google.com/file/d/1y9VHZc7MLqc6t2SHHdlVTYeW3czmmRbl/view?usp=sharing 00:00 Intro 02:44 Background 09:58 CoPE 24:50 Code 32:16 Results
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Gabriel Mongaras 2024-05-28 | Paper: arxiv.org/abs/2403.03100 Demo: speechresearch.github.io/naturalspeech3 Code: huggingface.co/spaces/amphion/naturalspeech3_facodec My notes: drive.google.com/file/d/1xnzErd_86B6eLwqpLckhoEQKqkxFPyM_/view?usp=drive_link 00:00 Intro 05:34 Architecture overview 18:45 GRL and subspace independence 24:45 Discrete diffusion Model 41:00 factorized diffusion model 44:00 Conclusion and results
xLSTM: Extended Long Short-Term Memory
Gabriel Mongaras 2024-05-17 | Paper: arxiv.org/abs/2405.04517 My notes: drive.google.com/file/d/1wFYvU_1oUWcCNuQ91zTpSGAeNUsPjlt3/view?usp=drive_link 00:00 Intro 05:44 LSTM 13:38 Problems paper addresses 14:12 sLSTM 23:00 sLSTM Memory mixing 27:08 mLSTM 35:14 Results and stuff
KAN: Kolmogorov-Arnold Networks
Gabriel Mongaras 2024-05-04 | Paper: arxiv.org/abs/2404.19756 Spline Video: https://m.youtube.com/watch?v=qhQrRCJ-mVg My notes: drive.google.com/file/d/1twcIF13nG8Qc10_qeDqCZ4NaUh9tFsAH/view?usp=drive_link 00:00 Intro 00:45 MLPs and Intuition 05:12 Splines 19:02 KAN Formulation 28:00 Potential Downsides to KANs 32:09 Results
LADD: Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation
Gabriel Mongaras 2024-04-29 | Paper: arxiv.org/abs/2403.12015 My notes: drive.google.com/file/d/1s1-nnWR_ZR26PNSAoZR1Xj3nuD9UZlvR/view?usp=sharing 00:00 Intro 01:31 Diffusion Models 08:08 Latent Diffusion Models 10:04 Distillation 12:02 Aversarial Diffusion Distillation (ADD) 17:06 Latent Aversarial Diffusion Distillation (LADD) 22:20 Results
Visual AutoRegressive Modeling:Scalable Image Generation via Next-Scale Prediction
Gabriel Mongaras 2024-04-21 | Paper: arxiv.org/abs/2404.02905 Demo: https://var.vision/ Code: github.com/FoundationVision/VAR My notes: drive.google.com/file/d/1qym3JG-0xqEgQhdvkt9N17o-ZzUWy2sn/view?usp=drive_link 00:00 Intro 00:53 DiTs 04:06 Autoregressive Image Transformers 06:23 Tokenization problem with AR ViTs 08:43 VAE 10:47 Discrete Quantization - VQGAN 16:42 Visual Autoregressive Modeling 21:31 Causal Inference with VAR 24:02 Losses 25:16 Residual Modeling 33:26 Summary 34:11 Results
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Gabriel Mongaras 2024-04-14 | Paper: arxiv.org/abs/2404.07143 My notes: drive.google.com/file/d/1plWJDwHTZkRK9PDdvaLMnZjFR6fVvNLH/view?usp=drive_link 00:00 Intro 07:17 Model intuition 11:00 Memory retrieval operation 16:29 Hidden state updates 21:58 Delta update 24:10 Is it causal? 25:26 Combining local attention and RNN 27:26 Results 30:25 Sampling and Conclusion
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Gabriel Mongaras 2024-04-08 | Paper: arxiv.org/abs/2404.02258 My notes: drive.google.com/file/d/1o4v5te1yfuK_FQPvvS8SR55Sysg04dYK/view?usp=drive_link 00:00 Intro 06:02 Mixture of Experts (MoE) 15:12 Mixture of Depths (MoD) 17:04 The gradients must flow! 22:40 Autoregressive Sampling 33:58 Results
Q* AGI Achieved (Apr Fools)
Gabriel Mongaras 2024-04-01 | Q* paper link: link.springer.com/content/pdf/10.1007/BF00992698.pdf April fools 😏
Stable Diffusion 3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Gabriel Mongaras 2024-03-28 | Website paper: stability.ai/news/stable-diffusion-3-research-paper Paper: arxiv.org/abs/2403.03206 My notes: drive.google.com/file/d/1n8rSM3OuOkzDBlXdK5VBrnADnEXp4xXv/view?usp=drive_link 00:00 Intro 01:58 DDPM 13:16 ODE/SDE formulation and score 18:09 ODE intuition 21:38 Rectified Flows 27:46 Sampling from a diffusion model 29:16 Going to the latent space 32:17 CLIP 37:53 Model architecture 56:18 Results and stuff
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Gabriel Mongaras 2024-03-21 | My notes: drive.google.com/file/d/1l2B4m8tDVchfsplIbps4-9533fcxqubF/view?usp=drive_link Paper: arxiv.org/abs/2403.03507 00:00 Intro 02:44 Intuition and proof of low rank 12:28 GaLore intuition 16:38 More GaLore intuition 21:20 GaLore algorithm 27:50 Algorithm analysis 33:00 Results
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits and BitNet
Gabriel Mongaras 2024-03-06 | My notes: BitNet: drive.google.com/file/d/1iA2tISamkfQq4jgZZBBSH1MN3Bgtc99_/view?usp=sharing Era of 1-bit LLMs: drive.google.com/file/d/1iNy91MTP53kTCSkeqHBqMOSePPyoYvCD/view?usp=sharing BitNet: Scaling 1-bit Transformers for Large Language Models: arxiv.org/abs/2310.11453 The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits: arxiv.org/abs/2402.17764 00:00 Intro 03:10 BitLinear Intuition 08:05 Weight Quantization 10:35 Activation Quantization 16:30 Matrix Multiplication and Dequantizing 23:08 Model Parallelism with Group Quantization and Normalization 32:36 Other Training Stuff 37:11 BitNet Results 39:11 The Era of 1-Bit LLMs
DoRA: Weight-Decomposed Low-Rank Adaptation
Gabriel Mongaras 2024-02-23 | Paper: arxiv.org/abs/2402.09353 My notes: drive.google.com/file/d/1hA56lNtz7jxQPWIxBpnDUsiLFaFZlyyP/view?usp=sharing
A Decoder-only Foundation Model For Time-series Forecasting
Gabriel Mongaras 2024-02-07 | Paper: arxiv.org/abs/2310.10688 Notes: drive.google.com/file/d/1fmk5Z5VJkqHvEbNXlq1OiIBP317NqNfN/view?usp=sharing
Lumiere: A Space-Time Diffusion Model for Video Generation
Gabriel Mongaras 2024-02-02 | Paper: arxiv.org/abs/2401.12945 Demo: lumiere-video.github.io Notes: drive.google.com/file/d/1fJl-ijVy6KML1YwM_9UVVU-MSfipDIqe/view?usp=sharing
Exphormer: Sparse Transformers for Graphs
Gabriel Mongaras 2024-01-29 | Paper here: arxiv.org/abs/2303.06147 Notes: drive.google.com/file/d/1eXoXtPgJYKBTKd7oN8StuBLW453yWJ3f/view?usp=drive_link
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Gabriel Mongaras 2024-01-24 | Paper here: arxiv.org/abs/2401.10774 demo: sites.google.com/view/medusa-llm Notes: drive.google.com/file/d/1eOminZIC4wrjjWIBnSroxBYduCXzs86E/view?usp=drive_link
Boundary Attention: Learning to Find Faint Boundaries at Any Resolution
Gabriel Mongaras 2024-01-18 | Paper here: arxiv.org/abs/2401.00935 Notes: drive.google.com/file/d/1eAiAhbmvczYQwHqHHv-GJeDlX-WlZBBI/view?usp=sharing
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Gabriel Mongaras 2024-01-04 | Paper here: arxiv.org/abs/2312.12742 Code here: github.com/annosubmission/GRC-Cache Notes: drive.google.com/file/d/1cgR14tZmrF3lQROMT_2RUig2dBfhqU9z/view?usp=sharing
Translatotron 3: Speech to Speech Translation with Monolingual Data
Gabriel Mongaras 2023-12-27 | Translatotron 3: arxiv.org/abs/2305.17547 Translatotron 2: arxiv.org/abs/2107.08661 Demo: google-research.github.io/lingvo-lab/translatotron3 Notes: Translatotron 3: drive.google.com/file/d/1EfOCuKp9yeLBzhxjsiTWuoYaVToBbgon/view?usp=sharing Translatotron 2: drive.google.com/file/d/1zPrIvZspMWpWPaFhvgpM2DEYzsvTL8R6/view?usp=drive_link
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Gabriel Mongaras 2023-12-12 | Paper here: arxiv.org/abs/2312.00752 The annotated S4: srush.github.io/annotated-s4 Notes: drive.google.com/file/d/1aoaKj3kuTtpHi0OzinXZGyZIFxhqp514/view?usp=sharing
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Gabriel Mongaras 2023-12-06 | Paper Link: arxiv.org/abs/2310.04378 My Notes: drive.google.com/file/d/1aUDxMSWNAqMkg0P91Ms1vu4yCeTtzSsR/view?usp=sharing
Adversarial Diffusion Distillation
Gabriel Mongaras 2023-11-30 | Paper Link: arxiv.org/abs/2311.17042 Stability Link: stability.ai/research/adversarial-diffusion-distillation My Notes: drive.google.com/file/d/1a7EZpQ-4_jjt7Fic1EQlyGnHOX1xB9Af/view?usp=sharing
Unsupervised Discovery of Semantic Latent Directions in Diffusion Models
Gabriel Mongaras 2023-11-21 | Paper found here: arxiv.org/abs/2302.12469 My notes: drive.google.com/file/d/1_wFtrtxZk7ZYq6-FfUILET3Nga8KCzsz/view?usp=drive_link
DALL-E 3 - Improving Image Generation with Better Captions
Gabriel Mongaras 2023-11-20 | Blog post here: openai.com/dall-e-3 My notes: drive.google.com/file/d/1_lSM24dNSdzAvP8MKaKIyfbsASn4UfYe/view?usp=sharing
LRM: Large Reconstruction Model for Single Image to 3D
Gabriel Mongaras 2023-11-13 | Paper found here: arxiv.org/abs/2311.04400 My notes: drive.google.com/file/d/1_cI6cYIm8QZrv0lhfYBG7ULXc4szr8Hg/view?usp=sharing
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Gabriel Mongaras 2023-11-06 | Paper found here: arxiv.org/abs/2310.17680v1 My chicken scratch: drive.google.com/file/d/1ErA6RsKW__uxmlprgdIO13PRCU69Q-U5/view?usp=drive_link
Matryoshka Diffusion Models Explained
Gabriel Mongaras 2023-10-30 | Paper found here: arxiv.org/abs/2310.15111
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Gabriel Mongaras 2023-10-22 | Paper: arxiv.org/abs/2310.00704 Code: github.com/yangdongchao/UniAudio Demo: https://dongchaoyang.top/UniAudio_demo/
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Gabriel Mongaras 2023-10-16 | Paper found here: arxiv.org/abs/2309.14717v2
StreamingLLM - Efficient Streaming Language Models with Attention Sinks Explained
Gabriel Mongaras 2023-10-07 | Paper found here: arxiv.org/abs/2309.17453 Code found here: github.com/mit-han-lab/streaming-llm
FreeU: Free Lunch in Diffusion U-Net Explained
Gabriel Mongaras 2023-09-24 | Paper found here: arxiv.org/abs/2309.11497
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation Explained
Gabriel Mongaras 2023-09-17 | Paper found here: arxiv.org/abs/2309.06380
Llama/Wizard LM Finetuning with Huggingface on RunPod
Gabriel Mongaras 2023-09-16 | A demo I made to show how to fine-tune a WizardLM model with Huggingface and peft. Presentation: docs.google.com/presentation/d/17TyDtImkcXnIXwd6CDoYxCXBprvtD_n1I3RlImJg8gQ/edit?usp=sharing Github: github.com/gmongaras/Wizard_QLoRA_Finetuning
2x Faster Language Model Pre-training via Masked Structural Growth
Gabriel Mongaras 2023-09-10 | Paper found here: arxiv.org/abs/2305.02869
Bayesian Flow Networks (BFN) Explained
Gabriel Mongaras 2023-09-03 | Paper found here: arxiv.org/abs/2308.07037
WizardLM: Empowering Large Language Models to Follow Complex Instructions Explained
Gabriel Mongaras 2023-08-27 | Paper found here: arxiv.org/abs/2304.12244 Code release: github.com/nlpxucan/WizardLM
From Sparse to Soft Mixtures of Experts Explained
Gabriel Mongaras 2023-08-21 | Paper found here: arxiv.org/abs/2308.00951
BK-SDM: Architecturally Compressed Stable Diffusion for Efficient T2I Generation Explained
Gabriel Mongaras 2023-08-16 | Paper found here: openreview.net/forum?id=bOVydU0XKC
Direct Preference Optimization (DPO): Your Language Model is Secretly a Reward Model Explained
Gabriel Mongaras 2023-08-10 | Paper found here: arxiv.org/abs/2305.18290
Universal and Transferable Adversarial Attacks on Aligned Language Models Explained
Gabriel Mongaras 2023-08-06 | Paper found here: arxiv.org/abs/2307.15043 Demo here: llm-attacks.org
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis Explained
Gabriel Mongaras 2023-08-01 | Paper found here: arxiv.org/abs/2307.01952
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations Explained
Gabriel Mongaras 2023-07-30 | Paper found here: arxiv.org/abs/2108.01073