41:42 TMLR: On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective Gabriel Mongaras 783 views - 4 weeks ago
39:07 Energy-Based Transformers are Scalable Learners and Thinkers Gabriel Mongaras 2.6K views - 8 months ago
59:58 ATLAS: Learning to Optimally Memorize the Context at Test Time Gabriel Mongaras 1.3K views - 8 months ago
47:19 RWKV-7 "Goose" with Expressive Dynamic State Evolution Gabriel Mongaras 1.5K views - 11 months ago
29:34 Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Gabriel Mongaras 2.2K views - 1 year ago
40:08 Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Gabriel Mongaras 6K views - 1 year ago
28:26 VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Gabriel Mongaras 948 views - 1 year ago
48:21 MiniMax-01: Scaling Foundation Models with Lightning Attention Gabriel Mongaras 1.8K views - 1 year ago
45:05 Byte Latent Transformer: Patches Scale Better Than Tokens Gabriel Mongaras 3.2K views - 1 year ago
25:22 TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters Gabriel Mongaras 1.8K views - 1 year ago
32:31 Round and Round We Go! What makes Rotary Positional Encodings useful? Gabriel Mongaras 2.2K views - 1 year ago
1:13:10 Deterministic Image Editing with DDPM Inversion, DDIM Inversion, Null Inversion and Prompt-to-Prompt Gabriel Mongaras 3.3K views - 1 year ago
42:25 Attending to Topological Spaces: The Cellular Transformer Gabriel Mongaras 839 views - 1 year ago
35:52 Learning to (Learn at Test Time): RNNs with Expressive Hidden States Gabriel Mongaras 4.6K views - 1 year ago
52:39 WARP: On the Benefits of Weight Averaged Rewarded Policies Gabriel Mongaras 827 views - 1 year ago
28:52 CoDeF: Content Deformation Fields for Temporally Consistent Video Processing Gabriel Mongaras 937 views - 1 year ago
1:14:43 Mamba 2 - Transformers are SSMs: Generalized Models and Efficient Algorithms Through SSS Duality Gabriel Mongaras 15.3K views - 1 year ago
38:55 CoPE - Contextual Position Encoding: Learning to Count What's Important Gabriel Mongaras 1.6K views - 1 year ago
45:48 NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models Gabriel Mongaras 1.2K views - 1 year ago
30:07 LADD: Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation Gabriel Mongaras 1.4K views - 1 year ago
37:00 Visual AutoRegressive Modeling:Scalable Image Generation via Next-Scale Prediction Gabriel Mongaras 8.6K views - 1 year ago
32:49 Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Gabriel Mongaras 4K views - 1 year ago
40:14 Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Gabriel Mongaras 2.6K views - 1 year ago
1:02:30 Stable Diffusion 3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Gabriel Mongaras 9K views - 1 year ago
37:08 GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Gabriel Mongaras 1.6K views - 1 year ago
46:25 The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits and BitNet Gabriel Mongaras 6.4K views - 2 years ago
1:02:38 OpenAI Sora and DiTs: Scalable Diffusion Models with Transformers Gabriel Mongaras 14.9K views - 2 years ago
33:55 A Decoder-only Foundation Model For Time-series Forecasting Gabriel Mongaras 6.8K views - 2 years ago
37:30 Lumiere: A Space-Time Diffusion Model for Video Generation Gabriel Mongaras 776 views - 2 years ago
25:56 Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads Gabriel Mongaras 2.9K views - 2 years ago
40:23 Boundary Attention: Learning to Find Faint Boundaries at Any Resolution Gabriel Mongaras 529 views - 2 years ago
29:38 Cached Transformers: Improving Transformers with Differentiable Memory Cache Gabriel Mongaras 929 views - 2 years ago
39:02 Translatotron 3: Speech to Speech Translation with Monolingual Data Gabriel Mongaras 1.3K views - 2 years ago
44:02 Mamba: Linear-Time Sequence Modeling with Selective State Spaces Gabriel Mongaras 10.6K views - 2 years ago
47:32 Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference Gabriel Mongaras 2.8K views - 2 years ago
40:51 Unsupervised Discovery of Semantic Latent Directions in Diffusion Models Gabriel Mongaras 800 views - 2 years ago
18:45 DALL-E 3 - Improving Image Generation with Better Captions Gabriel Mongaras 682 views - 2 years ago
38:18 LRM: Large Reconstruction Model for Single Image to 3D Gabriel Mongaras 2.2K views - 2 years ago
30:46 CodeFusion: A Pre-trained Diffusion Model for Code Generation Gabriel Mongaras 887 views - 2 years ago
36:04 UniAudio: An Audio Foundation Model Toward Universal Audio Generation Gabriel Mongaras 1K views - 2 years ago
57:43 QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models Gabriel Mongaras 2.3K views - 2 years ago
33:27 StreamingLLM - Efficient Streaming Language Models with Attention Sinks Explained Gabriel Mongaras 2.5K views - 2 years ago
26:26 InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation Explained Gabriel Mongaras 1.1K views - 2 years ago