Gabriel Mongaras

@UCYUq87t77YNTG5m256fOXeQ - 14.5K subscribers

Just some guy making exploring and making videos about current AI topics.

Home Videos Live Playlists

TMLR: On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective

TMLR: On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective Gabriel Mongaras

1K views - 5 months ago

Hierarchical Reasoning Models

Hierarchical Reasoning Models Gabriel Mongaras

12K views - 10 months ago

Energy-Based Transformers are Scalable Learners and Thinkers

Energy-Based Transformers are Scalable Learners and Thinkers Gabriel Mongaras

2.7K views - 11 months ago

Fast and Simplex: 2-Simplicial Attention in Triton

Fast and Simplex: 2-Simplicial Attention in Triton Gabriel Mongaras

1.4K views - 11 months ago

Hardware-Efficient Attention for Fast Decoding

Hardware-Efficient Attention for Fast Decoding Gabriel Mongaras

1.2K views - 1 year ago

ATLAS: Learning to Optimally Memorize the Context at Test Time

ATLAS: Learning to Optimally Memorize the Context at Test Time Gabriel Mongaras

1.3K views - 1 year ago

Coding Stable Diffusion 3 From Scratch

Coding Stable Diffusion 3 From Scratch Gabriel Mongaras

3.1K views - 1 year ago

Intro to Attention and Its Forms

Intro to Attention and Its Forms Gabriel Mongaras

4.2K views - 1 year ago

RWKV-7 "Goose" with Expressive Dynamic State Evolution

RWKV-7 "Goose" with Expressive Dynamic State Evolution Gabriel Mongaras

1.6K views - 1 year ago

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Gabriel Mongaras

2.2K views - 1 year ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Gabriel Mongaras

6.3K views - 1 year ago

VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models

VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Gabriel Mongaras

954 views - 1 year ago

DeepSeek-V3 Gabriel Mongaras

29K views - 1 year ago

Titans: Learning to Memorize at Test Time

Titans: Learning to Memorize at Test Time Gabriel Mongaras

4.1K views - 1 year ago

MiniMax-01: Scaling Foundation Models with Lightning Attention

MiniMax-01: Scaling Foundation Models with Lightning Attention Gabriel Mongaras

1.8K views - 1 year ago

Memory Layers at Scale

Memory Layers at Scale Gabriel Mongaras

1.7K views - 1 year ago

Byte Latent Transformer: Patches Scale Better Than Tokens

Byte Latent Transformer: Patches Scale Better Than Tokens Gabriel Mongaras

3.2K views - 1 year ago

Scaling up Masked Diffusion Models on Text

Scaling up Masked Diffusion Models on Text Gabriel Mongaras

1K views - 1 year ago

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters Gabriel Mongaras

1.8K views - 1 year ago

Round and Round We Go! What makes Rotary Positional Encodings useful?

Round and Round We Go! What makes Rotary Positional Encodings useful? Gabriel Mongaras

2.2K views - 1 year ago

Deterministic Image Editing with DDPM Inversion, DDIM Inversion, Null Inversion and Prompt-to-Prompt

Deterministic Image Editing with DDPM Inversion, DDIM Inversion, Null Inversion and Prompt-to-Prompt Gabriel Mongaras

3.4K views - 1 year ago

Attending to Topological Spaces: The Cellular Transformer

Attending to Topological Spaces: The Cellular Transformer Gabriel Mongaras

850 views - 1 year ago

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Learning to (Learn at Test Time): RNNs with Expressive Hidden States Gabriel Mongaras

4.7K views - 1 year ago

WARP: On the Benefits of Weight Averaged Rewarded Policies

WARP: On the Benefits of Weight Averaged Rewarded Policies Gabriel Mongaras

830 views - 1 year ago

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing Gabriel Mongaras

951 views - 2 years ago

Mamba 2 - Transformers are SSMs: Generalized Models and Efficient Algorithms Through SSS Duality

Mamba 2 - Transformers are SSMs: Generalized Models and Efficient Algorithms Through SSS Duality Gabriel Mongaras

16K views - 2 years ago

CoPE - Contextual Position Encoding: Learning to Count What's Important

CoPE - Contextual Position Encoding: Learning to Count What's Important Gabriel Mongaras

1.6K views - 2 years ago

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models Gabriel Mongaras

1.2K views - 2 years ago

xLSTM: Extended Long Short-Term Memory

xLSTM: Extended Long Short-Term Memory Gabriel Mongaras

2.5K views - 2 years ago

KAN: Kolmogorov-Arnold Networks

KAN: Kolmogorov-Arnold Networks Gabriel Mongaras

58K views - 2 years ago

LADD: Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

LADD: Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation Gabriel Mongaras

1.4K views - 2 years ago

Visual AutoRegressive Modeling:Scalable Image Generation via Next-Scale Prediction

Visual AutoRegressive Modeling:Scalable Image Generation via Next-Scale Prediction Gabriel Mongaras

8.9K views - 2 years ago

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Gabriel Mongaras

4K views - 2 years ago

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Gabriel Mongaras

2.6K views - 2 years ago

Q* AGI Achieved (Apr Fools)

Q* AGI Achieved (Apr Fools) Gabriel Mongaras

823 views - 2 years ago

Stable Diffusion 3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Stable Diffusion 3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Gabriel Mongaras

9.3K views - 2 years ago

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Gabriel Mongaras

1.6K views - 2 years ago

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits and BitNet

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits and BitNet Gabriel Mongaras

6.3K views - 2 years ago

DoRA: Weight-Decomposed Low-Rank Adaptation

DoRA: Weight-Decomposed Low-Rank Adaptation Gabriel Mongaras

2.9K views - 2 years ago

OpenAI Sora and DiTs: Scalable Diffusion Models with Transformers

OpenAI Sora and DiTs: Scalable Diffusion Models with Transformers Gabriel Mongaras

14K views - 2 years ago

A Decoder-only Foundation Model For Time-series Forecasting

A Decoder-only Foundation Model For Time-series Forecasting Gabriel Mongaras

6.9K views - 2 years ago

Lumiere: A Space-Time Diffusion Model for Video Generation

Lumiere: A Space-Time Diffusion Model for Video Generation Gabriel Mongaras

790 views - 2 years ago

Exphormer: Sparse Transformers for Graphs

Exphormer: Sparse Transformers for Graphs Gabriel Mongaras

612 views - 2 years ago

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads Gabriel Mongaras

2.9K views - 2 years ago

Boundary Attention: Learning to Find Faint Boundaries at Any Resolution

Boundary Attention: Learning to Find Faint Boundaries at Any Resolution Gabriel Mongaras

532 views - 2 years ago

Cached Transformers: Improving Transformers with Differentiable Memory Cache

Cached Transformers: Improving Transformers with Differentiable Memory Cache Gabriel Mongaras

934 views - 2 years ago

Translatotron 3: Speech to Speech Translation with Monolingual Data

Translatotron 3: Speech to Speech Translation with Monolingual Data Gabriel Mongaras

1.3K views - 2 years ago

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Mamba: Linear-Time Sequence Modeling with Selective State Spaces Gabriel Mongaras

10K views - 2 years ago

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference Gabriel Mongaras

2.9K views - 2 years ago

Adversarial Diffusion Distillation

Adversarial Diffusion Distillation Gabriel Mongaras

2.3K views - 2 years ago

Unsupervised Discovery of Semantic Latent Directions in Diffusion Models

Unsupervised Discovery of Semantic Latent Directions in Diffusion Models Gabriel Mongaras

809 views - 2 years ago

DALL-E 3 - Improving Image Generation with Better Captions

DALL-E 3 - Improving Image Generation with Better Captions Gabriel Mongaras

683 views - 2 years ago

LRM: Large Reconstruction Model for Single Image to 3D

LRM: Large Reconstruction Model for Single Image to 3D Gabriel Mongaras

2.3K views - 2 years ago

CodeFusion: A Pre-trained Diffusion Model for Code Generation

CodeFusion: A Pre-trained Diffusion Model for Code Generation Gabriel Mongaras

888 views - 2 years ago

Matryoshka Diffusion Models Explained

Matryoshka Diffusion Models Explained Gabriel Mongaras

746 views - 2 years ago

UniAudio: An Audio Foundation Model Toward Universal Audio Generation

UniAudio: An Audio Foundation Model Toward Universal Audio Generation Gabriel Mongaras

1K views - 2 years ago

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models Gabriel Mongaras

2.2K views - 2 years ago

StreamingLLM - Efficient Streaming Language Models with Attention Sinks Explained

StreamingLLM - Efficient Streaming Language Models with Attention Sinks Explained Gabriel Mongaras

2.5K views - 2 years ago

FreeU: Free Lunch in Diffusion U-Net Explained

FreeU: Free Lunch in Diffusion U-Net Explained Gabriel Mongaras

2.6K views - 2 years ago

InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation Explained

InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation Explained Gabriel Mongaras

1K views - 2 years ago