5:46:05 Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation Umar Jamil 132.4K views - 1 year ago
58:04 Attention is all you need (Transformer) - Model explanation (including math), Inference and Training Umar Jamil 701.8K views - 3 years ago
1:14:29 Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math Umar Jamil 62.1K views - 2 years ago
2:15:13 Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code. Umar Jamil 71K views - 2 years ago
7:38:18 Flash Attention derived and coded from first principles with Triton (Python) Umar Jamil 84.1K views - 1 year ago
2:59:24 Coding a Transformer from scratch on PyTorch, with full explanation, training and inference. Umar Jamil 369.7K views - 3 years ago
49:24 Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW) Umar Jamil 87.1K views - 2 years ago
1:26:21 Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer Umar Jamil 40.8K views - 2 years ago
54:52 BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token Umar Jamil 80.8K views - 2 years ago
27:12 Variational Autoencoder - Model, ELBO, loss function and maths explained easily! Umar Jamil 70.4K views - 2 years ago
1:12:53 Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code Umar Jamil 38.8K views - 2 years ago
1:19:37 Paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Umar Jamil 62.3K views - 1 year ago
48:46 Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math Umar Jamil 36.4K views - 2 years ago
50:55 Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training Umar Jamil 54.8K views - 2 years ago
0:15 KOFXV | Geese Howard Cool Combo Video by Umar Jamil | Combo Video (Optimal) Umar Jamil 103 views - 2 years ago
1:15:39 Kolmogorov-Arnold Networks: MLP vs KAN, Math, B-Splines, Universal Approximation Theorem Umar Jamil 39.5K views - 2 years ago