Soroush Mehraban

@UCCCzAbwp5De5wfiP7oGJtBQ - 6.2K subscribers

Home Videos Live Playlists

STARS (WACV'26): Self-supervised Tuning for 3D Action Recognition in Skeleton Sequences.

STARS (WACV'26): Self-supervised Tuning for 3D Action Recognition in Skeleton Sequences. Soroush Mehraban

129 views - 2 weeks ago

FastHMR (WACV'26):Accelerating Human Mesh Recovery via Token & Layer Merging with Diffusion Decoding

FastHMR (WACV'26):Accelerating Human Mesh Recovery via Token & Layer Merging with Diffusion Decoding Soroush Mehraban

124 views - 2 weeks ago

TRELLIS: One Latent for Any 3D Asset

TRELLIS: One Latent for Any 3D Asset Soroush Mehraban

396 views - 1 month ago

LightlyTrain - Train Better Models, Faster - No Labels Needed

LightlyTrain - Train Better Models, Faster - No Labels Needed Soroush Mehraban

710 views - 10 months ago

One-step Diffusion with Distribution Matching Distillation

One-step Diffusion with Distribution Matching Distillation Soroush Mehraban

1.9K views - 1 year ago

Variational Score Distillation (VSD) Helps Create Amazing 3D Scenes From Text Prompts

Variational Score Distillation (VSD) Helps Create Amazing 3D Scenes From Text Prompts Soroush Mehraban

599 views - 1 year ago

Dream-in-4D: Paper Explained!

Dream-in-4D: Paper Explained! Soroush Mehraban

377 views - 1 year ago

FreeU - Paper Explained

FreeU - Paper Explained Soroush Mehraban

758 views - 1 year ago

AnimateDiff - Paper explained!

AnimateDiff - Paper explained! Soroush Mehraban

694 views - 1 year ago

DreamFusion: Text-to-3D using 2D Diffusion

DreamFusion: Text-to-3D using 2D Diffusion Soroush Mehraban

1.5K views - 1 year ago

Null-text Inversion for Editing Real Images using Guided Diffusion Models

Null-text Inversion for Editing Real Images using Guided Diffusion Models Soroush Mehraban

1.1K views - 1 year ago

Prompt-to-Prompt (P2P) image Editing - Method Explained

Prompt-to-Prompt (P2P) image Editing - Method Explained Soroush Mehraban

787 views - 1 year ago

Denoising Diffusion Null-Space Model (DDNM) - Method Explained

Denoising Diffusion Null-Space Model (DDNM) - Method Explained Soroush Mehraban

834 views - 1 year ago

Autoregressive Image Generation without Vector Quantization

Autoregressive Image Generation without Vector Quantization Soroush Mehraban

2.2K views - 1 year ago

Diffusion Models (DDPM & DDIM) - Easily explained!

Diffusion Models (DDPM & DDIM) - Easily explained! Soroush Mehraban

27.5K views - 1 year ago

GLIGEN (CVPR2023): Open-Set Grounded Text-to-Image Generation

GLIGEN (CVPR2023): Open-Set Grounded Text-to-Image Generation Soroush Mehraban

865 views - 1 year ago

The Entropy Enigma: Success and Failure of Entropy Minimization

The Entropy Enigma: Success and Failure of Entropy Minimization Soroush Mehraban

790 views - 1 year ago

Tent: Fully Test-time Adaptation by Entropy Minimization

Tent: Fully Test-time Adaptation by Entropy Minimization Soroush Mehraban

866 views - 1 year ago

VPD (ICCV2023): Unleashing Text-to-Image Diffusion Models for Visual Perception

VPD (ICCV2023): Unleashing Text-to-Image Diffusion Models for Visual Perception Soroush Mehraban

389 views - 1 year ago

TokenHMR (CVPR2024): Advancing Human Mesh Recovery witha Tokenized Pose Representation

TokenHMR (CVPR2024): Advancing Human Mesh Recovery witha Tokenized Pose Representation Soroush Mehraban

764 views - 1 year ago

SHViT (CVPR2024): Single-Head Vision Transformer with Memory Efficient Macro Design

SHViT (CVPR2024): Single-Head Vision Transformer with Memory Efficient Macro Design Soroush Mehraban

1.5K views - 1 year ago

InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation

InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation Soroush Mehraban

1.3K views - 1 year ago

FastV: An Image is Worth 1/2 Tokens After Layer 2

FastV: An Image is Worth 1/2 Tokens After Layer 2 Soroush Mehraban

829 views - 1 year ago

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Soroush Mehraban

1.9K views - 2 years ago

PoseGPT (ChatPose): Chatting about 3D Human Pose

PoseGPT (ChatPose): Chatting about 3D Human Pose Soroush Mehraban

1.2K views - 2 years ago

MotionAGFormer (WACV2024): Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network

MotionAGFormer (WACV2024): Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network Soroush Mehraban

1.5K views - 2 years ago

HD-GCN (ICCV2023): Skeleton-Based Action Recognition

HD-GCN (ICCV2023): Skeleton-Based Action Recognition Soroush Mehraban

3.1K views - 2 years ago

ST-GCN: Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

ST-GCN: Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition Soroush Mehraban

8.4K views - 2 years ago

Graph Convolutional Networks (GCN): From CNN point of view

Graph Convolutional Networks (GCN): From CNN point of view Soroush Mehraban

15.7K views - 2 years ago

DINO: Self-Supervised Vision Transformers

DINO: Self-Supervised Vision Transformers Soroush Mehraban

9.4K views - 2 years ago

MoCo (+ v2): Unsupervised learning in computer vision

MoCo (+ v2): Unsupervised learning in computer vision Soroush Mehraban

5.4K views - 2 years ago

ViTPose: 2D Human Pose Estimation

ViTPose: 2D Human Pose Estimation Soroush Mehraban

5.7K views - 2 years ago

TrackFormer: Multi-Object Tracking with Transformers

TrackFormer: Multi-Object Tracking with Transformers Soroush Mehraban

7.6K views - 2 years ago

MetaFormer is Actually What You Need for Vision

MetaFormer is Actually What You Need for Vision Soroush Mehraban

1.4K views - 2 years ago

ConvNet beats Vision Transformers (ConvNeXt) Paper explained

ConvNet beats Vision Transformers (ConvNeXt) Paper explained Soroush Mehraban

3.2K views - 2 years ago

Swin Transformer V2 - Paper explained

Swin Transformer V2 - Paper explained Soroush Mehraban

5.8K views - 2 years ago

Masked Autoencoders (MAE) Paper Explained

Masked Autoencoders (MAE) Paper Explained Soroush Mehraban

9.2K views - 2 years ago

Relative Position Bias (+ PyTorch Implementation)

Relative Position Bias (+ PyTorch Implementation) Soroush Mehraban

6.2K views - 2 years ago

Swin Transformer - Paper Explained

Swin Transformer - Paper Explained Soroush Mehraban

23.1K views - 3 years ago

Vision Transformer (ViT) Paper Explained

Vision Transformer (ViT) Paper Explained Soroush Mehraban

5.3K views - 3 years ago

Convolutional Block Attention Module (CBAM) Paper Explained

Convolutional Block Attention Module (CBAM) Paper Explained Soroush Mehraban

15.1K views - 3 years ago

Squeeze-and-Excitation Networks (SENet) paper explained

Squeeze-and-Excitation Networks (SENet) paper explained Soroush Mehraban

11.6K views - 3 years ago

Faster R-CNN: Faster than Fast R-CNN!

Faster R-CNN: Faster than Fast R-CNN! Soroush Mehraban

12.6K views - 3 years ago

Receptive Fields: Why 3x3 conv layer is the best?

Receptive Fields: Why 3x3 conv layer is the best? Soroush Mehraban

10.5K views - 3 years ago

Fast R-CNN: Everything you need to know from the paper

Fast R-CNN: Everything you need to know from the paper Soroush Mehraban

22K views - 3 years ago

R-CNN: Clearly EXPLAINED!

R-CNN: Clearly EXPLAINED! Soroush Mehraban

62.6K views - 3 years ago