Neel Nanda

@UCBMJ0D-omcRay8dh4QT0doQ - 11.6K subscribers

Home Videos Live Playlists

How Aligned Is Claude? A Live Review of the Opus 4.5 System Card

How Aligned Is Claude? A Live Review of the Opus 4.5 System Card Neel Nanda

2.1K views - 1 month ago

Bitter Lesson-Pilled Interp: A Live Paper Review (Activation Oracles & PCD)

Bitter Lesson-Pilled Interp: A Live Paper Review (Activation Oracles & PCD) Neel Nanda

3.2K views - 2 months ago

Creating Models Worth Interpreting

Creating Models Worth Interpreting Neel Nanda

1.8K views - 2 months ago

How Reasoning Models Break Mechanistic Interpretability Techniques

How Reasoning Models Break Mechanistic Interpretability Techniques Neel Nanda

2.9K views - 2 months ago

Can Interpretability Control Model Training?

Can Interpretability Control Model Training? Neel Nanda

1.2K views - 3 months ago

Science of Misalignment

Science of Misalignment Neel Nanda

1.5K views - 3 months ago

How Will Mech Interp Help Make AGI Safe?

How Will Mech Interp Help Make AGI Safe? Neel Nanda

1.9K views - 3 months ago

What Matters Right Now In Mechanistic Interpretability?

What Matters Right Now In Mechanistic Interpretability? Neel Nanda

4.8K views - 3 months ago

What Happened With Sparse Autoencoders?

What Happened With Sparse Autoencoders? Neel Nanda

6K views - 3 months ago

The Story of Mech Interp

The Story of Mech Interp Neel Nanda

5.3K views - 3 months ago

What do models learn during finetuning? A model diffing paper walkthrough w/ Clement & Julian

What do models learn during finetuning? A model diffing paper walkthrough w/ Clement & Julian Neel Nanda

3.7K views - 3 months ago

Can LLMs Introspect? A Live Paper Review

Can LLMs Introspect? A Live Paper Review Neel Nanda

3.3K views - 3 months ago

How To Interpret Chain Of Thought: A Walkthrough

How To Interpret Chain Of Thought: A Walkthrough Neel Nanda

3.9K views - 6 months ago

How To Think About Thinking Models

How To Think About Thinking Models Neel Nanda

9.5K views - 10 months ago

Neel Does Research (Vibe Coding Edition)

Neel Does Research (Vibe Coding Edition) Neel Nanda

6.8K views - 10 months ago

4 Philosophies of Interpretability

4 Philosophies of Interpretability Neel Nanda

3K views - 10 months ago

A Walkthrough of Copy Suppression w/ Callum McDougall, Arthur Conmy & Cody Rushing Part 2/3

A Walkthrough of Copy Suppression w/ Callum McDougall, Arthur Conmy & Cody Rushing Part 2/3 Neel Nanda

1.6K views - 2 years ago

A Walkthrough of Copy Suppression w/ Callum McDougall, Arthur Conmy & Cody Rushing Part 3/3

A Walkthrough of Copy Suppression w/ Callum McDougall, Arthur Conmy & Cody Rushing Part 3/3 Neel Nanda

477 views - 2 years ago

A Walkthrough of Copy Suppression w/ Callum McDougall, Arthur Conmy & Cody Rushing Part 1/3

A Walkthrough of Copy Suppression w/ Callum McDougall, Arthur Conmy & Cody Rushing Part 1/3 Neel Nanda

1.9K views - 2 years ago

A Walkthrough of Automated Circuit Discovery w/ Arthur Conmy Part 1/3

A Walkthrough of Automated Circuit Discovery w/ Arthur Conmy Part 1/3 Neel Nanda

4.3K views - 2 years ago

A Walkthrough of Automated Circuit Discovery w/ Arthur Conmy Part 2/3

A Walkthrough of Automated Circuit Discovery w/ Arthur Conmy Part 2/3 Neel Nanda

1.4K views - 2 years ago

A Walkthrough of Automated Circuit Discovery w/ Arthur Conmy Part 3/3

A Walkthrough of Automated Circuit Discovery w/ Arthur Conmy Part 3/3 Neel Nanda

799 views - 2 years ago

Realtime Research Walkthrough: Parenthesis Balancing in 1L Toy Language Model (Part 2)

Realtime Research Walkthrough: Parenthesis Balancing in 1L Toy Language Model (Part 2) Neel Nanda

848 views - 2 years ago

Realtime Research Walkthrough: Parenthesis Balancing in 1L Toy Language Model (Part 1)

Realtime Research Walkthrough: Parenthesis Balancing in 1L Toy Language Model (Part 1) Neel Nanda

1.8K views - 2 years ago

Real-Time Research Walkthrough: Mover Heads Part 2/2

Real-Time Research Walkthrough: Mover Heads Part 2/2 Neel Nanda

873 views - 2 years ago

Real-Time Research Walkthrough: Mover Heads Part 1/2

Real-Time Research Walkthrough: Mover Heads Part 1/2 Neel Nanda

5.9K views - 2 years ago

Real-Time Research Walkthrough: Addition in GPT-J

Real-Time Research Walkthrough: Addition in GPT-J Neel Nanda

2.8K views - 2 years ago

A Walkthrough of Finding Neurons In A Haystack w/ Wes Gurnee Part 1/3

A Walkthrough of Finding Neurons In A Haystack w/ Wes Gurnee Part 1/3 Neel Nanda

2.3K views - 2 years ago

A Walkthrough of Finding Neurons In A Haystack w/ Wes Gurnee Part 2/3

A Walkthrough of Finding Neurons In A Haystack w/ Wes Gurnee Part 2/3 Neel Nanda

942 views - 2 years ago

A Walkthrough of Finding Neurons In A Haystack w/ Wes Gurnee Part 3/3

A Walkthrough of Finding Neurons In A Haystack w/ Wes Gurnee Part 3/3 Neel Nanda

723 views - 2 years ago

A Walkthrough of Aligning Causal Variables and Distributed Representations w/ Atticus Geiger (1/3)

A Walkthrough of Aligning Causal Variables and Distributed Representations w/ Atticus Geiger (1/3) Neel Nanda

2.8K views - 2 years ago

A Walkthrough of Aligning Causal Variables and Distributed Representations w/ Atticus Geiger (2/3)

A Walkthrough of Aligning Causal Variables and Distributed Representations w/ Atticus Geiger (2/3) Neel Nanda

1.3K views - 2 years ago

A Walkthrough of Aligning Causal Variables and Distributed Representations w/ Atticus Geiger (3/3)

A Walkthrough of Aligning Causal Variables and Distributed Representations w/ Atticus Geiger (3/3) Neel Nanda

1.1K views - 2 years ago

Implementing GPT-2 From Scratch (Transformer Walkthrough Part 2/2)

Implementing GPT-2 From Scratch (Transformer Walkthrough Part 2/2) Neel Nanda

20.1K views - 2 years ago

What is a Transformer? (Transformer Walkthrough Part 1/2)

What is a Transformer? (Transformer Walkthrough Part 1/2) Neel Nanda

34K views - 2 years ago

A Walkthrough of Progress Measures for Grokking via Mechanistic Interpretability: Why? (Part 3/3)

A Walkthrough of Progress Measures for Grokking via Mechanistic Interpretability: Why? (Part 3/3) Neel Nanda

1.4K views - 2 years ago

A Walkthrough of Progress Measures for Grokking via Mechanistic Interpretability: How? (Part 2/3)

A Walkthrough of Progress Measures for Grokking via Mechanistic Interpretability: How? (Part 2/3) Neel Nanda

2K views - 2 years ago

A Walkthrough of Progress Measures for Grokking via Mechanistic Interpretability: What? (Part 1/3)

A Walkthrough of Progress Measures for Grokking via Mechanistic Interpretability: What? (Part 1/3) Neel Nanda

7.4K views - 2 years ago

A Walkthrough of Reverse-Engineering Modular Addition: Why does it grok? (Part 3/3)

A Walkthrough of Reverse-Engineering Modular Addition: Why does it grok? (Part 3/3) Neel Nanda

1.6K views - 2 years ago

A Walkthrough of Reverse-Engineering Modular Addition: The Fourier Multiplication Algorithm Part 2/3

A Walkthrough of Reverse-Engineering Modular Addition: The Fourier Multiplication Algorithm Part 2/3 Neel Nanda

2.9K views - 2 years ago

A Walkthrough of Reverse-Engineering Modular Addition: Model Training (Part 1/3)

A Walkthrough of Reverse-Engineering Modular Addition: Model Training (Part 1/3) Neel Nanda

5.6K views - 2 years ago

Project Advising Call: Memorisation in GPT-2 Small (w/ Tessa Barton + Kushal Jain)

Project Advising Call: Memorisation in GPT-2 Small (w/ Tessa Barton + Kushal Jain) Neel Nanda

1.2K views - 3 years ago

A Walkthrough of Toy Models of Superposition w/ Jess Smith

A Walkthrough of Toy Models of Superposition w/ Jess Smith Neel Nanda

8.8K views - 3 years ago

A Walkthrough of In-Context Learning and Induction Heads Part 1 of 2 (w/ Charles Frye)

A Walkthrough of In-Context Learning and Induction Heads Part 1 of 2 (w/ Charles Frye) Neel Nanda

6.1K views - 3 years ago

A Walkthrough of Interpretability in the Wild Part 2/2: Deep Dive (w/ authors Kevin, Arthur & Alex)

A Walkthrough of Interpretability in the Wild Part 2/2: Deep Dive (w/ authors Kevin, Arthur & Alex) Neel Nanda

1.9K views - 3 years ago

A Walkthrough of Interpretability in the Wild Part 1/2: Overview (w/ authors Kevin, Arthur, Alex)

A Walkthrough of Interpretability in the Wild Part 1/2: Overview (w/ authors Kevin, Arthur, Alex) Neel Nanda

6.7K views - 3 years ago

Real-Time Research Recording: Can a Transformer Re-Derive Positional Info?

Real-Time Research Recording: Can a Transformer Re-Derive Positional Info? Neel Nanda

7.8K views - 3 years ago

A Walkthrough of A Mathematical Framework for Transformer Circuits

A Walkthrough of A Mathematical Framework for Transformer Circuits Neel Nanda

44.7K views - 3 years ago