FAR․AI

@UCCV6kbjBZje3LPxRp0NHfxg - 41.4K subscribers

Frontier alignment research to ensure the safe development and deployment of advanced AI systems.

Home Videos Live Playlists

Sören Mindermann - The International AI Safety Report 2026 [Alignment Workshop]

Sören Mindermann - The International AI Safety Report 2026 [Alignment Workshop] FAR․AI

106 views - 3 days ago

Stefan Heimersheim - Obfuscation Atlas: Mapping Where Honesty Emerges in RLVR with Deception Probes

Stefan Heimersheim - Obfuscation Atlas: Mapping Where Honesty Emerges in RLVR with Deception Probes FAR․AI

144 views - 5 days ago

Marius Hobbhahn - Science of Scheming [Alignment Workshop]

Marius Hobbhahn - Science of Scheming [Alignment Workshop] FAR․AI

1.1K views - 1 week ago

Thomas Clarke - Safe Widespread Adoption of AI [Alignment Workshop]

Thomas Clarke - Safe Widespread Adoption of AI [Alignment Workshop] FAR․AI

100 views - 1 week ago

Stephen Casper - ML Researchers as Policymakers [Alignment Workshop]

Stephen Casper - ML Researchers as Policymakers [Alignment Workshop] FAR․AI

441 views - 1 week ago

Kellin Pelrine - Truth and Falsehood Symmetric in AI Persuasion - But does it have to be? [Alignment

Kellin Pelrine - Truth and Falsehood Symmetric in AI Persuasion - But does it have to be? [Alignment FAR․AI

197 views - 2 weeks ago

Matija Franklin - Distributed AGI Safety in Emerging Agent Economies [Alignment Workshop]

Matija Franklin - Distributed AGI Safety in Emerging Agent Economies [Alignment Workshop] FAR․AI

225 views - 2 weeks ago

Vincent Conitzer - AI Testing Should Account for Sophisticated Strategic Behaviour [Alignment Worksh

Vincent Conitzer - AI Testing Should Account for Sophisticated Strategic Behaviour [Alignment Worksh FAR․AI

248 views - 2 weeks ago

Zachary Kenton - A Vision for Scalable Oversight [Alignment Workshop]

Zachary Kenton - A Vision for Scalable Oversight [Alignment Workshop] FAR․AI

354 views - 3 weeks ago

Neel Nanda - Our Pivot To Pragmatic Interpretability [Alignment Workshop]

Neel Nanda - Our Pivot To Pragmatic Interpretability [Alignment Workshop] FAR․AI

1.5K views - 3 weeks ago

Adam Gleave - Threat Models and Alignment [Alignment Workshop]

Adam Gleave - Threat Models and Alignment [Alignment Workshop] FAR․AI

347 views - 3 weeks ago

Simon Möller - Implementing the Code of Practice [Alignment Workshop]

Simon Möller - Implementing the Code of Practice [Alignment Workshop] FAR․AI

112 views - 4 weeks ago

Matthieu Delescluse - AI Safety at the EU AI Office [Alignment Workshop]

Matthieu Delescluse - AI Safety at the EU AI Office [Alignment Workshop] FAR․AI

78 views - 4 weeks ago

Gillian Hadfield - AI Regulatory Capacity with Independent Verification Organizations [Alignment Wor

Gillian Hadfield - AI Regulatory Capacity with Independent Verification Organizations [Alignment Wor FAR․AI

222 views - 1 month ago

Rohin Shah - How to Theorize So Empiricists Will Listen [Alignment Workshop]

Rohin Shah - How to Theorize So Empiricists Will Listen [Alignment Workshop] FAR․AI

744 views - 1 month ago

London Alignment Workshop 2026: Official Highlights

London Alignment Workshop 2026: Official Highlights FAR․AI

319 views - 1 month ago

San Diego Alignment Workshop 2025: Official Recap

San Diego Alignment Workshop 2025: Official Recap FAR․AI

173 views - 1 month ago

Bin Yu - Veridical Data Science towards Trustworthy AI

Bin Yu - Veridical Data Science towards Trustworthy AI FAR․AI

116 views - 1 month ago

Dawn Song - Fireside Chat with Dawn Song [Alignment Workshop]

Dawn Song - Fireside Chat with Dawn Song [Alignment Workshop] FAR․AI

131 views - 1 month ago

Yisen Wang - Finding & Reactivating Safety Mechanisms of Post-Trained LLMs [Alignment Workshop]

Yisen Wang - Finding & Reactivating Safety Mechanisms of Post-Trained LLMs [Alignment Workshop] FAR․AI

152 views - 1 month ago

Chenhao Tan - Automating Mechanistic Interpretability [Alignment Workshop]

Chenhao Tan - Automating Mechanistic Interpretability [Alignment Workshop] FAR․AI

278 views - 1 month ago

Yonatan Belinkov - Toward Scalable and Actionable Interpretability [Alignment Workshop]

Yonatan Belinkov - Toward Scalable and Actionable Interpretability [Alignment Workshop] FAR․AI

233 views - 1 month ago

Bryce Cai - The State and the Science of AI-Bio Evals [Alignment Workshop]

Bryce Cai - The State and the Science of AI-Bio Evals [Alignment Workshop] FAR․AI

195 views - 1 month ago

Atoosa Kasirzadeh - Hidden Pitfalls of AI Scientist Agents [Alignment Workshop]

Atoosa Kasirzadeh - Hidden Pitfalls of AI Scientist Agents [Alignment Workshop] FAR․AI

470 views - 1 month ago

Andy Zou - Current State of AI Agent Security [Alignment Workshop]

Andy Zou - Current State of AI Agent Security [Alignment Workshop] FAR․AI

266 views - 1 month ago

Kamalika Chaudhuri - Privacy and Security Challenges in AI Agents [Alignment Workshop]

Kamalika Chaudhuri - Privacy and Security Challenges in AI Agents [Alignment Workshop] FAR․AI

119 views - 1 month ago

Bo Li - Guarding the Age of Agents [Alignment Workshop]

Bo Li - Guarding the Age of Agents [Alignment Workshop] FAR․AI

269 views - 1 month ago

Dawn Song - Frontier AI in Cybersecurity: Risks, Challenges & Future Directions [Alignment Workshop]

Dawn Song - Frontier AI in Cybersecurity: Risks, Challenges & Future Directions [Alignment Workshop] FAR․AI

199 views - 1 month ago

Yoshua Bengio - Fireside Chat with Yoshua Bengio [Alignment Workshop]

Yoshua Bengio - Fireside Chat with Yoshua Bengio [Alignment Workshop] FAR․AI

529 views - 1 month ago

Adam Kalai - Consensus Sampling for Safer Generative AI [Alignment Workshop]

Adam Kalai - Consensus Sampling for Safer Generative AI [Alignment Workshop] FAR․AI

185 views - 1 month ago

Daniel Kang - AI Agent Benchmarks Are Broken [Alignment Workshop]

Daniel Kang - AI Agent Benchmarks Are Broken [Alignment Workshop] FAR․AI

310 views - 1 month ago

Cozmin Ududec - Toy Models for Task-Horizon Scaling [Alignment Workshop]

Cozmin Ududec - Toy Models for Task-Horizon Scaling [Alignment Workshop] FAR․AI

154 views - 1 month ago

Chirag Agarwal - Polarity-Aware Probing for Quantifying Latent Alignment in LMs [Alignment Workshop]

Chirag Agarwal - Polarity-Aware Probing for Quantifying Latent Alignment in LMs [Alignment Workshop] FAR․AI

128 views - 1 month ago

Niloofar Mireshghallah - What Does It Mean for Agentic AI to Preserve Privacy? [Alignment Workshop]

Niloofar Mireshghallah - What Does It Mean for Agentic AI to Preserve Privacy? [Alignment Workshop] FAR․AI

272 views - 1 month ago

Yoshua Bengio - Disentangling Agency & Predictive Power Without Solving ELK [Alignment Workshop]

Yoshua Bengio - Disentangling Agency & Predictive Power Without Solving ELK [Alignment Workshop] FAR․AI

636 views - 1 month ago

Chris Cundy - Training with Lie Detectors - Powerful but Risky

Chris Cundy - Training with Lie Detectors - Powerful but Risky FAR․AI

166 views - 1 month ago

Anna Gausen - Measuring AI Systems’ Ability to Influence Humans [Alignment Workshop]

Anna Gausen - Measuring AI Systems’ Ability to Influence Humans [Alignment Workshop] FAR․AI

305 views - 1 month ago

Santosh Vempala - Why Language Models Hallucinate [Alignment Workshop]

Santosh Vempala - Why Language Models Hallucinate [Alignment Workshop] FAR․AI

314 views - 1 month ago

Natasha Jaques - Multi-agent RL for Provably Robust LLM Safety [Alignment Workshop]

Natasha Jaques - Multi-agent RL for Provably Robust LLM Safety [Alignment Workshop] FAR․AI

398 views - 1 month ago

Marius Hobbhahn - Eval Awareness is Becoming a Problem [Alignment Workshop]

Marius Hobbhahn - Eval Awareness is Becoming a Problem [Alignment Workshop] FAR․AI

592 views - 1 month ago

Chris Cundy - Peril and Potentials of Training with Lie Detectors [Alignment Workshop]

Chris Cundy - Peril and Potentials of Training with Lie Detectors [Alignment Workshop] FAR․AI

275 views - 1 month ago

Divya Siddarth - AI + Democracy [Alignment Workshop]

Divya Siddarth - AI + Democracy [Alignment Workshop] FAR․AI

222 views - 1 month ago

Sarah Schwettmann - Scalable Oversight and Understanding [Alignment Workshop]

Sarah Schwettmann - Scalable Oversight and Understanding [Alignment Workshop] FAR․AI

257 views - 1 month ago

Xander Davies - State of Jailbreaks [Alignment Workshop]

Xander Davies - State of Jailbreaks [Alignment Workshop] FAR․AI

518 views - 1 month ago

Max Tegmark - Provably Safe AI [Alignment Workshop]

Max Tegmark - Provably Safe AI [Alignment Workshop] FAR․AI

488 views - 1 month ago

Stephen Casper - Powerful Open-Weight AI Models: Wonderful, Terrible & Inevitable [Alignment Worksho

Stephen Casper - Powerful Open-Weight AI Models: Wonderful, Terrible & Inevitable [Alignment Worksho FAR․AI

671 views - 1 month ago

Owain Evans - Weird Generalizations and Backdoors: New Ways to Corrupt LLMs

Owain Evans - Weird Generalizations and Backdoors: New Ways to Corrupt LLMs FAR․AI

615 views - 1 month ago

Asa Cooper Stickland - AI Control Needs Redteaming [Alignment Workshop]

Asa Cooper Stickland - AI Control Needs Redteaming [Alignment Workshop] FAR․AI

390 views - 1 month ago

Adam Gleave – STACK: Adversarial Attacks on LLM Safeguard Pipelines [AAAI 2026]

Adam Gleave – STACK: Adversarial Attacks on LLM Safeguard Pipelines [AAAI 2026] FAR․AI

388 views - 1 month ago

Tomek Korbak - Chain of Thought Monitorability for AI Safety [Alignment Workshop]

Tomek Korbak - Chain of Thought Monitorability for AI Safety [Alignment Workshop] FAR․AI

665 views - 3 months ago

Adam Gleave - AI in 2025: Faster Progress, Harder Problems [Alignment Workshop]

Adam Gleave - AI in 2025: Faster Progress, Harder Problems [Alignment Workshop] FAR․AI

1.1K views - 3 months ago

Sam Bowman - Lessons Learned from the First Misalignment Safety Case [Alignment Workshop]

Sam Bowman - Lessons Learned from the First Misalignment Safety Case [Alignment Workshop] FAR․AI

811 views - 4 months ago

Maja Trębacz - Scalable Oversight: A Practical Approach to Verifying Code at Scale [Alignment Works

Maja Trębacz - Scalable Oversight: A Practical Approach to Verifying Code at Scale [Alignment Works FAR․AI

320 views - 4 months ago

Neel Nanda - Our Pivot To Pragmatic Interpretability [Alignment Workshop]

Neel Nanda - Our Pivot To Pragmatic Interpretability [Alignment Workshop] FAR․AI

2.7K views - 4 months ago

Anka Reuel - How do we know what AI can (and can't) do? [Alignment Workshop]

Anka Reuel - How do we know what AI can (and can't) do? [Alignment Workshop] FAR․AI

456 views - 4 months ago

Marius Hobhhahn - Can You Just Train Models Not to Scheme?

Marius Hobhhahn - Can You Just Train Models Not to Scheme? FAR․AI

231 views - 4 months ago

Yoshua Bengio - An Argument for the Safety of Scientist Al [UK AISI Alignment Conference]

Yoshua Bengio - An Argument for the Safety of Scientist Al [UK AISI Alignment Conference] FAR․AI

396 views - 4 months ago

Alex Bores - How the States Should Regulate AI [Journalism Workshop]

Alex Bores - How the States Should Regulate AI [Journalism Workshop] FAR․AI

365 views - 4 months ago

Adam Gleave - Misalignment [Journalism Workshop]

Adam Gleave - Misalignment [Journalism Workshop] FAR․AI

158 views - 4 months ago

Jacob Hilton - Low Probability Estimation

Jacob Hilton - Low Probability Estimation FAR․AI

170 views - 4 months ago