3:56 Sören Mindermann - The International AI Safety Report 2026 [Alignment Workshop] FAR․AI 106 views - 3 days ago
4:51 Stefan Heimersheim - Obfuscation Atlas: Mapping Where Honesty Emerges in RLVR with Deception Probes FAR․AI 144 views - 5 days ago
4:55 Thomas Clarke - Safe Widespread Adoption of AI [Alignment Workshop] FAR․AI 100 views - 1 week ago
10:18 Stephen Casper - ML Researchers as Policymakers [Alignment Workshop] FAR․AI 441 views - 1 week ago
4:40 Kellin Pelrine - Truth and Falsehood Symmetric in AI Persuasion - But does it have to be? [Alignment FAR․AI 197 views - 2 weeks ago
5:06 Matija Franklin - Distributed AGI Safety in Emerging Agent Economies [Alignment Workshop] FAR․AI 225 views - 2 weeks ago
7:45 Vincent Conitzer - AI Testing Should Account for Sophisticated Strategic Behaviour [Alignment Worksh FAR․AI 248 views - 2 weeks ago
10:02 Zachary Kenton - A Vision for Scalable Oversight [Alignment Workshop] FAR․AI 354 views - 3 weeks ago
10:13 Neel Nanda - Our Pivot To Pragmatic Interpretability [Alignment Workshop] FAR․AI 1.5K views - 3 weeks ago
10:13 Simon Möller - Implementing the Code of Practice [Alignment Workshop] FAR․AI 112 views - 4 weeks ago
7:33 Matthieu Delescluse - AI Safety at the EU AI Office [Alignment Workshop] FAR․AI 78 views - 4 weeks ago
50:29 Gillian Hadfield - AI Regulatory Capacity with Independent Verification Organizations [Alignment Wor FAR․AI 222 views - 1 month ago
33:03 Rohin Shah - How to Theorize So Empiricists Will Listen [Alignment Workshop] FAR․AI 744 views - 1 month ago
4:43 Yisen Wang - Finding & Reactivating Safety Mechanisms of Post-Trained LLMs [Alignment Workshop] FAR․AI 152 views - 1 month ago
5:37 Chenhao Tan - Automating Mechanistic Interpretability [Alignment Workshop] FAR․AI 278 views - 1 month ago
5:07 Yonatan Belinkov - Toward Scalable and Actionable Interpretability [Alignment Workshop] FAR․AI 233 views - 1 month ago
4:54 Bryce Cai - The State and the Science of AI-Bio Evals [Alignment Workshop] FAR․AI 195 views - 1 month ago
5:36 Atoosa Kasirzadeh - Hidden Pitfalls of AI Scientist Agents [Alignment Workshop] FAR․AI 470 views - 1 month ago
5:26 Andy Zou - Current State of AI Agent Security [Alignment Workshop] FAR․AI 266 views - 1 month ago
5:49 Kamalika Chaudhuri - Privacy and Security Challenges in AI Agents [Alignment Workshop] FAR․AI 119 views - 1 month ago
10:25 Dawn Song - Frontier AI in Cybersecurity: Risks, Challenges & Future Directions [Alignment Workshop] FAR․AI 199 views - 1 month ago
17:19 Yoshua Bengio - Fireside Chat with Yoshua Bengio [Alignment Workshop] FAR․AI 529 views - 1 month ago
5:43 Adam Kalai - Consensus Sampling for Safer Generative AI [Alignment Workshop] FAR․AI 185 views - 1 month ago
5:11 Daniel Kang - AI Agent Benchmarks Are Broken [Alignment Workshop] FAR․AI 310 views - 1 month ago
5:23 Cozmin Ududec - Toy Models for Task-Horizon Scaling [Alignment Workshop] FAR․AI 154 views - 1 month ago
5:08 Chirag Agarwal - Polarity-Aware Probing for Quantifying Latent Alignment in LMs [Alignment Workshop] FAR․AI 128 views - 1 month ago
5:15 Niloofar Mireshghallah - What Does It Mean for Agentic AI to Preserve Privacy? [Alignment Workshop] FAR․AI 272 views - 1 month ago
30:57 Yoshua Bengio - Disentangling Agency & Predictive Power Without Solving ELK [Alignment Workshop] FAR․AI 636 views - 1 month ago
9:30 Anna Gausen - Measuring AI Systems’ Ability to Influence Humans [Alignment Workshop] FAR․AI 305 views - 1 month ago
5:08 Santosh Vempala - Why Language Models Hallucinate [Alignment Workshop] FAR․AI 314 views - 1 month ago
5:19 Natasha Jaques - Multi-agent RL for Provably Robust LLM Safety [Alignment Workshop] FAR․AI 398 views - 1 month ago
10:22 Marius Hobbhahn - Eval Awareness is Becoming a Problem [Alignment Workshop] FAR․AI 592 views - 1 month ago
5:05 Chris Cundy - Peril and Potentials of Training with Lie Detectors [Alignment Workshop] FAR․AI 275 views - 1 month ago
5:42 Sarah Schwettmann - Scalable Oversight and Understanding [Alignment Workshop] FAR․AI 257 views - 1 month ago
10:25 Stephen Casper - Powerful Open-Weight AI Models: Wonderful, Terrible & Inevitable [Alignment Worksho FAR․AI 671 views - 1 month ago
55:19 Owain Evans - Weird Generalizations and Backdoors: New Ways to Corrupt LLMs FAR․AI 615 views - 1 month ago
9:22 Asa Cooper Stickland - AI Control Needs Redteaming [Alignment Workshop] FAR․AI 390 views - 1 month ago
9:25 Adam Gleave – STACK: Adversarial Attacks on LLM Safeguard Pipelines [AAAI 2026] FAR․AI 388 views - 1 month ago
9:41 Tomek Korbak - Chain of Thought Monitorability for AI Safety [Alignment Workshop] FAR․AI 665 views - 3 months ago
22:58 Adam Gleave - AI in 2025: Faster Progress, Harder Problems [Alignment Workshop] FAR․AI 1.1K views - 3 months ago
9:51 Sam Bowman - Lessons Learned from the First Misalignment Safety Case [Alignment Workshop] FAR․AI 811 views - 4 months ago
10:18 Maja Trębacz - Scalable Oversight: A Practical Approach to Verifying Code at Scale [Alignment Works FAR․AI 320 views - 4 months ago
10:20 Neel Nanda - Our Pivot To Pragmatic Interpretability [Alignment Workshop] FAR․AI 2.7K views - 4 months ago
10:01 Anka Reuel - How do we know what AI can (and can't) do? [Alignment Workshop] FAR․AI 456 views - 4 months ago
21:30 Yoshua Bengio - An Argument for the Safety of Scientist Al [UK AISI Alignment Conference] FAR․AI 396 views - 4 months ago
30:58 Alex Bores - How the States Should Regulate AI [Journalism Workshop] FAR․AI 365 views - 4 months ago