16:09 Koen van der Veen - PySyft v2 - AI Auditing of Private Models and User Logs FAR․AI 101 views - 6 days ago
17:23 Shahin Tajik - Physical Verification of AI Systems against Nation-state Adversaries FAR․AI 83 views - 1 week ago
4:21 Aengus Lynch - AI evaluators give wrong labels when they disagree with the consequences FAR․AI 177 views - 1 week ago
24:46 Buck Shlegeris - Can we use permissions management to mitigate our threats? How much novel security FAR․AI 228 views - 1 week ago
20:16 Bing-Jyue Chen - Efficient Zero-Knowledge Proofs for AI Inference FAR․AI 312 views - 3 weeks ago
14:42 Adam Chlipala - End-to-End Formal Certification of Computing Infrastructure FAR․AI 168 views - 3 weeks ago
5:27 Roy Rinberg - Information bottlenecks and small-model imitation via Question-Asking FAR․AI 70 views - 4 weeks ago
5:04 Jonathan Bostock - Report From The Jungles Of Untrusted Monitoring FAR․AI 166 views - 1 month ago
30:19 Aryan Bhatt - The high stakes control roadmap: now to the singularity FAR․AI 303 views - 1 month ago
14:44 Anka Reuel - Beyond Leaderboards: Building Policy-Grade Evaluations for AI Agents FAR․AI 222 views - 1 month ago
11:55 Patricia Paskov - The Multi-Agent Gap: Risks, Evaluation, and Governance FAR․AI 163 views - 1 month ago
16:03 Stephen Casper - Non-Consensual AI Deepfakes: AI Safety's Trial by Fire FAR․AI 308 views - 1 month ago
10:42 Kellin Pelrine - Radicalization and Child Sexual Abuse: LLM Persuasion Risks FAR․AI 84 views - 1 month ago
4:55 Usman Anwar - Steganography with Applications to LLM Monitoring [Alignment Workshop] FAR․AI 184 views - 1 month ago
5:18 Edward Hughes - Emergent Risks from AI Scientists and How to Find Them [Alignment Workshop] FAR․AI 215 views - 1 month ago
5:37 Belinda Li - Introspection for Interpretability and Alignment [Alignment Workshop] FAR․AI 462 views - 1 month ago
4:59 Joachim Schaeffer - Measuring Control Intervention Awareness Across Frontier LLMs [Alignment Worksho FAR․AI 119 views - 2 months ago
4:42 Robert McCarthy - Tracking Threats to CoT Monitorability [Alignment Workshop] FAR․AI 191 views - 2 months ago
5:23 Jimmy Farrell - The EU Code of Practice: A Civil Society Perspective [Alignment Workshop] FAR․AI 145 views - 2 months ago
4:46 Aleksandr Bowkis - Automating Alignment is Hard [Alignment Workshop] FAR․AI 374 views - 2 months ago
4:55 Rory Greig - Amplified Oversight / Debate as a Mitigation for Reward Hacking [Alignment Workshop] FAR․AI 147 views - 2 months ago
5:51 Konstantinos Voudouris - Systematic Human Error in Debate Protocols [Alignment Workshop] FAR․AI 261 views - 2 months ago
5:53 Zhijing Jin - My AI Safety Agenda: Supporting Middle Powers [Alignment Workshop] FAR․AI 213 views - 2 months ago
5:22 Patricia Paskov - Biological Agentic Evaluations at RAND [Alignment Workshop] FAR․AI 207 views - 2 months ago
10:32 Robert Trager - Instantiating International Governance of Advanced AI [Alignment Workshop] FAR․AI 287 views - 2 months ago
10:07 Joseph Bloom - Future Oversight is a Key Crux in AI Safety [Alignment Workshop] FAR․AI 207 views - 2 months ago
44:24 Anne Neuberger - Fireside Chat with Anne Neuberger: AI and National Security [Alignment Workshop] FAR․AI 394 views - 2 months ago
13:23 Dominic Rizzo - Silicon Roots of Trust: Attestation You'd Want Even If Nobody Required It FAR․AI 224 views - 2 months ago
10:06 Christopher Summerfield - Lessons from a Chimp: AI "Scheming" & the Quest for Ape Language [Alignmen FAR․AI 315 views - 2 months ago
5:46 Andrew Gordon Wilson - Epiplexity: A New Measure of Information for OOD Generalization [Alignment Wo FAR․AI 571 views - 2 months ago
3:56 Sören Mindermann - The International AI Safety Report 2026 [Alignment Workshop] FAR․AI 155 views - 2 months ago
4:51 Stefan Heimersheim - Obfuscation Atlas: Mapping Where Honesty Emerges in RLVR with Deception Probes FAR․AI 198 views - 2 months ago
4:55 Thomas Clarke - Safe Widespread Adoption of AI [Alignment Workshop] FAR․AI 134 views - 2 months ago
10:18 Stephen Casper - ML Researchers as Policymakers [Alignment Workshop] FAR․AI 573 views - 2 months ago