research

Cooking with OpenAI’s Research Chief: AGI, o1, Evals, and Scaling Laws — Mark Chen

Latent Space2026Bekijk op YouTube

Beschrijving

In this episode, we have Mark Chen, Chief Research Officer at OpenAI, joining us to make Korean tofu stew and flambé shrimp. From scaling laws and why pre-training is not dead, to the o1 reasoning bet, evals crisis, research taste, long-context learning, and the future of end-to-end AI research, we cover what it takes to push models toward the frontier. We talk about: • Why Mark still believes in scaling laws and the exponential • How OpenAI chooses research bets and allocates compute • Why reasoning became one of OpenAI’s biggest bets • How to develop research taste without a traditional ML background • Why evals are in crisis and benchmarks can be misleading • What it takes for models to do long-horizon, real-world work • How AI could change the future of research itself 00:00 – Intro 00:13 – Welcome to Latent Space Cooking 00:28 – The Soup Story: Meta, OpenAI, and Research Recruiting 01:52 – From Trading to AI Research 03:21 – Developing Research Taste 04:26 – AlphaGo, Move 37, and the Rise of Agents 05:23 – RL, Hard-to-Grade Tasks, and Superhuman Evals 08:17 – Cooking Begins on the Impulse Stove 08:53 – Scaling Laws, Pre-Training, and Bear Takes 10:26 – Why Reasoning Became a Core Bet 12:33 – OpenAI’s Research Roadmap and Compute Allocation 15:48 – What Makes a Great Researcher 17:44 – Top Researchers vs. Top Engineers 19:33 – The Evals Crisis and Benchmark-Maxing 21:32 – Building Better Benchmarks 23:40 – Jakub Pachocki, Humor, and “Dumb IOI Gold Medalists” 24:34 – Jagged Intelligence and Why Models Miss Easy Tasks 25:56 – Long Context, Compaction, and Long-Horizon Learning 27:14 – Shrimp Flambé and One-Shot Learning 28:36 – Low-Hanging Fruit vs. New Research Bets 29:39 – Continual Learning and the Road to AGI 31:32 – Multimodal Models and One Architecture 32:36 – Vibe Researching and End-to-End AI Research 34:36 – Failed Bets, Postmortems, and OpenAI’s Alpha 37:07 – Final Seasoning and Taste Test 37:53 – Overrated vs. Underrated AI Research 38:42 – Plating and First Taste 39:15 – External Evals: The Soup Judge 41:00 – ChatGPT Prep and Closing

In deze video

OpenAI