All videos
research

The Agent Cloud: Databricks’ Bet on the Future of AI — Matei Zaharia and Reynold Xin

Latent Space2026Watch on YouTube

Description

From open-sourcing the layer above coding agents to rethinking databases for the agent era, Databricks cofounders Matei Zaharia and Reynold Xin are pushing the company beyond the lakehouse into a full data-and-AI operating system. In this episode, Matei and Reynold join swyx after Data + AI Summit to unpack Omnigent, LTAP, Lakebase, agent security, open formats, Mosaic, and why databases may matter more than ever once AI agents start doing real work. We go deep on Omnigent: Databricks’ open-source meta-harness for combining, controlling, and sharing agents across Claude Code, Codex, Cursor, Pi, custom agents, and internal tools. Matei explains why coding agents and enterprise agents run into the same problems: portability, collaboration, session history, security, spend controls, and the need for a common API above every harness. Then Reynold walks through Databricks’ database dream: why CDC is brittle enough to joke that it means “continuous data corruption,” why HTAP has been the holy grail of database engineering, and why Databricks thinks LTAP gets most of the benefits by unifying the storage layer instead of collapsing every query engine. We also cover Databricks’ infrastructure scale, the culture behind rapid prototyping, the difference between tech and enterprise customers, Databricks vs Snowflake, whether vector databases should have ever existed, the Mosaic model strategy, Genie, AI Runtime, RL fine-tuning, and the thesis that traditional software gets rewritten once the data is in the right place and agents sit on top. We discuss: • Why Databricks built Omnigent as a meta-harness above existing AI agents • Why coding agents and custom enterprise agents need the same infrastructure • The common API for agent sessions, files, streams, tool calls, and cancellation • Why persistent sessions, cloud sandboxes, sharing, search, and collaboration matter • Why Databricks open-sourced Omnigent instead of keeping it proprietary • Databricks’ internal agent usage, cloud sandboxes, and coding workflows • The scale of Databricks: 50–60 million virtual machines a day and exabytes before breakfast • Why agent security needs contextual and stateful policies • How an agent could read confidential docs, install a compromised npm package, and leak data • Why spend control matters when an agent can burn $500 reading logs • Startup opportunities around coding-agent analytics, quality, skills, and spend • LTAP, Lakebase, and why Databricks wants to rethink the database stack • OLTP vs OLAP, CDC, and why data pipelines break at 3 a.m. • Why HTAP has historically been the holy grail of database engineering • Why Databricks thinks LTAP is “HTAP done right” • How writing transactional data into column-oriented formats changes analytics • Why agents need live operational context from databases, not just telemetry • How Databricks prototypes strategic systems without endless process • Enterprise vs tech customers, governance, procurement, and DIY culture • The “second system syndrome” risk of rewriting a database engine • Building a database engine from a decade of traces and quadrillions of data points • Why vector databases should never have been a separate category • Why Databricks thinks open formats and AI changed the race with Snowflake • The Mosaic story, DBRX, Genie, document parsing models, and specialized model training • Why model customization and RL fine-tuning may become mainstream • Why “get the data there, slap some agent on top” may rewrite traditional software — Matei Zaharia • LinkedIn: https://www.linkedin.com/in/mateizaharia • X: https://x.com/matei_zaharia Reynold Xin • LinkedIn: https://www.linkedin.com/in/rxin • X: https://x.com/rxin Databricks • Website: https://www.databricks.com • X: https://x.com/databricks Timestamps 00:00:00 Hook 00:01:13 Introduction 00:03:35 Omnigent and the Agent Infrastructure Layer 00:09:52 Agent Clouds, Common APIs, and Open Source 00:18:05 Databricks Scale and Internal AI Workflows 00:19:16 Agent Security, Governance, and Spend Controls 00:28:47 LTAP and the Database Dream 00:31:43 CDC, HTAP, and Why Data Pipelines Break 00:35:18 Lakebase, Parquet, and Live Data for Agents 00:38:00 Databricks’ Culture of Fast Prototyping 00:44:53 The Dream Engine and Rewriting the Database Stack 00:52:15 Vector Databases, Query Engines, and LTAP 00:53:49 Databricks vs Snowflake 00:59:01 Mosaic, DBRX, Genie, and Specialized Models 01:04:24 Context, AI Runtime, and RL Fine-Tuning 01:07:28 Why Data + Agents May Rewrite Software 01:08:22 Closing Thoughts