# AI Product Engineer - ClickStack Observability Platform
ClickHouse is hiring an AI Product Engineer to build agentic capabilities on top of a petabyte-scale observability platform, with a focus on developer experience.
## What you'll do
- Build agents that investigate incidents. They surface anomalies, answer "why is production broken?", and use ClickStack as their substrate.
- Write skills, not just prompts. Build a library of reusable skills that captures how the team debugs, finds root causes, writes ClickHouse queries, and runs incident response.
- Own the agent stack end-to-end. Context engineering, tool design, evals, tracing, cost. Responsible for whether the agent works in production.
- Make ClickStack a great place to run AI workloads. Build the MCP servers, SDKs, and integrations that let customers' agents read telemetry, take action, and stay observable.
- Work in the open. Collaborate with OSS contributors and customers, debug their problems, and feed learnings back into the product.
- Tackle hard parts: latency, cost, context window limits, eval coverage, hallucinations on real telemetry.
## Who you are
- Built agents long enough to have opinions about context engineering, tool design, evals, and framework limitations.
- Think in production terms: p99 latency, cost per task, system sustainability.
- Move quickly, ship often, and learn from failures.
- Care about developer tools and understand good DX.
- Do well with ambiguity and ownership.
## What you bring
- 5+ years of software engineering experience, including 1–2 years on LLM-powered systems or agents in production.
- Strong backend skills in TypeScript/Node.js and/or Python.
- Hands-on experience building agents: multi-step tool use, planning, memory, error recovery.
- Experience designing skills (Markdown-based workflow encodings, Anthropic-style or similar).
- Experience with MCP: building servers, designing tools, thinking through auth, scoping, and observability.
- Strong evals practice: golden sets, LLM-as-judge, regression detection.
- SQL proficiency — can write ClickHouse queries directly.
- Comfort with Docker and Kubernetes.
- Active in open source and developer community.
## Bonus
- Built or operated production agents in observability, incident response, or SRE.
- Strong opinions on agent observability and improvement ideas.
- Experience with prompt caching, context compaction, or techniques for production telemetry volumes.
- Experience with columnar databases and event ingestion pipelines.
- Contributed to or maintained open source AI/agent project.
- Familiarity with Go, Rust, or other systems languages.