Wat je gaat doen

Over deze rol

# Models Team – AI Infrastructure Engineer

Nebius is building a full-stack AI cloud platform. The Models Team is responsible for onboarding state-of-the-art open-source models into Nebius TokenFactory and serving large-scale AI models efficiently and reliably in production.

## Responsibilities
You will work on advanced inference and systems optimization techniques including:

Cache-aware routing
NUMA-aware deployments
KV-cache offloading
Disaggregated serving architectures
Autoscaling with high-speed model loading over InfiniBand / RoCE

The team maintains and extends forks of leading inference frameworks such as vLLM and TRT-LLM. You'll invest heavily in tooling and automation including performance testing frameworks, hyperparameter optimization, observability tooling, and automated rollout pipelines.

You'll collaborate closely with model builders, open-source communities, Nebius Cloud teams, and hardware vendors to continuously improve serving infrastructure.

## Team
The team consists of eight engineers distributed across Europe (Netherlands, UK, Germany, Latvia). Workflows are optimized for remote collaboration with in-person meetings every one to two months.

## Technical Stack
Primarily Go and Python for building and scaling backend systems. Work sits at the intersection of distributed systems, high-performance computing, and modern AI infrastructure.

Skills & ervaring

SeniorPythonGoLLM servingDistributed systemsKubernetesvLLMTRT-LLMKV cache managementSpeculative decodingQuantizationInfiniBandRoCEPerformance benchmarkingHigh-performance networking

Waar je werkt

Locatie

In de buurt

Omgeving laden…

Adres

Amsterdam · Noord-Holland1076 ESGeen exact adres

Verder kijken

Vergelijkbare vacatures

EngD in Prototyping Future AI-enabled Radiology WorkflowsEindhoven · FulltimeBekijk →Junior Full-Stack Engineer (AI Code Evaluation)FulltimeBekijk →Data Science & AI LeadAmsterdam · FulltimeBekijk →AI Architect (AI for Security)FulltimeBekijk →Python DeveloperUtrecht · FulltimeBekijk →AI-Native Content Marketer - Early Career - EuropeFulltimeBekijk →

Wat je gaat doen

Over deze rol

# Models Team – AI Infrastructure Engineer

## Responsibilities
You will work on advanced inference and systems optimization techniques including:

Cache-aware routing
NUMA-aware deployments
KV-cache offloading
Disaggregated serving architectures
Autoscaling with high-speed model loading over InfiniBand / RoCE

You'll collaborate closely with model builders, open-source communities, Nebius Cloud teams, and hardware vendors to continuously improve serving infrastructure.

## Technical Stack
Primarily Go and Python for building and scaling backend systems. Work sits at the intersection of distributed systems, high-performance computing, and modern AI infrastructure.

Skills & ervaring

SeniorPythonGoLLM servingDistributed systemsKubernetesvLLMTRT-LLMKV cache managementSpeculative decodingQuantizationInfiniBandRoCEPerformance benchmarkingHigh-performance networking

Waar je werkt

Locatie

In de buurt

Omgeving laden…

Adres

Amsterdam · Noord-Holland1076 ESGeen exact adres

Verder kijken

Senior Software Developer: Models Team (Token Factory)

Over deze rol

Locatie

Vergelijkbare vacatures

Senior Software Developer: Models Team (Token Factory)

Over deze rol

Locatie

Vergelijkbare vacatures