StartupsEventsJobsNieuwsTV
dutchstartup.ai
EventsJobsNieuwsTV

Vacature

ML Infrastructure Engineer

Fulltime · Geplaatst op 14 jun 2026

Solliciteer direct
De rolHet bedrijfMeer vacaturesVergelijkbaar
01

Wat je gaat doen

Over deze rol

## ML/AI Engineer - GPU Platform Benchmarking

Nebius is seeking a highly skilled ML/AI Engineer to lead and support benchmarking of GPU platforms for machine learning and AI workloads. You will play a critical role in evaluating the performance of GPU-based hardware for various deep learning and AI frameworks, enabling data-driven decisions for platform optimisation and next-generation hardware development.

### Responsibilities:

  • Work closely with hardware and development teams to profile and analyse GPU performance at the system and kernel level
  • Evaluate and compare GPU performance across different platforms, architectures, and software stacks (e.g., CUDA, ROCm)
  • Debug and optimise ML workloads to run efficiently on GPU hardware, identifying and resolving performance bottlenecks
  • Perform acceptance testing for new GPU clusters, ensuring hardware and software meet performance, stability, and compatibility requirements for AI workloads
  • Perform experiments across diverse GPU system configurations to assess the impact of varying interconnect strategies and system-level optimisations on performance and scalability
  • Develop tools and dashboards to visualise performance metrics, bottlenecks, and trends
  • Contribute to internal tooling, frameworks, and best practices

### Required Experience:

  • Profound understanding of theoretical foundations of machine learning
  • Deep understanding of performance aspects of large neural networks training and inference (data/tensor/context/expert parallelism, offloading, custom kernels, hardware features, attention optimisations, dynamic batching etc.)
  • Deep experience with modern deep learning frameworks (PyTorch, JAX, Megatron-LM, Tensor-LLM)
  • Good understanding of the GPU stack: CUDA, NCCL, drivers, and relevant libraries
  • Familiarity with containerized environments (e.g., Docker, Kubernetes)
  • Strong communication and ability to work independently

### Ways to Stand Out:

  • Familiarity with modern LLM inference frameworks (vLLM, SGLang, TensorRT)
  • Experience in Python and performance profiling tools (e.g., Nsight, nvprof, perf)
  • Familiarity with cloud ML platforms like AWS, GCP, Azure ML
  • Contributions to open-source ML benchmarking tools

Skills & ervaring

SeniorPyTorchJAXMegatron-LMTensor-LLMCUDANCCLDockerKubernetesvLLMSGLangTensorRTPythonNsightnvprofperfAWSGCPAzure MLROCm
02

Waar je terechtkomt

Over Nebius Group

Nebius Group, gevestigd in Amsterdam, is een technologiebedrijf dat zich richt op het leveren van full-stack AI cloud-infrastructuur. Het bedrijf biedt GPU-clusters, cloudplatformen en ontwikkelaarstools voor het beheer van de volledige machine learning-levenscyclus, van dataverwerking tot fine-tuning en inferencing.

03

Meer bij dit bedrijf

Meer vacatures bij Nebius Group

Senior Software Engineer (Token Factory)FulltimeBekijk →Technical Product Manager - SoperatorFulltimeBekijk →AI/ML Specialist Solutions ArchitectFulltimeBekijk →Staff / Principal Applied AI Researcher (Agentic Search)FulltimeBekijk →HPC System EngineerFulltimeBekijk →ML Infrastructure EngineerFulltimeBekijk →
04

Verder kijken

Vergelijkbare vacatures

Software Engineer, Data Infrastructure & AcquisitionVeldhoven · FulltimeBekijk →AI Business AnalystVeldhoven · FulltimeBekijk →Lead Data EngineerFulltimeBekijk →AI Solutions EngineerNijmegen · FulltimeBekijk →Senior Data Engineer PricingFulltimeBekijk →Staff Officer (Data Scientist) - NATO 2030FulltimeBekijk →
dutchstartup.ai

Het platform voor de Nederlandse AI-scene.

Over ons·Contact·Privacy·Voorwaarden