StartupsEventsJobsNieuwsTV
dutchstartup.ai
EventsJobsNieuwsTV

Vacature

ML Infrastructure Engineer

Fulltime · Geplaatst op 15 jun 2026

Solliciteer direct
De rolHet bedrijfMeer vacaturesVergelijkbaar
01

Wat je gaat doen

Over deze rol

We are seeking a highly skilled ML/AI Engineer to join our team to lead and support benchmarking of GPU platforms for machine learning and AI workloads. You will play a critical role in evaluating the performance of GPU-based hardware for various deep learning and AI frameworks, enabling data-driven decisions for platform optimisation and next-generation hardware development.

## Responsibilities

  • Work closely with hardware, development teams to profile and analyse GPU performance at the system and kernel level.
  • Evaluate and compare GPU performance across different platforms, architectures, and software stacks (e.g., CUDA, ROCm).
  • Debug and optimise ML workloads to run efficiently on GPU hardware, identifying and resolving performance bottlenecks.
  • Perform acceptance testing for new GPU clusters, ensuring hardware and software meet performance, stability, and compatibility requirements for AI workloads.
  • Perform experiments across diverse GPU system configurations to assess the impact of varying interconnect strategies and system-level optimisations on performance and scalability.
  • Develop tools and dashboards to visualise performance metrics, bottlenecks, and trends.
  • Contribute to internal tooling, frameworks, and best practices

## Requirements

  • A profound understanding of theoretical foundations of machine learning
  • Deep understanding of performance aspects of large neural networks training and inference (data/tensor/context/expert parallelism, offloading, custom kernels, hardware features, attention optimisations, dynamic batching etc.)
  • Deep experience with modern deep learning frameworks (PyTorch, JAX, Megatron-LM, Tensort-LLM)
  • Good understanding of the GPU stack: CUDA, NCCL, drivers, and relevant libraries
  • Familiarity with containerized environments (e.g., Docker, Kubernetes).
  • Strong communication and ability to work independently

## Preferred Qualifications

  • Familiarity with modern LLM inference frameworks (vLLM, SGLang, TensorRT)
  • Experience in Python and performance profiling tools (e.g., Nsight, nvprof, perf).
  • Familiarity with cloud ML platforms like AWS, GCP, Azure ML
  • Contributions to open-source ML benchmarking tools

Skills & ervaring

SeniorPyTorchJAXMegatron-LMTensorRTCUDANCCLROCmDockerKubernetesPythonvLLMSGLangAWSGCPAzure MLNsightnvprofperf
02

Waar je terechtkomt

Over Nebius Group

Nebius Group, gevestigd in Amsterdam, is een technologiebedrijf dat zich richt op het leveren van full-stack AI cloud-infrastructuur. Het bedrijf biedt GPU-clusters, cloudplatformen en ontwikkelaarstools voor het beheer van de volledige machine learning-levenscyclus, van dataverwerking tot fine-tuning en inferencing.

03

Meer bij dit bedrijf

Meer vacatures bij Nebius Group

Senior Software Engineer (Token Factory)FulltimeBekijk →Technical Product Manager - SoperatorFulltimeBekijk →AI/ML Specialist Solutions ArchitectFulltimeBekijk →Staff / Principal Applied AI Researcher (Agentic Search)FulltimeBekijk →HPC System EngineerFulltimeBekijk →Senior ML Engineer (AI Research)FulltimeBekijk →
04

Verder kijken

Vergelijkbare vacatures

Software Engineer, Data Infrastructure & AcquisitionVeldhoven · FulltimeBekijk →AI Business AnalystVeldhoven · FulltimeBekijk →Lead Data EngineerFulltimeBekijk →AI Solutions EngineerNijmegen · FulltimeBekijk →Senior Data Engineer PricingFulltimeBekijk →Staff Officer (Data Scientist) - NATO 2030FulltimeBekijk →
dutchstartup.ai

Het platform voor de Nederlandse AI-scene.

Over ons·Contact·Privacy·Voorwaarden