StartupsEventsJobsNewsTV
dutchstartup.ai
EventsJobsNewsTV

Job opening

HPC System Engineer

Full-time · Posted 15 Jun 2026

Apply now
The roleThe companyMore jobsSimilar
01

What you will do

About this role

# Systems Engineer (Cloudmeter)

Nebius is seeking a highly skilled Systems Engineer (Cloudmeter) to join the team supporting benchmarking of GPU platforms for machine learning and AI workloads. You play a critical role in evaluating the performance of GPU-based hardware for diverse deep learning and AI frameworks, enabling data-driven decisions for platform optimization and next-generation hardware development.

## Responsibilities

  • Work closely with hardware and development teams to profile and analyze GPU performance at the system and kernel level
  • Evaluate and compare GPU performance across different platforms, architectures, and software stacks (e.g., CUDA, ROCm)
  • Conduct acceptance tests for new GPU clusters to ensure hardware and software meet performance, stability, and compatibility requirements for AI workloads
  • Run experiments on diverse GPU system configurations to assess the impact of varying interconnect strategies and system-level optimizations on performance and scalability

## Requirements

  • Proficient in Unix/Linux, plus Python and Bash for automation
  • Strong understanding of the GPU stack: CUDA, NCCL, drivers, and relevant libraries
  • Proven ability to troubleshoot complex system issues, including hardware, software, and networking problems
  • Familiarity with containerized environments (e.g., Docker, Kubernetes)

## Nice to have

  • Experience with modern deep learning frameworks (PyTorch, JAX, vLLM, TensorRT-LLM)
  • Experience with job schedulers and resource managers (Slurm, Volcano, etc.)

Skills & experience

SeniorUnix/LinuxPythonBashCUDANCCLDockerKubernetesPyTorchJAXvLLMTensorRT-LLMSlurmVolcanoROCm
02

Where you will work

About Nebius Group

Nebius Group, gevestigd in Amsterdam, is een technologiebedrijf dat zich richt op het leveren van full-stack AI cloud-infrastructuur. Het bedrijf biedt GPU-clusters, cloudplatformen en ontwikkelaarstools voor het beheer van de volledige machine learning-levenscyclus, van dataverwerking tot fine-tuning en inferencing.

03

More at this company

More jobs at Nebius Group

Senior Software Engineer (Token Factory)Full-timeView →Technical Product Manager - SoperatorFull-timeView →AI/ML Specialist Solutions ArchitectFull-timeView →Staff / Principal Applied AI Researcher (Agentic Search)Full-timeView →ML Infrastructure EngineerFull-timeView →Senior ML Engineer (AI Research)Full-timeView →
04

Keep exploring

Similar jobs

Software Engineer, Data Infrastructure & AcquisitionVeldhoven · Full-timeView →AI Business AnalystVeldhoven · Full-timeView →Lead Data EngineerFull-timeView →AI Solutions EngineerNijmegen · Full-timeView →Senior Data Engineer PricingFull-timeView →Staff Officer (Data Scientist) - NATO 2030Full-timeView →
dutchstartup.ai

The platform for the Dutch AI scene.

About·Contact·Privacy·Terms