What you will do

About this role

We are seeking a highly skilled ML/AI Engineer to join our team to lead and support benchmarking of GPU platforms for machine learning and AI workloads. You will play a critical role in evaluating the performance of GPU-based hardware for various deep learning and AI frameworks, enabling data-driven decisions for platform optimization and next-generation hardware development.

## Responsibilities

Work closely with hardware and development teams to profile and analyze GPU performance at the system and kernel level.
Evaluate and compare GPU performance across different platforms, architectures, and software stacks (e.g., CUDA, ROCm).
Debug and optimize ML workloads to run efficiently on GPU hardware, identifying and resolving performance bottlenecks.
Perform acceptance testing for new GPU clusters, ensuring hardware and software meet performance, stability, and compatibility requirements for AI workloads.
Perform experiments across diverse GPU system configurations to assess the impact of varying interconnect strategies and system-level optimizations on performance and scalability.
Develop tools and dashboards to visualize performance metrics, bottlenecks, and trends.
Contribute to internal tooling, frameworks, and best practices.

## Requirements

A profound understanding of the theoretical foundations of machine learning.
Deep understanding of performance aspects of large neural network training and inference (data/tensor/context/expert parallelism, offloading, custom kernels, hardware features, attention optimizations, dynamic batching, etc.).
Deep experience with modern deep learning frameworks (PyTorch, JAX, Megatron-LM, TensorRT-LLM).
Good understanding of the GPU stack: CUDA, NCCL, drivers, and relevant libraries.
Familiarity with containerized environments (e.g., Docker, Kubernetes).
Strong communication skills and ability to work independently.

## Preferred Qualifications

Familiarity with modern LLM inference frameworks (vLLM, SGLang, TensorRT).
Experience with Python and performance profiling tools (e.g., Nsight, nvprof, perf).
Familiarity with cloud ML platforms like AWS, GCP, Azure ML.
Contributions to open-source ML benchmarking tools.

Skills & experience

SeniorPyTorchJAXMegatron-LMTensorRTCUDANCCLROCmDockerKubernetesPythonvLLMSGLangAWSGCPAzure MLNsightnvprofperf

Where you will work

About Nebius Group

Nebius Group, gevestigd in Amsterdam, is een technologiebedrijf dat zich richt op het leveren van full-stack AI cloud-infrastructuur. Het bedrijf biedt GPU-clusters, cloudplatformen en ontwikkelaarstools voor het beheer van de volledige machine learning-levenscyclus, van dataverwerking tot fine-tuning en inferencing.

More at this company

More jobs at Nebius Group

Senior Software Engineer (Token Factory)Full-timeView →Technical Product Manager - SoperatorFull-timeView →AI/ML Specialist Solutions ArchitectFull-timeView →Staff / Principal Applied AI Researcher (Agentic Search)Full-timeView →HPC System EngineerFull-timeView →Senior ML Engineer (AI Research)Full-timeView →

Keep exploring

What you will do

About this role

## Responsibilities

Work closely with hardware and development teams to profile and analyze GPU performance at the system and kernel level.
Evaluate and compare GPU performance across different platforms, architectures, and software stacks (e.g., CUDA, ROCm).
Debug and optimize ML workloads to run efficiently on GPU hardware, identifying and resolving performance bottlenecks.
Perform acceptance testing for new GPU clusters, ensuring hardware and software meet performance, stability, and compatibility requirements for AI workloads.
Perform experiments across diverse GPU system configurations to assess the impact of varying interconnect strategies and system-level optimizations on performance and scalability.
Develop tools and dashboards to visualize performance metrics, bottlenecks, and trends.
Contribute to internal tooling, frameworks, and best practices.

## Requirements

A profound understanding of the theoretical foundations of machine learning.
Deep understanding of performance aspects of large neural network training and inference (data/tensor/context/expert parallelism, offloading, custom kernels, hardware features, attention optimizations, dynamic batching, etc.).
Deep experience with modern deep learning frameworks (PyTorch, JAX, Megatron-LM, TensorRT-LLM).
Good understanding of the GPU stack: CUDA, NCCL, drivers, and relevant libraries.
Familiarity with containerized environments (e.g., Docker, Kubernetes).
Strong communication skills and ability to work independently.

## Preferred Qualifications

Familiarity with modern LLM inference frameworks (vLLM, SGLang, TensorRT).
Experience with Python and performance profiling tools (e.g., Nsight, nvprof, perf).
Familiarity with cloud ML platforms like AWS, GCP, Azure ML.
Contributions to open-source ML benchmarking tools.

Skills & experience

SeniorPyTorchJAXMegatron-LMTensorRTCUDANCCLROCmDockerKubernetesPythonvLLMSGLangAWSGCPAzure MLNsightnvprofperf

Where you will work

About Nebius Group

More at this company

More jobs at Nebius Group

Keep exploring

ML Infrastructure Engineer

About this role

About Nebius Group

More jobs at Nebius Group

Similar jobs

ML Infrastructure Engineer

About this role

About Nebius Group

More jobs at Nebius Group

Similar jobs