# Senior/Staff ML Engineer - AI R&D
## About Nebius
Nebius is leading a new era in cloud infrastructure for the global AI economy. We build a fully integrated AI cloud platform that supports developers and enterprises from data and model training to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure.
## The Role
This role is with Nebius AI R&D, a team focused on applied AI research. Our work includes:
- Applying reinforcement learning for agent training in long-context multi-turn scenarios
- Scaling task data collection for reinforcement learning with SWE agents
- Building decontaminated evaluation for SWE agents
- Investigating how test-time guided search can be used to build more capable agents
We are seeking senior and staff-level ML engineers to work on research in areas such as:
- Guided search and reinforcement learning for agentic systems
- Reinforcement learning for reasoning models
- Web-scale problem collection for agent training
- Efficient model distillation
## Responsibilities
- Conduct experiments to find efficient ways to train large language models on traces of interactions with diverse environments
- Explore methods for guided generation and search in trajectory space
- Develop approaches for collecting relevant data at web scale and find efficient ways to leverage this data in model post-training
- Run experiments with different reinforcement learning configurations in verifiable domains
- Explore methods for training AI agents on tasks with non-verifiable reward signals
## Required
- Deep understanding of theoretical foundations in machine learning and reinforcement learning
- Deep expertise in modern deep learning for language understanding and generation
- Significant experience training large models across multiple compute nodes
- Strong software engineering skills (especially Python)
- Deep experience with modern deep learning frameworks (JAX)
- Strong communication and leadership skills
- Experience designing, conducting, and analyzing machine learning experiments with statistical rigor
- Ability to formulate research questions, design experiments to test hypotheses, and draw meaningful conclusions
- Ability to clearly document research findings and contribute to technical publications
## Nice to Have
- Experience with deep reinforcement learning for LLMs (reward modeling, DPO, PPO)
- Familiarity with key ideas in the LLM space (RoPE, ZeRO/FSDP, Flash Attention, quantization)
- Bachelor's degree in Computer Science, Artificial Intelligence, Data Science, or related field (Master's or PhD recommended)
- Track record of building and shipping products in a dynamic startup-like environment
- Experience designing complex systems such as large-scale distributed data processing or high-load web services
- Open-source projects demonstrating engineering skills
- Excellent English proficiency with superior writing, articulation, and communication abilities
- Knowledge of contemporary software development practices (CI/CD, version control, unit testing)