StartupsEventsJobsNieuwsTV
dutchstartup.ai
EventsJobsNieuwsTV

Vacature

AI Research Engineer (Kernel & Inference Optimization) - 100% Remote Worldwide

Fulltime · remote · Geplaatst op 14 jun 2026

Solliciteer direct
De rolMeer vacaturesVergelijkbaar
01

Wat je gaat doen

Over deze rol

# AI Model Serving & Inference Engineer

Join Tether's AI model team and drive innovation in model serving and inference architectures for advanced AI systems. You will optimize model deployment and inference strategies to deliver highly responsive, efficient, and scalable performance across real-world applications, working on systems ranging from resource-efficient models for limited hardware to complex multi-modal architectures.

## Responsibilities

  • Design and deploy state-of-the-art model serving architectures that deliver high throughput and low latency while optimizing memory usage across diverse environments, including resource-constrained devices and edge platforms
  • Build, run, and monitor controlled inference tests in simulated and live production environments, tracking key performance indicators such as response latency, throughput, memory consumption, and error rates
  • Identify and prepare high-quality test datasets and simulation scenarios tailored to real-world deployment challenges on low-resource devices
  • Analyze computational efficiency and diagnose bottlenecks in the serving pipeline by monitoring processing and memory metrics
  • Work closely with cross-functional teams to integrate optimized serving and inference frameworks into production pipelines designed for edge and on-device applications

## Requirements

  • Degree in Computer Science or related field; ideally PhD in NLP, Machine Learning, or related field with solid track record in AI R&D and publications in top-tier conferences
  • Knowledge of Metal Shading Language (MSL) with ability to write custom compute shaders from scratch
  • Proven experience in low-level kernel optimizations and inference optimization on mobile devices
  • Deep understanding of modern model serving architectures and inference optimization techniques
  • Strong expertise in writing GPU kernels for mobile devices (smartphones) and deep understanding of model serving frameworks
  • Practical experience developing and deploying end-to-end inference pipelines on resource-constrained devices
  • Knowledge of distributed inference systems, Diffusion Models, Vision Transformers, Pruning, Quantization, Flash Attention, KV Cache, and Speculative Decoding

Skills & ervaring

SeniorMetal Shading Language (MSL)GPU kernel optimizationMachine LearningModel servingInference optimizationMobile device optimizationDistributed Inference SystemsTensor ParallelismPipeline ParallelismExpert ParallelismDiffusion ModelsVision TransformersPruningQuantizationFlash AttentionKV CacheSpeculative Decoding
02

Meer bij dit bedrijf

Meer vacatures

AI Inference Engineer QVAC (100% remote Worldwide)FulltimeBekijk →Research Engineer Intern (Multimodal LLM)StageBekijk →Research Engineer Intern (Multimodal LLM)StageBekijk →
03

Verder kijken

Vergelijkbare vacatures

Software Engineer, Data Infrastructure & AcquisitionVeldhoven · FulltimeBekijk →AI Business AnalystVeldhoven · FulltimeBekijk →Lead Data EngineerFulltimeBekijk →AI Solutions EngineerNijmegen · FulltimeBekijk →Senior Data Engineer PricingFulltimeBekijk →Staff Officer (Data Scientist) - NATO 2030FulltimeBekijk →
dutchstartup.ai

Het platform voor de Nederlandse AI-scene.

Over ons·Contact·Privacy·Voorwaarden