Doğaç Eldenk - Attention Drift – What speculative decoding models learn
Cohere2026Bekijk op YouTube
Beschrijving
00:00 Seminar Welcome 00:53 Talk Overview 01:22 Why Inference Is Hard 02:55 Speculative Decoding Basics 06:58 EAGLETree and MTP 08:59 Attention Sinks Primer 10:08 Attention Drift Discovery 15:14 Magnitude Mismatch Clues 17:51 Post Norm Fix 20:48 Training Time Tests 24:02 Gated Attention Experiments 29:19 Architectural Improvements 31:25 Q and A Practical Serving 34:29 How We Found It 35:52 Templates and Prompt Length 39:24 Long Context Sliding Window 44:56 Production Impact 45:58 Open Questions 47:29 Key Takeaways Speculative decoding speeds up LLM inference by drafting tokens with a small model, but drafters degrade sharply under template perturbation and long contexts. We identify a new phenomenon, attention drift: as the drafter generates within a speculation chain, its attention shifts away from the prompt onto its own recent tokens. We trace this to hidden-state magnitude accumulation across drafting steps and fix it with a post-norm architecture—EAGLE 3.1—that improves resilience and performance. Bio: Doğaç is a Master's student in Northwestern University's Computer Science program, joining Fal as a Machine Learning Engineer. His work focuses on inference acceleration, from speculative decoding to agentic GPU kernel optimization and discovery. This session is brought to you by the Cohere Labs Open Science Community - a space where ML researchers, engineers, linguists, social scientists, and lifelong learners connect and collaborate with each other. We'd like to extend a special thank you to Harsha Nelaturu and Andrej Jovanović, Leads of our ML Systems and Theory group for their dedication in organizing this event. If you’re interested in sharing your work, we welcome you to join us! Simply fill out the form at https://forms.gle/ALND9i6KouEEpCnz6 to express your interest in becoming a speaker. Join the Cohere Labs Open Science Community to see a full list of upcoming events (https://tinyurl.com/CohereLabsCommunityApp).