Researchers from Mila, Microsoft Research, and McGill University have developed a breakthrough technique that could revolutionize how large language models handle extended reasoning tasks.
The Core Problem
Current reasoning models face a fundamental bottleneck: as they think longer, their computational costs grow quadratically due to the ever-expanding context they must process arXiv. This makes extended reasoning prohibitively expensive and limits the sophistication of AI problem-solving.
The Markovian Solution
The team's approach, called "Markovian Thinking," fundamentally changes how models reason by conditioning on a constant-size state rather than an ever-growing context arXiv. They implemented this through "Delethink," a training environment that structures reasoning into fixed-size chunks.
Instead of maintaining one continuous chain of thought, the model reasons in chunks of fixed size (e.g., 8K tokens). At each boundary, the context resets and continues with only a short "carryover" from the previous chunk arXiv. The model learns to compress essential information into this textual state to maintain reasoning continuity.
Remarkable Results
The technique delivers dramatic efficiency gains:
- Linear compute scaling instead of quadratic, with constant memory usage regardless of thinking length arXiv
- At one million tokens, Delethink achieves a 17× reduction in computational operations arXiv
- Training costs drop from an estimated 27 H100-months to just 7 months for 96K token reasoning arXiv
Performance matches or exceeds traditional approaches while enabling reasoning far beyond training limits. The researchers trained a 1.5B model to think up to 96K tokens, achieving 49% accuracy on challenging AIME mathematics problems arXiv.
Why It Works
Surprisingly, the team found that existing reasoning models already exhibit natural Markovian behavior when tested zero-shot, providing strong initialization for training arXiv. This suggests the approach could be broadly applicable to current model architectures.
Implications
By decoupling thinking length from context size, this paradigm opens the door to next-generation reasoning models that can think for millions of tokens with linear compute and constant memory arXiv. This could enable previously impossible applications requiring extended reasoning, complex decision workflows, and long-term strategic planning.
The research demonstrates that efficient long-context reasoning is achievable through clever environmental design rather than just architectural improvements.
 
                                        
                                        