Our new Turkish cohort starts on June 22. You can now register to secure your spot!

December 19, 2025

Our New Research Paper Shows AI Reasoning Efficiency Can Be Increased by Up to 74x

New research shows autonomous AI agents don’t need bigger models, just better reasoning structure.

As AI models become better at “thinking,” the cost of that thinking has quietly become one of the biggest bottlenecks in the industry. OpenServ Labs says it has found a way around it.

Today, OpenServ and Coyotiv released a new research paper based on the BRAID (Bounded Reasoning for Autonomous Inference and Decisions) framework, demonstrating up to 99% reasoning accuracy and up to 74x Performance per Dollar (PPD) gains compared to traditional approaches. The results are backed by quantitative benchmarks across AdvancedIF, GSM-Hard, and SCALE MultiChallenge.

The implication is blunt: better AI reasoning doesn’t require bigger models. Smaller, cheaper models with BRAID can match or exceed larger models using traditional prompting, challenging assumptions about parameter count.

The Problem: AI Can Reason, But It Can’t Do It Cheaply

Modern “thinking models” rely heavily on long chain-of-thought outputs. That approach improves accuracy, but it also explodes token usage, increases latency, and drives up inference costs. Even worse, models often drift away from instructions, forcing developers to babysit prompts and iterate endlessly.

“Right now, we’re asking models to reason in natural language, which is incredibly inefficient,” said Armağan Amcalar, CEO of Coyotiv, CTO of OpenServ Labs, and lead author of the paper.

“Natural language is great for humans. It’s a terrible medium for machine reasoning. BRAID is like giving every driver a GPS instead of a printed map. The agent can chart its route before moving, take the best path twice as often, and use a quarter of the fuel.”

The Insight: Models Already Understand Structure Better Than Prose

Instead of letting models “think out loud,” BRAID replaces free-form reasoning with bounded, machine-readable reasoning graphs, expressed using Mermaid diagrams. These diagrams encode logic as explicit flows: steps, branches, checks, and verification loops. The result is a reasoning process that is:

  • Deterministic instead of verbose
  • Compact instead of token-heavy
  • Far less prone to context drift

Here’s a simplified example of the Mermaid format BRAID uses:

flowchart TD
    A[Read constraints] --> B{Check condition 1}
    B -->|Yes| C[Apply rule A]
    B -->|No| D[Apply rule B]
    C --> E[Verify solution]
    D --> E
    E --> F[Output answer]

This approach enforces a more deterministic step structure while avoiding unnecessary token usage — each token serves a specific role in constructing the diagram. Because the reasoning structure is clearer, smaller and cheaper models can reliably execute it.

The Results: Small Models, Big Efficiency Gains

The paper’s authors — Armağan Amcalar and Dr. Eyüp Çınar (Eskişehir Osmangazi University) — introduce a new metric: Performance per Dollar (PPD), measuring how much reasoning performance you get for every dollar spent.

In several benchmark scenarios:

  • Large, expensive models generate a reasoning plan once
  • Low-cost “nano” models execute that plan repeatedly
  • The system achieves 30–74x higher performance per dollar than a GPT-5-class baseline

The paper calls this the BRAID Parity Effect: with bounded reasoning, small models can match or exceed the reasoning accuracy of models one or two tiers larger using classic prompting.

Why This Matters Now

Autonomous AI agents are moving fast — from browsers and copilots to enterprise workflows and usage-based pricing models. But reasoning costs scale linearly with usage. Without a breakthrough, autonomy hits a wall.

“Reasoning cost is one of the biggest hidden blockers to real autonomy,” Amcalar said.

“If you can reason faster and cheaper, you unlock experimentation. You can run 30 different solution paths for the price of one. That’s how agents become truly autonomous.”

He argues that reducing reasoning cost is not just an optimization problem, but a prerequisite for the next phase of AI systems.

Built for Production, Not Just Papers

The study:

  • Uses recent benchmarks with low data-leakage risk
  • Includes safeguards like numerical masking to prevent shortcut solutions
  • Reflects production-style economics, including amortized costs for reused reasoning plans
  • Has been tested with industry partners in real agent workflows
  • Has already been used by companies and governments

The full paper and detailed benchmarks are available at arxiv.org.


Contact: