Leanstral by Mistral AI: The AI That Proves Your Code Is Correct

Niels
Niels Co-founder
Published on Mar 17, 2026Updated on Mar 17, 2026

Leanstral by Mistral AI: The AI That Proves Your Code Is Correct

Logo Mistral AI

On March 16, 2026, Mistral AI released Leanstral, the first open-source AI agent specifically designed for Lean 4, a proof assistant used in both mathematics and software engineering. In a landscape crowded with AI code generation tools, Leanstral stands apart with a radical promise: not just generating code, but mathematically proving it is correct.

But why should you care about formal verification? How does Leanstral actually perform against Claude and open-source alternatives? And what concrete applications does this technology unlock for businesses and engineering teams? This guide breaks it all down.

What Is Leanstral and Why Formal Verification Matters

The Problem with "Vibe Coding"

AI code generation agents (Copilot, Cursor, Claude Code) produce functional code most of the time. But in high-stakes domains (aerospace, finance, smart contracts, mathematical research), "it looks like it works" is not enough. Human review of AI-generated code is expensive, slow, and requires rare expertise.

Leanstral offers an alternative: the agent writes the code and produces a formal proof of its correctness. The Lean 4 compiler acts as a binary verifier. Either the proof compiles, or it does not. No gray area.

Lean 4: The Language Behind Leanstral

Lean 4, developed by Leonardo de Moura (formerly Microsoft Research), is both a proof assistant and a functional programming language. It is used by Google DeepMind (AlphaProof, IMO silver medal in 2024), Amazon (Cedar policy verification), and a community of over 10,000 members on Zulip.

The Mathlib library, which formalizes mathematics in Lean, has over 20,000 contributions and received $15 million in funding in 2025.

How Leanstral Works: Architecture and Technical Specifications

A Mixture-of-Experts Model Optimized for Proofs

Leanstral is built on a Sparse Mixture-of-Experts (MoE) architecture with the following specifications:

Specification

Value

Full name

Leanstral-120B-A6B-2603

Total parameters

~119 billion (128 experts)

Active parameters per token

~6.5 billion (4 of 128 experts)

Architecture

Sparse Mixture-of-Experts

Base model family

Mistral Small 4

Context window

256K tokens

Input modalities

Text and images

License

Apache 2.0

The principle is straightforward: each token activates only 4 of the 128 expert modules. This gives Leanstral the knowledge capacity of a 119-billion-parameter model at the inference cost of a 6.5-billion-parameter one. This 18x efficiency ratio is what enables dramatically lower costs compared to the competition.

Native Lean Compiler Integration via MCP

Unlike generalist models that produce text resembling Lean code, Leanstral interacts directly with the Lean 4 compiler via the MCP (Model Context Protocol). In practice, the agent can:

  • Check types in the Lean compiler

  • Execute proof tactics and observe results

  • Analyze error messages

  • Iteratively refine proofs in a live interactive loop

This native integration is a decisive technical advantage. The model does not "guess" proofs: it builds them in dialogue with the verifier.

Leanstral Benchmarks: Performance Against Claude and Open-Source Models

The FLTEval Benchmark: Realistic Conditions

Mistral AI introduced FLTEval, a new benchmark designed to evaluate proof engineering in real-world repository conditions. It is based on the FLT (Fermat's Last Theorem) project at Imperial College London, led by Professor Kevin Buzzard, with 55 contributors and EPSRC funding through 2029.

Unlike MiniF2F (which targets isolated competition math problems), FLTEval measures the ability to complete formal proofs in a realistic environment with imports, library dependencies, and multi-file proof structures.

Leanstral vs Claude: The Cost-Performance Ratio

Model

Cost per FLTEval run ($)

FLTEval Score

Leanstral pass@1

18

21.9

Leanstral pass@2

36

26.3

Leanstral pass@4

72

29.3

Leanstral pass@8

145

31.0

Leanstral pass@16

290

31.9

Claude Haiku 4.5

184

23.0

Claude Sonnet 4.6

549

23.7

Claude Opus 4.6

1,650

39.6

The numbers speak for themselves:

  • Leanstral pass@2 ($36) beats Sonnet 4.6 ($549) by 2.6 points, at 1/15th the cost.

  • Leanstral pass@2 beats Haiku 4.5 ($184) by 3.3 points, at 1/5th the cost.

  • Leanstral pass@16 ($290) beats Sonnet by 8 points, at half the cost.

  • Claude Opus 4.6 remains the quality leader (39.6) but costs $1,650, which is 46x more than Leanstral pass@2.

Leanstral vs Open-Source Models: Active Parameter Efficiency

Model

Active Parameters

FLTEval Score (best pass)

GLM5-744B-A40B

40B

~16.6 (plateaus)

Kimi-K2.5-1T-A32B

32B

~20.1 (plateaus)

Leanstral pass@1

6.5B

21.9

Qwen3.5-397B-A17B

17B

25.4 (pass@4)

Leanstral pass@2

6.5B

26.3

Leanstral pass@4

6.5B

29.3

With only 6.5 billion active parameters, Leanstral outperforms models using 5 to 6 times more. Qwen3.5 needs 4 attempts and 17 billion active parameters to reach 25.4. Leanstral surpasses that score on its second attempt.

Cost Comparison: Leanstral vs Alternatives

Solution

Estimated Cost

FLTEval Score

Open Source

Leanstral pass@2

$36

26.3

Yes (Apache 2.0)

Leanstral pass@16

$290

31.9

Yes (Apache 2.0)

Claude Haiku 4.5

$184

23.0

No

Claude Sonnet 4.6

$549

23.7

No

Claude Opus 4.6

$1,650

39.6

No

Self-hosted Leanstral

Hardware: 4x A100/H100

Same

Yes

Concrete Use Cases for Leanstral in Business

Smart Contract Verification and DeFi Security

Bugs in DeFi code have cost billions of dollars in recent years. Formal verification is the gold standard for guaranteeing that a smart contract does exactly what it claims. With Leanstral, the cost of a formal audit drops dramatically: a proof of correctness for $36 instead of hundreds of dollars with proprietary alternatives.

Mission-Critical Software: Aerospace, Finance, Healthcare

In industries where a software bug can cost lives or millions, formal verification is not a luxury but a regulatory requirement. Leanstral enables development teams to specify expected behavior in Lean 4, then automatically generate compliance proofs. The compiler then verifies the proof is valid.

Collaborative Mathematical Research

The FLT project (formalizing Fermat's Last Theorem) and Mathlib illustrate Leanstral's potential to accelerate formalized research. Researchers can delegate routine proofs to the agent and focus on the creative aspects of their work.

Cross-Language Proof Migration

One of the use cases demonstrated by Mistral AI is translating proofs from Rocq (formerly Coq) to Lean 4, preserving semantics and notation. This facilitates migrating academic or industrial projects from one ecosystem to another.

Verifying AI-Generated Code

The most strategic use case: formally verifying that code produced by AI agents (Copilot, Cursor, etc.) is correct. Leanstral embodies the vision of "trustworthy vibe coding" where humans specify what they want and the machine proves compliance.

Three Ways to Access Leanstral Today

1. Mistral Vibe (Zero Setup)

The /leanstall command in the Mistral Vibe CLI (version 2.5.0, released March 16, 2026) automatically configures the Leanstral agent. This is the fastest way to try it out.

2. Free API (Limited Period)

The labs-leanstral-2603 endpoint is available for free for a limited period. Mistral AI wants to collect real-world feedback to improve future versions.

3. Self-Hosting (Open-Source Weights)

Model weights are published under the Apache 2.0 license on Hugging Face (mistralai/Leanstral-120B-A6B-2603). Recommended setup: 4 A100 80GB or H100 GPUs, with vLLM and Flash Attention. Note: the Hugging Face page showed a temporary 404 error at launch.

Leanstral Limitations: What to Know Before Adopting

A Specialized Model, Not a General-Purpose Code Assistant

Leanstral is designed exclusively for Lean 4. It does not replace your general-purpose coding tools (Copilot, Claude Code, Cursor). If you need an assistant for Python, TypeScript, or SQL, this is not the right tool.

Claude Opus 4.6 Still Leads on Raw Quality

With an FLTEval score of 39.6 versus 31.9 for Leanstral pass@16, Opus maintains a significant lead. If your absolute priority is maximum accuracy and budget is not a constraint, Opus remains the best choice. The Hacker News community highlighted this point: a model specifically trained for this task should, in theory, outperform a generalist model.

The Performance Curve May Plateau

Some observers note that Leanstral's performance gains appear to diminish beyond pass@8. The jump from pass@8 (31.0) to pass@16 (31.9) represents only a 0.9-point gain for a doubling in cost.

Infrastructure Requirements for Self-Hosting

Self-hosting requires 4 high-end GPUs (A100 or H100), which represents a significant hardware investment. For teams without this infrastructure, the free API or Mistral Vibe remain the most accessible options.

Should You Use Leanstral for Formal Verification Projects?

Leanstral fills a genuine gap in the ecosystem. Before its release, options for AI-assisted formal verification were limited to expensive proprietary models (Claude Opus) or generalist models not optimized for Lean 4.

Leanstral sits at the intersection of three qualities: open source (Apache 2.0), specifically trained for proof engineering, and cost-effective. No other model currently occupies that exact space.

For CTOs and engineering leaders evaluating formal verification as part of their software quality strategy, Leanstral represents an accessible entry point. For researchers in formalized mathematics, it is an accelerator. And for the Lean 4 ecosystem as a whole, it is a strong signal that specialized AI for formal proofs is becoming a practical reality.

The real question is no longer "is AI-assisted formal verification possible?" but "how production-ready is it?" With Leanstral, Mistral AI offers a first answer.

logo emelia

Discover Emelia, your all-in-one prospecting tool.

logo emelia

Clear, transparent prices without hidden fees

No commitment, prices to help you increase your prospecting.

Start

€37

/month

Unlimited email sending

Connect 1 LinkedIn Accounts

Unlimited LinkedIn Actions

Email Warmup Included

Unlimited Scraping

Unlimited contacts

Grow

Best seller
arrow-right
€97

/month

Unlimited email sending

Up to 5 LinkedIn Accounts

Unlimited LinkedIn Actions

Unlimited Warmup

Unlimited contacts

1 CRM Integration

Scale

€297

/month

Unlimited email sending

Up to 20 LinkedIn Accounts

Unlimited LinkedIn Actions

Unlimited Warmup

Unlimited contacts

Multi CRM Integrations

Unlimited API Calls

Credits(optional)

You don't need credits if you just want to send emails or do actions on LinkedIn

May use it for :

Find Emails

AI Action

Phone Finder

Verify Emails

1,000
5,000
10,000
50,000
100,000
1,000 Emails found
1,000 AI Actions
20 Number
4,000 Verify
19per month

Discover other articles that might interest you !

See all articles
NielsNiels Co-founder
Read more
MathieuMathieu Co-founder
Read more
NielsNiels Co-founder
Read more
MarieMarie Head Of Sales
Read more
Tips and training
Published on Dec 5, 2022

Few things to avoid in your campaigns

NielsNiels Co-founder
Read more
Made with ❤ for Growth Marketers by Growth Marketers
Copyright © 2026 Emelia All Rights Reserved