Niels Co-founder

Published on Mar 17, 2026Updated on Apr 9, 2026

Find and contact your future customers

All-in-one prospecting platform

Try for free →

Back to hub

Leanstral by Mistral AI: The AI That Proves Your Code Is Correct

Niels Co-founder

Published on Mar 17, 2026Updated on Apr 9, 2026

On March 16, 2026, Mistral AI released Leanstral, the first open-source AI agent specifically designed for Lean 4, a proof assistant used in both mathematics and software engineering. In a landscape crowded with AI code generation tools, Leanstral stands apart with a radical promise: not just generating code, but mathematically proving it is correct.

But why should you care about formal verification? How does Leanstral actually perform against Claude and open-source alternatives? And what concrete applications does this technology unlock for businesses and engineering teams? This guide breaks it all down.

What Is Leanstral and Why Formal Verification Matters

The Problem with "Vibe Coding"

AI code generation agents (Copilot, Cursor, Claude Code) produce functional code most of the time. But in high-stakes domains (aerospace, finance, smart contracts, mathematical research), "it looks like it works" is not enough. Human review of AI-generated code is expensive, slow, and requires rare expertise.

Leanstral offers an alternative: the agent writes the code and produces a formal proof of its correctness. The Lean 4 compiler acts as a binary verifier. Either the proof compiles, or it does not. No gray area.

Lean 4: The Language Behind Leanstral

Lean 4, developed by Leonardo de Moura (formerly Microsoft Research), is both a proof assistant and a functional programming language. It is used by Google DeepMind (AlphaProof, IMO silver medal in 2024), Amazon (Cedar policy verification), and a community of over 10,000 members on Zulip.

The Mathlib library, which formalizes mathematics in Lean, has over 20,000 contributions and received $15 million in funding in 2025.

How Leanstral Works: Architecture and Technical Specifications

A Mixture-of-Experts Model Optimized for Proofs

Leanstral is built on a Sparse Mixture-of-Experts (MoE) architecture with the following specifications:

Specification	Value
Full name	Leanstral-120B-A6B-2603
Total parameters	~119 billion (128 experts)
Active parameters per token	~6.5 billion (4 of 128 experts)
Architecture	Sparse Mixture-of-Experts
Base model family	Mistral Small 4
Context window	256K tokens
Input modalities	Text and images
License	Apache 2.0

The principle is straightforward: each token activates only 4 of the 128 expert modules. This gives Leanstral the knowledge capacity of a 119-billion-parameter model at the inference cost of a 6.5-billion-parameter one. This 18x efficiency ratio is what enables dramatically lower costs compared to the competition.

Native Lean Compiler Integration via MCP

Unlike generalist models that produce text resembling Lean code, Leanstral interacts directly with the Lean 4 compiler via the MCP (Model Context Protocol). In practice, the agent can:

Check types in the Lean compiler
Execute proof tactics and observe results
Analyze error messages
Iteratively refine proofs in a live interactive loop

This native integration is a decisive technical advantage. The model does not "guess" proofs: it builds them in dialogue with the verifier.

Leanstral Benchmarks: Performance Against Claude and Open-Source Models

The FLTEval Benchmark: Realistic Conditions

Mistral AI introduced FLTEval, a new benchmark designed to evaluate proof engineering in real-world repository conditions. It is based on the FLT (Fermat's Last Theorem) project at Imperial College London, led by Professor Kevin Buzzard, with 55 contributors and EPSRC funding through 2029.

Unlike MiniF2F (which targets isolated competition math problems), FLTEval measures the ability to complete formal proofs in a realistic environment with imports, library dependencies, and multi-file proof structures.

Leanstral vs Claude: The Cost-Performance Ratio

Model	Cost per FLTEval run ($)	FLTEval Score
Leanstral pass@1	18	21.9
Leanstral pass@2	36	26.3
Leanstral pass@4	72	29.3
Leanstral pass@8	145	31.0
Leanstral pass@16	290	31.9
Claude Haiku 4.5	184	23.0
Claude Sonnet 4.6	549	23.7
Claude Opus 4.6	1,650	39.6

The numbers speak for themselves:

Leanstral pass@2 ($36) beats Sonnet 4.6 ($549) by 2.6 points, at 1/15th the cost.
Leanstral pass@2 beats Haiku 4.5 ($184) by 3.3 points, at 1/5th the cost.
Leanstral pass@16 ($290) beats Sonnet by 8 points, at half the cost.
Claude Opus 4.6 remains the quality leader (39.6) but costs $1,650, which is 46x more than Leanstral pass@2.

Leanstral vs Open-Source Models: Active Parameter Efficiency

Model	Active Parameters	FLTEval Score (best pass)
GLM5-744B-A40B	40B	~16.6 (plateaus)
Kimi-K2.5-1T-A32B	32B	~20.1 (plateaus)
Leanstral pass@1	6.5B	21.9
Qwen3.5-397B-A17B	17B	25.4 (pass@4)
Leanstral pass@2	6.5B	26.3
Leanstral pass@4	6.5B	29.3

With only 6.5 billion active parameters, Leanstral outperforms models using 5 to 6 times more. Qwen3.5 needs 4 attempts and 17 billion active parameters to reach 25.4. Leanstral surpasses that score on its second attempt.

Cost Comparison: Leanstral vs Alternatives

Solution	Estimated Cost	FLTEval Score	Open Source
Leanstral pass@2	$36	26.3	Yes (Apache 2.0)
Leanstral pass@16	$290	31.9	Yes (Apache 2.0)
Claude Haiku 4.5	$184	23.0	No
Claude Sonnet 4.6	$549	23.7	No
Claude Opus 4.6	$1,650	39.6	No
Self-hosted Leanstral	Hardware: 4x A100/H100	Same	Yes

Concrete Use Cases for Leanstral in Business

Smart Contract Verification and DeFi Security

Bugs in DeFi code have cost billions of dollars in recent years. Formal verification is the gold standard for guaranteeing that a smart contract does exactly what it claims. With Leanstral, the cost of a formal audit drops dramatically: a proof of correctness for $36 instead of hundreds of dollars with proprietary alternatives.

Mission-Critical Software: Aerospace, Finance, Healthcare

In industries where a software bug can cost lives or millions, formal verification is not a luxury but a regulatory requirement. Leanstral enables development teams to specify expected behavior in Lean 4, then automatically generate compliance proofs. The compiler then verifies the proof is valid.

Collaborative Mathematical Research

The FLT project (formalizing Fermat's Last Theorem) and Mathlib illustrate Leanstral's potential to accelerate formalized research. Researchers can delegate routine proofs to the agent and focus on the creative aspects of their work.

Cross-Language Proof Migration

One of the use cases demonstrated by Mistral AI is translating proofs from Rocq (formerly Coq) to Lean 4, preserving semantics and notation. This facilitates migrating academic or industrial projects from one ecosystem to another.

Verifying AI-Generated Code

The most strategic use case: formally verifying that code produced by AI agents (Copilot, Cursor, etc.) is correct. Leanstral embodies the vision of "trustworthy vibe coding" where humans specify what they want and the machine proves compliance.

Three Ways to Access Leanstral Today

1. Mistral Vibe (Zero Setup)

The /leanstall command in the Mistral Vibe CLI (version 2.5.0, released March 16, 2026) automatically configures the Leanstral agent. This is the fastest way to try it out.

2. Free API (Limited Period)

The labs-leanstral-2603 endpoint is available for free for a limited period. Mistral AI wants to collect real-world feedback to improve future versions.

3. Self-Hosting (Open-Source Weights)

Model weights are published under the Apache 2.0 license on Hugging Face (mistralai/Leanstral-120B-A6B-2603). Recommended setup: 4 A100 80GB or H100 GPUs, with vLLM and Flash Attention. Note: the Hugging Face page showed a temporary 404 error at launch.

Leanstral Limitations: What to Know Before Adopting

A Specialized Model, Not a General-Purpose Code Assistant

Leanstral is designed exclusively for Lean 4. It does not replace your general-purpose coding tools (Copilot, Claude Code, Cursor). If you need an assistant for Python, TypeScript, or SQL, this is not the right tool.

Claude Opus 4.6 Still Leads on Raw Quality

With an FLTEval score of 39.6 versus 31.9 for Leanstral pass@16, Opus maintains a significant lead. If your absolute priority is maximum accuracy and budget is not a constraint, Opus remains the best choice. The Hacker News community highlighted this point: a model specifically trained for this task should, in theory, outperform a generalist model.

The Performance Curve May Plateau

Some observers note that Leanstral's performance gains appear to diminish beyond pass@8. The jump from pass@8 (31.0) to pass@16 (31.9) represents only a 0.9-point gain for a doubling in cost.

Infrastructure Requirements for Self-Hosting

Self-hosting requires 4 high-end GPUs (A100 or H100), which represents a significant hardware investment. For teams without this infrastructure, the free API or Mistral Vibe remain the most accessible options.

Should You Use Leanstral for Formal Verification Projects?

Leanstral fills a genuine gap in the ecosystem. Before its release, options for AI-assisted formal verification were limited to expensive proprietary models (Claude Opus) or generalist models not optimized for Lean 4.

Leanstral sits at the intersection of three qualities: open source (Apache 2.0), specifically trained for proof engineering, and cost-effective. No other model currently occupies that exact space.

For CTOs and engineering leaders evaluating formal verification as part of their software quality strategy, Leanstral represents an accessible entry point. For researchers in formalized mathematics, it is an accelerator. And for the Lean 4 ecosystem as a whole, it is a strong signal that specialized AI for formal proofs is becoming a practical reality.

The real question is no longer "is AI-assisted formal verification possible?" but "how production-ready is it?" With Leanstral, Mistral AI offers a first answer.

Discover Emelia, your all-in-one prospecting tool.

Launch my campaign

Clear, transparent prices without hidden fees

No commitment, prices to help you increase your prospecting.

Start

€37

/month

Unlimited email sending

Connect 1 LinkedIn Accounts

Unlimited LinkedIn Actions

Email Warmup Included

Unlimited Scraping

Unlimited contacts

Grow

Best seller

€97

/month

Unlimited email sending

Up to 5 LinkedIn Accounts

Unlimited LinkedIn Actions

Unlimited Warmup

Unlimited contacts

1 CRM Integration

Scale

€297

/month

Unlimited email sending

Up to 20 LinkedIn Accounts

Unlimited LinkedIn Actions

Unlimited Warmup

Unlimited contacts

Multi CRM Integrations

Unlimited API Calls

Credits(optional)

You don't need credits if you just want to send emails or do actions on LinkedIn

May use it for :

Find Emails

AI Action

Phone Finder

Verify Emails

€19per month

1,000

1,000 Emails found

1,000 AI Actions

20 Number

4,000 Verify

5,000

10,000

50,000

100,000

1,000 Emails found

1,000 AI Actions

20 Number

4,000 Verify

€19per month

Discover other articles that might interest you !

See all articles

Software

Published on Jun 24, 2025

Kaspr vs Waalaxy: The Champions Redefining B2B Prospecting

Mathieu Co-founder

B2B Prospecting

Published on Jun 30, 2025

Zopto vs Waalaxy: Comparison of LinkedIn automation tools

Niels Co-founder

Software

Published on Jul 6, 2025

Kaspr vs RocketReach: The Ultimate Comparison of B2B Prospecting Tools for 2026

Niels Co-founder

B2B Prospecting

Published on Jun 26, 2025

Clearbit vs Cognism: Common Features and Differences

Niels Co-founder

B2B Prospecting

Published on Jul 2, 2025

Overloop vs Waalaxy vs Emelia: Which Tool Will Boost your B2B Prospecting?

Niels Co-founder

Software

Published on Jun 30, 2025

Salesflow vs Waalaxy: The Ultimate Battle of 2026

Niels Co-founder

Made with ❤ for Growth Marketers by Growth Marketers

Find and contact your future customers

Leanstral by Mistral AI: The AI That Proves Your Code Is Correct

What Is Leanstral and Why Formal Verification Matters

The Problem with "Vibe Coding"

Lean 4: The Language Behind Leanstral

How Leanstral Works: Architecture and Technical Specifications

A Mixture-of-Experts Model Optimized for Proofs

Native Lean Compiler Integration via MCP

Leanstral Benchmarks: Performance Against Claude and Open-Source Models

The FLTEval Benchmark: Realistic Conditions

Leanstral vs Claude: The Cost-Performance Ratio

Leanstral vs Open-Source Models: Active Parameter Efficiency

Cost Comparison: Leanstral vs Alternatives

Concrete Use Cases for Leanstral in Business

Smart Contract Verification and DeFi Security

Mission-Critical Software: Aerospace, Finance, Healthcare

Collaborative Mathematical Research

Cross-Language Proof Migration

Verifying AI-Generated Code

Three Ways to Access Leanstral Today

1. Mistral Vibe (Zero Setup)

2. Free API (Limited Period)

3. Self-Hosting (Open-Source Weights)

Leanstral Limitations: What to Know Before Adopting

A Specialized Model, Not a General-Purpose Code Assistant

Claude Opus 4.6 Still Leads on Raw Quality

The Performance Curve May Plateau

Infrastructure Requirements for Self-Hosting

Should You Use Leanstral for Formal Verification Projects?

Discover Emelia, your all-in-one prospecting tool.

Clear, transparent prices without hidden fees

Start

Grow

Scale

Credits(optional)

Discover other articles that might interest you !

Kaspr vs Waalaxy: The Champions Redefining B2B Prospecting

Zopto vs Waalaxy: Comparison of LinkedIn automation tools

Kaspr vs RocketReach: The Ultimate Comparison of B2B Prospecting Tools for 2026

Clearbit vs Cognism: Common Features and Differences

Overloop vs Waalaxy vs Emelia: Which Tool Will Boost your B2B Prospecting?

Salesflow vs Waalaxy: The Ultimate Battle of 2026

Useful links

About

Features

Follow us

Partners