Blog

Niels Co-founder

Publicado el 14 mar 2026Actualizado el 27 may 2026

Encuentra y contacta a tus futuros clientes

Plataforma de prospección todo-en-uno

Probar gratis →

Volver al hub

Blog

TensorZero: The LLMOps Stack Making LangSmith Obsolete?

Niels Co-founder

Publicado el 14 mar 2026Actualizado el 27 may 2026

At Emelia, we embed artificial intelligence at the core of our B2B prospecting platform: AI-assisted cold email writing, data enrichment, lead scoring. Every LLM call passes through critical infrastructure that must be fast, reliable, and optimizable. That is why we keep a close eye on the LLMOps ecosystem, and TensorZero immediately caught our attention. We have started evaluating it internally on parallel projects, and here is our complete analysis.

What Is TensorZero and Why Is Everyone Talking About It?

Tensorzero write in black on whith backround

TensorZero is an open-source LLMOps stack built primarily in Rust that unifies five essential building blocks of the LLM application lifecycle: the gateway, observability, optimization, evaluation, and experimentation. The stated ambition is clear: transform LLM applications from simple API wrappers into "defensible AI products."

The project was founded in January 2024 by Gabriel Bianconi (CEO, former CPO at Ondo Finance, Stanford BS & MS in Computer Science) and Viraj Mehta (CTO, CMU PhD in reinforcement learning, specialized in nuclear fusion and LLMs). The open-source release came in September 2024, and the repository immediately became the #1 trending repository of the week on GitHub, jumping from 3,000 to over 9,700 stars in a matter of months.

Today, TensorZero has accumulated 11,100 stars on GitHub, 769 forks, 124 contributors, and has raised $7.3 million in seed funding from FirstMark Capital, Bessemer Venture Partners, Bedrock, and DRW.

https://x.com/TensorZero/status/1957546109632483330

The Story Behind TensorZero: From Nuclear Fusion to LLMs

TensorZero's origin story is unusual. CTO Viraj Mehta spent years applying reinforcement learning (RL) to nuclear fusion reactors. In that domain, every data point costs roughly $30,000 for 5 seconds of collection. This extreme constraint forged an obsession: extracting maximum value from every available data point.

When Mehta and Bianconi started working together, they reconceptualized LLM applications as Partially Observable Markov Decision Processes (POMDPs). This theoretical framework, borrowed from RL research, models an LLM application as an agent making decisions in an uncertain environment, collecting feedback, and improving continuously.

https://x.com/thebigmehtaphor/status/2031775345473368126

The POMDP Framework Applied to LLM Applications

POMDP Element	LLM Equivalent
Agent (policy π)	The LLM functions themselves
State Space (S)	Unobserved factors (user intent, external systems)
Action Space (A)	LLM function outputs (text, tool calls)
Observation Space (O)	LLM function inputs (history, variables)
Transition (T)	Non-LLM application code + real-world changes
Reward (R)	Business KPIs (conversion, satisfaction, accuracy)

This framing is not just academic. It structures TensorZero's entire architecture and enables what the team calls the data flywheel: a self-reinforcing learning loop.

How the TensorZero Data Flywheel Works

The data flywheel is the central concept that sets TensorZero apart from competitors. It is a continuous cycle in four stages:

1. Collect. Every production inference is recorded in a structured format in ClickHouse. TensorZero does not store raw prompts: it records input variables, outputs, and feedback. This approach makes data provider-agnostic, meaning you can reuse the same data to fine-tune an OpenAI, Anthropic, or open-source model.

2. Optimize. Collected data is used to generate policy variants: prompt updates, supervised fine-tuning (SFT), preference fine-tuning (DPO), reinforcement learning from human feedback (RLHF), or inference-time optimization (Dynamic In-Context Learning, Best-of-N, Mixture-of-N).

3. Evaluate. Offline evaluations (backtests) on historical data validate each variant before deployment. TensorZero offers static evaluations (heuristics + LLM judges) and dynamic evaluations (end-to-end workflows).

4. Loop. Production traffic automatically generates new variants, evaluates them against KPIs, and closes the loop. Engineers focus on high-level decisions: what data to feed in, what feedback signals to use, what behaviors to incentivize.

The 5 Pillars: Gateway, Observability, Optimization, Evaluation, Experimentation

The Fastest LLM Gateway on the Market

The gateway is the component through which all LLM calls pass. Built in Rust, it delivers spectacular performance: under 1 ms P99 latency, even at 10,000 requests per second. For comparison, LiteLLM (written in Python) fails entirely at 1,000 QPS and already adds 25 to 100 times more latency than TensorZero at just 100 QPS.

Metric	LiteLLM (100 QPS)	LiteLLM (500 QPS)	LiteLLM (1,000 QPS)	TensorZero (10,000 QPS)
Mean latency	4.91 ms	7.45 ms	Failure	0.37 ms
P50	4.83 ms	5.81 ms	Failure	0.35 ms
P90	5.26 ms	10.02 ms	Failure	0.50 ms
P99	5.87 ms	39.69 ms	Failure	0.94 ms

These benchmarks were run on the same AWS c7i.xlarge instance (4 vCPUs, 8 GB RAM), according to the official TensorZero documentation.

The gateway supports streaming, tool use, structured generation (JSON mode), batch inference, multimodal inputs, caching, automatic retries, fallbacks, load balancing, and granular timeouts. It is compatible with the OpenAI SDK, TensorZero's Python client, or a raw HTTP API.

19+ LLM Providers Supported Natively

TensorZero natively supports over 19 major providers: OpenAI, Anthropic, AWS Bedrock, AWS SageMaker, Azure OpenAI, DeepSeek, Fireworks, GCP Vertex AI (Anthropic and Gemini), Google AI Studio, Groq, Hyperbolic, Mistral, OpenRouter, SGLang, TGI, Together AI, vLLM, xAI (Grok), and any OpenAI-compatible API (including Ollama for local models).

Full Observability with ClickHouse

All inferences and feedback are stored in your own ClickHouse database, a columnar database optimized for analytics at scale. TensorZero provides an open-source UI for monitoring, with the ability to drill down to individual inferences or observe aggregate patterns. OpenTelemetry (OTLP) export to third-party tools is also supported.

A crucial capability: you can replay historical inferences with new prompts or models for counterfactual analysis. This means you can test a new prompt against six months of past data without sending a single request to an LLM.

Optimization: From Fine-Tuning to Reinforcement Learning

TensorZero distinguishes itself through the depth of its optimization capabilities:

Model optimization: supervised fine-tuning (SFT), preference fine-tuning (DPO), reinforcement learning from human feedback (RLHF).

Prompt optimization: MIPROv2, DSPy integration, GEPA. These methods automatically generate optimized prompt variants from your production data.

Inference-time optimization: Dynamic In-Context Learning (DICL), which dynamically selects the best examples to inject into context; Best-of-N sampling, which generates N responses and selects the best; Mixture-of-N, which combines outputs from multiple models; and Chain-of-Thought (CoT).

All optimizations rely on structured production data tied to real KPIs. This is not lab optimization: it is continuous improvement based on real-world usage.

Evaluations: Unit Tests for LLMs

TensorZero offers two types of evaluations:

Static evaluations: heuristics and LLM judges. These are the equivalent of unit tests for your LLM functions. LLM judges are themselves TensorZero functions, meaning they can be optimized using the same mechanisms.

Dynamic evaluations: end-to-end workflows, the equivalent of integration tests. They verify that the entire chain works correctly.

Built-In Experimentation: A/B Testing and Bandits

Experimentation is native in TensorZero. You can run rigorous A/B tests (randomized controlled trials) across models, prompts, providers, and hyperparameters. The system automatically manages randomization in complex multi-turn workflows.

Going further, TensorZero supports adaptive experimentation with multi-armed bandits, which progressively allocate more traffic to the best-performing variants.

TensorZero Autopilot: The Automated AI Engineer

Launched in preview in version 2026.1.7 (February 2026), Autopilot is described as "Claude Code for LLM engineering." It is an automated AI engineer that operates on top of the TensorZero stack.

https://x.com/TensorZero/status/2018450123332763783

Autopilot can:

Analyze millions of inferences to surface error patterns and optimization opportunities
Recommend models and inference strategies to improve quality, cost, and latency
Generate and refine prompts based on human feedback, metrics, and evaluations
Drive optimization workflows including fine-tuning, RL, and knowledge distillation
Set up evaluations, prevent regressions, and align LLM judges to real-world scenarios
Run A/B tests to validate changes and identify winners

The TensorZero team claims that Autopilot has already produced "substantial performance improvements in use cases ranging from data extraction to customer support agents."

Autopilot is currently invite-only (waitlist). It represents the future monetization layer on top of the fully free open-source stack.

Why Rust Changes Everything for an LLM Gateway

The choice of Rust is deliberate. The team includes Aaron Hill, a Rust compiler maintainer, which speaks to the depth of technical expertise.

Here is why Rust is a decisive advantage for an LLM gateway:

The gateway is a hot path. Every LLM call flows through this layer. Even small latency additions multiply across millions of daily requests. At 10,000 QPS, a Python gateway adds 50 to 100 ms of pure overhead per request.

Memory safety without garbage collection pauses. Rust's ownership model eliminates both GC pauses (which cause latency spikes) and memory bugs (which cause production crashes).

Fearless concurrency. Rust's type system catches data races at compile time, not in production. For a concurrent gateway routing requests across dozens of LLM providers, this eliminates an entire category of bugs.

Operational predictability. Rust's performance is deterministic. Python's GIL (Global Interpreter Lock) creates throughput ceilings that only surface under production load.

https://x.com/TensorZero/status/1931367228772962353

TensorZero vs the Competition: LangSmith, Langfuse, LiteLLM, Helicone

How does TensorZero stack up against the most popular alternatives?

Feature	TensorZero	LangSmith	Langfuse	LiteLLM
Open source	100% (Apache 2.0)	Partial (LangSmith is commercial)	Partial (paid features)	Partial (enterprise tier)
Self-hosted	Yes	Partially	Yes	Yes
LLM Gateway	Yes	Via LangChain	No	Yes
Observability	Full OSS UI	Paid	Full UI (partial OSS)	Third-party integrations
Evaluations	Built-in	Paid	Built-in	No
A/B Testing	Native	No	No	No
Fine-tuning	Built-in (SFT, DPO, RLHF)	No	No	No
Inference-time opt	Yes (DICL, BoN, MoN)	No	No	No
Performance	< 1 ms P99 at 10K QPS	Slow (Python)	N/A	Fails at 1K QPS
Pricing	100% free	Paid	Freemium	Freemium

TensorZero vs LangSmith and LangChain

TensorZero cleanly separates application engineering from LLM optimization, where LangChain blends the two. LangSmith requires a separate paid subscription; TensorZero's observability is entirely free and open source. TensorZero is built for production; LangChain excels at rapid prototyping. Critically, TensorZero is language-agnostic (HTTP API), while LangChain only supports Python and JavaScript, according to the TensorZero comparison documentation.

TensorZero vs Langfuse

Langfuse does not offer an LLM gateway, meaning you need to combine Langfuse with another tool for that function. Langfuse has a more mature observability UI and a more advanced playground, but TensorZero is significantly stronger on optimization. The two tools can actually be combined, per the official comparison page.

TensorZero vs LiteLLM

The performance gap is the most striking point. TensorZero handles 10,000 QPS with under 1 ms at P99; LiteLLM fails at 1,000 QPS. Beyond the gateway, TensorZero adds evaluations, A/B testing, and optimization that none of the alternatives offer, according to the benchmark documentation. LiteLLM does support over 100 models compared to about twenty for TensorZero (extensible via any OpenAI-compatible API).

Real-World Use Cases for Businesses

Automating Code Changelogs at a Major European Bank

A case study published by TensorZero describes deployment at a major European bank. Engineers were required to write detailed changelogs for every GitLab merge request, a task most skipped. TensorZero was integrated into GitLab CI/CD pipelines using Dynamic In-Context Learning (DICL): every human correction to an AI-generated changelog automatically fed future requests. The entire setup was deployed fully on-premise with TensorZero + Ollama, with no data leaving the bank's infrastructure.

Data Extraction (NER)

On a named entity recognition task, a GPT-4o Mini model optimized with TensorZero outperformed GPT-4o (unoptimized), at a fraction of the cost and latency, using a small training dataset. Proof that systematic optimization can compensate for model size.

Reverse-Engineering Cursor's LLM Client

TensorZero demonstrated the ability to reverse-engineer the LLM client of Cursor (valued at $9.9 billion) to observe the prompts being used, A/B test different models, and optimize its own prompts and models. This demonstration went viral with over 11,700 views.

Applications for B2B SaaS

For a tool like Emelia, TensorZero's potential use cases are numerous:

Optimizing AI-generated writing: every AI-generated email can be evaluated via user feedback (open rate, reply rate), feeding the data flywheel to continuously improve quality
A/B testing models: testing GPT-4o vs Claude 3.7 vs Mistral on the same writing task, with real production metrics
Cost reduction: using fine-tuning to achieve GPT-4o performance with a lighter, cheaper model
Minimal latency: the Rust gateway ensures that LLM infrastructure is never the bottleneck

The Business Model: 100% Free, Really?

TensorZero is distributed under the Apache 2.0 license. No features are paid. Self-hosting is free: you bring your own LLM API keys, and TensorZero adds zero cost. Enterprise support is even free: a simple email to hello@tensorzero.com gets you a dedicated Slack or Teams channel.

Gabriel Bianconi explained to VentureBeat: "We realized very early on that we needed to make this open source, to give enterprises the confidence to do this." The open-source strategy directly addresses the vendor lock-in fear that enterprises feel about their sensitive AI data.

Future monetization will come from a managed service (Autopilot) that will include GPU infrastructure for fine-tuning, automated experiment management, and proactive optimization suggestions. FirstMark's Matt Turck summarized the situation in a tweet: "Been thinking about feedback loops in AI forever and those guys are the real deal."

https://x.com/mattturck/status/1957546109632483330

The 9-Person Team Behind TensorZero

The team, based in Brooklyn, New York, is remarkably dense in expertise:

Name	Role	Background
Gabriel Bianconi	CEO	CPO at Ondo Finance (DeFi, $1B+ AUM); Stanford BS & MS
Viraj Mehta	CTO	CMU PhD (RL for nuclear fusion + LLMs); Stanford BS & MS
Aaron Hill	Engineer	Rust compiler maintainer; AWS, Svix
Alan Mishler	Researcher	VP at J.P. Morgan AI Research; CMU PhD; 1,300+ citations
Andrew Jesson	Researcher	Columbia postdoc, Oxford PhD (LLMs); 4,000+ citations
Antoine Toussaint	Engineer	Staff SWE (Shopify, Illumio); ex-quant; Princeton PhD
Michelle Hui	ML/Product	Wing/Alphabet, UN; Cornell BS & MS
Shuyang Li	Engineer	Staff SWE at Google (LLM infra, search); Palantir
Simeon Lee	Design	Head of Design at Merge; design engineer AI/devtools

https://x.com/gabrielbianconi/status/2031773980734976161

TensorZero's Limitations: Who Should Skip It

Let us be honest about the limitations:

Learning curve. TensorZero uses a Configuration-as-Code (GitOps) approach for prompt management. If your team is not comfortable with TOML configuration files and Git workflows, onboarding will be slower than with a GUI-based tool like Langfuse or LangSmith.

Less mature observability UI. TensorZero's user interface is functional but less polished than Langfuse or LangSmith. If your priority is an elegant dashboard for non-technical stakeholders, other tools are better suited.

No dynamic routing by latency or cost. LiteLLM offers dynamic routing based on provider latency or cost. TensorZero only supports static routing for now.

Limited native provider count. With around 20 native providers versus over 100 for LiteLLM, TensorZero covers the major players but not the long tail. Any OpenAI-compatible API can be added, however.

No native SSO/access control. For large organizations, the lack of built-in SSO requires adding Nginx or OAuth2 Proxy, which complicates deployment.

Autopilot still in preview. The most promising feature is invite-only and not yet stabilized.

Who Should Skip It?

Teams just getting started with LLMs who want a quick prototype (LangChain is better suited)
Organizations that need a full GUI for non-technical users
Low-volume projects that do not justify the ClickHouse infrastructure
Teams without a DevOps/GitOps culture

Who Should Jump In?

Teams operating LLM applications in production at medium to large scale
Organizations that want to continuously optimize models with production data
Companies with strict performance requirements (latency, throughput)
Teams with data sovereignty constraints (self-hosted, on-premise)
AI startups looking to build a competitive moat through the learning loop

Our Verdict: TensorZero Deserves Your Attention

TensorZero is the most ambitious project in the open-source LLMOps ecosystem. Where most tools focus on a single aspect (observability for Langfuse, gateway for LiteLLM, prototyping for LangChain), TensorZero aims for complete integration of the LLM lifecycle in a single coherent stack.

The technical bet is bold: building in Rust for unmatched performance, modeling LLM applications as POMDPs to maximize learning, and making everything 100% free and open source. With 11,100 stars, $7.3 million in funding, and a team that includes a Rust compiler maintainer, a J.P. Morgan AI Research VP, and Staff engineers from Google and Shopify, the project has the means to match its ambitions.

As the LLM gateway guide from getmaxim.ai noted: "TensorZero targets teams with strong DevOps cultures that treat AI infrastructure with the same rigor as traditional backend systems."

For teams building serious LLM applications, the question is no longer whether you need an LLMOps stack, but which one to choose. TensorZero makes a compelling case for being that choice.

Descubre Emelia, tu herramienta de prospección todo en uno.

Lanzo mi campaña

Precios claros, transparentes y sin costes ocultos.

Sin compromiso, precios para ayudarte a aumentar tu prospección.

Start

37€

/mes

Envío ilimitado de emails

Conectar 1 cuenta de LinkedIn

Acciones LinkedIn ilimitadas

Email Warmup incluido

Extracción ilimitada

Contactos ilimitados

Grow

Popular

97€

/mes

Envío ilimitado de emails

Hasta 5 cuentas de LinkedIn

Acciones LinkedIn ilimitadas

Email Warmup ilimitado

Contactos ilimitados

1 integración CRM

Scale

297€

/mes

Envío ilimitado de emails

Hasta 20 cuentas de LinkedIn

Acciones LinkedIn ilimitadas

Email Warmup ilimitado

Contactos ilimitados

Conexión Multi CRM

Llamadas API ilimitadas

Créditos(opcional)

No necesitas créditos si solo quieres enviar emails o hacer acciones en LinkedIn

Se pueden utilizar para:

Buscar Emails

Acción IA

Buscar Números

Verificar Emails

€19por mes

1,000

1,000 Emails encontrados

1,000 Acciones IA

20 Números

4,000 Verificaciones

5,000

10,000

50,000

100,000

1,000 Emails encontrados

1,000 Acciones IA

20 Números

4,000 Verificaciones

€19por mes

Descubre otros artículos que te pueden interesar!

Ver todos los artículos

Software

Publicado el 24 jun 2025

Kaspr vs Waalaxy: los campeones que están redefiniendo la prospección B2B

Mathieu Co-founder

Prospección B2B

Publicado el 30 jun 2025

Zopto vs Waalaxy: Comparación de herramientas de automatización de LinkedIn

Niels Co-founder

Software

Publicado el 6 jul 2025

Kaspr vs RocketReach: la comparación definitiva de herramientas de prospección B2B para 2026

Niels Co-founder

Prospección B2B

Publicado el 26 jun 2025

Clearbit vs Cognism: características comunes y diferencias

Niels Co-founder

Prospección B2B

Publicado el 2 jul 2025

Overloop vs Waalaxy vs Emelia: ¿Qué herramienta utilizar para impulsar su prospección B2B?

Niels Co-founder

Software

Publicado el 30 jun 2025

Salesflow vs Waalaxy: La batalla definitiva de 2026

Niels Co-founder

Made with ❤ for Growth Marketers by Growth Marketers

Encuentra y contacta a tus futuros clientes

TensorZero: The LLMOps Stack Making LangSmith Obsolete?

What Is TensorZero and Why Is Everyone Talking About It?

The Story Behind TensorZero: From Nuclear Fusion to LLMs

The POMDP Framework Applied to LLM Applications

How the TensorZero Data Flywheel Works

The 5 Pillars: Gateway, Observability, Optimization, Evaluation, Experimentation

The Fastest LLM Gateway on the Market

19+ LLM Providers Supported Natively

Full Observability with ClickHouse

Optimization: From Fine-Tuning to Reinforcement Learning

Evaluations: Unit Tests for LLMs

Built-In Experimentation: A/B Testing and Bandits

TensorZero Autopilot: The Automated AI Engineer

Why Rust Changes Everything for an LLM Gateway

TensorZero vs the Competition: LangSmith, Langfuse, LiteLLM, Helicone

TensorZero vs LangSmith and LangChain

TensorZero vs Langfuse

TensorZero vs LiteLLM

Real-World Use Cases for Businesses

Automating Code Changelogs at a Major European Bank

Data Extraction (NER)

Reverse-Engineering Cursor's LLM Client

Applications for B2B SaaS

The Business Model: 100% Free, Really?

The 9-Person Team Behind TensorZero

TensorZero's Limitations: Who Should Skip It

Who Should Skip It?

Who Should Jump In?

Our Verdict: TensorZero Deserves Your Attention

Descubre Emelia, tu herramienta de prospección todo en uno.

Precios claros, transparentes y sin costes ocultos.

Start

Grow

Scale

Créditos(opcional)

Descubre otros artículos que te pueden interesar!

Kaspr vs Waalaxy: los campeones que están redefiniendo la prospección B2B

Zopto vs Waalaxy: Comparación de herramientas de automatización de LinkedIn

Kaspr vs RocketReach: la comparación definitiva de herramientas de prospección B2B para 2026

Clearbit vs Cognism: características comunes y diferencias

Overloop vs Waalaxy vs Emelia: ¿Qué herramienta utilizar para impulsar su prospección B2B?

Salesflow vs Waalaxy: La batalla definitiva de 2026

Enlaces útiles

Acerca de

Features

Síguenos

Socios