Cloudflare /crawl: One API Call to Crawl an Entire Website

Niels
Niels Co-founder
Veröffentlicht am 11. März 2026

At Emelia, our B2B prospecting tool, and Bridgers, our digital agency specializing in AI solutions, we build data pipelines that feed AI models every day. Web content extraction, prospect enrichment, automated competitive intelligence: web crawling sits at the core of our workflows. When Cloudflare drops an endpoint that can ingest an entire website in a single API call, it deserves a deep dive.

On March 10, 2026, Cloudflare launched /crawl, a new endpoint built into its Browser Rendering service. The announcement tweet from @CloudflareDev blew past 2 million impressions, 7,800 likes, and 8,600 bookmarks within 24 hours. The pitch is brutally simple: "One API call and an entire site crawled." No scripts. No browser management. Just the content in HTML, Markdown, or JSON.

How Does Cloudflare's /crawl API Work?

The system uses an asynchronous two-step process.

Step 1: Start the crawl. Send a POST request with a starting URL. The API immediately returns a job ID.

Step 2: Fetch results. Poll the API with that job ID using GET requests. Results stream in as pages are processed, with cursor-based pagination for large crawls.

The crawler automatically discovers URLs from three sources: the starting URL, the site's sitemap, and links found on each page. It respects robots.txt by default and identifies itself as a bot. Kathy Liao, Product Manager at Cloudflare, emphasized this repeatedly when facing community pushback:

Key Parameters

Parameter

Type

Description

url

String

Starting URL (required)

limit

Number

Maximum pages to crawl (default: 10, max: 100,000)

depth

Number

Maximum crawl depth (max: 100,000)

formats

Array

Output formats: html, markdown, json

render

Boolean

Execute JavaScript (default: true)

source

String

URL discovery: all, sitemaps, links

maxAge

Number

Cache duration in seconds (max: 7 days)

includePatterns

Array

Wildcard patterns to filter included URLs

excludePatterns

Array

Wildcard patterns to exclude URLs

modifiedSince

Number

Unix timestamp; only crawl pages modified after this date

The render: false option is a standout feature: it disables the headless browser and performs a simple HTTP fetch instead, making it significantly faster and cheaper. During the beta period, this mode is free.

Guide: Crawl a Website in One Line of Code

Here is how to launch a full crawl with curl:

```bash

# Step 1: Start the crawl curl -X POST \ "https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl" \ -H "Authorization: Bearer {api_token}" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "limit": 50, "formats": ["markdown", "html"], "render": true }'

# Response: { "success": true, "result": "job-id-xxx" }

# Step 2: Fetch results curl "https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/job-id-xxx" \ -H "Authorization: Bearer {api_token}" ```

Each page in the response includes the URL, title, status, and content in your requested formats. For dynamic sites built with React, Vue, or Angular, the render: true mode launches a real headless Chrome instance that executes JavaScript before extracting content.

For structured JSON extraction, you can provide a prompt or a schema:

``json { "url": "https://shop.example.com", "formats": ["json"], "jsonOptions": { "prompt": "Extract the product name, price, and description", "response_format": { "type": "json_schema", "json_schema": { "name": "product", "schema": { "type": "object", "properties": { "name": { "type": "string" }, "price": { "type": "number" }, "description": { "type": "string" } } } } } } } ``

This structured extraction uses Workers AI under the hood, which incurs additional costs.

Cloudflare Browser Rendering: Pricing and Limits

One of Cloudflare's strongest selling points is the price. Here is the full breakdown:

Free Plan (Workers Free)

Feature

Limit

Browser time

10 minutes per day

/crawl jobs per day

5

Max pages per crawl

100

REST API requests

6 per minute

Concurrent browsers

3

Paid Plan (Workers Paid, $5/month)

Feature

Limit

Browser hours included

10 hours/month

Extra browser time

$0.09/hour

REST API requests

600 per minute

Concurrent browsers

30

Max pages per crawl

100,000

The render: false mode (no JavaScript execution) is free during the beta and will later follow standard Workers pricing. Crawl jobs have a maximum runtime of 7 days, and results remain available for 14 days.

To put this in perspective: with the $5/month paid plan, you get 10 hours of browser rendering time included. If a 100-page crawl takes roughly 5 minutes of browser time, you can crawl approximately 12,000 pages per month for five dollars. Compare that to Firecrawl's Standard plan at $47/month for 100,000 pages, and the economics become compelling at scale.

Cloudflare /crawl vs Firecrawl vs Crawl4AI: Full Comparison

The web crawling market for AI applications is heating up fast. Here is how Cloudflare stacks up against the competition.

Feature

Cloudflare /crawl

Firecrawl

Crawl4AI

Jina Reader

Entry price

Free ($5/mo for paid plan)

Free (500 pages), then $19/mo

Free (open source)

Free (20 req/min without key)

Volume pricing

$0.09/browser hour

$47/mo (100k pages), $599/mo (1M pages)

Free (self-hosted)

Token-based (from $0.01/1M tokens)

Multi-page crawl

Yes (up to 100,000 pages)

Yes

Yes

No (single page)

Crawl depth

Up to 100,000 levels

Configurable

Configurable

N/A

Output formats

HTML, Markdown, JSON

HTML, Markdown, JSON, Screenshot

HTML, Markdown, JSON

Markdown, HTML

JavaScript rendering

Yes (headless Chrome)

Yes

Yes (Playwright/Chromium)

Yes (Puppeteer)

Structured AI extraction

Yes (Workers AI)

Yes (LLM extract)

Yes (LLM strategies)

No

Respects robots.txt

Yes (by default)

Optional

Configurable

Yes

Concurrent requests

30 (paid plan)

5 to 150 depending on plan

Unlimited (self-hosted)

2 to 500 depending on plan

Infrastructure

Serverless (Cloudflare edge)

Cloud SaaS

Self-hosted or Docker

Cloud SaaS

Open source

No

No

Yes (Apache 2.0)

Partially

When to Choose Cloudflare /crawl

If you are already in the Cloudflare ecosystem (Workers, R2, KV), integration is seamless. The cost-per-page is unbeatable for high-volume crawls thanks to time-based billing instead of per-page pricing. The render: false mode, free during beta, is perfect for static sites.

When to Choose Firecrawl

Firecrawl excels in developer experience with polished SDKs and AI-oriented features (LLM extraction, screenshots, site mapping). If you need a plug-and-play tool and do not want to manage infrastructure, it is a strong choice. However, per-page costs add up quickly at scale.

When to Choose Crawl4AI

With over 61,000 GitHub stars, Crawl4AI is the pick for teams that want total control. Open source, self-hosted, no rate limits imposed. Ideal for AI training pipelines or research projects on tight budgets.

When to Choose Jina Reader

Jina Reader is perfect for single-page conversion to LLM-friendly formats. Prepend https://r.jina.ai/ to any URL and you get clean Markdown. No native multi-page crawl, but unmatched simplicity for basic use cases.

Extract Website Data for AI with Cloudflare

Cloudflare's timing is not accidental. Demand for structured web data to feed AI models is exploding.

The crawl-to-refer ratio (how many times an AI bot visits a site versus how many visitors it sends back) has reached staggering levels: 1,700:1 for OpenAI, 73,000:1 for Anthropic according to Cloudflare's own data. AI bots are consuming web content at an industrial scale, and developers need reliable tools to do the same.

RAG Pipelines (Retrieval-Augmented Generation)

The most obvious use case is building knowledge bases for RAG systems. With /crawl, you can ingest an entire product documentation site in Markdown, chunk it, vectorize it, and inject it into an index so your AI agents answer with precision.

Automated Competitive Intelligence

Periodically crawl competitor websites to detect price changes, new products, or positioning shifts. The modifiedSince parameter lets you fetch only pages modified since your last crawl, enabling efficient differential crawls.

Large-Scale SEO Auditing

Extract all pages from a site to analyze title tags, meta descriptions, heading structure, internal links, and 404 errors. The JSON format with structured AI extraction delivers directly actionable data.

Real-World Use Cases for Cloudflare /crawl

Beyond theoretical scenarios, here are concrete use cases we are already seeing:

Content migration. Switching CMS platforms? Crawl the old site in Markdown, clean up the content, and import it into the new system. No more manual exports or unreliable plugins.

Compliance monitoring. Legal teams can automatically track legal notices, terms of service, and privacy policies across a portfolio of websites.

Training dataset construction. Machine Learning teams can build text corpora from public sources while respecting robots.txt, for fine-tuning specialized models.

Editorial content analysis. Marketing teams can analyze competitor content strategies: What topics do they cover? How frequently do they publish? What keywords are they targeting?

Knowledge base generation. Customer support teams can crawl their own documentation to build searchable knowledge bases. Feed the Markdown output into a vector database, connect it to a chatbot, and your support agents (human or AI) get instant access to every page of your docs.

Price monitoring at scale. E-commerce teams can track pricing across dozens of competitor sites. Use the JSON format with a prompt like "Extract product name and price" and get structured data ready for analysis, without writing custom parsers for each site.

The Cloudflare Irony: Selling the Lock and the Lockpick

The announcement sparked passionate reactions across the developer community. The company that built its reputation on anti-bot protection is now selling a crawling tool. As one SRE engineer put it:

A viral tweet from @TukiFromKL (496,000 impressions, 3,700 likes) called it the "biggest betrayal in tech this year." Kathy Liao's response from Cloudflare was immediate and unambiguous:

Cloudflare's position is clear: /crawl identifies as a bot, respects robots.txt, and does not bypass any anti-bot protections. If a site owner blocks bots, the crawl will fail. This is an approach that gives content owners control, unlike some crawlers that attempt to masquerade as human browsers.

Under the Hood: Technical Architecture

For developers who want to understand the internals, here are the key technical details.

The /crawl endpoint runs on Cloudflare's Browser Rendering infrastructure, which spins up headless Chrome instances across Cloudflare's global edge network. When you run a crawl with render: true, each page loads in a real browser instance, JavaScript executes, AJAX requests complete, and the final DOM is captured. This is what makes the tool capable of handling modern Single Page Applications (SPAs).

With render: false, the process is fundamentally different: Cloudflare performs a simple HTTP fetch via Workers, no browser involved. The result is raw HTML (no JavaScript rendering), but speed and cost are incomparable. This mode is ideal for documentation sites, static blogs, or any site that generates its HTML server-side.

The caching system is well designed. The maxAge parameter controls how long results are cached in R2 (Cloudflare's object storage). Matches are exact on URL. If you crawl the same site twice within the cache window, the second request is near-instant and consumes no browser time.

The modifiedSince parameter deserves special attention. It takes a Unix timestamp and only crawls pages modified after that date. Combined with caching, this enables extremely efficient differential crawls: one full initial pass, then incremental updates.

Finally, the filtering patterns (includePatterns and excludePatterns) use wildcards with * (one segment) and ** (all segments). For example, to crawl only a site's documentation: includePatterns: ["/docs/**"] and excludePatterns: ["/docs/legacy/**"]. Exclude rules always take priority over include rules.

The crawler also supports authentication via custom headers, cookies, and HTTP Basic Auth, letting you crawl password-protected staging environments or authenticated sections of a site. You can set a custom userAgent string and use rejectResourceTypes to block images, media, or fonts for faster crawls.

What Cloudflare /crawl Does Not Do

For a complete picture, here are the current limitations:

No image extraction. The /crawl endpoint returns text content only (HTML, Markdown, JSON). For screenshots, you need the separate /screenshot endpoint.

No protection bypass. If a site uses CAPTCHAs, Bot Fight Mode, or Cloudflare challenges, the crawl will be blocked. This is by design.

Open beta. The API is in open beta. Bugs exist. Some developers report "Crawl job not found" errors immediately after creating a job.

Limited free tier. The 5-job-per-day and 100-page-per-job limits on the free plan are restrictive for production use. The $5/month paid plan is nearly essential.

Who Should Use Cloudflare /crawl?

This is for you if you are building data pipelines for AI, need to programmatically crawl entire sites, are already in the Cloudflare ecosystem, or are looking for a cheaper alternative to Firecrawl at scale.

Skip it if you need to bypass anti-bot protections (this is not the tool for that), only need single-page conversion (Jina Reader will be simpler), or need total control over infrastructure (self-hosted Crawl4AI will be a better fit).

How to Get Started with Cloudflare /crawl

Here are the steps to start using the API:

  1. Create a Cloudflare account at dash.cloudflare.com (free)

  2. Generate an API token with Browser Rendering permissions in your account settings

  3. Get your Account ID from the Workers dashboard

  4. Launch your first crawl using the curl request described above

  5. Upgrade to Workers Paid ($5/month) if you exceed free plan limits

The official documentation is available at developers.cloudflare.com/browser-rendering and covers all parameters, output formats, and advanced use cases.

The web is becoming an API for language models. Cloudflare, which handles over 20% of global web traffic, just built one of the most powerful taps to access it. And at $5 a month, that tap is open to everyone.

logo emelia

Entdecken Sie Emelia, Ihre All-in-One-Software für prospektion.

logo emelia

Klare, transparente Preise ohne versteckte Kosten.

Keine Verpflichtung, Preise, die Ihnen helfen, Ihre Akquise zu steigern.

Start

37€

/Monat

Unbegrenztes E-Mail-Versand

1 LinkedIn-Konto verbinden

Unbegrenzte LinkedIn-Aktionen

E-Mail-Warm-up inklusive

Unbegrenztes Scraping

Unbegrenzte Kontakte

Grow

Beliebt
arrow-right
97€

/Monat

Unbegrenztes E-Mail-Versand

Bis zu 5 LinkedIn-Konten

Unbegrenzte LinkedIn-Aktionen

Unbegrenztes Warm-up

Unbegrenzte Kontakte

1 CRM-Integration

Scale

297€

/Monat

Unbegrenztes E-Mail-Versand

Bis zu 20 LinkedIn-Konten

Unbegrenzte LinkedIn-Aktionen

Unbegrenztes Warm-up

Unbegrenzte Kontakte

Multi-CRM-Verbindung

Unbegrenzte API-Aufrufe

Credits(optional)

Sie benötigen keine Credits, wenn Sie nur E-Mails senden oder auf LinkedIn-Aktionen ausführen möchten

Können verwendet werden für:

E-Mails finden

KI-Aktion

Nummern finden

E-Mails verifizieren

1,000
5,000
10,000
50,000
100,000
1,000 Gefundene E-Mails
1,000 KI-Aktionen
20 Nummern
4,000 Verifizierungen
19pro Monat

Entdecken Sie andere Artikel, die Sie interessieren könnten!

Alle Artikel ansehen
MarieMarie Head Of Sales
Weiterlesen
MathieuMathieu Co-founder
Weiterlesen
NielsNiels Co-founder
Weiterlesen
MarieMarie Head Of Sales
Weiterlesen
MarieMarie Head Of Sales
Weiterlesen
MarieMarie Head Of Sales
Weiterlesen
Made with ❤ for Growth Marketers by Growth Marketers
Copyright © 2026 Emelia All Rights Reserved