Blog

Niels Co-founder

Published on Mar 11, 2026Updated on May 27, 2026

Find and contact your future customers

All-in-one prospecting platform

Try for free →

Back to hub

Blog

Cloudflare /crawl: One API Call to Crawl an Entire Website

Niels Co-founder

Published on Mar 11, 2026Updated on May 27, 2026

At Emelia, our B2B prospecting tool, and Bridgers, our digital agency specializing in AI solutions, we build data pipelines that feed AI models every day. Web content extraction, prospect enrichment, automated competitive intelligence: web crawling sits at the core of our workflows. When Cloudflare drops an endpoint that can ingest an entire website in a single API call, it deserves a deep dive.

On March 10, 2026, Cloudflare launched /crawl, a new endpoint built into its Browser Rendering service. The announcement tweet from @CloudflareDev blew past 2 million impressions, 7,800 likes, and 8,600 bookmarks within 24 hours. The pitch is brutally simple: "One API call and an entire site crawled." No scripts. No browser management. Just the content in HTML, Markdown, or JSON.

How Does Cloudflare's /crawl API Work?

The system uses an asynchronous two-step process.

Step 1: Start the crawl. Send a POST request with a starting URL. The API immediately returns a job ID.

Step 2: Fetch results. Poll the API with that job ID using GET requests. Results stream in as pages are processed, with cursor-based pagination for large crawls.

The crawler automatically discovers URLs from three sources: the starting URL, the site's sitemap, and links found on each page. It respects robots.txt by default and identifies itself as a bot. Kathy Liao, Product Manager at Cloudflare, emphasized this repeatedly when facing community pushback:

Key Parameters

Parameter	Type	Description
url	String	Starting URL (required)
limit	Number	Maximum pages to crawl (default: 10, max: 100,000)
depth	Number	Maximum crawl depth (max: 100,000)
formats	Array	Output formats: html, markdown, json
render	Boolean	Execute JavaScript (default: true)
source	String	URL discovery: all, sitemaps, links
maxAge	Number	Cache duration in seconds (max: 7 days)
includePatterns	Array	Wildcard patterns to filter included URLs
excludePatterns	Array	Wildcard patterns to exclude URLs
modifiedSince	Number	Unix timestamp; only crawl pages modified after this date

The render: false option is a standout feature: it disables the headless browser and performs a simple HTTP fetch instead, making it significantly faster and cheaper. During the beta period, this mode is free.

Guide: Crawl a Website in One Line of Code

Here is how to launch a full crawl with curl:

```bash

# Step 1: Start the crawl curl -X POST \ "https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl" \ -H "Authorization: Bearer {api_token}" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "limit": 50, "formats": ["markdown", "html"], "render": true }'

# Response: { "success": true, "result": "job-id-xxx" }

# Step 2: Fetch results curl "https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/job-id-xxx" \ -H "Authorization: Bearer {api_token}" ```

Each page in the response includes the URL, title, status, and content in your requested formats. For dynamic sites built with React, Vue, or Angular, the render: true mode launches a real headless Chrome instance that executes JavaScript before extracting content.

For structured JSON extraction, you can provide a prompt or a schema:

{
  "url": "https://shop.example.com",
  "formats": [
    "json"
  ],
  "jsonOptions": {
    "prompt": "Extract the product name, price, and description",
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "product",
        "schema": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string"
            },
            "price": {
              "type": "number"
            },
            "description": {
              "type": "string"
            }
          }
        }
      }
    }
  }
}

This structured extraction uses Workers AI under the hood, which incurs additional costs.

Cloudflare Browser Rendering: Pricing and Limits

One of Cloudflare's strongest selling points is the price. Here is the full breakdown:

Free Plan (Workers Free)

Feature	Limit
Browser time	10 minutes per day
/crawl jobs per day	5
Max pages per crawl	100
REST API requests	6 per minute
Concurrent browsers	3

Paid Plan (Workers Paid, $5/month)

Feature	Limit
Browser hours included	10 hours/month
Extra browser time	$0.09/hour
REST API requests	600 per minute
Concurrent browsers	30
Max pages per crawl	100,000

The render: false mode (no JavaScript execution) is free during the beta and will later follow standard Workers pricing. Crawl jobs have a maximum runtime of 7 days, and results remain available for 14 days.

To put this in perspective: with the $5/month paid plan, you get 10 hours of browser rendering time included. If a 100-page crawl takes roughly 5 minutes of browser time, you can crawl approximately 12,000 pages per month for five dollars. Compare that to Firecrawl's Standard plan at $47/month for 100,000 pages, and the economics become compelling at scale.

Cloudflare /crawl vs Firecrawl vs Crawl4AI: Full Comparison

The web crawling market for AI applications is heating up fast. Here is how Cloudflare stacks up against the competition.

Feature	Cloudflare /crawl	Firecrawl	Crawl4AI	Jina Reader
Entry price	Free ($5/mo for paid plan)	Free (500 pages), then $19/mo	Free (open source)	Free (20 req/min without key)
Volume pricing	$0.09/browser hour	$47/mo (100k pages), $599/mo (1M pages)	Free (self-hosted)	Token-based (from $0.01/1M tokens)
Multi-page crawl	Yes (up to 100,000 pages)	Yes	Yes	No (single page)
Crawl depth	Up to 100,000 levels	Configurable	Configurable	N/A
Output formats	HTML, Markdown, JSON	HTML, Markdown, JSON, Screenshot	HTML, Markdown, JSON	Markdown, HTML
JavaScript rendering	Yes (headless Chrome)	Yes	Yes (Playwright/Chromium)	Yes (Puppeteer)
Structured AI extraction	Yes (Workers AI)	Yes (LLM extract)	Yes (LLM strategies)	No
Respects robots.txt	Yes (by default)	Optional	Configurable	Yes
Concurrent requests	30 (paid plan)	5 to 150 depending on plan	Unlimited (self-hosted)	2 to 500 depending on plan
Infrastructure	Serverless (Cloudflare edge)	Cloud SaaS	Self-hosted or Docker	Cloud SaaS
Open source	No	No	Yes (Apache 2.0)	Partially

When to Choose Cloudflare /crawl

If you are already in the Cloudflare ecosystem (Workers, R2, KV), integration is seamless. The cost-per-page is unbeatable for high-volume crawls thanks to time-based billing instead of per-page pricing. The render: false mode, free during beta, is perfect for static sites.

When to Choose Firecrawl

Firecrawl excels in developer experience with polished SDKs and AI-oriented features (LLM extraction, screenshots, site mapping). If you need a plug-and-play tool and do not want to manage infrastructure, it is a strong choice. However, per-page costs add up quickly at scale.

When to Choose Crawl4AI

With over 61,000 GitHub stars, Crawl4AI is the pick for teams that want total control. Open source, self-hosted, no rate limits imposed. Ideal for AI training pipelines or research projects on tight budgets.

When to Choose Jina Reader

Jina Reader is perfect for single-page conversion to LLM-friendly formats. Prepend https://r.jina.ai/ to any URL and you get clean Markdown. No native multi-page crawl, but unmatched simplicity for basic use cases.

Extract Website Data for AI with Cloudflare

Cloudflare's timing is not accidental. Demand for structured web data to feed AI models is exploding.

The crawl-to-refer ratio (how many times an AI bot visits a site versus how many visitors it sends back) has reached staggering levels: 1,700:1 for OpenAI, 73,000:1 for Anthropic according to Cloudflare's own data. AI bots are consuming web content at an industrial scale, and developers need reliable tools to do the same.

RAG Pipelines (Retrieval-Augmented Generation)

The most obvious use case is building knowledge bases for RAG systems. With /crawl, you can ingest an entire product documentation site in Markdown, chunk it, vectorize it, and inject it into an index so your AI agents answer with precision.

Automated Competitive Intelligence

Periodically crawl competitor websites to detect price changes, new products, or positioning shifts. The modifiedSince parameter lets you fetch only pages modified since your last crawl, enabling efficient differential crawls.

Large-Scale SEO Auditing

Extract all pages from a site to analyze title tags, meta descriptions, heading structure, internal links, and 404 errors. The JSON format with structured AI extraction delivers directly actionable data.

Real-World Use Cases for Cloudflare /crawl

Beyond theoretical scenarios, here are concrete use cases we are already seeing:

Content migration. Switching CMS platforms? Crawl the old site in Markdown, clean up the content, and import it into the new system. No more manual exports or unreliable plugins.

Compliance monitoring. Legal teams can automatically track legal notices, terms of service, and privacy policies across a portfolio of websites.

Training dataset construction. Machine Learning teams can build text corpora from public sources while respecting robots.txt, for fine-tuning specialized models.

Editorial content analysis. Marketing teams can analyze competitor content strategies: What topics do they cover? How frequently do they publish? What keywords are they targeting?

Knowledge base generation. Customer support teams can crawl their own documentation to build searchable knowledge bases. Feed the Markdown output into a vector database, connect it to a chatbot, and your support agents (human or AI) get instant access to every page of your docs.

Price monitoring at scale. E-commerce teams can track pricing across dozens of competitor sites. Use the JSON format with a prompt like "Extract product name and price" and get structured data ready for analysis, without writing custom parsers for each site.

The Cloudflare Irony: Selling the Lock and the Lockpick

The announcement sparked passionate reactions across the developer community. The company that built its reputation on anti-bot protection is now selling a crawling tool. As one SRE engineer put it:

A viral tweet from @TukiFromKL (496,000 impressions, 3,700 likes) called it the "biggest betrayal in tech this year." Kathy Liao's response from Cloudflare was immediate and unambiguous:

Cloudflare's position is clear: /crawl identifies as a bot, respects robots.txt, and does not bypass any anti-bot protections. If a site owner blocks bots, the crawl will fail. This is an approach that gives content owners control, unlike some crawlers that attempt to masquerade as human browsers.

Under the Hood: Technical Architecture

For developers who want to understand the internals, here are the key technical details.

The /crawl endpoint runs on Cloudflare's Browser Rendering infrastructure, which spins up headless Chrome instances across Cloudflare's global edge network. When you run a crawl with render: true, each page loads in a real browser instance, JavaScript executes, AJAX requests complete, and the final DOM is captured. This is what makes the tool capable of handling modern Single Page Applications (SPAs).

With render: false, the process is fundamentally different: Cloudflare performs a simple HTTP fetch via Workers, no browser involved. The result is raw HTML (no JavaScript rendering), but speed and cost are incomparable. This mode is ideal for documentation sites, static blogs, or any site that generates its HTML server-side.

The caching system is well designed. The maxAge parameter controls how long results are cached in R2 (Cloudflare's object storage). Matches are exact on URL. If you crawl the same site twice within the cache window, the second request is near-instant and consumes no browser time.

The modifiedSince parameter deserves special attention. It takes a Unix timestamp and only crawls pages modified after that date. Combined with caching, this enables extremely efficient differential crawls: one full initial pass, then incremental updates.

Finally, the filtering patterns (includePatterns and excludePatterns) use wildcards with * (one segment) and ** (all segments). For example, to crawl only a site's documentation: includePatterns: ["/docs/**"] and excludePatterns: ["/docs/legacy/**"]. Exclude rules always take priority over include rules.

The crawler also supports authentication via custom headers, cookies, and HTTP Basic Auth, letting you crawl password-protected staging environments or authenticated sections of a site. You can set a custom userAgent string and use rejectResourceTypes to block images, media, or fonts for faster crawls.

What Cloudflare /crawl Does Not Do

For a complete picture, here are the current limitations:

No image extraction. The /crawl endpoint returns text content only (HTML, Markdown, JSON). For screenshots, you need the separate /screenshot endpoint.

No protection bypass. If a site uses CAPTCHAs, Bot Fight Mode, or Cloudflare challenges, the crawl will be blocked. This is by design.

Open beta. The API is in open beta. Bugs exist. Some developers report "Crawl job not found" errors immediately after creating a job.

Limited free tier. The 5-job-per-day and 100-page-per-job limits on the free plan are restrictive for production use. The $5/month paid plan is nearly essential.

Who Should Use Cloudflare /crawl?

This is for you if you are building data pipelines for AI, need to programmatically crawl entire sites, are already in the Cloudflare ecosystem, or are looking for a cheaper alternative to Firecrawl at scale.

Skip it if you need to bypass anti-bot protections (this is not the tool for that), only need single-page conversion (Jina Reader will be simpler), or need total control over infrastructure (self-hosted Crawl4AI will be a better fit).

How to Get Started with Cloudflare /crawl

Here are the steps to start using the API:

Create a Cloudflare account at dash.cloudflare.com (free)
Generate an API token with Browser Rendering permissions in your account settings
Get your Account ID from the Workers dashboard
Launch your first crawl using the curl request described above
Upgrade to Workers Paid ($5/month) if you exceed free plan limits

The official documentation is available at developers.cloudflare.com/browser-rendering and covers all parameters, output formats, and advanced use cases.

The web is becoming an API for language models. Cloudflare, which handles over 20% of global web traffic, just built one of the most powerful taps to access it. And at $5 a month, that tap is open to everyone.

Discover Emelia, your all-in-one prospecting tool.

Launch my campaign

Clear, transparent prices without hidden fees

No commitment, prices to help you increase your prospecting.

Start

€37

/month

Unlimited email sending

Connect 1 LinkedIn Accounts

Unlimited LinkedIn Actions

Email Warmup Included

Unlimited Scraping

Unlimited contacts

Grow

Best seller

€97

/month

Unlimited email sending

Up to 5 LinkedIn Accounts

Unlimited LinkedIn Actions

Unlimited Warmup

Unlimited contacts

1 CRM Integration

Scale

€297

/month

Unlimited email sending

Up to 20 LinkedIn Accounts

Unlimited LinkedIn Actions

Unlimited Warmup

Unlimited contacts

Multi CRM Integrations

Unlimited API Calls

Credits(optional)

You don't need credits if you just want to send emails or do actions on LinkedIn

May use it for :

Find Emails

AI Action

Phone Finder

Verify Emails

€19per month

1,000

1,000 Emails found

1,000 AI Actions

20 Number

4,000 Verify

5,000

10,000

50,000

100,000

1,000 Emails found

1,000 AI Actions

20 Number

4,000 Verify

€19per month

Discover other articles that might interest you !

See all articles

Software

Published on Jun 24, 2025

Kaspr vs Waalaxy: The Champions Redefining B2B Prospecting

Mathieu Co-founder

Software

Published on Jul 3, 2025

Dux Soup vs Waalaxy: Which LinkedIn Automation Tool Should you Choose for your Prospecting?

Niels Co-founder

Software

Published on Jul 6, 2025

Kaspr vs RocketReach: The Ultimate Comparison of B2B Prospecting Tools for 2026

Niels Co-founder

B2B Prospecting

Published on Jun 26, 2025

Clearbit vs Cognism: Common Features and Differences

Niels Co-founder

B2B Prospecting

Published on Jul 2, 2025

Overloop vs Waalaxy vs Emelia: Which Tool Will Boost your B2B Prospecting?

Niels Co-founder

Software

Published on Jun 30, 2025

Salesflow vs Waalaxy: The Ultimate Battle of 2026

Niels Co-founder

Made with ❤ for Growth Marketers by Growth Marketers

Find and contact your future customers

Cloudflare /crawl: One API Call to Crawl an Entire Website

How Does Cloudflare's /crawl API Work?

Key Parameters

Guide: Crawl a Website in One Line of Code

Cloudflare Browser Rendering: Pricing and Limits

Free Plan (Workers Free)

Paid Plan (Workers Paid, $5/month)

Cloudflare /crawl vs Firecrawl vs Crawl4AI: Full Comparison

When to Choose Cloudflare /crawl

When to Choose Firecrawl

When to Choose Crawl4AI

When to Choose Jina Reader

Extract Website Data for AI with Cloudflare

RAG Pipelines (Retrieval-Augmented Generation)

Automated Competitive Intelligence

Large-Scale SEO Auditing

Real-World Use Cases for Cloudflare /crawl

The Cloudflare Irony: Selling the Lock and the Lockpick

Under the Hood: Technical Architecture

What Cloudflare /crawl Does Not Do

Who Should Use Cloudflare /crawl?

How to Get Started with Cloudflare /crawl

Discover Emelia, your all-in-one prospecting tool.

Clear, transparent prices without hidden fees

Start

Grow

Scale

Credits(optional)

Discover other articles that might interest you !

Kaspr vs Waalaxy: The Champions Redefining B2B Prospecting

Dux Soup vs Waalaxy: Which LinkedIn Automation Tool Should you Choose for your Prospecting?

Kaspr vs RocketReach: The Ultimate Comparison of B2B Prospecting Tools for 2026

Clearbit vs Cognism: Common Features and Differences

Overloop vs Waalaxy vs Emelia: Which Tool Will Boost your B2B Prospecting?

Salesflow vs Waalaxy: The Ultimate Battle of 2026

Useful links

About

Features

Follow us

Partners