At Emelia, our B2B prospecting tool, and Bridgers, our digital agency specializing in AI solutions, we build data pipelines that feed AI models every day. Web content extraction, prospect enrichment, automated competitive intelligence: web crawling sits at the core of our workflows. When Cloudflare drops an endpoint that can ingest an entire website in a single API call, it deserves a deep dive.
On March 10, 2026, Cloudflare launched /crawl, a new endpoint built into its Browser Rendering service. The announcement tweet from @CloudflareDev blew past 2 million impressions, 7,800 likes, and 8,600 bookmarks within 24 hours. The pitch is brutally simple: "One API call and an entire site crawled." No scripts. No browser management. Just the content in HTML, Markdown, or JSON.
The system uses an asynchronous two-step process.
Step 1: Start the crawl. Send a POST request with a starting URL. The API immediately returns a job ID.
Step 2: Fetch results. Poll the API with that job ID using GET requests. Results stream in as pages are processed, with cursor-based pagination for large crawls.
The crawler automatically discovers URLs from three sources: the starting URL, the site's sitemap, and links found on each page. It respects robots.txt by default and identifies itself as a bot. Kathy Liao, Product Manager at Cloudflare, emphasized this repeatedly when facing community pushback:
Parameter | Type | Description |
|---|---|---|
url | String | Starting URL (required) |
limit | Number | Maximum pages to crawl (default: 10, max: 100,000) |
depth | Number | Maximum crawl depth (max: 100,000) |
formats | Array | Output formats: html, markdown, json |
render | Boolean | Execute JavaScript (default: true) |
source | String | URL discovery: all, sitemaps, links |
maxAge | Number | Cache duration in seconds (max: 7 days) |
includePatterns | Array | Wildcard patterns to filter included URLs |
excludePatterns | Array | Wildcard patterns to exclude URLs |
modifiedSince | Number | Unix timestamp; only crawl pages modified after this date |
The render: false option is a standout feature: it disables the headless browser and performs a simple HTTP fetch instead, making it significantly faster and cheaper. During the beta period, this mode is free.
Here is how to launch a full crawl with curl:
```bash
# Step 1: Start the crawl curl -X POST \ "https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl" \ -H "Authorization: Bearer {api_token}" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "limit": 50, "formats": ["markdown", "html"], "render": true }'
# Response: { "success": true, "result": "job-id-xxx" }
# Step 2: Fetch results curl "https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/job-id-xxx" \ -H "Authorization: Bearer {api_token}" ```
Each page in the response includes the URL, title, status, and content in your requested formats. For dynamic sites built with React, Vue, or Angular, the render: true mode launches a real headless Chrome instance that executes JavaScript before extracting content.
For structured JSON extraction, you can provide a prompt or a schema:
``json { "url": "https://shop.example.com", "formats": ["json"], "jsonOptions": { "prompt": "Extract the product name, price, and description", "response_format": { "type": "json_schema", "json_schema": { "name": "product", "schema": { "type": "object", "properties": { "name": { "type": "string" }, "price": { "type": "number" }, "description": { "type": "string" } } } } } } } ``
This structured extraction uses Workers AI under the hood, which incurs additional costs.
One of Cloudflare's strongest selling points is the price. Here is the full breakdown:
Feature | Limit |
|---|---|
Browser time | 10 minutes per day |
/crawl jobs per day | 5 |
Max pages per crawl | 100 |
REST API requests | 6 per minute |
Concurrent browsers | 3 |
Feature | Limit |
|---|---|
Browser hours included | 10 hours/month |
Extra browser time | $0.09/hour |
REST API requests | 600 per minute |
Concurrent browsers | 30 |
Max pages per crawl | 100,000 |
The render: false mode (no JavaScript execution) is free during the beta and will later follow standard Workers pricing. Crawl jobs have a maximum runtime of 7 days, and results remain available for 14 days.
To put this in perspective: with the $5/month paid plan, you get 10 hours of browser rendering time included. If a 100-page crawl takes roughly 5 minutes of browser time, you can crawl approximately 12,000 pages per month for five dollars. Compare that to Firecrawl's Standard plan at $47/month for 100,000 pages, and the economics become compelling at scale.
The web crawling market for AI applications is heating up fast. Here is how Cloudflare stacks up against the competition.
Feature | Cloudflare /crawl | Firecrawl | Crawl4AI | Jina Reader |
|---|---|---|---|---|
Entry price | Free ($5/mo for paid plan) | Free (500 pages), then $19/mo | Free (open source) | Free (20 req/min without key) |
Volume pricing | $0.09/browser hour | $47/mo (100k pages), $599/mo (1M pages) | Free (self-hosted) | Token-based (from $0.01/1M tokens) |
Multi-page crawl | Yes (up to 100,000 pages) | Yes | Yes | No (single page) |
Crawl depth | Up to 100,000 levels | Configurable | Configurable | N/A |
Output formats | HTML, Markdown, JSON | HTML, Markdown, JSON, Screenshot | HTML, Markdown, JSON | Markdown, HTML |
JavaScript rendering | Yes (headless Chrome) | Yes | Yes (Playwright/Chromium) | Yes (Puppeteer) |
Structured AI extraction | Yes (Workers AI) | Yes (LLM extract) | Yes (LLM strategies) | No |
Respects robots.txt | Yes (by default) | Optional | Configurable | Yes |
Concurrent requests | 30 (paid plan) | 5 to 150 depending on plan | Unlimited (self-hosted) | 2 to 500 depending on plan |
Infrastructure | Serverless (Cloudflare edge) | Cloud SaaS | Self-hosted or Docker | Cloud SaaS |
Open source | No | No | Yes (Apache 2.0) | Partially |
If you are already in the Cloudflare ecosystem (Workers, R2, KV), integration is seamless. The cost-per-page is unbeatable for high-volume crawls thanks to time-based billing instead of per-page pricing. The render: false mode, free during beta, is perfect for static sites.
Firecrawl excels in developer experience with polished SDKs and AI-oriented features (LLM extraction, screenshots, site mapping). If you need a plug-and-play tool and do not want to manage infrastructure, it is a strong choice. However, per-page costs add up quickly at scale.
With over 61,000 GitHub stars, Crawl4AI is the pick for teams that want total control. Open source, self-hosted, no rate limits imposed. Ideal for AI training pipelines or research projects on tight budgets.
Jina Reader is perfect for single-page conversion to LLM-friendly formats. Prepend https://r.jina.ai/ to any URL and you get clean Markdown. No native multi-page crawl, but unmatched simplicity for basic use cases.
Cloudflare's timing is not accidental. Demand for structured web data to feed AI models is exploding.
The crawl-to-refer ratio (how many times an AI bot visits a site versus how many visitors it sends back) has reached staggering levels: 1,700:1 for OpenAI, 73,000:1 for Anthropic according to Cloudflare's own data. AI bots are consuming web content at an industrial scale, and developers need reliable tools to do the same.
The most obvious use case is building knowledge bases for RAG systems. With /crawl, you can ingest an entire product documentation site in Markdown, chunk it, vectorize it, and inject it into an index so your AI agents answer with precision.
Periodically crawl competitor websites to detect price changes, new products, or positioning shifts. The modifiedSince parameter lets you fetch only pages modified since your last crawl, enabling efficient differential crawls.
Extract all pages from a site to analyze title tags, meta descriptions, heading structure, internal links, and 404 errors. The JSON format with structured AI extraction delivers directly actionable data.
Beyond theoretical scenarios, here are concrete use cases we are already seeing:
Content migration. Switching CMS platforms? Crawl the old site in Markdown, clean up the content, and import it into the new system. No more manual exports or unreliable plugins.
Compliance monitoring. Legal teams can automatically track legal notices, terms of service, and privacy policies across a portfolio of websites.
Training dataset construction. Machine Learning teams can build text corpora from public sources while respecting robots.txt, for fine-tuning specialized models.
Editorial content analysis. Marketing teams can analyze competitor content strategies: What topics do they cover? How frequently do they publish? What keywords are they targeting?
Knowledge base generation. Customer support teams can crawl their own documentation to build searchable knowledge bases. Feed the Markdown output into a vector database, connect it to a chatbot, and your support agents (human or AI) get instant access to every page of your docs.
Price monitoring at scale. E-commerce teams can track pricing across dozens of competitor sites. Use the JSON format with a prompt like "Extract product name and price" and get structured data ready for analysis, without writing custom parsers for each site.
The announcement sparked passionate reactions across the developer community. The company that built its reputation on anti-bot protection is now selling a crawling tool. As one SRE engineer put it:
A viral tweet from @TukiFromKL (496,000 impressions, 3,700 likes) called it the "biggest betrayal in tech this year." Kathy Liao's response from Cloudflare was immediate and unambiguous:
Cloudflare's position is clear: /crawl identifies as a bot, respects robots.txt, and does not bypass any anti-bot protections. If a site owner blocks bots, the crawl will fail. This is an approach that gives content owners control, unlike some crawlers that attempt to masquerade as human browsers.
For developers who want to understand the internals, here are the key technical details.
The /crawl endpoint runs on Cloudflare's Browser Rendering infrastructure, which spins up headless Chrome instances across Cloudflare's global edge network. When you run a crawl with render: true, each page loads in a real browser instance, JavaScript executes, AJAX requests complete, and the final DOM is captured. This is what makes the tool capable of handling modern Single Page Applications (SPAs).
With render: false, the process is fundamentally different: Cloudflare performs a simple HTTP fetch via Workers, no browser involved. The result is raw HTML (no JavaScript rendering), but speed and cost are incomparable. This mode is ideal for documentation sites, static blogs, or any site that generates its HTML server-side.
The caching system is well designed. The maxAge parameter controls how long results are cached in R2 (Cloudflare's object storage). Matches are exact on URL. If you crawl the same site twice within the cache window, the second request is near-instant and consumes no browser time.
The modifiedSince parameter deserves special attention. It takes a Unix timestamp and only crawls pages modified after that date. Combined with caching, this enables extremely efficient differential crawls: one full initial pass, then incremental updates.
Finally, the filtering patterns (includePatterns and excludePatterns) use wildcards with * (one segment) and ** (all segments). For example, to crawl only a site's documentation: includePatterns: ["/docs/**"] and excludePatterns: ["/docs/legacy/**"]. Exclude rules always take priority over include rules.
The crawler also supports authentication via custom headers, cookies, and HTTP Basic Auth, letting you crawl password-protected staging environments or authenticated sections of a site. You can set a custom userAgent string and use rejectResourceTypes to block images, media, or fonts for faster crawls.
For a complete picture, here are the current limitations:
No image extraction. The /crawl endpoint returns text content only (HTML, Markdown, JSON). For screenshots, you need the separate /screenshot endpoint.
No protection bypass. If a site uses CAPTCHAs, Bot Fight Mode, or Cloudflare challenges, the crawl will be blocked. This is by design.
Open beta. The API is in open beta. Bugs exist. Some developers report "Crawl job not found" errors immediately after creating a job.
Limited free tier. The 5-job-per-day and 100-page-per-job limits on the free plan are restrictive for production use. The $5/month paid plan is nearly essential.
This is for you if you are building data pipelines for AI, need to programmatically crawl entire sites, are already in the Cloudflare ecosystem, or are looking for a cheaper alternative to Firecrawl at scale.
Skip it if you need to bypass anti-bot protections (this is not the tool for that), only need single-page conversion (Jina Reader will be simpler), or need total control over infrastructure (self-hosted Crawl4AI will be a better fit).
Here are the steps to start using the API:
Create a Cloudflare account at dash.cloudflare.com (free)
Generate an API token with Browser Rendering permissions in your account settings
Get your Account ID from the Workers dashboard
Launch your first crawl using the curl request described above
Upgrade to Workers Paid ($5/month) if you exceed free plan limits
The official documentation is available at developers.cloudflare.com/browser-rendering and covers all parameters, output formats, and advanced use cases.
The web is becoming an API for language models. Cloudflare, which handles over 20% of global web traffic, just built one of the most powerful taps to access it. And at $5 a month, that tap is open to everyone.

No commitment, prices to help you increase your prospecting.
You don't need credits if you just want to send emails or do actions on LinkedIn
May use it for :
Find Emails
AI Action
Phone Finder
Verify Emails