Ground Truth.
AI, checked against the source.

News · 2026-07-04

Cloudflare Now Lets Sites Block AI Training Bots While Keeping Search

Cloudflare has split its AI bot controls into three separately toggleable categories, letting site owners block bots that scrape content to train AI models while still allowing search engines to index their pages. The change, live as of July 1, 2026 and covered by Overcentral and SEO Kreativ, is available on Cloudflare's free tier.

Key facts

Some background helps here, because the plumbing of the web is invisible to most people who use it. A large share of the internet's traffic is not humans clicking links but automated programs called crawlers or bots, which fetch pages on their own. Cloudflare sits in front of a huge portion of the world's websites as a kind of security and delivery layer, which gives it an unusual vantage point: it can see the bots arriving and decide, on the site owner's behalf, which ones to let through. That is what makes this change meaningful rather than symbolic.

For years, site owners who wanted to block AI bots faced a blunt instrument: block everything with a given user-agent string, or nothing. That was a real problem because "AI bot" covers wildly different behavior. A search crawler indexes a page and later sends a human reader back to it via a search result - the classic bargain the web has run on for two decades, where a site lets Google read its pages in exchange for the visitors Google sends back. A training crawler, by contrast, scrapes a page once to feed a model's training data and typically sends nothing back, ever. The content gets absorbed into a model, and the site that produced it never sees a visitor or a dollar in return. Until now, a site could not easily allow one and block the other, because both often used similar infrastructure and site owners lacked fine-grained tools to tell them apart.

Cloudflare's fix separates AI-related traffic into three named lanes. Search crawlers cover bots indexing pages for search results - the Googlebot-style traffic that still sends visitors back to the source. Agent crawlers cover AI assistants that fetch a page live, in the moment, to answer a specific user's question - the fetch-on-demand behavior that has become common as chatbots browse the web in real time to answer queries. Training crawlers cover bots that scrape pages specifically to build a model's training dataset, with no equivalent traffic returned to the publisher. A site owner can now flip these three switches independently, allowing Google-style indexing while shutting the door on model-training scrapers - a distinction that simply did not exist as a toggle before.

The analogy is a store owner deciding who gets through the front door. She wants foot traffic from the mall directory that lists her shop, because that directory sends her customers (search). She is fine with someone popping in to answer a customer's specific question on the spot (agent). But she does not want a competitor sending a truck to photograph her entire inventory for their own catalog, taking everything and giving nothing back (training). Same front door, three very different kinds of visitors, and now three different locks she can set independently.

Why it matters: publishers have spent the last two years watching AI crawlers harvest their content for free while search-referral traffic - the thing that actually pays their bills through ads and subscriptions - keeps declining. When a chatbot answers a question using a publisher's reporting without sending anyone to the publisher's site, the economic engine of the open web starts to sputter. Putting this control on the free tier removes the excuse that only large, well-resourced publishers could afford to fight back; a one-person blog now has the same three switches as a major news organization (related: The Race to Turn Documents into AI-Ready Text, on the broader scramble over how content becomes AI training material). The September 15 default-block on ad-supported pages is the sharper edge of the change: it flips the starting assumption from opt-out to opt-in for training and agent bots across a huge share of the web, meaning those bots have to be explicitly allowed rather than explicitly blocked.

The honest caveat: the default-block only applies to ad-supported pages, so coverage is not universal, and the reporting here is based on rollout coverage rather than Cloudflare's full technical documentation, so specifics could shift by plan tier. It also does not solve the separate problem of bots that simply ignore robots-style signals altogether - Cloudflare's controls only work against crawlers that respect them, which is an enforcement question distinct from the policy question. A determined scraper that lies about who it is remains a harder problem than a checkbox can fix. See also AI Agents for background on how these live-fetching agent bots actually work.


Primary source, verified: read the paper →

Key questions

What changed in Cloudflare's AI bot controls?

As of July 1, 2026, Cloudflare split AI bot controls into three separate switches - search crawlers, live agent crawlers, and training crawlers - so site owners can allow one type while blocking another.

Will this cost extra to use?

No, the new controls are available on Cloudflare's free tier, not gated behind a paid plan.

When do training and agent bots get blocked by default?

Starting September 15, 2026, Cloudflare will block agent and training crawlers by default on ad-supported pages, while search crawlers remain allowed.
Cite this

APA

Ground Truth. (2026, July 4). Cloudflare Now Lets Sites Block AI Training Bots While Keeping Search. Ground Truth. https://groundtruth.day/news/cloudflare-splits-ai-crawlers-into-three-switches.html

BibTeX

@misc{groundtruth:cloudflare-splits-ai-crawlers-into-three-switches,
  title  = {Cloudflare Now Lets Sites Block AI Training Bots While Keeping Search},
  author = {{Ground Truth}},
  year   = {2026},
  month  = {jul},
  url    = {https://groundtruth.day/news/cloudflare-splits-ai-crawlers-into-three-switches.html}
}

Topics: Cloudflare · web · AI crawlers · publishers · policy

Comments are replies to this story on Bluesky — reply with any Bluesky account to join in.