8 min readZypact Research

27% of Websites Silently Block ChatGPT — Is Yours One of Them?

We audited 217 B2B websites. More than 1 in 4 are blocking at least one major AI crawler — without knowing it.

Most website owners obsess over Google rankings. They check their PageSpeed score, fix broken links, and submit sitemaps. But in 2026, there's a new category of crawler you've probably never thought about — and 27% of B2B sites are blocking it without knowing it.

That crawler belongs to ChatGPT.

And if ChatGPT can't read your website, it can't cite your brand when a potential buyer asks “what's the best [your category] tool?”


What Are LLM Crawlers?

LLM crawlers are automated bots sent by AI companies to read and index web content. Just like Googlebot crawls your site to index it for search, AI companies send their own bots to train their models and power real-time citations. According to OpenAI's GPTBot documentation, GPTBot is used to “improve AI models” and “train them to be helpful, accurate, and safe.”

CrawlerAI SystemWhat It Does
GPTBotOpenAI / ChatGPTTraining data and search citations
OAI-SearchBotChatGPT SearchReal-time web citations
ChatGPT-UserChatGPTLive browsing during conversations
ClaudeBotAnthropic / ClaudeTraining data
Claude-SearchBotClaudeSearch citations
PerplexityBotPerplexityReal-time citations
Google-ExtendedGeminiAI training data
GooglebotGoogle AI OverviewsAI-generated answers

If any of these bots can't access your site, your brand becomes invisible in that AI system's answers.


Why Are Sites Blocking AI Crawlers?

Most AI crawler blocks are accidental. In our audit of 217 B2B websites, bot protection systems, robots.txt rules, and hosting-level rate limits accounted for nearly all crawlability issues — with the site owner unaware in every case.

1. Cloudflare and CDN Bot Protection

Many websites use Cloudflare's “Bot Fight Mode” or similar WAF rules to block malicious bots. The problem? These rules often classify AI crawlers as suspicious traffic. According to Cloudflare's bot management documentation, bot scores are assigned based on behavioral analysis — and AI crawlers, which fetch pages rapidly and systematically, often receive low scores that trigger automatic blocks.

GPTBot, ClaudeBot, and PerplexityBot are relatively new. Many Cloudflare rule sets were written before they existed — so they get caught in the net.

2. robots.txt Rules

The robots.txt file tells crawlers what they can and can't access. Many sites have rules like:

User-agent: *
Disallow: /

This blocks every crawler — including AI ones. Some sites added specific blocks after OpenAI launched GPTBot in 2023, without realizing other AI crawlers would follow. Anthropic's documentation explicitly states that ClaudeBot respects robots.txt directives.

3. Rate Limiting and IP Blocking

If your hosting provider rate-limits unknown user agents, AI crawlers may get blocked before they can read your content. They hit your site, get a 429 or 403 error, and move on — never indexing your pages.


The 27% Problem — Our Data

Between March and May 2026, we audited 217 B2B websites across the US, Canada, and UK using Zypact's LLM Crawler Audit. Every site was tested with real HTTP requests from all 8 major AI crawlers.

FindingPercentage of Sites
Block at least 1 LLM crawler27%
Block GPTBot specifically14%
Block PerplexityBot11%
Block all AI crawlers entirely8%
Average TTFB (accessible sites)187ms

Methodology: Each site was tested with direct HTTP requests simulating each crawler's user agent string, cross-referenced with robots.txt parsing. A site was classified as “blocked” if it returned HTTP 403, 429, or a robots.txt Disallow rule for that specific user agent. CDN-level blocks were identified through content-length analysis.

The companies with blocked crawlers? They're still investing in SEO, publishing content, and wondering why they never appear in AI-generated answers for their category. The answer is sitting in their robots.txt or Cloudflare settings.


How to Check If Your Site Is Blocked

The manual way is painful. You'd need to simulate HTTP requests from each bot's user agent, check your robots.txt, and test your CDN configuration.

Or you can use Zypact's free LLM Crawler Audit — it tests all 8 major AI crawlers in 20 seconds.

Here's what a clean audit looks like:

✅ GPTBot          — accessible · TTFB 32ms · HTTP 200
✅ OAI-SearchBot   — accessible · TTFB 44ms · HTTP 200
✅ ClaudeBot       — accessible · TTFB 28ms · HTTP 200
✅ PerplexityBot   — accessible · TTFB 51ms · HTTP 200
✅ Google-Extended — accessible · TTFB 38ms · HTTP 200

Crawlability Score: 100/100

Here's what a blocked site looks like:

❌ GPTBot          — BLOCKED · robots.txt Disallow
❌ ClaudeBot       — BLOCKED · CDN/WAF rule
✅ PerplexityBot   — accessible · TTFB 284ms · HTTP 200

Crawlability Score: 37/100

How to Fix It

Fix 1 — Update your robots.txt

Check your robots.txt (visit yourdomain.com/robots.txt) and add explicit allow rules for all AI crawlers:

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

Fix 2 — Review Your Cloudflare Settings

  1. Go to Security → Bots
  2. Disable “Bot Fight Mode” or set it to “Definitely Automated” only
  3. Add firewall rules to explicitly allow AI crawler user agents
  4. Check your WAF rules for any broad bot blocks

Fix 3 — Check Your Hosting Provider

Some managed hosting providers (WP Engine, Kinsta, SiteGround) have aggressive bot protection enabled by default. Contact your provider and ask them to whitelist AI crawler user agents.


Does Crawlability Guarantee AI Citations?

No — but it's the prerequisite.

Think of it like this: if Googlebot can't crawl your site, you won't rank on Google. Crawlability is the floor, not the ceiling.

Once AI crawlers can access your content, you still need to optimize it for AI citation. That's where GEO Score comes in — measuring whether your content structure makes it easy for LLMs to extract and cite your expertise.

But none of that matters if the front door is locked.


Frequently Asked Questions

What is GPTBot?

GPTBot is the web crawler operated by OpenAI. It crawls websites to collect training data for ChatGPT and to power real-time citations in ChatGPT Search. Its user agent string is GPTBot/1.0. OpenAI provides an official GPTBot documentation page with instructions for site owners.

Does blocking GPTBot affect my SEO on Google?

No. GPTBot is entirely separate from Googlebot. Blocking GPTBot has no impact on your Google search rankings. However, it prevents your content from being cited in ChatGPT responses.

Should I allow all AI crawlers on my site?

Generally yes, if you want to appear in AI-generated answers. The exception is if you have proprietary or paywalled content — in which case selective blocking by URL path makes more sense than blanket blocks.

Can ChatGPT cite pages it cannot crawl?

Only if the content was included in its training data before the block was implemented. For real-time citations (ChatGPT Search, Perplexity), the crawler must be able to access the page at citation time.

How often do AI crawlers revisit websites?

Based on observations from our crawler audits, high-authority sites are typically revisited within weeks rather than months. Lower-traffic sites may be crawled less frequently — the exact cadence varies by AI system and is not publicly documented.

What is the difference between GPTBot and OAI-SearchBot?

GPTBot primarily collects training data. OAI-SearchBot powers real-time citations in ChatGPT Search. Both need access to your site for full ChatGPT visibility.


Check Your Site in 20 Seconds

Zypact's LLM Crawler Audit is free, instant, and requires no signup.

It tests all 8 major AI crawlers and gives you a Crawlability Score with specific recommendations for any issues found.

Test your site on Zypact — it's free

The brands showing up in ChatGPT answers right now didn't get lucky. They made sure the door was open.

Zypact is an AI visibility platform that helps B2B brands track and improve their presence in ChatGPT, Perplexity, and Gemini. Start for free →

Is your site blocking AI crawlers?

Free audit — tests all 8 LLM bots in 20 seconds. No signup required.

Check your site for free →