The Claudebot Split: Why Your Robots.txt is Probably Killing Your AI Visibility
- Adam Gold
- May 26
- 6 min read
TL;DR: In May 2026, Anthropic split its web crawler into three distinct agents: ClaudeBot, Claude-User, and Claude-SearchBot. Many UK SMEs are accidentally blocking their AI visibility by using outdated "blanket" robots.txt rules. To secure citations in Claude’s search results, you must allow Claude-SearchBot and Claude-User while selectively managing ClaudeBot. This article explains how to configure your site to feed AI engines the right signals while protecting your training data.
Key Takeaways
The Trinity: Anthropic now uses three separate bots; blocking the wrong one kills your visibility in Claude's search answers.
Selective Access: Use specific robots.txt directives to allow search indexing while blocking AI model training.
IGS > Keywords: AI engines prioritise an Information Gain Score (IGS) over simple keyword density.
llms.txt Limitations: Currently, llms.txt is a developer preference file, not a primary ranking or citation signal.
Strategic Advantage: Only 14% of businesses track AI visibility, early adopters can claim the majority of local AI search citations.
What is the Anthropic crawler split of May 2026?
Anthropic’s recent update has fragmented their web crawling activity into three specialised user-agents: ClaudeBot for training, Claude-User for live user requests, and Claude-SearchBot for search indexing. This allows website owners to granularly decide if they want their content to train future models or simply appear as a cited source in live AI search results.
As a specialist in ai seo services, I’ve seen first-hand how this shift is reshaping the landscape. For years, we treated "Claude" as a single entity. Now, it’s a tiered system. If you want your business to be the one Claude recommends when a user asks for a "web designer in Bedford," you have to understand which door you’re leaving open.
Defining the Three New User-Agents
ClaudeBot (Training): This bot collects data specifically to train and refine Anthropic's future Large Language Models (LLMs). It does not directly drive citations.
Claude-User (Live Retrieval): Triggered when a user asks a specific question. It fetches your page "on-demand" to provide a real-time answer with a link.
Claude-SearchBot (Search Indexer): The backbone of Claude’s native search engine. It indexes your site so Claude knows you exist before a user even asks.
Why are blanket robots.txt rules hiding your site from Claude?
Many businesses use a blanket "Disallow: /" for the User-agent "Anthropic" or "ClaudeBot," which often inadvertently blocks the newer search and retrieval agents. This "scorched earth" approach effectively makes your website invisible to Claude’s search capabilities, preventing your content from being cited as a trustworthy source in AI-generated answers.
I feel your pain, managing a robots.txt file used to be a "set and forget" task. But in 2026, it’s the frontline of your seo services for startups strategy. If you block everything, you aren't just protecting your data; you are opting out of the new search economy.
The Risk of the "All or Nothing" Approach
When you block the entire Anthropic family, you lose:
Citation Traffic: High-intent users who click the links inside Claude's answers.
Brand Authority: If your competitor is cited and you aren't, the AI implicitly flags them as the "expert."
Accuracy: Without access to your site, Claude might rely on outdated or third-party data to describe your services.

How to implement the perfect robots.txt for AI SEO?
To maximize AI visibility while controlling data usage, you should explicitly allow Claude-SearchBot and Claude-User while disallowing ClaudeBot if you wish to opt-out of training. This setup ensures that your business remains eligible for citations and search inclusion without contributing your intellectual property to the model's underlying training set.
Here is the technical snippet I recommend for our web design and seo services clients:
# Block the training bot (Optional)
User-agent: ClaudeBot
Disallow: /
# ALLOW the citation and search bots (Critical for GEO)
User-agent: Claude-User
Allow: /
User-agent: Claude-SearchBot
Allow: /
This simple distinction tells Anthropic: "You can show my business to your users, but you can't use my proprietary insights to build your next model for free." It’s a proactive way to maintain control over your digital assets.
Why is llms.txt a dev-tool rather than a citation signal?
While llms.txt has gained popularity as a way to provide AI-friendly summaries, it currently serves primarily as a "developer preference" file rather than a primary ranking or citation signal. AI search engines still rely on robots.txt for access control and sophisticated indexing crawlers to determine the relevance and authority of a page for a specific query.
A lot of the "insider" chatter is deluged with talk about llms.txt. Think of it as a brochure left on the front desk, it’s helpful if a developer is looking for a quick summary, but the "inspectors" (the crawlers) want to see the whole building. You shouldn't ignore it, but you certainly shouldn't rely on it to drive your AI visibility.
The Role of llms.txt in 2026
Human-in-the-loop: It helps developers understand how to interact with your data.
Summarisation: It can provide a "shortcut" for AI agents, but it doesn't replace the need for high-quality, structured content.
Contextual Guardrails: It’s a great place to state your brand’s "rules of engagement" for AI systems.
Mastering the AI mix: Cosine Similarity vs Information Gain Score
AI visibility in 2026 is governed by two core concepts: Cosine Similarity (how well your content matches the user’s intent) and Information Gain Score (how much unique, new value you provide compared to other sources). To get cited, your content must be semantically close to the question but uniquely different from the "sea of sameness" found in top search results.
Cosine Similarity is a mathematical measure used by AI to plot your content on a multi-dimensional map. If a user asks a question, the AI looks for the content that sits "closest" to that query.
Information Gain Score (IGS) is the new king of the hill. If ten websites say the same thing, the AI will likely cite the one that adds a new perspective, a unique case study, or original data. This is why our AI SEO services focus on "Experience" and "Expertise", the human elements that AI can't just hallucinate.
How to achieve High Information Gain
Original Data: Share your own benchmarks or industry findings.
Contrarian Views: Challenge the "standard" advice with real-world examples.
Deep Specificity: Don't just say "we do SEO"; explain exactly how you handled the ClaudeBot split for a local UK startup.

The FCWD Quality Challenge: Half-Term Special
Since it's half-term here in the UK, I know many business owners are using this "quiet" week to catch up on the admin and marketing tasks that usually get pushed to the side.
If you're looking to supercharge your digital presence, I’m offering a special opportunity. Use the promo code FREEPROMO26 to get a free AI Visibility Audit. We’ll look at your robots.txt, check your citation rates across Claude and ChatGPT, and show you exactly where your competitors are outmanoeuvring you. Don't let the technical shifts of 2026 leave you behind while you're busy running your business.
About the Author
Adam Gold BA FCMI Adam Gold is the founder of Full Circle Website Design Ltd, a UK digital marketing consultancy specialising in website design, SEO, and AI search visibility (GEO). As a Fellow of the Chartered Management Institute, Adam combines strategic marketing expertise with technical delivery. He helps UK SMEs transition from traditional search to the AI-first era, ensuring their businesses are not just ranked, but cited by the tools their customers are using every day. Follow Adam on LinkedIn.
Frequently Asked Questions
Does blocking ClaudeBot stop me from appearing in Claude's search? No, provided you specifically allow Claude-SearchBot and Claude-User. Blocking ClaudeBot only prevents your content from being used to train Anthropic's future AI models; it does not remove you from their live search results if the other agents are permitted.
How do I know if my site has high Information Gain? Content with high Information Gain typically includes original research, unique case studies, or perspectives not found in the top 10 search results. If your page provides the same facts as your competitors in a similar order, your Information Gain Score will likely be low.
Is llms.txt more important than robots.txt? No. In 2026, robots.txt remains the legally and technically enforceable way to control AI crawlers. While llms.txt is useful for providing AI-friendly summaries and instructions, it is an advisory file and does not replace the access controls found in robots.txt.
What is the "Claude-User" bot used for? The Claude-User user-agent is used for real-time web retrieval. When a user asks Claude a question that requires current information, this bot fetches the relevant pages to provide an answer with citations, directly driving referral traffic to your site.
Sources

