Web Crawler Directory & Bot Database

A database of 160 active web crawlers, AI bots, SEO tools, Social Bots, Scrapers used by the CrawlerCheck core engine. Get detailed profiles, copy robots.txt rules, and check if your URL is allowed or blocked by them.

Currently viewing 12 of 160 crawlers and bots

What do these safety ratings mean?

Select a safety category to see specific descriptions and recommendations for each safety badge.

AI2Bot

Caution

AI Bots Allen Institute for AI

AI2Bot is the crawler for the Allen Institute for AI (AI2). It collects data for open-source AI research and datasets.

Ai2Bot-Dolma

Caution

AI Bots Allen Institute for AI

A specific AI2 crawler used to build the Dolma dataset, a massive open corpus for training language models.

Big Sur AI

Caution

AI Bots Big Sur AI

The web crawler for Big Sur AI, an e-commerce AI platform used to gather product data and market insights.

ChatGPT Operator

Safe

AI Bots OpenAI

An agent associated with ChatGPT's operational tasks or specific plugin interactions.

ChatGPT-User

Safe

AI Bots OpenAI

ChatGPT-User is the user-agent used when a human user of ChatGPT explicitly asks the AI to browse a specific webpage.

ChatGPT-User/2.0

Safe

AI Bots OpenAI

Version 2.0 of the ChatGPT-User agent, representing updated browsing capabilities of the ChatGPT model.

Claude-Code

Safe New

AI Bots Anthropic

Terminal-based agent for deep site and system management.

Claude-SearchBot

Safe

AI Bots Anthropic

Claude-SearchBot is a crawler used to index content specifically for search-related features within the Claude ecosystem...

Claude-User

Safe

AI Bots Anthropic

Similar to Claude-Web, this agent represents direct user-initiated browsing requests from the Claude interface.

Claude-Web

Safe

AI Bots Anthropic

Claude-Web is used when a Claude AI user asks the model to visit a specific URL to answer a question.

ClaudeBot

Safe

AI Bots Anthropic

The main crawler for Anthropic. It scrapes the web to build the training dataset for the Claude family of AI models.

DeepSeekBot

Aggressive New

AI Bots DeepSeek

The web crawler for DeepSeek, a Chinese AI lab known for high-performance open models. It is often opaque and aggressive...

Currently viewing 12 of 160 crawlers and bots

What do these safety ratings mean?

Select a safety category to see specific descriptions and recommendations for each safety badge.

Check URL for 160 crawlers and bots

Verify if 160 crawlers and bots are allowed or disallowed on a specific URL.

0% Blocked

⚠️ Caution: Advanced Configuration

Modifying your robots.txt file effectively controls who can access your website. Incorrect rules can accidentally de-index your entire site from Search Engines like Google. This tool generates valid syntax rules based on your selection. It does not analyze your specific website needs.

We strongly suggest testing any changes in Google Search Console or with CrawlerCheck before deploying to production.

How to Block or Allow bots?

Generate your robots.txt snippet by selecting one of the options below to Disallow or Allow rules for 160 bots.

Updated DISALLOW rules for 160 bots currently in the CrawlerCheck Directory.

Review the below snippet. We recommend blocking bots marked as 'Aggressive' and carefully evaluating the bots marked as 'Caution'.

This is a live generated robots.txt snippet based on the filter options currently active. Go back to the start of the page to select different options.

User-agent: ai2bot
Disallow: /
User-agent: ai2bot-dolma
Disallow: /
User-agent: big-sur-ai
Disallow: /
User-agent: chatgpt-operator
Disallow: /
User-agent: chatgpt-user
Disallow: /
User-agent: chatgpt-user-2-0
Disallow: /
User-agent: claude-code
Disallow: /
User-agent: claude-searchbot
Disallow: /
User-agent: claude-user
Disallow: /
User-agent: claude-web
Disallow: /
User-agent: claudebot
Disallow: /
User-agent: deepseekbot
Disallow: /
User-agent: digitaloceangenai-crawler
Disallow: /
User-agent: duckassistbot
Disallow: /
User-agent: gptbot
Disallow: /
User-agent: google-extended
Disallow: /
User-agent: grok
Disallow: /
User-agent: grokbot
Disallow: /
User-agent: liner-bot
Disallow: /
User-agent: mistralai-user
Disallow: /
User-agent: mistralai-user-1-0
Disallow: /
User-agent: mycentralaiscraperbot
Disallow: /
User-agent: oai-searchbot
Disallow: /
User-agent: pangubot
Disallow: /
User-agent: perplexity-user
Disallow: /
User-agent: perplexity-user-1-0
Disallow: /
User-agent: perplexitybot
Disallow: /
User-agent: sbintuitionsbot
Disallow: /
User-agent: youbot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: cohere-ai
Disallow: /
User-agent: cohere-training-data-crawler
Disallow: /
User-agent: img2dataset
Disallow: /
User-agent: quillbot-com
Disallow: /
User-agent: aliyunsecbot
Disallow: /
User-agent: amazonbot
Disallow: /
User-agent: google-cloudvertexbot
Disallow: /
User-agent: gemini-deep-research
Disallow: /
User-agent: google-inspectiontool
Disallow: /
User-agent: googleother
Disallow: /
User-agent: googleother-image
Disallow: /
User-agent: googleother-video
Disallow: /
User-agent: googlebot-discovery
Disallow: /
User-agent: googlebot-image
Disallow: /
User-agent: googlebot-news
Disallow: /
User-agent: googlebot-video
Disallow: /
User-agent: storebot-google
Disallow: /
User-agent: turnitinbot
Disallow: /
User-agent: viennatinybot
Disallow: /
User-agent: archive-org-bot
Disallow: /
User-agent: ia-archiver
Disallow: /
User-agent: ia-archiver-web-archive-org
Disallow: /
User-agent: peer39-crawler-1-0
Disallow: /
User-agent: ahrefsbot
Disallow: /
User-agent: audigentadbot
Disallow: /
User-agent: awariorssbot
Disallow: /
User-agent: awariosmartbot
Disallow: /
User-agent: brandwatch
Disallow: /
User-agent: chrome-lighthouse
Disallow: /
User-agent: dataforseobot
Disallow: /
User-agent: dotbot
Disallow: /
User-agent: google-page-speed-insights
Disallow: /
User-agent: jetslide
Disallow: /
User-agent: mj12bot
Disallow: /
User-agent: meltwater
Disallow: /
User-agent: neticlebot
Disallow: /
User-agent: netvibes
Disallow: /
User-agent: screaming-frog-seo-spider
Disallow: /
User-agent: searchmetricsbot
Disallow: /
User-agent: semrushbot
Disallow: /
User-agent: semrushbot-ocob
Disallow: /
User-agent: semrushbotswa
Disallow: /
User-agent: sidetrade-indexer-bot
Disallow: /
User-agent: peer39-crawler
Disallow: /
User-agent: amazon-kendra
Disallow: /
User-agent: arquivo-web-crawler
Disallow: /
User-agent: blexbot
Disallow: /
User-agent: barkrowler
Disallow: /
User-agent: bravest
Disallow: /
User-agent: bytespider
Disallow: /
User-agent: ccbot
Disallow: /
User-agent: cotoyogi
Disallow: /
User-agent: crawl4ai
Disallow: /
User-agent: crawlspace
Disallow: /
User-agent: diffbot
Disallow: /
User-agent: echobot-bot
Disallow: /
User-agent: echoboxbot
Disallow: /
User-agent: factset-spyderbot
Disallow: /
User-agent: firecrawl
Disallow: /
User-agent: friendlycrawler
Disallow: /
User-agent: icc-crawler
Disallow: /
User-agent: isscyberriskcrawler
Disallow: /
User-agent: imagesiftbot
Disallow: /
User-agent: jenkersbot
Disallow: /
User-agent: kangaroo-bot
Disallow: /
User-agent: livelapbot
Disallow: /
User-agent: mauibot
Disallow: /
User-agent: moodlebot
Disallow: /
User-agent: newsnow
Disallow: /
User-agent: novaact
Disallow: /
User-agent: poseidon-research-crawler
Disallow: /
User-agent: qualifiedbot
Disallow: /
User-agent: scrapy
Disallow: /
User-agent: seekportbot
Disallow: /
User-agent: seekr
Disallow: /
User-agent: seekrbot
Disallow: /
User-agent: taragroup-intelligent-bot
Disallow: /
User-agent: timpibot
Disallow: /
User-agent: turnitin
Disallow: /
User-agent: velenpublicwebcrawler
Disallow: /
User-agent: webzio-extended
Disallow: /
User-agent: coccocbot-web
Disallow: /
User-agent: crawler4j
Disallow: /
User-agent: hada-news
Disallow: /
User-agent: iaskspider
Disallow: /
User-agent: iaskspider-2-0
Disallow: /
User-agent: imediaethics-org
Disallow: /
User-agent: imgproxy
Disallow: /
User-agent: magpie-crawler
Disallow: /
User-agent: netestate-imprint-crawler
Disallow: /
User-agent: news-please
Disallow: /
User-agent: omgili
Disallow: /
User-agent: omgilibot
Disallow: /
User-agent: yacy
Disallow: /
User-agent: yacybot
Disallow: /
User-agent: amzn-searchbot
Disallow: /
User-agent: applebot
Disallow: /
User-agent: applebot-extended
Disallow: /
User-agent: aspiegelbot
Disallow: /
User-agent: baiduspider
Disallow: /
User-agent: bingbot
Disallow: /
User-agent: duckduckbot
Disallow: /
User-agent: googlebot
Disallow: /
User-agent: mojeek
Disallow: /
User-agent: mojeekbot
Disallow: /
User-agent: petalbot
Disallow: /
User-agent: seznambot
Disallow: /
User-agent: seznamhomepagecrawler
Disallow: /
User-agent: slurp
Disallow: /
User-agent: teoma
Disallow: /
User-agent: yahoo-blogs
Disallow: /
User-agent: yahoo-feedseeker
Disallow: /
User-agent: yahoo-mmcrawler
Disallow: /
User-agent: yahooseeker
Disallow: /
User-agent: yandex
Disallow: /
User-agent: yandexadditional
Disallow: /
User-agent: yandexadditionalbot
Disallow: /
User-agent: yandexbot
Disallow: /
User-agent: baidu
Disallow: /
User-agent: facebookbot
Disallow: /
User-agent: linkedinbot
Disallow: /
User-agent: meta-externalagent
Disallow: /
User-agent: meta-externalfetcher
Disallow: /
User-agent: pinterestbot
Disallow: /
User-agent: quora-bot
Disallow: /
User-agent: slackbot
Disallow: /
User-agent: twitterbot
Disallow: /
User-agent: facebookexternalhit
Disallow: /
User-agent: meta-externalagent-lowercase
Disallow: /
User-agent: meta-externalfetcher-lowercase
Disallow: /

Steps:

Copy the snippet and update your live website's robots.txt file to block the identified bots.
Go back to the URL Checker and enter URL to check for the updated statuses.

If changes are not successfull:

Your robots.txt was not updated correctly, or not updated yet. Wait a couple of minutes.
Manually verify your robots.txt live link and confirm that the changes are visible.
Go back to URL Checker and verify your URL again for the updated statuses.

Resource & Impact Analysis

Managing bot traffic is about more than just security. It's about optimizing your infrastructure and protecting your digital assets. Unchecked crawler activity can have significant downstream effects on your website's performance and business metrics.

📉 Server Load & Bandwidth

Every request from a bot consumes CPU cycles, RAM, and bandwidth. Aggressive scrapers can simulate a DDoS attack, slowing down your site for real human users and increasing your hosting costs, especially on metered cloud platforms.

💰 Crawl Budget Waste

Search engines like Google assign a "Crawl Budget" to your site. A limit on how many pages they will crawl in a given timeframe. If low-value bots clog your server queues, Googlebot may reduce its crawl rate, delaying the indexing of your new content.

🤖 AI & Data Privacy

Modern AI bots (like GPTBot and CCBot) scrape your content to train Large Language Models. While not malicious, they use your intellectual property without providing traffic back. Blocking them allows you to opt-out of having your data used for AI training.

🕵️ Competitive Intelligence

Many "SEO Tools" and commercial scrapers are used by competitors to monitor your pricing, copy your content strategy, or analyze your site structure. Restricting these bots protects your business intelligence.

Understanding Web Crawlers & Bots

Web crawlers (also known as spiders or bots) are automated software programs that browse the internet. CrawlerCheck classifies them into distinct categories to help you decide which ones to allow and which to block.

Search Engines Bots

Bots like Googlebot and Bingbot are essential for your website's visibility. They index your content so it appears in search results. Blocking these will remove your site from search engines.

AI Data Scrapers

Bots like GPTBot (OpenAI), ClaudeBot (Anthropic) and PerplexityBot (PerplexityAI) crawl the web to collect data for training Large Language Models (LLMs). Blocking them prevents your content from being used to train AI, but does not affect your search rankings.

SEO Tools & Scrapers

Marketing tools like Ahrefs and Semrush scan your site to analyze backlinks and SEO health. While useful for SEO audits, aggressive scrapers can consume server bandwidth and impact performance.

Featured & Supported

We are proud to be featured on major platforms! Support CrawlerCheck by checking out our listings below and helping us spread the word.