Scrapers Directory & Bot Database

Browse 50 active Scrapers in our database. Get detailed profiles, copy robots.txt rules, and check if your URL is allowed or blocked by them.

Viewed by category:
Viewed by safety:

Currently viewing 12 of 50 crawlers and bots

What do these safety ratings mean?

All Filter by safety to see specific recommendations for each category.

Scrapers

Amazon Kendra is an intelligent enterprise search service powered by machine learning.

Scrapers

The web crawler for Arquivo.pt, the Portuguese web archive, preserving the history of the Portuguese web.

BLEXBot
Unsafe
Scrapers

BLEXBot is a crawler for WebMeUp, an SEO tool. It is often considered aggressive and of low value to webmasters.

Barkrowler
Caution
Scrapers

Barkrowler is a crawler used by various research projects and datasets.

Bravest
Safe
Scrapers

The web crawler for Brave Software, used to support Brave Search and AI initiatives.

Bytespider
Unsafe
Scrapers

The aggressive web crawler operated by ByteDance (parent company of TikTok). It is known for high crawl rates and is pri...

CCBot
Caution
Scrapers

CCBot is the crawler for Common Crawl, a non-profit that scrapes the web to provide open datasets. While useful for rese...

Cotoyogi
Unsafe
Scrapers

A crawler service, details are sparse but often seen in server logs.

Crawlspace
Caution
Scrapers

A crawler associated with digital archiving or data preservation.

Diffbot
Caution
Scrapers

Diffbot uses computer vision and NLP to extract structured data from web pages.

Echobot Bot
Caution
Scrapers

The web crawler for Echobot (now Dealfront), a sales intelligence platform used to gather business data.

EchoboxBot
Caution
Scrapers

EchoboxBot is used by Echobox, a social publishing automation tool for publishers.

Check URL for Scrapers

Verify if the crawlers currently in view are allowed or blocked on a specific URL.

https://

Enter URL to check score.

0% Blocked

⚠️ Caution: Advanced Configuration

Modifying your robots.txt file effectively controls who can access your website. Incorrect rules can accidentally de-index your entire site from Search Engines like Google. This tool generates valid syntax rules based on your selection. It does not analyze your specific website needs.

We strongly suggest testing any changes in Google Search Console or with CrawlerCheck before deploying to production.

Generated robots.txt snippet for the currently viewed bots 50

Select one of the options below to Disallow or Allow the bots.

Generating rules to BLOCK all bots currently in the list.

Review the list above. We recommend blocking bots marked as 'Unsafe' and carefully evaluating the bots marked as 'Caution'.

This is a live generated robots.txt based on the filters you selected above.

User-agent: amazon-kendra
Disallow: /
User-agent: arquivo-web-crawler
Disallow: /
User-agent: blexbot
Disallow: /
User-agent: barkrowler
Disallow: /
User-agent: bravest
Disallow: /
User-agent: bytespider
Disallow: /
User-agent: ccbot
Disallow: /
User-agent: cotoyogi
Disallow: /
User-agent: crawlspace
Disallow: /
User-agent: diffbot
Disallow: /
User-agent: echobot-bot
Disallow: /
User-agent: echoboxbot
Disallow: /
User-agent: factset-spyderbot
Disallow: /
User-agent: friendlycrawler
Disallow: /
User-agent: icc-crawler
Disallow: /
User-agent: isscyberriskcrawler
Disallow: /
User-agent: imagesiftbot
Disallow: /
User-agent: jenkersbot
Disallow: /
User-agent: kangaroo-bot
Disallow: /
User-agent: livelapbot
Disallow: /
User-agent: mauibot
Disallow: /
User-agent: moodlebot
Disallow: /
User-agent: newsnow
Disallow: /
User-agent: novaact
Disallow: /
User-agent: pangubot
Disallow: /
User-agent: poseidon-research-crawler
Disallow: /
User-agent: qualifiedbot
Disallow: /
User-agent: scrapy
Disallow: /
User-agent: seekportbot
Disallow: /
User-agent: seekr
Disallow: /
User-agent: seekrbot
Disallow: /
User-agent: taragroup-intelligent-bot
Disallow: /
User-agent: timpibot
Disallow: /
User-agent: turnitin
Disallow: /
User-agent: velenpublicwebcrawler
Disallow: /
User-agent: webzio-extended
Disallow: /
User-agent: coccocbot-web
Disallow: /
User-agent: crawler4j
Disallow: /
User-agent: hada-news
Disallow: /
User-agent: iaskspider
Disallow: /
User-agent: iaskspider-2-0
Disallow: /
User-agent: imediaethics-org
Disallow: /
User-agent: imgproxy
Disallow: /
User-agent: magpie-crawler
Disallow: /
User-agent: netestate-imprint-crawler
Disallow: /
User-agent: news-please
Disallow: /
User-agent: omgili
Disallow: /
User-agent: omgilibot
Disallow: /
User-agent: yacy
Disallow: /
User-agent: yacybot
Disallow: /

Copy and paste these rules into your website's robots.txt file to block the identified bots.

Resource & Impact Analysis

Managing bot traffic is about more than just security. It's about optimizing your infrastructure and protecting your digital assets. Unchecked crawler activity can have significant downstream effects on your website's performance and business metrics.

📉 Server Load & Bandwidth

Every request from a bot consumes CPU cycles, RAM, and bandwidth. Aggressive scrapers can simulate a DDoS attack, slowing down your site for real human users and increasing your hosting costs, especially on metered cloud platforms.

💰 Crawl Budget Waste

Search engines like Google assign a "Crawl Budget" to your site. A limit on how many pages they will crawl in a given timeframe. If low-value bots clog your server queues, Googlebot may reduce its crawl rate, delaying the indexing of your new content.

🤖 AI & Data Privacy

Modern AI bots (like GPTBot and CCBot) scrape your content to train Large Language Models. While not malicious, they use your intellectual property without providing traffic back. Blocking them allows you to opt-out of having your data used for AI training.

🕵️ Competitive Intelligence

Many "SEO Tools" and commercial scrapers are used by competitors to monitor your pricing, copy your content strategy, or analyze your site structure. Restricting these bots protects your business intelligence.

Understanding Web Crawlers & Bots

Web crawlers (also known as spiders or bots) are automated software programs that browse the internet. CrawlerCheck classifies them into distinct categories to help you decide which ones to allow and which to block.

Search Engines Bots

Bots like Googlebot and Bingbot are essential for your website's visibility. They index your content so it appears in search results. Blocking these will remove your site from search engines.

AI Data Scrapers

Bots like GPTBot (OpenAI), ClaudeBot (Anthropic) and PerplexityBot (PerplexityAI) crawl the web to collect data for training Large Language Models (LLMs). Blocking them prevents your content from being used to train AI, but does not affect your search rankings.

SEO Tools & Scrapers

Marketing tools like Ahrefs and Semrush scan your site to analyze backlinks and SEO health. While useful for SEO audits, aggressive scrapers can consume server bandwidth and impact performance.

Featured & Supported

We are proud to be featured on major platforms! Support CrawlerCheck by checking out our listings below and helping us spread the word.

CrawlerCheck - Instantly see if you're blocking search engines or AI bots | Product Hunt CrawlerCheck - Featured on Startup Fame Featured on toolfame.com