Close Menu
SkytikSkytik

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    SkytikSkytik
    • Home
    • AI Tools
    • Online Tools
    • Tech News
    • Guides
    • Reviews
    • SEO & Marketing
    • Social Media Tools
    SkytikSkytik
    Home»SEO & Marketing»Complete Crawler List For AI User-Agents [Dec 2025]
    SEO & Marketing

    Complete Crawler List For AI User-Agents [Dec 2025]

    AwaisBy AwaisDecember 6, 2025No Comments7 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    Complete Crawler List For AI User-Agents [Dec 2025]
    Share
    Facebook Twitter LinkedIn Pinterest Email

    AI visibility plays a crucial role for SEOs, and this starts with controlling AI crawlers. If AI crawlers can’t access your pages, you’re invisible to AI discovery engines.

    On the flip side, unmonitored AI crawlers can overwhelm servers with excessive requests, causing crashes and unexpected hosting bills.

    User-agent strings are essential for controlling which AI crawlers can access your website, but official documentation is often outdated, incomplete, or missing entirely. So, we curated a verified list of AI crawlers from our actual server logs as a useful reference.

    Every user-agent is validated against official IP lists when available, ensuring accuracy. We will maintain and update this list to catch new crawlers and changes to existing ones.

    The Complete Verified AI Crawler List (December 2025)

    NamePurposeCrawl Rate of SEJ (pages/hour)Verified IP ListRobots.txt disallowComplete User Agent
    GPTBotAI training data collection for GPT models (ChatGPT, GPT-4o)100Official IP ListUser-agent: GPTBot
    Allow: /
    Disallow: /private-folder
    Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.3; +https://openai.com/gptbot)
    ChatGPT-UserAI agent for real-time web browsing when users interact with ChatGPT2400Official IP ListUser-agent: ChatGPT-User
    Allow: /
    Disallow: /private-folder
    Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot
    OAI-SearchBotAI search indexing for ChatGPT search features (not for training)150Official IP ListUser-agent: OAI-SearchBot
    Allow: /
    Disallow: /private-folder
    Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36; compatible; OAI-SearchBot/1.3; +https://openai.com/searchbot
    ClaudeBotAI training data collection for Claude models500Official IP ListUser-agent: ClaudeBot
    Allow: /
    Disallow: /private-folder
    Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
    Claude-UserAI agent for real-time web access when Claude users browse<10Not availableUser-agent: Claude-User
    Disallow: /sample-folder
    Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Claude-User/1.0; +Claude-User@anthropic.com)
    Claude-SearchBotAI search indexing for Claude search capabilities<10Not availableUser-agent: Claude-SearchBot
    Allow: /
    Disallow: /private-folder
    Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Claude-SearchBot/1.0; +https://www.anthropic.com)
    Google-CloudVertexBotAI agent for Vertex AI Agent Builder (site owners’ request only)<10Official IP ListUser-agent: Google-CloudVertexBot
    Allow: /
    Disallow: /private-folder
    Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.7390.122 Mobile Safari/537.36 (compatible; Google-CloudVertexBot; +https://cloud.google.com/enterprise-search)
    Google-ExtendedToken controlling AI training usage of Googlebot-crawled content.User-agent: Google-Extended
    Allow: /
    Disallow: /private-folder
    Gemini-Deep-ResearchAI research agent for Google Gemini’s Deep Research feature<10Official IP ListUser-agent: Gemini-Deep-Research
    Allow: /
    Disallow: /private-folder
    Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Gemini-Deep-Research; +https://gemini.google/overview/deep-research/) Chrome/135.0.0.0 Safari/537.36
    Google Gemini’s chat when a user asks to open a webpage<10Google
    BingbotPowers Bing Search and Bing Chat (Copilot) AI answers1300Official IP ListUser-agent: BingBot
    Allow: /
    Disallow: /private-folder
    Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36
    Applebot-ExtendedDoesn’t crawl but controls how Apple uses Applebot data.<10Official IP ListUser-agent: Applebot-Extended
    Allow: /
    Disallow: /private-folder
    Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot)
    PerplexityBotAI search indexing for Perplexity’s answer engine150Official IP ListUser-agent: PerplexityBot
    Allow: /
    Disallow: /private-folder
    Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)
    Perplexity-UserAI agent for real-time browsing when Perplexity users request information<10Official IP ListUser-agent: Perplexity-User
    Allow: /
    Disallow: /private-folder
    Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Perplexity-User/1.0; +https://perplexity.ai/perplexity-user)
    Meta-ExternalAgentAI training data collection for Meta’s LLMs (Llama, etc.)1100Not availableUser-agent: meta-externalagent
    Allow: /
    Disallow: /private-folder
    meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
    Meta-WebIndexerUsed to improve Meta AI search.<10Not availableUser-agent: Meta-WebIndexer
    Allow: /
    Disallow: /private-folder
    meta-webindexer/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
    BytespiderAI training data for ByteDance’s LLMs for products like TikTok<10Not availableUser-agent: Bytespider
    Allow: /
    Disallow: /private-folder
    Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; https://zhanzhang.toutiao.com/)
    AmazonbotAI training for Alexa and other Amazon AI services1050Not availableUser-agent: Amazonbot
    Allow: /
    Disallow: /private-folder
    Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot) Chrome/119.0.6045.214 Safari/537.36
    DuckAssistBotAI search indexing for DuckDuckGo search engine20Official IP ListUser-agent: DuckAssistBot
    Allow: /
    Disallow: /private-folder
    DuckAssistBot/1.2; (+http://duckduckgo.com/duckassistbot.html)
    MistralAI-UserMistral’s real-time citation fetcher for “Le Chat” assistant<10Not availableUser-agent: MistralAI-User
    Allow: /
    Disallow: /private-folder
    Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; MistralAI-User/1.0; +https://docs.mistral.ai/robots)
    Webz.ioData extraction and web scraping used by other AI training companies. Formerly known as Omgili.<10Not availableUser-agent: webzio
    Allow: /
    Disallow: /private-folder
    webzio (+https://webz.io/bot.html)
    DiffbotData extraction and web scraping used by companies all over the world.<10Not availableUser-agent: Diffbot
    Allow: /
    Disallow: /private-folder
    Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729; Diffbot/0.1; +http://www.diffbot.com)
    ICC-CrawlerAI and machine learning data collection<10Not availableUser-agent: ICC-Crawler
    Allow: /
    Disallow: /private-folder
    ICC-Crawler/3.0 (Mozilla-compatible; ; https://ucri.nict.go.jp/en/icccrawler.html)
    CCBotOpen-source web archive used as training data by multiple AI companies<10Official IP ListUser-agent: CCBot
    Allow: /
    Disallow: /private-folder
    CCBot/2.0 (https://commoncrawl.org/faq/)

    The user-agent strings above have all been verified against Search Engine Journal server logs.

    Popular AI Agent Crawlers With Unidentifiable User Agent

    We’ve found that the following didn’t identify themselves:

    • you.com.
    • ChatGPT’s agent Operator.
    • Bing’s Copilot chat.
    • Grok.
    • DeepSeek.

    There is no way to track this crawler from accessing webpages other than by identifying the explicit IP.

    We set up a trap page (e.g., /specific-page-for-you-com/) and used the on-page chat to prompt you.com to visit it, allowing us to locate the corresponding visit record and IP address in our server logs. Below is the screenshot:

    Screenshot by author, December 2025

    What About Agentic AI Browsers?

    Unfortunately, AI browsers such as Comet or ChatGPT’s Atlas don’t differentiate themselves in the user agent string, and you can’t identify them in server logs and blend with normal users’ visits.

    Chatgpt's Atlas browser user agetn string from server logs records
    ChatGPT’s Atlas browser user agent string from server logs records (Screenshot by author, December 2025)

    This is disappointing for SEOs because tracking agentic browser visits to a website is important for reporting POV.

    How To Check What’s Crawling Your Server

    Some hosting companies offer a user interface (UI) that makes it easy to access and look at server logs, depending on what hosting service you are using.

    If your hosting doesn’t offer this, you can get server log files (usually located  /var/log/apache2/access.log in Linux-based servers) via FTP or request it from your server support to send it to you.

    Once you have the log file, you can view and analyze it in either Google Sheets (if the file is in CSV format), Screaming Frog’s log analyzer, or, if your log file is less than 100 MB, you can try analyzing it with Gemini AI.

    How To Verify Legitimate Vs. Fake Bots

    Fake crawlers can spoof legitimate user agents to bypass restrictions and scrape content aggressively. For example, anyone can impersonate ClaudeBot from their laptop and initiate crawl request from the terminal. In your server log, you will see it as Claudebot is crawling it:

    curl -A 'Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)' https://example.com

    Verification can help to save server bandwidth and prevent harvesting content illegally. The most reliable verification method you can apply is checking the request IP.

    Check all IPs and scan to match if it’s one of the officially declared IPs listed above. If so, you can allow the request; otherwise, block.

    Various types of firewalls can help you with this via allowlist verified IPs (which allows legitimate bot requests to pass through), and all other requests impersonating AI crawlers in their user agent strings are blocked.

    For example, in WordPress, you can use Wordfence free plugin to allowlist legitimate IPs from the official lists (as above) and add blocking custom rules as below:

    Allowlist IP setting in Wordfence
    Block User agent setting in Wordfance
    Block User agent setting in Wordfence

    The allowlist rule is superior, and it will let legitimate crawlers pass through and block any impersonation request which comes from different IPs.

    However, please note that it is possible to spoof an IP address, and in that case, when bot user agent and IPs are spoofed, you won’t be able to block it.

    Conclusion: Stay In Control Of AI Crawlers For Reliable AI Visibility

    AI crawlers are now part of our web ecosystem, and the bots listed here represent the major AI platforms currently indexing the web, although this list is likely to grow.

    Check your server logs regularly to see what’s actually hitting your site and make sure you inadvertently don’t block AI crawlers if visibility in AI search engines is important for your business. If you don’t want AI crawlers to access your content, block them via robots.txt using the user-agent name.

    We’ll keep this list updated as new crawlers emerge and update existing ones, so we recommend you bookmark this URL, or revisit this article on a regular basis to keep your AI crawler list up to date.

    More Resources:


    Featured Image: BestForBest/Shutterstock

    Complete Crawler Dec list UserAgents
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Awais
    • Website

    Related Posts

    Google expands Personal Intelligence to AI Mode, Gemini, Chrome

    March 18, 2026

    Google AI Overviews Cut Germany’s Top Organic CTR By 59%

    March 18, 2026

    Google says AI Mode stays ad-free for Personal Intelligence users

    March 18, 2026

    Search Referral Traffic Down 60% For Small Publishers, Data Shows

    March 18, 2026

    Google adds video visibility to Performance Max reporting

    March 18, 2026

    Google Removes ‘What People Suggest,’ Expands Health AI Tools

    March 17, 2026
    Leave A Reply Cancel Reply

    Top Posts

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 20250 Views

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 20250 Views

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 20250 Views
    Don't Miss

    Bridging Facts for Cross-Document Reasoning at Index Time

    March 18, 2026

    arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly…

    Google expands Personal Intelligence to AI Mode, Gemini, Chrome

    March 18, 2026

    Google AI Overviews Cut Germany’s Top Organic CTR By 59%

    March 18, 2026

    SpecMoE: Spectral Mixture-of-Experts Foundation Model for Cross-Species EEG Decoding

    March 18, 2026
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Bridging Modality Gap with Temporal Evolution Semantic Space

    March 18, 2026

    How to Effectively Review Claude Code Output

    March 18, 2026
    Most Popular

    13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

    November 18, 20257 Views

    How to watch the 2026 GRAMMY Awards online from anywhere

    February 1, 20263 Views

    Corporate Reputation Management Strategies | Sprout Social

    November 19, 20252 Views
    Our Picks

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Disclaimer

    © 2025 skytik.cc. All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.