User Agent Classifier API

Classify HTTP User-Agent strings to determine if traffic is human, bot, or AI/LLM crawler.

Known Bots

784
Patterns in classifier database

AI/LLM Crawlers

159
GPTBot, ClaudeBot, CCBot, etc.

Categories

16
Aligned with AWS WAF taxonomy

How it works

Send a POST /classify request with a User-Agent string and optionally an IP address. The API runs the input through multiple detection layers:

LayerWhat it does
Pattern MatchingMatches against 784 known bot patterns from arcjet/well-known-bots, crawler-user-agents, and ai-robots-txt
IP → ASNLooks up the Autonomous System to identify the network operator
Bot IP VerificationChecks if the IP matches officially published ranges (Google, Bing, OpenAI)
Datacenter DetectionIdentifies if the IP belongs to AWS, GCP, or other cloud providers
Confidence ScoringCombines all signals into high / medium / low confidence

Quick example

curl -X POST https://ua-api.lab.c2comms.cloud/classify \
  -H "Content-Type: application/json" \
  -d '{
    "userAgent": "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)",
    "ip": "66.249.66.1"
  }'
{
  "classification": {
    "isBot": true,
    "isLLM": false,
    "category": "search_engine",
    "botName": "Google Crawler",
    "verified": true,
    "confidence": "high",
    "signals": ["verified_bot_ip"]
  }
}

Categories

advertising ai archiver content_fetcher feed_fetcher http_library link_checker monitoring page_preview scraping_framework search_engine security seo social_media tool webhooks

Data sources

SourcePurposeLicense
arcjet/well-known-botsPrimary bot DB (600+ bots)Apache 2.0
monperrus/crawler-user-agentsAdditional regex patternsMIT
ai-robots-txt/ai.robots.txtAI/LLM bot identificationMIT
Google / Bing / OpenAIBot IP verification rangesPublic
ip-location-db ASN MMDBIP → ASN lookupCC0