Files
keyhunter/docs/superpowers/specs/2026-04-04-keyhunter-design.md
2026-04-06 13:21:39 +03:00

23 KiB

KeyHunter - Design Specification

Overview

KeyHunter is a comprehensive, modular API key scanner built in Go, focused on detecting and validating API keys from 100+ LLM/AI providers. It combines native scanning capabilities with external tool integration (TruffleHog, Gitleaks), OSINT/recon modules, a web dashboard, and Telegram bot notifications.

Architecture

Approach: Plugin-based architecture. Core scanner engine with providers defined as YAML files (compile-time embedded). Single binary distribution.

Directory Structure

keyhunter/
├── cmd/keyhunter/         # CLI entrypoint (cobra)
├── pkg/
│   ├── engine/            # Core scanning engine
│   │   ├── scanner.go     # Orchestrator - input alir, provider'lari calistirir
│   │   ├── matcher.go     # Regex + entropy matching
│   │   └── verifier.go    # Active key verification (--verify flag)
│   ├── provider/          # Provider registry & loader
│   │   ├── registry.go    # Provider'lari yukler ve yonetir
│   │   ├── types.go       # Provider interface tanimlari
│   │   └── builtin/       # Compile-time embedded provider YAML'lari
│   ├── input/             # Input source adapters
│   │   ├── file.go        # Dosya/dizin tarama
│   │   ├── git.go         # Git history/diff tarama
│   │   ├── stdin.go       # Pipe/stdin destegi
│   │   ├── url.go         # URL fetch
│   │   └── remote.go      # GitHub/GitLab API, paste siteleri
│   ├── output/            # Output formatters
│   │   ├── table.go       # Renkli terminal tablo
│   │   ├── json.go        # JSON export
│   │   ├── sarif.go       # SARIF (CI/CD uyumlu)
│   │   └── csv.go         # CSV export
│   ├── adapter/           # External tool parsers
│   │   ├── trufflehog.go  # TruffleHog JSON output parser
│   │   └── gitleaks.go    # Gitleaks JSON output parser
│   ├── recon/             # OSINT/Recon engine (80+ sources)
│   │   ├── engine.go      # Recon orchestrator
│   │   ├── ratelimit.go   # Rate limiting & politeness
│   │   │
│   │   │  # --- IoT & Internet Search Engines ---
│   │   ├── shodan.go      # Shodan API client
│   │   ├── censys.go      # Censys API client
│   │   ├── zoomeye.go     # ZoomEye (Chinese IoT scanner)
│   │   ├── fofa.go        # FOFA (Chinese IoT scanner)
│   │   ├── netlas.go      # Netlas.io (HTTP body search)
│   │   ├── binaryedge.go  # BinaryEdge scanner
│   │   │
│   │   │  # --- Code Hosting & Snippets ---
│   │   ├── github.go      # GitHub code search / dorks
│   │   ├── gitlab.go      # GitLab search
│   │   ├── gist.go        # GitHub Gist search
│   │   ├── bitbucket.go   # Bitbucket code search
│   │   ├── codeberg.go    # Codeberg/Gitea search
│   │   ├── gitea.go       # Self-hosted Gitea instances
│   │   ├── replit.go      # Replit public repls
│   │   ├── codesandbox.go # CodeSandbox projects
│   │   ├── stackblitz.go  # StackBlitz projects
│   │   ├── codepen.go     # CodePen pens
│   │   ├── jsfiddle.go    # JSFiddle snippets
│   │   ├── glitch.go      # Glitch public projects
│   │   ├── observable.go  # Observable notebooks
│   │   ├── huggingface.go # HuggingFace Spaces/repos
│   │   ├── kaggle.go      # Kaggle notebooks/datasets
│   │   ├── jupyter.go     # nbviewer / Jupyter notebooks
│   │   ├── gitpod.go      # Gitpod workspace snapshots
│   │   │
│   │   │  # --- Search Engine Dorking ---
│   │   ├── google.go      # Google Custom Search / SerpAPI dorking
│   │   ├── bing.go        # Bing Web Search API dorking
│   │   ├── duckduckgo.go  # DuckDuckGo search
│   │   ├── yandex.go      # Yandex XML Search
│   │   ├── brave.go       # Brave Search API
│   │   │
│   │   │  # --- Paste Sites ---
│   │   ├── paste.go       # Multi-paste aggregator (pastebin, dpaste, paste.ee, rentry, hastebin, ix.io, etc.)
│   │   │
│   │   │  # --- Package Registries ---
│   │   ├── npm.go         # npm registry scanning
│   │   ├── pypi.go        # PyPI package scanning
│   │   ├── rubygems.go    # RubyGems scanning
│   │   ├── crates.go      # crates.io (Rust)
│   │   ├── maven.go       # Maven Central (Java)
│   │   ├── nuget.go       # NuGet (.NET)
│   │   ├── packagist.go   # Packagist (PHP)
│   │   ├── goproxy.go     # Go module proxy
│   │   │
│   │   │  # --- Container & Infra ---
│   │   ├── docker.go      # Docker Hub image/layer scanning
│   │   ├── kubernetes.go  # Exposed K8s dashboards & configs
│   │   ├── terraform.go   # Terraform state files & registry
│   │   ├── helm.go        # Artifact Hub / Helm charts
│   │   ├── ansible.go     # Ansible Galaxy collections
│   │   │
│   │   │  # --- Cloud Storage ---
│   │   ├── s3.go          # AWS S3 bucket enumeration
│   │   ├── gcs.go         # Google Cloud Storage buckets
│   │   ├── azureblob.go   # Azure Blob Storage
│   │   ├── spaces.go      # DigitalOcean Spaces
│   │   ├── backblaze.go   # Backblaze B2
│   │   ├── minio.go       # Self-hosted MinIO instances
│   │   ├── grayhat.go     # GrayHatWarfare (bucket search engine)
│   │   │
│   │   │  # --- CI/CD Log Leaks ---
│   │   ├── travisci.go    # Travis CI public build logs
│   │   ├── circleci.go    # CircleCI build logs
│   │   ├── ghactions.go   # GitHub Actions workflow logs
│   │   ├── jenkins.go     # Exposed Jenkins instances
│   │   ├── gitlabci.go    # GitLab CI/CD pipeline logs
│   │   │
│   │   │  # --- Web Archives ---
│   │   ├── wayback.go     # Wayback Machine CDX API
│   │   ├── commoncrawl.go # CommonCrawl index & WARC
│   │   │
│   │   │  # --- Forums & Documentation ---
│   │   ├── stackoverflow.go # Stack Overflow / Stack Exchange API
│   │   ├── reddit.go      # Reddit search
│   │   ├── hackernews.go  # HN Algolia API
│   │   ├── devto.go       # dev.to articles
│   │   ├── medium.go      # Medium articles
│   │   ├── telegram_recon.go # Telegram public channels
│   │   ├── discord.go     # Discord indexed content
│   │   │
│   │   │  # --- Collaboration Tools ---
│   │   ├── notion.go      # Notion public pages
│   │   ├── confluence.go  # Confluence public spaces
│   │   ├── trello.go      # Trello public boards
│   │   ├── googledocs.go  # Google Docs/Sheets public
│   │   │
│   │   │  # --- Frontend & JS Leaks ---
│   │   ├── sourcemaps.go  # JS source map extraction
│   │   ├── webpack.go     # Webpack/Vite bundle scanning
│   │   ├── dotenv_web.go  # Exposed .env files on web servers
│   │   ├── swagger.go     # Exposed Swagger/OpenAPI docs
│   │   ├── deploys.go     # Vercel/Netlify preview deployments
│   │   │
│   │   │  # --- Log Aggregators ---
│   │   ├── elasticsearch.go # Exposed Elasticsearch/Kibana
│   │   ├── grafana.go     # Exposed Grafana dashboards
│   │   ├── sentry.go      # Exposed Sentry instances
│   │   │
│   │   │  # --- Threat Intelligence ---
│   │   ├── virustotal.go  # VirusTotal file/URL search
│   │   ├── intelx.go      # Intelligence X aggregated search
│   │   ├── urlhaus.go     # URLhaus abuse.ch
│   │   │
│   │   │  # --- Mobile Apps ---
│   │   ├── apk.go         # APK download & decompile scanning
│   │   │
│   │   │  # --- DNS/Subdomain ---
│   │   ├── crtsh.go       # Certificate Transparency (crt.sh)
│   │   ├── subdomain.go   # Subdomain config endpoint probing
│   │   │
│   │   │  # --- API Marketplaces ---
│   │   ├── postman.go     # Postman public collections/workspaces
│   │   ├── swaggerhub.go  # SwaggerHub published APIs
│   │   └── rapidapi.go    # RapidAPI public endpoints
│   │
│   ├── dorks/             # Dork management
│   │   ├── loader.go      # YAML dork loader
│   │   ├── runner.go      # Dork execution engine
│   │   └── builtin/       # Embedded dork YAML'lari
│   ├── notify/            # Notification modulleri
│   │   ├── telegram.go    # Telegram bot
│   │   ├── webhook.go     # Generic webhook
│   │   └── slack.go       # Slack
│   └── web/               # Web dashboard
│       ├── server.go      # Embedded HTTP server
│       ├── api.go         # REST API
│       └── static/        # Frontend assets (htmx + tailwind)
├── providers/             # Provider YAML definitions (embed edilir)
│   ├── openai.yaml
│   ├── anthropic.yaml
│   └── ... (108 provider)
├── dorks/                 # Dork YAML definitions (embed edilir)
│   ├── github.yaml        # GitHub code search dorks
│   ├── gitlab.yaml        # GitLab search dorks
│   ├── shodan.yaml        # Shodan IoT dorks
│   ├── censys.yaml        # Censys dorks
│   ├── zoomeye.yaml       # ZoomEye dorks
│   ├── fofa.yaml          # FOFA dorks
│   ├── google.yaml        # Google dorking queries
│   ├── bing.yaml          # Bing dorking queries
│   └── generic.yaml       # Multi-source keyword dorks
├── configs/               # Ornek config dosyalari
└── docs/

Data Flow

Input Source -> Scanner Engine -> Provider Matcher -> (optional) Verifier -> Output Formatter + Notifier
                                                                         -> SQLite DB (persist)
                                                                         -> Web Dashboard (serve)

Provider YAML Schema

id: string                  # Unique provider ID
name: string                # Display name
category: enum              # frontier | mid-tier | emerging | chinese | infrastructure | gateway | self-hosted
website: string             # API base URL
confidence: enum            # high | medium | low

patterns:
  - id: string              # Unique pattern ID
    name: string            # Human-readable name
    regex: string           # Detection regex
    confidence: enum        # high | medium | low
    description: string     # Pattern description

keywords: []string          # Pre-filtering keywords (performance optimization)

verify:
  enabled: bool
  method: string            # HTTP method
  url: string               # Verification endpoint
  headers: map              # Headers with {{key}} template
  success_codes: []int
  failure_codes: []int
  extract:                  # Additional info extraction on success
    - field: string
      path: string          # JSON path

metadata:
  docs: string              # API docs URL
  key_url: string           # Key management URL
  env_vars: []string        # Common environment variable names
  revoke_url: string        # Key revocation URL

CLI Command Structure

Core Commands

# Scanning
keyhunter scan path <dir>
keyhunter scan file <file>
keyhunter scan git <repo> [--since=<duration>]
keyhunter scan stdin
keyhunter scan url <url>
keyhunter scan clipboard

# Verification
keyhunter verify <key>
keyhunter verify --file <keyfile>

# External Tool Import
keyhunter import trufflehog <json>
keyhunter import gitleaks <json>
keyhunter import generic --format=csv <file>

# OSINT/Recon — IoT & Internet Scanners
keyhunter recon shodan [--query|--dork]
keyhunter recon censys [--query]
keyhunter recon zoomeye [--query]
keyhunter recon fofa [--query]
keyhunter recon netlas [--query]
keyhunter recon binaryedge [--query]

# OSINT/Recon — Code Hosting & Snippets
keyhunter recon github [--dork=auto|custom]
keyhunter recon gitlab [--dork=auto|custom]
keyhunter recon gist [--query]
keyhunter recon bitbucket [--query|--workspace]
keyhunter recon codeberg [--query]
keyhunter recon gitea [--instances-from=shodan|file]
keyhunter recon replit [--query]
keyhunter recon codesandbox [--query]
keyhunter recon stackblitz [--query]
keyhunter recon codepen [--query]
keyhunter recon jsfiddle [--query]
keyhunter recon glitch [--query]
keyhunter recon huggingface [--query|--spaces|--repos]
keyhunter recon kaggle [--query|--notebooks]
keyhunter recon jupyter [--query]
keyhunter recon observable [--query]

# OSINT/Recon — Search Engine Dorking
keyhunter recon google [--dork=auto|custom]
keyhunter recon bing [--dork=auto|custom]
keyhunter recon duckduckgo [--query]
keyhunter recon yandex [--query]
keyhunter recon brave [--query]

# OSINT/Recon — Paste Sites
keyhunter recon paste [--sources=pastebin,dpaste,paste.ee,rentry,hastebin,ix.io,all]

# OSINT/Recon — Package Registries
keyhunter recon npm [--query|--recent]
keyhunter recon pypi [--query|--recent]
keyhunter recon rubygems [--query]
keyhunter recon crates [--query]
keyhunter recon maven [--query]
keyhunter recon nuget [--query]
keyhunter recon packagist [--query]
keyhunter recon goproxy [--query]

# OSINT/Recon — Container & Infrastructure
keyhunter recon docker [--query|--image|--layers]
keyhunter recon kubernetes [--shodan|--github]
keyhunter recon terraform [--github|--registry]
keyhunter recon helm [--query]
keyhunter recon ansible [--query]

# OSINT/Recon — Cloud Storage
keyhunter recon s3 [--wordlist|--domain]
keyhunter recon gcs [--wordlist|--domain]
keyhunter recon azure [--wordlist|--domain]
keyhunter recon spaces [--wordlist]
keyhunter recon minio [--shodan]
keyhunter recon grayhat [--query]   # GrayHatWarfare bucket search

# OSINT/Recon — CI/CD Logs
keyhunter recon travis [--org|--repo]
keyhunter recon circleci [--org|--repo]
keyhunter recon ghactions [--org|--repo]
keyhunter recon jenkins [--shodan|--url]
keyhunter recon gitlabci [--project]

# OSINT/Recon — Web Archives
keyhunter recon wayback [--domain|--url]
keyhunter recon commoncrawl [--domain|--pattern]

# OSINT/Recon — Forums & Documentation
keyhunter recon stackoverflow [--query]
keyhunter recon reddit [--query|--subreddit]
keyhunter recon hackernews [--query]
keyhunter recon devto [--query|--tag]
keyhunter recon medium [--query]
keyhunter recon telegram-groups [--channel|--query]

# OSINT/Recon — Collaboration Tools
keyhunter recon notion [--query]        # Google dorking
keyhunter recon confluence [--shodan|--url]
keyhunter recon trello [--query]
keyhunter recon googledocs [--query]    # Google dorking

# OSINT/Recon — Frontend & JS Leaks
keyhunter recon sourcemaps [--domain|--url]
keyhunter recon webpack [--domain|--url]
keyhunter recon dotenv [--domain-list|--url]  # Exposed .env files
keyhunter recon swagger [--shodan|--domain]
keyhunter recon deploys [--domain]      # Vercel/Netlify previews

# OSINT/Recon — Log Aggregators
keyhunter recon elasticsearch [--shodan|--url]
keyhunter recon grafana [--shodan|--url]
keyhunter recon sentry [--shodan|--url]

# OSINT/Recon — Threat Intelligence
keyhunter recon virustotal [--query]
keyhunter recon intelx [--query]
keyhunter recon urlhaus [--query]

# OSINT/Recon — Mobile Apps
keyhunter recon apk [--package|--query|--file]

# OSINT/Recon — DNS/Subdomain
keyhunter recon crtsh [--domain]
keyhunter recon subdomain [--domain] [--probe-configs]

# OSINT/Recon — API Marketplaces
keyhunter recon postman [--query|--workspace]
keyhunter recon swaggerhub [--query]

# OSINT/Recon — Full Sweep
keyhunter recon full [--providers] [--categories=all|code|cloud|forums|cicd|...]

# Dork Management
keyhunter dorks list [--source]
keyhunter dorks add <source> <query>
keyhunter dorks run <source> [--category]
keyhunter dorks export

# Key Management (full key access)
keyhunter keys list [--unmask] [--provider=X] [--status=active|revoked]
keyhunter keys show <id>
keyhunter keys export --format=json|csv
keyhunter keys copy <id>
keyhunter keys verify <id>
keyhunter keys delete <id>

# Provider Management
keyhunter providers list [--category]
keyhunter providers info <id>
keyhunter providers stats

# Web Dashboard & Telegram
keyhunter serve [--port] [--telegram]

# Scheduled Scanning
keyhunter schedule add --name --cron --command --notify
keyhunter schedule list
keyhunter schedule remove <name>

# Config & Hooks
keyhunter config init
keyhunter config set <key> <value>
keyhunter hook install
keyhunter hook uninstall

Scan Flags

--providers=<list>       Filter by provider IDs
--category=<cat>         Filter by provider category
--confidence=<level>     Minimum confidence level
--exclude=<patterns>     Exclude file patterns
--verify                 Enable active key verification
--verify-timeout=<dur>   Verification timeout (default: 10s)
--workers=<n>            Parallel workers (default: CPU count)
--output=<format>        Output format: table|json|sarif|csv
--unmask                 Show full API keys without masking (default: masked)
--notify=<channel>       Send results to: telegram|webhook|slack
--stealth                Stealth mode: UA rotation, increased delays
--respect-robots         Respect robots.txt (default: true)

Exit Codes

  • 0 — Clean, no keys found
  • 1 — Keys found
  • 2 — Error

Dork YAML Schema

source: string             # github | gitlab | shodan | censys
dorks:
  - id: string
    query: string          # Search query
    description: string
    providers: []string    # Optional: related provider IDs

Built-in dork categories: GitHub (code search, filename, language), GitLab (snippets, projects), Shodan (exposed proxies, dashboards), Censys (HTTP body search).

Web Dashboard

Stack: Go embed + htmx + Tailwind CSS (zero JS framework dependency)

Pages:

  • / — Dashboard overview with summary statistics
  • /scans — Scan history list
  • /scans/:id — Scan detail with found keys
  • /keys — All found keys (filterable table)
  • /keys/:id — Key detail (provider, confidence, verify status)
  • /recon — OSINT scan launcher and results
  • /providers — Provider list and statistics
  • /dorks — Dork management
  • /settings — Configuration (tokens, API keys)
  • /api/v1/* — REST API for programmatic access

Storage: SQLite (embedded, AES-256 encrypted)

Telegram Bot

Commands:

  • /scan <url/path> — Remote scan trigger
  • /verify <key> — Key verification
  • /recon github <dork> — GitHub dork execution
  • /status — Active scan status
  • /stats — General statistics
  • /subscribe — Auto-notification on new key findings
  • /unsubscribe — Disable notifications
  • /providers — Provider list
  • /help — Help

Auto-notifications: New key found, recon complete, scheduled scan results, verify results.

LLM Provider Coverage (108 Providers)

Tier 1 — Frontier (12)

OpenAI, Anthropic, Google AI (Gemini), Google Vertex AI, AWS Bedrock, Azure OpenAI, Meta AI (Llama API), xAI (Grok), Cohere, Mistral AI, Inflection AI, AI21 Labs

Tier 2 — Inference Platforms (14)

Together AI, Fireworks AI, Groq, Replicate, Anyscale, DeepInfra, Lepton AI, Modal, Baseten, Cerebrium, NovitaAI, Sambanova, OctoAI, Friendli AI

Tier 3 — Specialized/Vertical (12)

Perplexity, You.com, Voyage AI, Jina AI, Unstructured, AssemblyAI, Deepgram, ElevenLabs, Stability AI, Runway ML, Midjourney, HuggingFace

Tier 4 — Chinese/Regional (16)

DeepSeek, Baichuan, Zhipu AI (GLM), Moonshot AI (Kimi), Yi (01.AI), Qwen (Alibaba Cloud), Baidu (ERNIE/Wenxin), ByteDance (Doubao), SenseTime, iFlytek (Spark), MiniMax, Stepfun, 360 AI, Kuaishou (Kling), Tencent Hunyuan, SiliconFlow

Tier 5 — Infrastructure/Gateway (11)

Cloudflare AI, Vercel AI, LiteLLM, Portkey, Helicone, OpenRouter, Martian, AI Gateway (Kong), BricksAI, Aether, Not Diamond

Tier 6 — Emerging/Niche (15)

Reka AI, Aleph Alpha, Writer, Jasper AI, Typeface, Comet ML, Weights & Biases, LangSmith (LangChain), Pinecone, Weaviate, Qdrant, Chroma, Milvus, Neon AI, Lamini

Tier 7 — Code & Dev Tools (10)

GitHub Copilot, Cursor, Tabnine, Codeium/Windsurf, Sourcegraph Cody, Amazon CodeWhisperer, Replit AI, Codestral (Mistral), IBM watsonx.ai, Oracle AI

Tier 8 — Self-Hosted/Open Infra (10)

Ollama, vLLM, LocalAI, LM Studio, llama.cpp, GPT4All, text-generation-webui, TensorRT-LLM, Triton Inference Server, Jan AI

Tier 9 — Enterprise/Legacy (8)

Salesforce Einstein, ServiceNow AI, SAP AI Core, Palantir AIP, Databricks (DBRX), Snowflake Cortex, Oracle Generative AI, HPE GreenLake AI

Performance

  • Worker pool: parallel scanning (default: CPU count, configurable via --workers=N)
  • Keyword pre-filtering before regex (10x speedup on large files)
  • mmap for large file reading
  • Delta-based git scanning (only changed files between commits)
  • Source-based rate limiting in recon module

Key Visibility & Access

Full (unmasked) API keys are accessible through multiple channels:

  1. CLI --unmask flagkeyhunter scan path . --unmask shows full keys in terminal table
  2. JSON/CSV/SARIF export — Always contains full keys: keyhunter scan path . -o json
  3. keyhunter keys command — Dedicated key management:
    • keyhunter keys list — all found keys (masked by default)
    • keyhunter keys list --unmask — all found keys (full)
    • keyhunter keys show <id> — single key full detail (always unmasked)
    • keyhunter keys export --format=json — export all keys with full values
    • keyhunter keys copy <id> — copy full key to clipboard
    • keyhunter keys verify <id> — verify and show full detail
  4. Web Dashboard/keys/:id detail page with "Reveal Key" toggle button (auth required)
  5. Telegram Bot/key <id> returns full key detail in private chat
  6. SQLite DB — Full keys always stored (encrypted), queryable via API

Default behavior: masked in terminal for shoulder-surfing protection. When you need the real key (to test, verify, or report): --unmask, JSON export, or keys show.

Security

  • Key masking in terminal output by default (first 8 + last 4 chars, middle ***)
  • --unmask flag to reveal full keys when needed
  • SQLite database AES-256 encrypted (full keys stored encrypted)
  • Telegram/Shodan tokens encrypted in config
  • No key values written to logs during --verify
  • Optional basic auth / token auth for web dashboard

Rate Limiting & Ethics

  • GitHub API: 30 req/min (auth), 10 req/min (unauth)
  • Shodan/Censys: respect API plan limits
  • Paste sites: 1 req/2sec politeness delay
  • --stealth flag: UA rotation, increased spacing
  • --respect-robots: robots.txt compliance (default: on)

Error Handling

  • Verify timeout: 10s default, configurable
  • Network errors: 3 retries with exponential backoff
  • Partial results: failed sources don't block others
  • Graceful degradation on all external dependencies