# KeyHunter - Design Specification ## Overview KeyHunter is a comprehensive, modular API key scanner built in Go, focused on detecting and validating API keys from 100+ LLM/AI providers. It combines native scanning capabilities with external tool integration (TruffleHog, Gitleaks), OSINT/recon modules, a web dashboard, and Telegram bot notifications. ## Architecture **Approach:** Plugin-based architecture. Core scanner engine with providers defined as YAML files (compile-time embedded). Single binary distribution. ### Directory Structure ``` keyhunter/ ├── cmd/keyhunter/ # CLI entrypoint (cobra) ├── pkg/ │ ├── engine/ # Core scanning engine │ │ ├── scanner.go # Orchestrator - input alir, provider'lari calistirir │ │ ├── matcher.go # Regex + entropy matching │ │ └── verifier.go # Active key verification (--verify flag) │ ├── provider/ # Provider registry & loader │ │ ├── registry.go # Provider'lari yukler ve yonetir │ │ ├── types.go # Provider interface tanimlari │ │ └── builtin/ # Compile-time embedded provider YAML'lari │ ├── input/ # Input source adapters │ │ ├── file.go # Dosya/dizin tarama │ │ ├── git.go # Git history/diff tarama │ │ ├── stdin.go # Pipe/stdin destegi │ │ ├── url.go # URL fetch │ │ └── remote.go # GitHub/GitLab API, paste siteleri │ ├── output/ # Output formatters │ │ ├── table.go # Renkli terminal tablo │ │ ├── json.go # JSON export │ │ ├── sarif.go # SARIF (CI/CD uyumlu) │ │ └── csv.go # CSV export │ ├── adapter/ # External tool parsers │ │ ├── trufflehog.go # TruffleHog JSON output parser │ │ └── gitleaks.go # Gitleaks JSON output parser │ ├── recon/ # OSINT/Recon engine (80+ sources) │ │ ├── engine.go # Recon orchestrator │ │ ├── ratelimit.go # Rate limiting & politeness │ │ │ │ │ │ # --- IoT & Internet Search Engines --- │ │ ├── shodan.go # Shodan API client │ │ ├── censys.go # Censys API client │ │ ├── zoomeye.go # ZoomEye (Chinese IoT scanner) │ │ ├── fofa.go # FOFA (Chinese IoT scanner) │ │ ├── netlas.go # Netlas.io (HTTP body search) │ │ ├── binaryedge.go # BinaryEdge scanner │ │ │ │ │ │ # --- Code Hosting & Snippets --- │ │ ├── github.go # GitHub code search / dorks │ │ ├── gitlab.go # GitLab search │ │ ├── gist.go # GitHub Gist search │ │ ├── bitbucket.go # Bitbucket code search │ │ ├── codeberg.go # Codeberg/Gitea search │ │ ├── gitea.go # Self-hosted Gitea instances │ │ ├── replit.go # Replit public repls │ │ ├── codesandbox.go # CodeSandbox projects │ │ ├── stackblitz.go # StackBlitz projects │ │ ├── codepen.go # CodePen pens │ │ ├── jsfiddle.go # JSFiddle snippets │ │ ├── glitch.go # Glitch public projects │ │ ├── observable.go # Observable notebooks │ │ ├── huggingface.go # HuggingFace Spaces/repos │ │ ├── kaggle.go # Kaggle notebooks/datasets │ │ ├── jupyter.go # nbviewer / Jupyter notebooks │ │ ├── gitpod.go # Gitpod workspace snapshots │ │ │ │ │ │ # --- Search Engine Dorking --- │ │ ├── google.go # Google Custom Search / SerpAPI dorking │ │ ├── bing.go # Bing Web Search API dorking │ │ ├── duckduckgo.go # DuckDuckGo search │ │ ├── yandex.go # Yandex XML Search │ │ ├── brave.go # Brave Search API │ │ │ │ │ │ # --- Paste Sites --- │ │ ├── paste.go # Multi-paste aggregator (pastebin, dpaste, paste.ee, rentry, hastebin, ix.io, etc.) │ │ │ │ │ │ # --- Package Registries --- │ │ ├── npm.go # npm registry scanning │ │ ├── pypi.go # PyPI package scanning │ │ ├── rubygems.go # RubyGems scanning │ │ ├── crates.go # crates.io (Rust) │ │ ├── maven.go # Maven Central (Java) │ │ ├── nuget.go # NuGet (.NET) │ │ ├── packagist.go # Packagist (PHP) │ │ ├── goproxy.go # Go module proxy │ │ │ │ │ │ # --- Container & Infra --- │ │ ├── docker.go # Docker Hub image/layer scanning │ │ ├── kubernetes.go # Exposed K8s dashboards & configs │ │ ├── terraform.go # Terraform state files & registry │ │ ├── helm.go # Artifact Hub / Helm charts │ │ ├── ansible.go # Ansible Galaxy collections │ │ │ │ │ │ # --- Cloud Storage --- │ │ ├── s3.go # AWS S3 bucket enumeration │ │ ├── gcs.go # Google Cloud Storage buckets │ │ ├── azureblob.go # Azure Blob Storage │ │ ├── spaces.go # DigitalOcean Spaces │ │ ├── backblaze.go # Backblaze B2 │ │ ├── minio.go # Self-hosted MinIO instances │ │ ├── grayhat.go # GrayHatWarfare (bucket search engine) │ │ │ │ │ │ # --- CI/CD Log Leaks --- │ │ ├── travisci.go # Travis CI public build logs │ │ ├── circleci.go # CircleCI build logs │ │ ├── ghactions.go # GitHub Actions workflow logs │ │ ├── jenkins.go # Exposed Jenkins instances │ │ ├── gitlabci.go # GitLab CI/CD pipeline logs │ │ │ │ │ │ # --- Web Archives --- │ │ ├── wayback.go # Wayback Machine CDX API │ │ ├── commoncrawl.go # CommonCrawl index & WARC │ │ │ │ │ │ # --- Forums & Documentation --- │ │ ├── stackoverflow.go # Stack Overflow / Stack Exchange API │ │ ├── reddit.go # Reddit search │ │ ├── hackernews.go # HN Algolia API │ │ ├── devto.go # dev.to articles │ │ ├── medium.go # Medium articles │ │ ├── telegram_recon.go # Telegram public channels │ │ ├── discord.go # Discord indexed content │ │ │ │ │ │ # --- Collaboration Tools --- │ │ ├── notion.go # Notion public pages │ │ ├── confluence.go # Confluence public spaces │ │ ├── trello.go # Trello public boards │ │ ├── googledocs.go # Google Docs/Sheets public │ │ │ │ │ │ # --- Frontend & JS Leaks --- │ │ ├── sourcemaps.go # JS source map extraction │ │ ├── webpack.go # Webpack/Vite bundle scanning │ │ ├── dotenv_web.go # Exposed .env files on web servers │ │ ├── swagger.go # Exposed Swagger/OpenAPI docs │ │ ├── deploys.go # Vercel/Netlify preview deployments │ │ │ │ │ │ # --- Log Aggregators --- │ │ ├── elasticsearch.go # Exposed Elasticsearch/Kibana │ │ ├── grafana.go # Exposed Grafana dashboards │ │ ├── sentry.go # Exposed Sentry instances │ │ │ │ │ │ # --- Threat Intelligence --- │ │ ├── virustotal.go # VirusTotal file/URL search │ │ ├── intelx.go # Intelligence X aggregated search │ │ ├── urlhaus.go # URLhaus abuse.ch │ │ │ │ │ │ # --- Mobile Apps --- │ │ ├── apk.go # APK download & decompile scanning │ │ │ │ │ │ # --- DNS/Subdomain --- │ │ ├── crtsh.go # Certificate Transparency (crt.sh) │ │ ├── subdomain.go # Subdomain config endpoint probing │ │ │ │ │ │ # --- API Marketplaces --- │ │ ├── postman.go # Postman public collections/workspaces │ │ ├── swaggerhub.go # SwaggerHub published APIs │ │ └── rapidapi.go # RapidAPI public endpoints │ │ │ ├── dorks/ # Dork management │ │ ├── loader.go # YAML dork loader │ │ ├── runner.go # Dork execution engine │ │ └── builtin/ # Embedded dork YAML'lari │ ├── notify/ # Notification modulleri │ │ ├── telegram.go # Telegram bot │ │ ├── webhook.go # Generic webhook │ │ └── slack.go # Slack │ └── web/ # Web dashboard │ ├── server.go # Embedded HTTP server │ ├── api.go # REST API │ └── static/ # Frontend assets (htmx + tailwind) ├── providers/ # Provider YAML definitions (embed edilir) │ ├── openai.yaml │ ├── anthropic.yaml │ └── ... (108 provider) ├── dorks/ # Dork YAML definitions (embed edilir) │ ├── github.yaml # GitHub code search dorks │ ├── gitlab.yaml # GitLab search dorks │ ├── shodan.yaml # Shodan IoT dorks │ ├── censys.yaml # Censys dorks │ ├── zoomeye.yaml # ZoomEye dorks │ ├── fofa.yaml # FOFA dorks │ ├── google.yaml # Google dorking queries │ ├── bing.yaml # Bing dorking queries │ └── generic.yaml # Multi-source keyword dorks ├── configs/ # Ornek config dosyalari └── docs/ ``` ### Data Flow ``` Input Source -> Scanner Engine -> Provider Matcher -> (optional) Verifier -> Output Formatter + Notifier -> SQLite DB (persist) -> Web Dashboard (serve) ``` ## Provider YAML Schema ```yaml id: string # Unique provider ID name: string # Display name category: enum # frontier | mid-tier | emerging | chinese | infrastructure | gateway | self-hosted website: string # API base URL confidence: enum # high | medium | low patterns: - id: string # Unique pattern ID name: string # Human-readable name regex: string # Detection regex confidence: enum # high | medium | low description: string # Pattern description keywords: []string # Pre-filtering keywords (performance optimization) verify: enabled: bool method: string # HTTP method url: string # Verification endpoint headers: map # Headers with {{key}} template success_codes: []int failure_codes: []int extract: # Additional info extraction on success - field: string path: string # JSON path metadata: docs: string # API docs URL key_url: string # Key management URL env_vars: []string # Common environment variable names revoke_url: string # Key revocation URL ``` ## CLI Command Structure ### Core Commands ```bash # Scanning keyhunter scan path keyhunter scan file keyhunter scan git [--since=] keyhunter scan stdin keyhunter scan url keyhunter scan clipboard # Verification keyhunter verify keyhunter verify --file # External Tool Import keyhunter import trufflehog keyhunter import gitleaks keyhunter import generic --format=csv # OSINT/Recon — IoT & Internet Scanners keyhunter recon shodan [--query|--dork] keyhunter recon censys [--query] keyhunter recon zoomeye [--query] keyhunter recon fofa [--query] keyhunter recon netlas [--query] keyhunter recon binaryedge [--query] # OSINT/Recon — Code Hosting & Snippets keyhunter recon github [--dork=auto|custom] keyhunter recon gitlab [--dork=auto|custom] keyhunter recon gist [--query] keyhunter recon bitbucket [--query|--workspace] keyhunter recon codeberg [--query] keyhunter recon gitea [--instances-from=shodan|file] keyhunter recon replit [--query] keyhunter recon codesandbox [--query] keyhunter recon stackblitz [--query] keyhunter recon codepen [--query] keyhunter recon jsfiddle [--query] keyhunter recon glitch [--query] keyhunter recon huggingface [--query|--spaces|--repos] keyhunter recon kaggle [--query|--notebooks] keyhunter recon jupyter [--query] keyhunter recon observable [--query] # OSINT/Recon — Search Engine Dorking keyhunter recon google [--dork=auto|custom] keyhunter recon bing [--dork=auto|custom] keyhunter recon duckduckgo [--query] keyhunter recon yandex [--query] keyhunter recon brave [--query] # OSINT/Recon — Paste Sites keyhunter recon paste [--sources=pastebin,dpaste,paste.ee,rentry,hastebin,ix.io,all] # OSINT/Recon — Package Registries keyhunter recon npm [--query|--recent] keyhunter recon pypi [--query|--recent] keyhunter recon rubygems [--query] keyhunter recon crates [--query] keyhunter recon maven [--query] keyhunter recon nuget [--query] keyhunter recon packagist [--query] keyhunter recon goproxy [--query] # OSINT/Recon — Container & Infrastructure keyhunter recon docker [--query|--image|--layers] keyhunter recon kubernetes [--shodan|--github] keyhunter recon terraform [--github|--registry] keyhunter recon helm [--query] keyhunter recon ansible [--query] # OSINT/Recon — Cloud Storage keyhunter recon s3 [--wordlist|--domain] keyhunter recon gcs [--wordlist|--domain] keyhunter recon azure [--wordlist|--domain] keyhunter recon spaces [--wordlist] keyhunter recon minio [--shodan] keyhunter recon grayhat [--query] # GrayHatWarfare bucket search # OSINT/Recon — CI/CD Logs keyhunter recon travis [--org|--repo] keyhunter recon circleci [--org|--repo] keyhunter recon ghactions [--org|--repo] keyhunter recon jenkins [--shodan|--url] keyhunter recon gitlabci [--project] # OSINT/Recon — Web Archives keyhunter recon wayback [--domain|--url] keyhunter recon commoncrawl [--domain|--pattern] # OSINT/Recon — Forums & Documentation keyhunter recon stackoverflow [--query] keyhunter recon reddit [--query|--subreddit] keyhunter recon hackernews [--query] keyhunter recon devto [--query|--tag] keyhunter recon medium [--query] keyhunter recon telegram-groups [--channel|--query] # OSINT/Recon — Collaboration Tools keyhunter recon notion [--query] # Google dorking keyhunter recon confluence [--shodan|--url] keyhunter recon trello [--query] keyhunter recon googledocs [--query] # Google dorking # OSINT/Recon — Frontend & JS Leaks keyhunter recon sourcemaps [--domain|--url] keyhunter recon webpack [--domain|--url] keyhunter recon dotenv [--domain-list|--url] # Exposed .env files keyhunter recon swagger [--shodan|--domain] keyhunter recon deploys [--domain] # Vercel/Netlify previews # OSINT/Recon — Log Aggregators keyhunter recon elasticsearch [--shodan|--url] keyhunter recon grafana [--shodan|--url] keyhunter recon sentry [--shodan|--url] # OSINT/Recon — Threat Intelligence keyhunter recon virustotal [--query] keyhunter recon intelx [--query] keyhunter recon urlhaus [--query] # OSINT/Recon — Mobile Apps keyhunter recon apk [--package|--query|--file] # OSINT/Recon — DNS/Subdomain keyhunter recon crtsh [--domain] keyhunter recon subdomain [--domain] [--probe-configs] # OSINT/Recon — API Marketplaces keyhunter recon postman [--query|--workspace] keyhunter recon swaggerhub [--query] # OSINT/Recon — Full Sweep keyhunter recon full [--providers] [--categories=all|code|cloud|forums|cicd|...] # Dork Management keyhunter dorks list [--source] keyhunter dorks add keyhunter dorks run [--category] keyhunter dorks export # Key Management (full key access) keyhunter keys list [--unmask] [--provider=X] [--status=active|revoked] keyhunter keys show keyhunter keys export --format=json|csv keyhunter keys copy keyhunter keys verify keyhunter keys delete # Provider Management keyhunter providers list [--category] keyhunter providers info keyhunter providers stats # Web Dashboard & Telegram keyhunter serve [--port] [--telegram] # Scheduled Scanning keyhunter schedule add --name --cron --command --notify keyhunter schedule list keyhunter schedule remove # Config & Hooks keyhunter config init keyhunter config set keyhunter hook install keyhunter hook uninstall ``` ### Scan Flags ``` --providers= Filter by provider IDs --category= Filter by provider category --confidence= Minimum confidence level --exclude= Exclude file patterns --verify Enable active key verification --verify-timeout= Verification timeout (default: 10s) --workers= Parallel workers (default: CPU count) --output= Output format: table|json|sarif|csv --unmask Show full API keys without masking (default: masked) --notify= Send results to: telegram|webhook|slack --stealth Stealth mode: UA rotation, increased delays --respect-robots Respect robots.txt (default: true) ``` ### Exit Codes - `0` — Clean, no keys found - `1` — Keys found - `2` — Error ## Dork YAML Schema ```yaml source: string # github | gitlab | shodan | censys dorks: - id: string query: string # Search query description: string providers: []string # Optional: related provider IDs ``` Built-in dork categories: GitHub (code search, filename, language), GitLab (snippets, projects), Shodan (exposed proxies, dashboards), Censys (HTTP body search). ## Web Dashboard **Stack:** Go embed + htmx + Tailwind CSS (zero JS framework dependency) **Pages:** - `/` — Dashboard overview with summary statistics - `/scans` — Scan history list - `/scans/:id` — Scan detail with found keys - `/keys` — All found keys (filterable table) - `/keys/:id` — Key detail (provider, confidence, verify status) - `/recon` — OSINT scan launcher and results - `/providers` — Provider list and statistics - `/dorks` — Dork management - `/settings` — Configuration (tokens, API keys) - `/api/v1/*` — REST API for programmatic access **Storage:** SQLite (embedded, AES-256 encrypted) ## Telegram Bot **Commands:** - `/scan ` — Remote scan trigger - `/verify ` — Key verification - `/recon github ` — GitHub dork execution - `/status` — Active scan status - `/stats` — General statistics - `/subscribe` — Auto-notification on new key findings - `/unsubscribe` — Disable notifications - `/providers` — Provider list - `/help` — Help **Auto-notifications:** New key found, recon complete, scheduled scan results, verify results. ## LLM Provider Coverage (108 Providers) ### Tier 1 — Frontier (12) OpenAI, Anthropic, Google AI (Gemini), Google Vertex AI, AWS Bedrock, Azure OpenAI, Meta AI (Llama API), xAI (Grok), Cohere, Mistral AI, Inflection AI, AI21 Labs ### Tier 2 — Inference Platforms (14) Together AI, Fireworks AI, Groq, Replicate, Anyscale, DeepInfra, Lepton AI, Modal, Baseten, Cerebrium, NovitaAI, Sambanova, OctoAI, Friendli AI ### Tier 3 — Specialized/Vertical (12) Perplexity, You.com, Voyage AI, Jina AI, Unstructured, AssemblyAI, Deepgram, ElevenLabs, Stability AI, Runway ML, Midjourney, HuggingFace ### Tier 4 — Chinese/Regional (16) DeepSeek, Baichuan, Zhipu AI (GLM), Moonshot AI (Kimi), Yi (01.AI), Qwen (Alibaba Cloud), Baidu (ERNIE/Wenxin), ByteDance (Doubao), SenseTime, iFlytek (Spark), MiniMax, Stepfun, 360 AI, Kuaishou (Kling), Tencent Hunyuan, SiliconFlow ### Tier 5 — Infrastructure/Gateway (11) Cloudflare AI, Vercel AI, LiteLLM, Portkey, Helicone, OpenRouter, Martian, AI Gateway (Kong), BricksAI, Aether, Not Diamond ### Tier 6 — Emerging/Niche (15) Reka AI, Aleph Alpha, Writer, Jasper AI, Typeface, Comet ML, Weights & Biases, LangSmith (LangChain), Pinecone, Weaviate, Qdrant, Chroma, Milvus, Neon AI, Lamini ### Tier 7 — Code & Dev Tools (10) GitHub Copilot, Cursor, Tabnine, Codeium/Windsurf, Sourcegraph Cody, Amazon CodeWhisperer, Replit AI, Codestral (Mistral), IBM watsonx.ai, Oracle AI ### Tier 8 — Self-Hosted/Open Infra (10) Ollama, vLLM, LocalAI, LM Studio, llama.cpp, GPT4All, text-generation-webui, TensorRT-LLM, Triton Inference Server, Jan AI ### Tier 9 — Enterprise/Legacy (8) Salesforce Einstein, ServiceNow AI, SAP AI Core, Palantir AIP, Databricks (DBRX), Snowflake Cortex, Oracle Generative AI, HPE GreenLake AI ## Performance - Worker pool: parallel scanning (default: CPU count, configurable via `--workers=N`) - Keyword pre-filtering before regex (10x speedup on large files) - `mmap` for large file reading - Delta-based git scanning (only changed files between commits) - Source-based rate limiting in recon module ## Key Visibility & Access Full (unmasked) API keys are accessible through multiple channels: 1. **CLI `--unmask` flag** — `keyhunter scan path . --unmask` shows full keys in terminal table 2. **JSON/CSV/SARIF export** — Always contains full keys: `keyhunter scan path . -o json` 3. **`keyhunter keys` command** — Dedicated key management: - `keyhunter keys list` — all found keys (masked by default) - `keyhunter keys list --unmask` — all found keys (full) - `keyhunter keys show ` — single key full detail (always unmasked) - `keyhunter keys export --format=json` — export all keys with full values - `keyhunter keys copy ` — copy full key to clipboard - `keyhunter keys verify ` — verify and show full detail 4. **Web Dashboard** — `/keys/:id` detail page with "Reveal Key" toggle button (auth required) 5. **Telegram Bot** — `/key ` returns full key detail in private chat 6. **SQLite DB** — Full keys always stored (encrypted), queryable via API Default behavior: masked in terminal for shoulder-surfing protection. When you need the real key (to test, verify, or report): `--unmask`, JSON export, or `keys show`. ## Security - Key masking in terminal output by default (first 8 + last 4 chars, middle `***`) - `--unmask` flag to reveal full keys when needed - SQLite database AES-256 encrypted (full keys stored encrypted) - Telegram/Shodan tokens encrypted in config - No key values written to logs during `--verify` - Optional basic auth / token auth for web dashboard ## Rate Limiting & Ethics - GitHub API: 30 req/min (auth), 10 req/min (unauth) - Shodan/Censys: respect API plan limits - Paste sites: 1 req/2sec politeness delay - `--stealth` flag: UA rotation, increased spacing - `--respect-robots`: robots.txt compliance (default: on) ## Error Handling - Verify timeout: 10s default, configurable - Network errors: 3 retries with exponential backoff - Partial results: failed sources don't block others - Graceful degradation on all external dependencies