23 KiB
KeyHunter - Design Specification
Overview
KeyHunter is a comprehensive, modular API key scanner built in Go, focused on detecting and validating API keys from 100+ LLM/AI providers. It combines native scanning capabilities with external tool integration (TruffleHog, Gitleaks), OSINT/recon modules, a web dashboard, and Telegram bot notifications.
Architecture
Approach: Plugin-based architecture. Core scanner engine with providers defined as YAML files (compile-time embedded). Single binary distribution.
Directory Structure
keyhunter/
├── cmd/keyhunter/ # CLI entrypoint (cobra)
├── pkg/
│ ├── engine/ # Core scanning engine
│ │ ├── scanner.go # Orchestrator - input alir, provider'lari calistirir
│ │ ├── matcher.go # Regex + entropy matching
│ │ └── verifier.go # Active key verification (--verify flag)
│ ├── provider/ # Provider registry & loader
│ │ ├── registry.go # Provider'lari yukler ve yonetir
│ │ ├── types.go # Provider interface tanimlari
│ │ └── builtin/ # Compile-time embedded provider YAML'lari
│ ├── input/ # Input source adapters
│ │ ├── file.go # Dosya/dizin tarama
│ │ ├── git.go # Git history/diff tarama
│ │ ├── stdin.go # Pipe/stdin destegi
│ │ ├── url.go # URL fetch
│ │ └── remote.go # GitHub/GitLab API, paste siteleri
│ ├── output/ # Output formatters
│ │ ├── table.go # Renkli terminal tablo
│ │ ├── json.go # JSON export
│ │ ├── sarif.go # SARIF (CI/CD uyumlu)
│ │ └── csv.go # CSV export
│ ├── adapter/ # External tool parsers
│ │ ├── trufflehog.go # TruffleHog JSON output parser
│ │ └── gitleaks.go # Gitleaks JSON output parser
│ ├── recon/ # OSINT/Recon engine (80+ sources)
│ │ ├── engine.go # Recon orchestrator
│ │ ├── ratelimit.go # Rate limiting & politeness
│ │ │
│ │ │ # --- IoT & Internet Search Engines ---
│ │ ├── shodan.go # Shodan API client
│ │ ├── censys.go # Censys API client
│ │ ├── zoomeye.go # ZoomEye (Chinese IoT scanner)
│ │ ├── fofa.go # FOFA (Chinese IoT scanner)
│ │ ├── netlas.go # Netlas.io (HTTP body search)
│ │ ├── binaryedge.go # BinaryEdge scanner
│ │ │
│ │ │ # --- Code Hosting & Snippets ---
│ │ ├── github.go # GitHub code search / dorks
│ │ ├── gitlab.go # GitLab search
│ │ ├── gist.go # GitHub Gist search
│ │ ├── bitbucket.go # Bitbucket code search
│ │ ├── codeberg.go # Codeberg/Gitea search
│ │ ├── gitea.go # Self-hosted Gitea instances
│ │ ├── replit.go # Replit public repls
│ │ ├── codesandbox.go # CodeSandbox projects
│ │ ├── stackblitz.go # StackBlitz projects
│ │ ├── codepen.go # CodePen pens
│ │ ├── jsfiddle.go # JSFiddle snippets
│ │ ├── glitch.go # Glitch public projects
│ │ ├── observable.go # Observable notebooks
│ │ ├── huggingface.go # HuggingFace Spaces/repos
│ │ ├── kaggle.go # Kaggle notebooks/datasets
│ │ ├── jupyter.go # nbviewer / Jupyter notebooks
│ │ ├── gitpod.go # Gitpod workspace snapshots
│ │ │
│ │ │ # --- Search Engine Dorking ---
│ │ ├── google.go # Google Custom Search / SerpAPI dorking
│ │ ├── bing.go # Bing Web Search API dorking
│ │ ├── duckduckgo.go # DuckDuckGo search
│ │ ├── yandex.go # Yandex XML Search
│ │ ├── brave.go # Brave Search API
│ │ │
│ │ │ # --- Paste Sites ---
│ │ ├── paste.go # Multi-paste aggregator (pastebin, dpaste, paste.ee, rentry, hastebin, ix.io, etc.)
│ │ │
│ │ │ # --- Package Registries ---
│ │ ├── npm.go # npm registry scanning
│ │ ├── pypi.go # PyPI package scanning
│ │ ├── rubygems.go # RubyGems scanning
│ │ ├── crates.go # crates.io (Rust)
│ │ ├── maven.go # Maven Central (Java)
│ │ ├── nuget.go # NuGet (.NET)
│ │ ├── packagist.go # Packagist (PHP)
│ │ ├── goproxy.go # Go module proxy
│ │ │
│ │ │ # --- Container & Infra ---
│ │ ├── docker.go # Docker Hub image/layer scanning
│ │ ├── kubernetes.go # Exposed K8s dashboards & configs
│ │ ├── terraform.go # Terraform state files & registry
│ │ ├── helm.go # Artifact Hub / Helm charts
│ │ ├── ansible.go # Ansible Galaxy collections
│ │ │
│ │ │ # --- Cloud Storage ---
│ │ ├── s3.go # AWS S3 bucket enumeration
│ │ ├── gcs.go # Google Cloud Storage buckets
│ │ ├── azureblob.go # Azure Blob Storage
│ │ ├── spaces.go # DigitalOcean Spaces
│ │ ├── backblaze.go # Backblaze B2
│ │ ├── minio.go # Self-hosted MinIO instances
│ │ ├── grayhat.go # GrayHatWarfare (bucket search engine)
│ │ │
│ │ │ # --- CI/CD Log Leaks ---
│ │ ├── travisci.go # Travis CI public build logs
│ │ ├── circleci.go # CircleCI build logs
│ │ ├── ghactions.go # GitHub Actions workflow logs
│ │ ├── jenkins.go # Exposed Jenkins instances
│ │ ├── gitlabci.go # GitLab CI/CD pipeline logs
│ │ │
│ │ │ # --- Web Archives ---
│ │ ├── wayback.go # Wayback Machine CDX API
│ │ ├── commoncrawl.go # CommonCrawl index & WARC
│ │ │
│ │ │ # --- Forums & Documentation ---
│ │ ├── stackoverflow.go # Stack Overflow / Stack Exchange API
│ │ ├── reddit.go # Reddit search
│ │ ├── hackernews.go # HN Algolia API
│ │ ├── devto.go # dev.to articles
│ │ ├── medium.go # Medium articles
│ │ ├── telegram_recon.go # Telegram public channels
│ │ ├── discord.go # Discord indexed content
│ │ │
│ │ │ # --- Collaboration Tools ---
│ │ ├── notion.go # Notion public pages
│ │ ├── confluence.go # Confluence public spaces
│ │ ├── trello.go # Trello public boards
│ │ ├── googledocs.go # Google Docs/Sheets public
│ │ │
│ │ │ # --- Frontend & JS Leaks ---
│ │ ├── sourcemaps.go # JS source map extraction
│ │ ├── webpack.go # Webpack/Vite bundle scanning
│ │ ├── dotenv_web.go # Exposed .env files on web servers
│ │ ├── swagger.go # Exposed Swagger/OpenAPI docs
│ │ ├── deploys.go # Vercel/Netlify preview deployments
│ │ │
│ │ │ # --- Log Aggregators ---
│ │ ├── elasticsearch.go # Exposed Elasticsearch/Kibana
│ │ ├── grafana.go # Exposed Grafana dashboards
│ │ ├── sentry.go # Exposed Sentry instances
│ │ │
│ │ │ # --- Threat Intelligence ---
│ │ ├── virustotal.go # VirusTotal file/URL search
│ │ ├── intelx.go # Intelligence X aggregated search
│ │ ├── urlhaus.go # URLhaus abuse.ch
│ │ │
│ │ │ # --- Mobile Apps ---
│ │ ├── apk.go # APK download & decompile scanning
│ │ │
│ │ │ # --- DNS/Subdomain ---
│ │ ├── crtsh.go # Certificate Transparency (crt.sh)
│ │ ├── subdomain.go # Subdomain config endpoint probing
│ │ │
│ │ │ # --- API Marketplaces ---
│ │ ├── postman.go # Postman public collections/workspaces
│ │ ├── swaggerhub.go # SwaggerHub published APIs
│ │ └── rapidapi.go # RapidAPI public endpoints
│ │
│ ├── dorks/ # Dork management
│ │ ├── loader.go # YAML dork loader
│ │ ├── runner.go # Dork execution engine
│ │ └── builtin/ # Embedded dork YAML'lari
│ ├── notify/ # Notification modulleri
│ │ ├── telegram.go # Telegram bot
│ │ ├── webhook.go # Generic webhook
│ │ └── slack.go # Slack
│ └── web/ # Web dashboard
│ ├── server.go # Embedded HTTP server
│ ├── api.go # REST API
│ └── static/ # Frontend assets (htmx + tailwind)
├── providers/ # Provider YAML definitions (embed edilir)
│ ├── openai.yaml
│ ├── anthropic.yaml
│ └── ... (108 provider)
├── dorks/ # Dork YAML definitions (embed edilir)
│ ├── github.yaml # GitHub code search dorks
│ ├── gitlab.yaml # GitLab search dorks
│ ├── shodan.yaml # Shodan IoT dorks
│ ├── censys.yaml # Censys dorks
│ ├── zoomeye.yaml # ZoomEye dorks
│ ├── fofa.yaml # FOFA dorks
│ ├── google.yaml # Google dorking queries
│ ├── bing.yaml # Bing dorking queries
│ └── generic.yaml # Multi-source keyword dorks
├── configs/ # Ornek config dosyalari
└── docs/
Data Flow
Input Source -> Scanner Engine -> Provider Matcher -> (optional) Verifier -> Output Formatter + Notifier
-> SQLite DB (persist)
-> Web Dashboard (serve)
Provider YAML Schema
id: string # Unique provider ID
name: string # Display name
category: enum # frontier | mid-tier | emerging | chinese | infrastructure | gateway | self-hosted
website: string # API base URL
confidence: enum # high | medium | low
patterns:
- id: string # Unique pattern ID
name: string # Human-readable name
regex: string # Detection regex
confidence: enum # high | medium | low
description: string # Pattern description
keywords: []string # Pre-filtering keywords (performance optimization)
verify:
enabled: bool
method: string # HTTP method
url: string # Verification endpoint
headers: map # Headers with {{key}} template
success_codes: []int
failure_codes: []int
extract: # Additional info extraction on success
- field: string
path: string # JSON path
metadata:
docs: string # API docs URL
key_url: string # Key management URL
env_vars: []string # Common environment variable names
revoke_url: string # Key revocation URL
CLI Command Structure
Core Commands
# Scanning
keyhunter scan path <dir>
keyhunter scan file <file>
keyhunter scan git <repo> [--since=<duration>]
keyhunter scan stdin
keyhunter scan url <url>
keyhunter scan clipboard
# Verification
keyhunter verify <key>
keyhunter verify --file <keyfile>
# External Tool Import
keyhunter import trufflehog <json>
keyhunter import gitleaks <json>
keyhunter import generic --format=csv <file>
# OSINT/Recon — IoT & Internet Scanners
keyhunter recon shodan [--query|--dork]
keyhunter recon censys [--query]
keyhunter recon zoomeye [--query]
keyhunter recon fofa [--query]
keyhunter recon netlas [--query]
keyhunter recon binaryedge [--query]
# OSINT/Recon — Code Hosting & Snippets
keyhunter recon github [--dork=auto|custom]
keyhunter recon gitlab [--dork=auto|custom]
keyhunter recon gist [--query]
keyhunter recon bitbucket [--query|--workspace]
keyhunter recon codeberg [--query]
keyhunter recon gitea [--instances-from=shodan|file]
keyhunter recon replit [--query]
keyhunter recon codesandbox [--query]
keyhunter recon stackblitz [--query]
keyhunter recon codepen [--query]
keyhunter recon jsfiddle [--query]
keyhunter recon glitch [--query]
keyhunter recon huggingface [--query|--spaces|--repos]
keyhunter recon kaggle [--query|--notebooks]
keyhunter recon jupyter [--query]
keyhunter recon observable [--query]
# OSINT/Recon — Search Engine Dorking
keyhunter recon google [--dork=auto|custom]
keyhunter recon bing [--dork=auto|custom]
keyhunter recon duckduckgo [--query]
keyhunter recon yandex [--query]
keyhunter recon brave [--query]
# OSINT/Recon — Paste Sites
keyhunter recon paste [--sources=pastebin,dpaste,paste.ee,rentry,hastebin,ix.io,all]
# OSINT/Recon — Package Registries
keyhunter recon npm [--query|--recent]
keyhunter recon pypi [--query|--recent]
keyhunter recon rubygems [--query]
keyhunter recon crates [--query]
keyhunter recon maven [--query]
keyhunter recon nuget [--query]
keyhunter recon packagist [--query]
keyhunter recon goproxy [--query]
# OSINT/Recon — Container & Infrastructure
keyhunter recon docker [--query|--image|--layers]
keyhunter recon kubernetes [--shodan|--github]
keyhunter recon terraform [--github|--registry]
keyhunter recon helm [--query]
keyhunter recon ansible [--query]
# OSINT/Recon — Cloud Storage
keyhunter recon s3 [--wordlist|--domain]
keyhunter recon gcs [--wordlist|--domain]
keyhunter recon azure [--wordlist|--domain]
keyhunter recon spaces [--wordlist]
keyhunter recon minio [--shodan]
keyhunter recon grayhat [--query] # GrayHatWarfare bucket search
# OSINT/Recon — CI/CD Logs
keyhunter recon travis [--org|--repo]
keyhunter recon circleci [--org|--repo]
keyhunter recon ghactions [--org|--repo]
keyhunter recon jenkins [--shodan|--url]
keyhunter recon gitlabci [--project]
# OSINT/Recon — Web Archives
keyhunter recon wayback [--domain|--url]
keyhunter recon commoncrawl [--domain|--pattern]
# OSINT/Recon — Forums & Documentation
keyhunter recon stackoverflow [--query]
keyhunter recon reddit [--query|--subreddit]
keyhunter recon hackernews [--query]
keyhunter recon devto [--query|--tag]
keyhunter recon medium [--query]
keyhunter recon telegram-groups [--channel|--query]
# OSINT/Recon — Collaboration Tools
keyhunter recon notion [--query] # Google dorking
keyhunter recon confluence [--shodan|--url]
keyhunter recon trello [--query]
keyhunter recon googledocs [--query] # Google dorking
# OSINT/Recon — Frontend & JS Leaks
keyhunter recon sourcemaps [--domain|--url]
keyhunter recon webpack [--domain|--url]
keyhunter recon dotenv [--domain-list|--url] # Exposed .env files
keyhunter recon swagger [--shodan|--domain]
keyhunter recon deploys [--domain] # Vercel/Netlify previews
# OSINT/Recon — Log Aggregators
keyhunter recon elasticsearch [--shodan|--url]
keyhunter recon grafana [--shodan|--url]
keyhunter recon sentry [--shodan|--url]
# OSINT/Recon — Threat Intelligence
keyhunter recon virustotal [--query]
keyhunter recon intelx [--query]
keyhunter recon urlhaus [--query]
# OSINT/Recon — Mobile Apps
keyhunter recon apk [--package|--query|--file]
# OSINT/Recon — DNS/Subdomain
keyhunter recon crtsh [--domain]
keyhunter recon subdomain [--domain] [--probe-configs]
# OSINT/Recon — API Marketplaces
keyhunter recon postman [--query|--workspace]
keyhunter recon swaggerhub [--query]
# OSINT/Recon — Full Sweep
keyhunter recon full [--providers] [--categories=all|code|cloud|forums|cicd|...]
# Dork Management
keyhunter dorks list [--source]
keyhunter dorks add <source> <query>
keyhunter dorks run <source> [--category]
keyhunter dorks export
# Key Management (full key access)
keyhunter keys list [--unmask] [--provider=X] [--status=active|revoked]
keyhunter keys show <id>
keyhunter keys export --format=json|csv
keyhunter keys copy <id>
keyhunter keys verify <id>
keyhunter keys delete <id>
# Provider Management
keyhunter providers list [--category]
keyhunter providers info <id>
keyhunter providers stats
# Web Dashboard & Telegram
keyhunter serve [--port] [--telegram]
# Scheduled Scanning
keyhunter schedule add --name --cron --command --notify
keyhunter schedule list
keyhunter schedule remove <name>
# Config & Hooks
keyhunter config init
keyhunter config set <key> <value>
keyhunter hook install
keyhunter hook uninstall
Scan Flags
--providers=<list> Filter by provider IDs
--category=<cat> Filter by provider category
--confidence=<level> Minimum confidence level
--exclude=<patterns> Exclude file patterns
--verify Enable active key verification
--verify-timeout=<dur> Verification timeout (default: 10s)
--workers=<n> Parallel workers (default: CPU count)
--output=<format> Output format: table|json|sarif|csv
--unmask Show full API keys without masking (default: masked)
--notify=<channel> Send results to: telegram|webhook|slack
--stealth Stealth mode: UA rotation, increased delays
--respect-robots Respect robots.txt (default: true)
Exit Codes
0— Clean, no keys found1— Keys found2— Error
Dork YAML Schema
source: string # github | gitlab | shodan | censys
dorks:
- id: string
query: string # Search query
description: string
providers: []string # Optional: related provider IDs
Built-in dork categories: GitHub (code search, filename, language), GitLab (snippets, projects), Shodan (exposed proxies, dashboards), Censys (HTTP body search).
Web Dashboard
Stack: Go embed + htmx + Tailwind CSS (zero JS framework dependency)
Pages:
/— Dashboard overview with summary statistics/scans— Scan history list/scans/:id— Scan detail with found keys/keys— All found keys (filterable table)/keys/:id— Key detail (provider, confidence, verify status)/recon— OSINT scan launcher and results/providers— Provider list and statistics/dorks— Dork management/settings— Configuration (tokens, API keys)/api/v1/*— REST API for programmatic access
Storage: SQLite (embedded, AES-256 encrypted)
Telegram Bot
Commands:
/scan <url/path>— Remote scan trigger/verify <key>— Key verification/recon github <dork>— GitHub dork execution/status— Active scan status/stats— General statistics/subscribe— Auto-notification on new key findings/unsubscribe— Disable notifications/providers— Provider list/help— Help
Auto-notifications: New key found, recon complete, scheduled scan results, verify results.
LLM Provider Coverage (108 Providers)
Tier 1 — Frontier (12)
OpenAI, Anthropic, Google AI (Gemini), Google Vertex AI, AWS Bedrock, Azure OpenAI, Meta AI (Llama API), xAI (Grok), Cohere, Mistral AI, Inflection AI, AI21 Labs
Tier 2 — Inference Platforms (14)
Together AI, Fireworks AI, Groq, Replicate, Anyscale, DeepInfra, Lepton AI, Modal, Baseten, Cerebrium, NovitaAI, Sambanova, OctoAI, Friendli AI
Tier 3 — Specialized/Vertical (12)
Perplexity, You.com, Voyage AI, Jina AI, Unstructured, AssemblyAI, Deepgram, ElevenLabs, Stability AI, Runway ML, Midjourney, HuggingFace
Tier 4 — Chinese/Regional (16)
DeepSeek, Baichuan, Zhipu AI (GLM), Moonshot AI (Kimi), Yi (01.AI), Qwen (Alibaba Cloud), Baidu (ERNIE/Wenxin), ByteDance (Doubao), SenseTime, iFlytek (Spark), MiniMax, Stepfun, 360 AI, Kuaishou (Kling), Tencent Hunyuan, SiliconFlow
Tier 5 — Infrastructure/Gateway (11)
Cloudflare AI, Vercel AI, LiteLLM, Portkey, Helicone, OpenRouter, Martian, AI Gateway (Kong), BricksAI, Aether, Not Diamond
Tier 6 — Emerging/Niche (15)
Reka AI, Aleph Alpha, Writer, Jasper AI, Typeface, Comet ML, Weights & Biases, LangSmith (LangChain), Pinecone, Weaviate, Qdrant, Chroma, Milvus, Neon AI, Lamini
Tier 7 — Code & Dev Tools (10)
GitHub Copilot, Cursor, Tabnine, Codeium/Windsurf, Sourcegraph Cody, Amazon CodeWhisperer, Replit AI, Codestral (Mistral), IBM watsonx.ai, Oracle AI
Tier 8 — Self-Hosted/Open Infra (10)
Ollama, vLLM, LocalAI, LM Studio, llama.cpp, GPT4All, text-generation-webui, TensorRT-LLM, Triton Inference Server, Jan AI
Tier 9 — Enterprise/Legacy (8)
Salesforce Einstein, ServiceNow AI, SAP AI Core, Palantir AIP, Databricks (DBRX), Snowflake Cortex, Oracle Generative AI, HPE GreenLake AI
Performance
- Worker pool: parallel scanning (default: CPU count, configurable via
--workers=N) - Keyword pre-filtering before regex (10x speedup on large files)
mmapfor large file reading- Delta-based git scanning (only changed files between commits)
- Source-based rate limiting in recon module
Key Visibility & Access
Full (unmasked) API keys are accessible through multiple channels:
- CLI
--unmaskflag —keyhunter scan path . --unmaskshows full keys in terminal table - JSON/CSV/SARIF export — Always contains full keys:
keyhunter scan path . -o json keyhunter keyscommand — Dedicated key management:keyhunter keys list— all found keys (masked by default)keyhunter keys list --unmask— all found keys (full)keyhunter keys show <id>— single key full detail (always unmasked)keyhunter keys export --format=json— export all keys with full valueskeyhunter keys copy <id>— copy full key to clipboardkeyhunter keys verify <id>— verify and show full detail
- Web Dashboard —
/keys/:iddetail page with "Reveal Key" toggle button (auth required) - Telegram Bot —
/key <id>returns full key detail in private chat - SQLite DB — Full keys always stored (encrypted), queryable via API
Default behavior: masked in terminal for shoulder-surfing protection.
When you need the real key (to test, verify, or report): --unmask, JSON export, or keys show.
Security
- Key masking in terminal output by default (first 8 + last 4 chars, middle
***) --unmaskflag to reveal full keys when needed- SQLite database AES-256 encrypted (full keys stored encrypted)
- Telegram/Shodan tokens encrypted in config
- No key values written to logs during
--verify - Optional basic auth / token auth for web dashboard
Rate Limiting & Ethics
- GitHub API: 30 req/min (auth), 10 req/min (unauth)
- Shodan/Censys: respect API plan limits
- Paste sites: 1 req/2sec politeness delay
--stealthflag: UA rotation, increased spacing--respect-robots: robots.txt compliance (default: on)
Error Handling
- Verify timeout: 10s default, configurable
- Network errors: 3 retries with exponential backoff
- Partial results: failed sources don't block others
- Graceful degradation on all external dependencies