# KeyHunter ## What This Is KeyHunter is a comprehensive, modular API key scanner built in Go, focused on detecting and validating API keys from 108+ LLM/AI providers. It combines native scanning with external tool integration (TruffleHog, Gitleaks), OSINT/recon across 80+ internet sources, a web dashboard, and Telegram bot notifications. Designed for red teams, DevSecOps, bug bounty hunters, and security researchers. ## Core Value Detect leaked LLM API keys across more providers and more internet sources than any other tool, with active verification to confirm keys are real and alive. ## Requirements ### Validated (None yet — ship to validate) ### Active - [ ] Core scanning engine (regex + entropy + keyword pre-filtering) - [ ] 108 provider YAML definitions with patterns, keywords, verify endpoints - [ ] Plugin-based architecture — providers as YAML, compile-time embedded - [ ] Multiple input sources: file, dir, git history, stdin, URL, clipboard - [ ] Active key verification via `--verify` flag (off by default) - [ ] Full key access: `--unmask`, JSON export, `keys show`, web dashboard, Telegram - [ ] CLI with Cobra: scan, verify, import, recon, keys, serve, dorks, providers, config, hook, schedule - [ ] TruffleHog & Gitleaks JSON import adapters - [ ] OSINT/Recon engine: 80+ sources across 18 categories - [ ] IoT scanners: Shodan, Censys, ZoomEye, FOFA, Netlas, BinaryEdge - [ ] Code hosting: GitHub, GitLab, Bitbucket, Codeberg, Gitea, Replit, CodeSandbox, HuggingFace, Kaggle, etc. - [ ] Search engine dorking: Google, Bing, DuckDuckGo, Yandex, Brave - [ ] Paste site aggregator: Pastebin, dpaste, paste.ee, rentry, hastebin, ix.io, etc. - [ ] Package registries: npm, PyPI, RubyGems, crates.io, Maven, NuGet, Packagist, Go proxy - [ ] Container/infra: Docker Hub layers, K8s configs, Terraform state, Helm, Ansible - [ ] Cloud storage: S3, GCS, Azure Blob, DO Spaces, MinIO, GrayHatWarfare - [ ] CI/CD logs: Travis, CircleCI, GitHub Actions, Jenkins, GitLab CI - [ ] Web archives: Wayback Machine, CommonCrawl - [ ] Forums: StackOverflow, Reddit, HackerNews, dev.to, Medium, Telegram groups, Discord - [ ] Collaboration: Notion, Confluence, Trello, Google Docs - [ ] Frontend/JS: Source maps, webpack bundles, exposed .env, Swagger, deploy previews - [ ] Log aggregators: Elasticsearch/Kibana, Grafana, Sentry - [ ] Threat intel: VirusTotal, IntelX, URLhaus - [ ] Mobile: APK decompile scanning - [ ] DNS/Subdomain: crt.sh, config endpoint probing - [ ] API marketplaces: Postman, SwaggerHub - [ ] Built-in dork engine: 150+ dorks in YAML (GitHub, Google, Shodan, Censys, ZoomEye, FOFA, etc.) - [ ] Web dashboard: htmx + Tailwind + SQLite, scans/keys/recon/providers/dorks/settings pages - [ ] Telegram bot: /scan, /verify, /recon, /status, /stats, /subscribe, /key - [ ] Scheduled scanning: cron-based recurring scans with auto-notify - [ ] Pre-commit hook & CI/CD integration (SARIF output) - [ ] Output formats: table (colored), JSON, SARIF, CSV - [ ] SQLite storage with AES-256 encryption - [ ] Worker pool parallelism, keyword pre-filtering, mmap, delta-based git scanning - [ ] Rate limiting per source, stealth mode, robots.txt respect ### Out of Scope - GUI desktop app — CLI + web dashboard is sufficient - Real-time streaming API — batch scanning is the primary mode - Key rotation/remediation — KeyHunter finds keys, doesn't manage them - Paid SaaS version — open-source tool only - Windows native — Linux/macOS primary, Windows via WSL/Docker ## Context - AI-related credential leaks grew 81% YoY in 2025 (GitGuardian report) - 28M credentials leaked on GitHub in 2025 - Best existing tools (TruffleHog, Gitleaks) cover at most ~15 LLM providers - No dedicated tool covers 100+ LLM providers with detection + verification + OSINT - LiteLLM supports 107 providers — our 108 provider list covers the market comprehensively - High-confidence key prefixes exist for: OpenAI (sk-proj-), Anthropic (sk-ant-api03-), HuggingFace (hf_), Groq (gsk_), Replicate (r8_), Perplexity (pplx-), Fireworks (fw_), Google AI (AIza) - Many providers (Mistral, Cohere, Together AI, Chinese providers) use generic keys — keyword-based detection needed ## Constraints - **Language**: Go 1.22+ — single binary distribution, performance, TruffleHog/Gitleaks ecosystem alignment - **Architecture**: Plugin-based — providers as YAML files, compile-time embedded via Go embed - **Storage**: SQLite — zero-dependency embedded database, AES-256 encrypted - **Web stack**: htmx + Tailwind CSS — no JS framework dependency, embedded in binary - **CLI framework**: Cobra — industry standard for Go CLIs - **Verification**: Must be opt-in (`--verify` flag) — passive scanning by default for legal safety - **Key masking**: Default masked output, `--unmask` for full keys — shoulder surfing protection ## Key Decisions | Decision | Rationale | Outcome | |----------|-----------|---------| | Go over Python/Rust | Single binary, performance, ecosystem alignment with TruffleHog/Gitleaks | — Pending | | Plugin architecture (YAML providers) | Community extensibility, easy to add providers without recompile | — Pending | | Compile-time embed over runtime plugins | Single binary advantage, no external dependency loading | — Pending | | SQLite over PostgreSQL | Zero dependency, embedded, sufficient for local tool | — Pending | | htmx over React/Vue | Minimal JS, embedded in binary, server-rendered simplicity | — Pending | | Keyword pre-filtering before regex | 10x performance improvement on large codebases (TruffleHog's approach) | — Pending | | YAML dorks alongside YAML providers | Consistent extensibility pattern, community can add dorks same way | — Pending | | Configurable verification (--verify) | Legal safety — passive scanning by default, active only when explicitly requested | — Pending | ## Evolution This document evolves at phase transitions and milestone boundaries. **After each phase transition:** 1. Requirements invalidated? -> Move to Out of Scope with reason 2. Requirements validated? -> Move to Validated with phase reference 3. New requirements emerged? -> Add to Active 4. Decisions to log? -> Add to Key Decisions 5. "What This Is" still accurate? -> Update if drifted **After each milestone:** 1. Full review of all sections 2. Core Value check — still the right priority? 3. Audit Out of Scope — reasons still valid? 4. Update Context with current state --- *Last updated: 2026-04-04 after initialization*