6.4 KiB
6.4 KiB
KeyHunter
What This Is
KeyHunter is a comprehensive, modular API key scanner built in Go, focused on detecting and validating API keys from 108+ LLM/AI providers. It combines native scanning with external tool integration (TruffleHog, Gitleaks), OSINT/recon across 80+ internet sources, a web dashboard, and Telegram bot notifications. Designed for red teams, DevSecOps, bug bounty hunters, and security researchers.
Core Value
Detect leaked LLM API keys across more providers and more internet sources than any other tool, with active verification to confirm keys are real and alive.
Requirements
Validated
(None yet — ship to validate)
Active
- Core scanning engine (regex + entropy + keyword pre-filtering)
- 108 provider YAML definitions with patterns, keywords, verify endpoints
- Plugin-based architecture — providers as YAML, compile-time embedded
- Multiple input sources: file, dir, git history, stdin, URL, clipboard
- Active key verification via
--verifyflag (off by default) - Full key access:
--unmask, JSON export,keys show, web dashboard, Telegram - CLI with Cobra: scan, verify, import, recon, keys, serve, dorks, providers, config, hook, schedule
- TruffleHog & Gitleaks JSON import adapters
- OSINT/Recon engine: 80+ sources across 18 categories
- IoT scanners: Shodan, Censys, ZoomEye, FOFA, Netlas, BinaryEdge
- Code hosting: GitHub, GitLab, Bitbucket, Codeberg, Gitea, Replit, CodeSandbox, HuggingFace, Kaggle, etc.
- Search engine dorking: Google, Bing, DuckDuckGo, Yandex, Brave
- Paste site aggregator: Pastebin, dpaste, paste.ee, rentry, hastebin, ix.io, etc.
- Package registries: npm, PyPI, RubyGems, crates.io, Maven, NuGet, Packagist, Go proxy
- Container/infra: Docker Hub layers, K8s configs, Terraform state, Helm, Ansible
- Cloud storage: S3, GCS, Azure Blob, DO Spaces, MinIO, GrayHatWarfare
- CI/CD logs: Travis, CircleCI, GitHub Actions, Jenkins, GitLab CI
- Web archives: Wayback Machine, CommonCrawl
- Forums: StackOverflow, Reddit, HackerNews, dev.to, Medium, Telegram groups, Discord
- Collaboration: Notion, Confluence, Trello, Google Docs
- Frontend/JS: Source maps, webpack bundles, exposed .env, Swagger, deploy previews
- Log aggregators: Elasticsearch/Kibana, Grafana, Sentry
- Threat intel: VirusTotal, IntelX, URLhaus
- Mobile: APK decompile scanning
- DNS/Subdomain: crt.sh, config endpoint probing
- API marketplaces: Postman, SwaggerHub
- Built-in dork engine: 150+ dorks in YAML (GitHub, Google, Shodan, Censys, ZoomEye, FOFA, etc.)
- Web dashboard: htmx + Tailwind + SQLite, scans/keys/recon/providers/dorks/settings pages
- Telegram bot: /scan, /verify, /recon, /status, /stats, /subscribe, /key
- Scheduled scanning: cron-based recurring scans with auto-notify
- Pre-commit hook & CI/CD integration (SARIF output)
- Output formats: table (colored), JSON, SARIF, CSV
- SQLite storage with AES-256 encryption
- Worker pool parallelism, keyword pre-filtering, mmap, delta-based git scanning
- Rate limiting per source, stealth mode, robots.txt respect
Out of Scope
- GUI desktop app — CLI + web dashboard is sufficient
- Real-time streaming API — batch scanning is the primary mode
- Key rotation/remediation — KeyHunter finds keys, doesn't manage them
- Paid SaaS version — open-source tool only
- Windows native — Linux/macOS primary, Windows via WSL/Docker
Context
- AI-related credential leaks grew 81% YoY in 2025 (GitGuardian report)
- 28M credentials leaked on GitHub in 2025
- Best existing tools (TruffleHog, Gitleaks) cover at most ~15 LLM providers
- No dedicated tool covers 100+ LLM providers with detection + verification + OSINT
- LiteLLM supports 107 providers — our 108 provider list covers the market comprehensively
- High-confidence key prefixes exist for: OpenAI (sk-proj-), Anthropic (sk-ant-api03-), HuggingFace (hf_), Groq (gsk_), Replicate (r8_), Perplexity (pplx-), Fireworks (fw_), Google AI (AIza)
- Many providers (Mistral, Cohere, Together AI, Chinese providers) use generic keys — keyword-based detection needed
Constraints
- Language: Go 1.22+ — single binary distribution, performance, TruffleHog/Gitleaks ecosystem alignment
- Architecture: Plugin-based — providers as YAML files, compile-time embedded via Go embed
- Storage: SQLite — zero-dependency embedded database, AES-256 encrypted
- Web stack: htmx + Tailwind CSS — no JS framework dependency, embedded in binary
- CLI framework: Cobra — industry standard for Go CLIs
- Verification: Must be opt-in (
--verifyflag) — passive scanning by default for legal safety - Key masking: Default masked output,
--unmaskfor full keys — shoulder surfing protection
Key Decisions
| Decision | Rationale | Outcome |
|---|---|---|
| Go over Python/Rust | Single binary, performance, ecosystem alignment with TruffleHog/Gitleaks | — Pending |
| Plugin architecture (YAML providers) | Community extensibility, easy to add providers without recompile | — Pending |
| Compile-time embed over runtime plugins | Single binary advantage, no external dependency loading | — Pending |
| SQLite over PostgreSQL | Zero dependency, embedded, sufficient for local tool | — Pending |
| htmx over React/Vue | Minimal JS, embedded in binary, server-rendered simplicity | — Pending |
| Keyword pre-filtering before regex | 10x performance improvement on large codebases (TruffleHog's approach) | — Pending |
| YAML dorks alongside YAML providers | Consistent extensibility pattern, community can add dorks same way | — Pending |
| Configurable verification (--verify) | Legal safety — passive scanning by default, active only when explicitly requested | — Pending |
Evolution
This document evolves at phase transitions and milestone boundaries.
After each phase transition:
- Requirements invalidated? -> Move to Out of Scope with reason
- Requirements validated? -> Move to Validated with phase reference
- New requirements emerged? -> Add to Active
- Decisions to log? -> Add to Key Decisions
- "What This Is" still accurate? -> Update if drifted
After each milestone:
- Full review of all sections
- Core Value check — still the right priority?
- Audit Out of Scope — reasons still valid?
- Update Context with current state
Last updated: 2026-04-04 after initialization