From c8e744cb48a66dbe1a3f50a5f5beee649b8bf01d Mon Sep 17 00:00:00 2001 From: salvacybersec Date: Sat, 4 Apr 2026 18:54:39 +0300 Subject: [PATCH] docs: initialize project --- .planning/PROJECT.md | 114 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 .planning/PROJECT.md diff --git a/.planning/PROJECT.md b/.planning/PROJECT.md new file mode 100644 index 0000000..4fcbdfa --- /dev/null +++ b/.planning/PROJECT.md @@ -0,0 +1,114 @@ +# KeyHunter + +## What This Is + +KeyHunter is a comprehensive, modular API key scanner built in Go, focused on detecting and validating API keys from 108+ LLM/AI providers. It combines native scanning with external tool integration (TruffleHog, Gitleaks), OSINT/recon across 80+ internet sources, a web dashboard, and Telegram bot notifications. Designed for red teams, DevSecOps, bug bounty hunters, and security researchers. + +## Core Value + +Detect leaked LLM API keys across more providers and more internet sources than any other tool, with active verification to confirm keys are real and alive. + +## Requirements + +### Validated + +(None yet — ship to validate) + +### Active + +- [ ] Core scanning engine (regex + entropy + keyword pre-filtering) +- [ ] 108 provider YAML definitions with patterns, keywords, verify endpoints +- [ ] Plugin-based architecture — providers as YAML, compile-time embedded +- [ ] Multiple input sources: file, dir, git history, stdin, URL, clipboard +- [ ] Active key verification via `--verify` flag (off by default) +- [ ] Full key access: `--unmask`, JSON export, `keys show`, web dashboard, Telegram +- [ ] CLI with Cobra: scan, verify, import, recon, keys, serve, dorks, providers, config, hook, schedule +- [ ] TruffleHog & Gitleaks JSON import adapters +- [ ] OSINT/Recon engine: 80+ sources across 18 categories +- [ ] IoT scanners: Shodan, Censys, ZoomEye, FOFA, Netlas, BinaryEdge +- [ ] Code hosting: GitHub, GitLab, Bitbucket, Codeberg, Gitea, Replit, CodeSandbox, HuggingFace, Kaggle, etc. +- [ ] Search engine dorking: Google, Bing, DuckDuckGo, Yandex, Brave +- [ ] Paste site aggregator: Pastebin, dpaste, paste.ee, rentry, hastebin, ix.io, etc. +- [ ] Package registries: npm, PyPI, RubyGems, crates.io, Maven, NuGet, Packagist, Go proxy +- [ ] Container/infra: Docker Hub layers, K8s configs, Terraform state, Helm, Ansible +- [ ] Cloud storage: S3, GCS, Azure Blob, DO Spaces, MinIO, GrayHatWarfare +- [ ] CI/CD logs: Travis, CircleCI, GitHub Actions, Jenkins, GitLab CI +- [ ] Web archives: Wayback Machine, CommonCrawl +- [ ] Forums: StackOverflow, Reddit, HackerNews, dev.to, Medium, Telegram groups, Discord +- [ ] Collaboration: Notion, Confluence, Trello, Google Docs +- [ ] Frontend/JS: Source maps, webpack bundles, exposed .env, Swagger, deploy previews +- [ ] Log aggregators: Elasticsearch/Kibana, Grafana, Sentry +- [ ] Threat intel: VirusTotal, IntelX, URLhaus +- [ ] Mobile: APK decompile scanning +- [ ] DNS/Subdomain: crt.sh, config endpoint probing +- [ ] API marketplaces: Postman, SwaggerHub +- [ ] Built-in dork engine: 150+ dorks in YAML (GitHub, Google, Shodan, Censys, ZoomEye, FOFA, etc.) +- [ ] Web dashboard: htmx + Tailwind + SQLite, scans/keys/recon/providers/dorks/settings pages +- [ ] Telegram bot: /scan, /verify, /recon, /status, /stats, /subscribe, /key +- [ ] Scheduled scanning: cron-based recurring scans with auto-notify +- [ ] Pre-commit hook & CI/CD integration (SARIF output) +- [ ] Output formats: table (colored), JSON, SARIF, CSV +- [ ] SQLite storage with AES-256 encryption +- [ ] Worker pool parallelism, keyword pre-filtering, mmap, delta-based git scanning +- [ ] Rate limiting per source, stealth mode, robots.txt respect + +### Out of Scope + +- GUI desktop app — CLI + web dashboard is sufficient +- Real-time streaming API — batch scanning is the primary mode +- Key rotation/remediation — KeyHunter finds keys, doesn't manage them +- Paid SaaS version — open-source tool only +- Windows native — Linux/macOS primary, Windows via WSL/Docker + +## Context + +- AI-related credential leaks grew 81% YoY in 2025 (GitGuardian report) +- 28M credentials leaked on GitHub in 2025 +- Best existing tools (TruffleHog, Gitleaks) cover at most ~15 LLM providers +- No dedicated tool covers 100+ LLM providers with detection + verification + OSINT +- LiteLLM supports 107 providers — our 108 provider list covers the market comprehensively +- High-confidence key prefixes exist for: OpenAI (sk-proj-), Anthropic (sk-ant-api03-), HuggingFace (hf_), Groq (gsk_), Replicate (r8_), Perplexity (pplx-), Fireworks (fw_), Google AI (AIza) +- Many providers (Mistral, Cohere, Together AI, Chinese providers) use generic keys — keyword-based detection needed + +## Constraints + +- **Language**: Go 1.22+ — single binary distribution, performance, TruffleHog/Gitleaks ecosystem alignment +- **Architecture**: Plugin-based — providers as YAML files, compile-time embedded via Go embed +- **Storage**: SQLite — zero-dependency embedded database, AES-256 encrypted +- **Web stack**: htmx + Tailwind CSS — no JS framework dependency, embedded in binary +- **CLI framework**: Cobra — industry standard for Go CLIs +- **Verification**: Must be opt-in (`--verify` flag) — passive scanning by default for legal safety +- **Key masking**: Default masked output, `--unmask` for full keys — shoulder surfing protection + +## Key Decisions + +| Decision | Rationale | Outcome | +|----------|-----------|---------| +| Go over Python/Rust | Single binary, performance, ecosystem alignment with TruffleHog/Gitleaks | — Pending | +| Plugin architecture (YAML providers) | Community extensibility, easy to add providers without recompile | — Pending | +| Compile-time embed over runtime plugins | Single binary advantage, no external dependency loading | — Pending | +| SQLite over PostgreSQL | Zero dependency, embedded, sufficient for local tool | — Pending | +| htmx over React/Vue | Minimal JS, embedded in binary, server-rendered simplicity | — Pending | +| Keyword pre-filtering before regex | 10x performance improvement on large codebases (TruffleHog's approach) | — Pending | +| YAML dorks alongside YAML providers | Consistent extensibility pattern, community can add dorks same way | — Pending | +| Configurable verification (--verify) | Legal safety — passive scanning by default, active only when explicitly requested | — Pending | + +## Evolution + +This document evolves at phase transitions and milestone boundaries. + +**After each phase transition:** +1. Requirements invalidated? -> Move to Out of Scope with reason +2. Requirements validated? -> Move to Validated with phase reference +3. New requirements emerged? -> Add to Active +4. Decisions to log? -> Add to Key Decisions +5. "What This Is" still accurate? -> Update if drifted + +**After each milestone:** +1. Full review of all sections +2. Core Value check — still the right priority? +3. Audit Out of Scope — reasons still valid? +4. Update Context with current state + +--- +*Last updated: 2026-04-04 after initialization*