docs: initialize project
This commit is contained in:
114
.planning/PROJECT.md
Normal file
114
.planning/PROJECT.md
Normal file
@@ -0,0 +1,114 @@
|
|||||||
|
# KeyHunter
|
||||||
|
|
||||||
|
## What This Is
|
||||||
|
|
||||||
|
KeyHunter is a comprehensive, modular API key scanner built in Go, focused on detecting and validating API keys from 108+ LLM/AI providers. It combines native scanning with external tool integration (TruffleHog, Gitleaks), OSINT/recon across 80+ internet sources, a web dashboard, and Telegram bot notifications. Designed for red teams, DevSecOps, bug bounty hunters, and security researchers.
|
||||||
|
|
||||||
|
## Core Value
|
||||||
|
|
||||||
|
Detect leaked LLM API keys across more providers and more internet sources than any other tool, with active verification to confirm keys are real and alive.
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
### Validated
|
||||||
|
|
||||||
|
(None yet — ship to validate)
|
||||||
|
|
||||||
|
### Active
|
||||||
|
|
||||||
|
- [ ] Core scanning engine (regex + entropy + keyword pre-filtering)
|
||||||
|
- [ ] 108 provider YAML definitions with patterns, keywords, verify endpoints
|
||||||
|
- [ ] Plugin-based architecture — providers as YAML, compile-time embedded
|
||||||
|
- [ ] Multiple input sources: file, dir, git history, stdin, URL, clipboard
|
||||||
|
- [ ] Active key verification via `--verify` flag (off by default)
|
||||||
|
- [ ] Full key access: `--unmask`, JSON export, `keys show`, web dashboard, Telegram
|
||||||
|
- [ ] CLI with Cobra: scan, verify, import, recon, keys, serve, dorks, providers, config, hook, schedule
|
||||||
|
- [ ] TruffleHog & Gitleaks JSON import adapters
|
||||||
|
- [ ] OSINT/Recon engine: 80+ sources across 18 categories
|
||||||
|
- [ ] IoT scanners: Shodan, Censys, ZoomEye, FOFA, Netlas, BinaryEdge
|
||||||
|
- [ ] Code hosting: GitHub, GitLab, Bitbucket, Codeberg, Gitea, Replit, CodeSandbox, HuggingFace, Kaggle, etc.
|
||||||
|
- [ ] Search engine dorking: Google, Bing, DuckDuckGo, Yandex, Brave
|
||||||
|
- [ ] Paste site aggregator: Pastebin, dpaste, paste.ee, rentry, hastebin, ix.io, etc.
|
||||||
|
- [ ] Package registries: npm, PyPI, RubyGems, crates.io, Maven, NuGet, Packagist, Go proxy
|
||||||
|
- [ ] Container/infra: Docker Hub layers, K8s configs, Terraform state, Helm, Ansible
|
||||||
|
- [ ] Cloud storage: S3, GCS, Azure Blob, DO Spaces, MinIO, GrayHatWarfare
|
||||||
|
- [ ] CI/CD logs: Travis, CircleCI, GitHub Actions, Jenkins, GitLab CI
|
||||||
|
- [ ] Web archives: Wayback Machine, CommonCrawl
|
||||||
|
- [ ] Forums: StackOverflow, Reddit, HackerNews, dev.to, Medium, Telegram groups, Discord
|
||||||
|
- [ ] Collaboration: Notion, Confluence, Trello, Google Docs
|
||||||
|
- [ ] Frontend/JS: Source maps, webpack bundles, exposed .env, Swagger, deploy previews
|
||||||
|
- [ ] Log aggregators: Elasticsearch/Kibana, Grafana, Sentry
|
||||||
|
- [ ] Threat intel: VirusTotal, IntelX, URLhaus
|
||||||
|
- [ ] Mobile: APK decompile scanning
|
||||||
|
- [ ] DNS/Subdomain: crt.sh, config endpoint probing
|
||||||
|
- [ ] API marketplaces: Postman, SwaggerHub
|
||||||
|
- [ ] Built-in dork engine: 150+ dorks in YAML (GitHub, Google, Shodan, Censys, ZoomEye, FOFA, etc.)
|
||||||
|
- [ ] Web dashboard: htmx + Tailwind + SQLite, scans/keys/recon/providers/dorks/settings pages
|
||||||
|
- [ ] Telegram bot: /scan, /verify, /recon, /status, /stats, /subscribe, /key
|
||||||
|
- [ ] Scheduled scanning: cron-based recurring scans with auto-notify
|
||||||
|
- [ ] Pre-commit hook & CI/CD integration (SARIF output)
|
||||||
|
- [ ] Output formats: table (colored), JSON, SARIF, CSV
|
||||||
|
- [ ] SQLite storage with AES-256 encryption
|
||||||
|
- [ ] Worker pool parallelism, keyword pre-filtering, mmap, delta-based git scanning
|
||||||
|
- [ ] Rate limiting per source, stealth mode, robots.txt respect
|
||||||
|
|
||||||
|
### Out of Scope
|
||||||
|
|
||||||
|
- GUI desktop app — CLI + web dashboard is sufficient
|
||||||
|
- Real-time streaming API — batch scanning is the primary mode
|
||||||
|
- Key rotation/remediation — KeyHunter finds keys, doesn't manage them
|
||||||
|
- Paid SaaS version — open-source tool only
|
||||||
|
- Windows native — Linux/macOS primary, Windows via WSL/Docker
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
- AI-related credential leaks grew 81% YoY in 2025 (GitGuardian report)
|
||||||
|
- 28M credentials leaked on GitHub in 2025
|
||||||
|
- Best existing tools (TruffleHog, Gitleaks) cover at most ~15 LLM providers
|
||||||
|
- No dedicated tool covers 100+ LLM providers with detection + verification + OSINT
|
||||||
|
- LiteLLM supports 107 providers — our 108 provider list covers the market comprehensively
|
||||||
|
- High-confidence key prefixes exist for: OpenAI (sk-proj-), Anthropic (sk-ant-api03-), HuggingFace (hf_), Groq (gsk_), Replicate (r8_), Perplexity (pplx-), Fireworks (fw_), Google AI (AIza)
|
||||||
|
- Many providers (Mistral, Cohere, Together AI, Chinese providers) use generic keys — keyword-based detection needed
|
||||||
|
|
||||||
|
## Constraints
|
||||||
|
|
||||||
|
- **Language**: Go 1.22+ — single binary distribution, performance, TruffleHog/Gitleaks ecosystem alignment
|
||||||
|
- **Architecture**: Plugin-based — providers as YAML files, compile-time embedded via Go embed
|
||||||
|
- **Storage**: SQLite — zero-dependency embedded database, AES-256 encrypted
|
||||||
|
- **Web stack**: htmx + Tailwind CSS — no JS framework dependency, embedded in binary
|
||||||
|
- **CLI framework**: Cobra — industry standard for Go CLIs
|
||||||
|
- **Verification**: Must be opt-in (`--verify` flag) — passive scanning by default for legal safety
|
||||||
|
- **Key masking**: Default masked output, `--unmask` for full keys — shoulder surfing protection
|
||||||
|
|
||||||
|
## Key Decisions
|
||||||
|
|
||||||
|
| Decision | Rationale | Outcome |
|
||||||
|
|----------|-----------|---------|
|
||||||
|
| Go over Python/Rust | Single binary, performance, ecosystem alignment with TruffleHog/Gitleaks | — Pending |
|
||||||
|
| Plugin architecture (YAML providers) | Community extensibility, easy to add providers without recompile | — Pending |
|
||||||
|
| Compile-time embed over runtime plugins | Single binary advantage, no external dependency loading | — Pending |
|
||||||
|
| SQLite over PostgreSQL | Zero dependency, embedded, sufficient for local tool | — Pending |
|
||||||
|
| htmx over React/Vue | Minimal JS, embedded in binary, server-rendered simplicity | — Pending |
|
||||||
|
| Keyword pre-filtering before regex | 10x performance improvement on large codebases (TruffleHog's approach) | — Pending |
|
||||||
|
| YAML dorks alongside YAML providers | Consistent extensibility pattern, community can add dorks same way | — Pending |
|
||||||
|
| Configurable verification (--verify) | Legal safety — passive scanning by default, active only when explicitly requested | — Pending |
|
||||||
|
|
||||||
|
## Evolution
|
||||||
|
|
||||||
|
This document evolves at phase transitions and milestone boundaries.
|
||||||
|
|
||||||
|
**After each phase transition:**
|
||||||
|
1. Requirements invalidated? -> Move to Out of Scope with reason
|
||||||
|
2. Requirements validated? -> Move to Validated with phase reference
|
||||||
|
3. New requirements emerged? -> Add to Active
|
||||||
|
4. Decisions to log? -> Add to Key Decisions
|
||||||
|
5. "What This Is" still accurate? -> Update if drifted
|
||||||
|
|
||||||
|
**After each milestone:**
|
||||||
|
1. Full review of all sections
|
||||||
|
2. Core Value check — still the right priority?
|
||||||
|
3. Audit Out of Scope — reasons still valid?
|
||||||
|
4. Update Context with current state
|
||||||
|
|
||||||
|
---
|
||||||
|
*Last updated: 2026-04-04 after initialization*
|
||||||
Reference in New Issue
Block a user