- SUMMARY.md with 28-source RegisterAll wiring and integration test - STATE.md, ROADMAP.md, REQUIREMENTS.md updated
332 lines
17 KiB
Markdown
332 lines
17 KiB
Markdown
# Requirements: KeyHunter
|
|
|
|
**Defined:** 2026-04-04
|
|
**Core Value:** Detect leaked LLM API keys across more providers and more internet sources than any other tool, with active verification to confirm keys are real and alive.
|
|
|
|
## v1 Requirements
|
|
|
|
Requirements for initial release. Each maps to roadmap phases.
|
|
|
|
### Core Engine
|
|
|
|
- [x] **CORE-01**: Scanner engine detects API keys using keyword pre-filtering + regex matching pipeline
|
|
- [x] **CORE-02**: Provider definitions loaded from YAML files embedded at compile time via Go embed
|
|
- [x] **CORE-03**: Provider registry manages 108+ provider definitions with pattern, keyword, confidence, and verify metadata
|
|
- [x] **CORE-04**: Entropy analysis as secondary signal for low-confidence providers (generic key formats)
|
|
- [x] **CORE-05**: Worker pool parallelism with configurable worker count (default: CPU count)
|
|
- [x] **CORE-06**: Aho-Corasick keyword pre-filter runs before regex for 10x performance on large files
|
|
- [x] **CORE-07**: mmap-based large file reading for memory efficiency
|
|
|
|
### Providers
|
|
|
|
- [x] **PROV-01**: 12 Tier 1 Frontier provider YAML definitions (OpenAI, Anthropic, Google AI, Vertex, AWS Bedrock, Azure OpenAI, Meta AI, xAI, Cohere, Mistral, Inflection, AI21)
|
|
- [x] **PROV-02**: 14 Tier 2 Inference Platform provider definitions (Together, Fireworks, Groq, Replicate, Anyscale, DeepInfra, Lepton, Modal, Baseten, Cerebrium, NovitaAI, Sambanova, OctoAI, Friendli)
|
|
- [x] **PROV-03**: 12 Tier 3 Specialized provider definitions (Perplexity, You.com, Voyage, Jina, Unstructured, AssemblyAI, Deepgram, ElevenLabs, Stability, Runway, Midjourney, HuggingFace)
|
|
- [x] **PROV-04**: 16 Tier 4 Chinese/Regional provider definitions (DeepSeek, Baichuan, Zhipu, Moonshot, Yi, Qwen, Baidu, ByteDance, SenseTime, iFlytek, MiniMax, Stepfun, 360 AI, Kuaishou, Tencent, SiliconFlow)
|
|
- [x] **PROV-05**: 11 Tier 5 Infrastructure/Gateway provider definitions (Cloudflare AI, Vercel AI, LiteLLM, Portkey, Helicone, OpenRouter, Martian, Kong, BricksAI, Aether, Not Diamond)
|
|
- [x] **PROV-06**: 15 Tier 6 Emerging/Niche provider definitions (Reka, Aleph Alpha, Writer, Jasper, Typeface, Comet, W&B, LangSmith, Pinecone, Weaviate, Qdrant, Chroma, Milvus, Neon, Lamini)
|
|
- [x] **PROV-07**: 10 Tier 7 Code/Dev Tools provider definitions (GitHub Copilot, Cursor, Tabnine, Codeium, Sourcegraph, CodeWhisperer, Replit AI, Codestral, watsonx, Oracle AI)
|
|
- [x] **PROV-08**: 10 Tier 8 Self-Hosted provider definitions (Ollama, vLLM, LocalAI, LM Studio, llama.cpp, GPT4All, text-gen-webui, TensorRT-LLM, Triton, Jan AI)
|
|
- [x] **PROV-09**: 8 Tier 9 Enterprise provider definitions (Salesforce Einstein, ServiceNow, SAP AI Core, Palantir, Databricks, Snowflake, Oracle GenAI, HPE GreenLake)
|
|
- [x] **PROV-10**: Provider YAML schema includes format_version and last_verified date for pattern health tracking
|
|
|
|
### Input Sources
|
|
|
|
- [x] **INPUT-01**: File and directory scanning with recursive traversal and glob exclusion patterns
|
|
- [x] **INPUT-02**: Git-aware scanning — full history, branches, stash, delta-based diffs
|
|
- [ ] **INPUT-03**: Git scanning supports --since flag for time-scoped history scan
|
|
- [ ] **INPUT-04**: stdin/pipe input support (cat file | keyhunter scan stdin)
|
|
- [ ] **INPUT-05**: URL fetching — scan content from any remote URL
|
|
- [x] **INPUT-06**: Clipboard content scanning
|
|
|
|
### Verification
|
|
|
|
- [x] **VRFY-01**: Active key verification via lightweight API calls when --verify flag is set
|
|
- [x] **VRFY-02**: Verification is opt-in only (off by default) with consent prompt on first use
|
|
- [x] **VRFY-03**: Each provider YAML defines verify endpoint, method, headers, success/failure codes
|
|
- [x] **VRFY-04**: Verification extracts additional metadata (org, rate limit, permissions) when available
|
|
- [x] **VRFY-05**: Configurable verification timeout (default 10s, --verify-timeout flag)
|
|
- [x] **VRFY-06**: Legal disclaimer and documentation ships with verification feature
|
|
|
|
### Output & Reporting
|
|
|
|
- [x] **OUT-01**: Colored terminal table output (default)
|
|
- [x] **OUT-02**: JSON output format
|
|
- [x] **OUT-03**: SARIF output format (CI/CD compatible)
|
|
- [x] **OUT-04**: CSV output format
|
|
- [x] **OUT-05**: Key masking by default (first 8 + last 4 chars) with --unmask flag for full keys
|
|
- [x] **OUT-06**: Exit codes: 0=clean, 1=keys found, 2=error
|
|
|
|
### Key Management
|
|
|
|
- [x] **KEYS-01**: keyhunter keys list — show all found keys (masked by default, --unmask for full)
|
|
- [x] **KEYS-02**: keyhunter keys show <id> — single key full detail (always unmasked)
|
|
- [x] **KEYS-03**: keyhunter keys export --format=json|csv — export all keys with full values
|
|
- [x] **KEYS-04**: keyhunter keys copy <id> — copy full key to clipboard
|
|
- [x] **KEYS-05**: keyhunter keys verify <id> — verify specific key and show full detail
|
|
- [x] **KEYS-06**: keyhunter keys delete <id> — remove key from database
|
|
|
|
### External Tool Import
|
|
|
|
- [ ] **IMP-01**: TruffleHog JSON output parser and importer
|
|
- [ ] **IMP-02**: Gitleaks JSON output parser and importer
|
|
- [x] **IMP-03**: Generic CSV import for custom tool output
|
|
|
|
### Storage
|
|
|
|
- [ ] **STOR-01**: SQLite database for persisting scan results, keys, recon history
|
|
- [ ] **STOR-02**: Application-level AES-256 encryption for stored keys and sensitive config
|
|
- [ ] **STOR-03**: Encryption key derived from user passphrase via Argon2
|
|
|
|
### CLI
|
|
|
|
- [x] **CLI-01**: Cobra-based CLI with commands: scan, verify, import, recon, keys, serve, dorks, providers, config, hook, schedule
|
|
- [x] **CLI-02**: keyhunter config init creates ~/.keyhunter.yaml
|
|
- [x] **CLI-03**: keyhunter config set <key> <value> for all configuration
|
|
- [x] **CLI-04**: keyhunter providers list/info/stats for provider management
|
|
- [x] **CLI-05**: Scan flags: --providers, --category, --confidence, --exclude, --verify, --workers, --output, --unmask, --notify
|
|
|
|
### CI/CD Integration
|
|
|
|
- [x] **CICD-01**: keyhunter hook install/uninstall for git pre-commit hooks
|
|
- [x] **CICD-02**: SARIF output uploadable to GitHub Security tab
|
|
|
|
### OSINT/Recon — IoT & Internet Scanners
|
|
|
|
- [x] **RECON-IOT-01**: Shodan API search and dorking
|
|
- [x] **RECON-IOT-02**: Censys API search
|
|
- [x] **RECON-IOT-03**: ZoomEye API search
|
|
- [x] **RECON-IOT-04**: FOFA API search
|
|
- [x] **RECON-IOT-05**: Netlas API search
|
|
- [x] **RECON-IOT-06**: BinaryEdge API search
|
|
|
|
### OSINT/Recon — Code Hosting & Snippets
|
|
|
|
- [x] **RECON-CODE-01**: GitHub code search with automated dork execution
|
|
- [ ] **RECON-CODE-02**: GitLab code search with dork execution
|
|
- [ ] **RECON-CODE-03**: GitHub Gist search
|
|
- [ ] **RECON-CODE-04**: Bitbucket code search
|
|
- [ ] **RECON-CODE-05**: Codeberg/Gitea search (Gitea auto-discovered via Shodan)
|
|
- [x] **RECON-CODE-06**: Replit public repl scanning
|
|
- [x] **RECON-CODE-07**: CodeSandbox project scanning
|
|
- [ ] **RECON-CODE-08**: HuggingFace Spaces and repos scanning
|
|
- [ ] **RECON-CODE-09**: Kaggle notebook scanning
|
|
- [x] **RECON-CODE-10**: CodePen, JSFiddle, StackBlitz, Glitch, Observable, Gitpod scanning
|
|
|
|
### OSINT/Recon — Search Engine Dorking
|
|
|
|
- [x] **RECON-DORK-01**: Google dorking via Custom Search API / SerpAPI with 100+ built-in dorks
|
|
- [x] **RECON-DORK-02**: Bing dorking via Azure Cognitive Services
|
|
- [x] **RECON-DORK-03**: DuckDuckGo, Yandex, Brave search integration
|
|
|
|
### OSINT/Recon — Paste Sites
|
|
|
|
- [x] **RECON-PASTE-01**: Multi-paste aggregator (Pastebin, dpaste, paste.ee, rentry, hastebin, ix.io, etc.)
|
|
|
|
### OSINT/Recon — Package Registries
|
|
|
|
- [ ] **RECON-PKG-01**: npm registry package scanning (download + extract + grep)
|
|
- [ ] **RECON-PKG-02**: PyPI package scanning
|
|
- [ ] **RECON-PKG-03**: RubyGems, crates.io, Maven, NuGet, Packagist, Go proxy scanning
|
|
|
|
### OSINT/Recon — Container & Infrastructure
|
|
|
|
- [ ] **RECON-INFRA-01**: Docker Hub image layer scanning and build arg extraction
|
|
- [ ] **RECON-INFRA-02**: Kubernetes exposed dashboards and public Secret/ConfigMap discovery
|
|
- [ ] **RECON-INFRA-03**: Terraform state file and registry module scanning
|
|
- [ ] **RECON-INFRA-04**: Helm chart and Ansible Galaxy scanning
|
|
|
|
### OSINT/Recon — Cloud Storage
|
|
|
|
- [x] **RECON-CLOUD-01**: AWS S3 bucket enumeration and content scanning
|
|
- [x] **RECON-CLOUD-02**: GCS, Azure Blob, DigitalOcean Spaces, Backblaze B2 scanning
|
|
- [x] **RECON-CLOUD-03**: Self-hosted MinIO instance discovery via Shodan
|
|
- [x] **RECON-CLOUD-04**: GrayHatWarfare bucket search engine integration
|
|
|
|
### OSINT/Recon — CI/CD Logs
|
|
|
|
- [ ] **RECON-CI-01**: GitHub Actions workflow log scanning
|
|
- [ ] **RECON-CI-02**: Travis CI and CircleCI public build log scanning
|
|
- [ ] **RECON-CI-03**: Exposed Jenkins instance discovery and console output scanning
|
|
- [ ] **RECON-CI-04**: GitLab CI/CD pipeline trace scanning
|
|
|
|
### OSINT/Recon — Web Archives
|
|
|
|
- [ ] **RECON-ARCH-01**: Wayback Machine CDX API historical snapshot scanning
|
|
- [ ] **RECON-ARCH-02**: CommonCrawl index and WARC record scanning
|
|
|
|
### OSINT/Recon — Forums & Documentation
|
|
|
|
- [ ] **RECON-FORUM-01**: Stack Overflow / Stack Exchange API search
|
|
- [ ] **RECON-FORUM-02**: Reddit subreddit search
|
|
- [ ] **RECON-FORUM-03**: Hacker News Algolia API search
|
|
- [ ] **RECON-FORUM-04**: dev.to and Medium article scanning
|
|
- [ ] **RECON-FORUM-05**: Telegram public channel scanning
|
|
- [ ] **RECON-FORUM-06**: Discord indexed content search
|
|
|
|
### OSINT/Recon — Collaboration Tools
|
|
|
|
- [ ] **RECON-COLLAB-01**: Notion public page scanning (via Google dorking)
|
|
- [ ] **RECON-COLLAB-02**: Confluence exposed instance scanning
|
|
- [ ] **RECON-COLLAB-03**: Trello public board scanning
|
|
- [ ] **RECON-COLLAB-04**: Google Docs/Sheets public document scanning
|
|
|
|
### OSINT/Recon — Frontend & JS Leaks
|
|
|
|
- [ ] **RECON-JS-01**: JavaScript source map extraction and scanning
|
|
- [ ] **RECON-JS-02**: Webpack/Vite bundle scanning for inlined env vars
|
|
- [ ] **RECON-JS-03**: Exposed .env file scanning on web servers
|
|
- [ ] **RECON-JS-04**: Exposed Swagger/OpenAPI documentation scanning
|
|
- [ ] **RECON-JS-05**: Vercel/Netlify deploy preview JS bundle scanning
|
|
|
|
### OSINT/Recon — Log Aggregators
|
|
|
|
- [ ] **RECON-LOG-01**: Exposed Elasticsearch/Kibana instance scanning
|
|
- [ ] **RECON-LOG-02**: Exposed Grafana dashboard scanning
|
|
- [ ] **RECON-LOG-03**: Exposed Sentry instance scanning
|
|
|
|
### OSINT/Recon — Threat Intelligence
|
|
|
|
- [ ] **RECON-INTEL-01**: VirusTotal file and URL search
|
|
- [ ] **RECON-INTEL-02**: Intelligence X aggregated search
|
|
- [ ] **RECON-INTEL-03**: URLhaus search
|
|
|
|
### OSINT/Recon — Mobile & DNS
|
|
|
|
- [ ] **RECON-MOBILE-01**: APK download, decompile, and scanning
|
|
- [ ] **RECON-DNS-01**: crt.sh Certificate Transparency log subdomain discovery
|
|
- [ ] **RECON-DNS-02**: Subdomain config endpoint probing (.env, /api/config, /actuator/env)
|
|
|
|
### OSINT/Recon — API Marketplaces
|
|
|
|
- [ ] **RECON-API-01**: Postman public collections and workspaces scanning
|
|
- [ ] **RECON-API-02**: SwaggerHub published API scanning
|
|
|
|
### OSINT/Recon — Infrastructure
|
|
|
|
- [x] **RECON-INFRA-05**: Per-source rate limiter with configurable limits
|
|
- [x] **RECON-INFRA-06**: Stealth mode (--stealth) with UA rotation and increased delays
|
|
- [x] **RECON-INFRA-07**: robots.txt respect (--respect-robots, default on)
|
|
- [x] **RECON-INFRA-08**: Recon full command — parallel sweep across all sources with deduplication
|
|
|
|
### Dork Engine
|
|
|
|
- [x] **DORK-01**: YAML-based dork definitions (GitHub, Google, Shodan, Censys, ZoomEye, FOFA, GitLab, Bing)
|
|
- [x] **DORK-02**: 150+ built-in dorks across all sources
|
|
- [x] **DORK-03**: keyhunter dorks list/add/run/export commands
|
|
- [x] **DORK-04**: Category-filtered dork execution (--category=frontier)
|
|
|
|
### Web Dashboard
|
|
|
|
- [ ] **WEB-01**: Embedded HTTP server (chi + htmx + Tailwind CSS)
|
|
- [ ] **WEB-02**: Dashboard overview page with summary statistics
|
|
- [ ] **WEB-03**: Scan history and scan detail pages
|
|
- [ ] **WEB-04**: Key listing page with filtering and "Reveal Key" toggle
|
|
- [ ] **WEB-05**: OSINT/Recon launcher and results page
|
|
- [ ] **WEB-06**: Provider listing and statistics page
|
|
- [ ] **WEB-07**: Dork management page
|
|
- [ ] **WEB-08**: Settings configuration page
|
|
- [ ] **WEB-09**: REST API (/api/v1/*) for programmatic access
|
|
- [ ] **WEB-10**: Optional basic auth / token auth
|
|
- [ ] **WEB-11**: Server-Sent Events for live scan progress
|
|
|
|
### Telegram Bot
|
|
|
|
- [ ] **TELE-01**: /scan command — remote scan trigger
|
|
- [ ] **TELE-02**: /verify command — key verification
|
|
- [ ] **TELE-03**: /recon command — dork execution
|
|
- [ ] **TELE-04**: /status, /stats, /providers, /help commands
|
|
- [ ] **TELE-05**: /subscribe and /unsubscribe for auto-notifications
|
|
- [ ] **TELE-06**: /key <id> command — full key detail in private chat
|
|
- [ ] **TELE-07**: Auto-notification on new key findings
|
|
|
|
### Scheduled Scanning
|
|
|
|
- [ ] **SCHED-01**: Cron-based recurring scan scheduling
|
|
- [ ] **SCHED-02**: keyhunter schedule add/list/remove commands
|
|
- [ ] **SCHED-03**: Auto-notify on scheduled scan completion
|
|
|
|
## v2 Requirements
|
|
|
|
### Advanced Detection
|
|
|
|
- **ADV-01**: BPE tokenization-based detection (Betterleaks approach, 98.6% recall)
|
|
- **ADV-02**: ML/LLM-based key detection for zero-pattern providers
|
|
- **ADV-03**: Custom provider YAML hot-reload without recompile (external dir)
|
|
|
|
### Additional Integrations
|
|
|
|
- **INT-01**: Slack notification module
|
|
- **INT-02**: Webhook notification module
|
|
- **INT-03**: JIRA ticket creation on key findings
|
|
- **INT-04**: PagerDuty alert integration
|
|
|
|
### Advanced OSINT
|
|
|
|
- **OSINT-01**: Dark web / breach database search (Dehashed, HIBP correlation)
|
|
- **OSINT-02**: IPA (iOS) app decompile and scanning
|
|
- **OSINT-03**: Backblaze B2 deep scanning
|
|
- **OSINT-04**: Rapid7 Open Data integration
|
|
|
|
## Out of Scope
|
|
|
|
| Feature | Reason |
|
|
|---------|--------|
|
|
| GUI desktop app | CLI + web dashboard covers all use cases |
|
|
| Key rotation/remediation | KeyHunter detects, doesn't manage — separate concern |
|
|
| Automatic key invalidation | Legal exposure, not our responsibility |
|
|
| SaaS hosted version | Open-source tool only, no infrastructure to maintain |
|
|
| Telemetry/analytics | Privacy-first tool, no phone-home |
|
|
| Windows native binary | Linux/macOS primary, Windows via WSL/Docker |
|
|
| Real-time streaming API | Batch scanning is primary mode |
|
|
| regexp2/PCRE patterns | Catastrophic backtracking risk — Go stdlib regexp (RE2) only |
|
|
|
|
## Traceability
|
|
|
|
| Requirement | Phase | Status |
|
|
|-------------|-------|--------|
|
|
| CORE-01, CORE-02, CORE-03, CORE-04, CORE-05, CORE-06, CORE-07 | Phase 1 | Pending |
|
|
| STOR-01, STOR-02, STOR-03 | Phase 1 | Pending |
|
|
| CLI-01, CLI-02, CLI-03, CLI-04, CLI-05 | Phase 1 | Pending |
|
|
| PROV-10 | Phase 1 | Complete |
|
|
| PROV-01, PROV-02 | Phase 2 | Pending |
|
|
| PROV-03, PROV-04, PROV-05, PROV-06, PROV-07, PROV-08, PROV-09 | Phase 3 | Pending |
|
|
| INPUT-01, INPUT-02, INPUT-03, INPUT-04, INPUT-05, INPUT-06 | Phase 4 | Pending |
|
|
| VRFY-01, VRFY-02, VRFY-03, VRFY-04, VRFY-05, VRFY-06 | Phase 5 | Pending |
|
|
| OUT-01, OUT-02, OUT-03, OUT-04, OUT-05, OUT-06 | Phase 6 | Pending |
|
|
| KEYS-01, KEYS-02, KEYS-03, KEYS-04, KEYS-05, KEYS-06 | Phase 6 | Pending |
|
|
| IMP-01, IMP-02, IMP-03 | Phase 7 | Pending |
|
|
| CICD-01, CICD-02 | Phase 7 | Pending |
|
|
| DORK-01, DORK-02, DORK-03, DORK-04 | Phase 8 | Pending |
|
|
| RECON-INFRA-05, RECON-INFRA-06, RECON-INFRA-07, RECON-INFRA-08 | Phase 9 | Pending |
|
|
| RECON-CODE-01, RECON-CODE-02, RECON-CODE-03, RECON-CODE-04, RECON-CODE-05 | Phase 10 | Pending |
|
|
| RECON-CODE-06, RECON-CODE-07, RECON-CODE-08, RECON-CODE-09, RECON-CODE-10 | Phase 10 | Pending |
|
|
| RECON-DORK-01, RECON-DORK-02, RECON-DORK-03 | Phase 11 | Pending |
|
|
| RECON-PASTE-01 | Phase 11 | Complete |
|
|
| RECON-IOT-01, RECON-IOT-02, RECON-IOT-03, RECON-IOT-04, RECON-IOT-05, RECON-IOT-06 | Phase 12 | Pending |
|
|
| RECON-CLOUD-01, RECON-CLOUD-02, RECON-CLOUD-03, RECON-CLOUD-04 | Phase 12 | Pending |
|
|
| RECON-PKG-01, RECON-PKG-02, RECON-PKG-03 | Phase 13 | Pending |
|
|
| RECON-INFRA-01, RECON-INFRA-02, RECON-INFRA-03, RECON-INFRA-04 | Phase 13 | Pending |
|
|
| RECON-CI-01, RECON-CI-02, RECON-CI-03, RECON-CI-04 | Phase 14 | Pending |
|
|
| RECON-ARCH-01, RECON-ARCH-02 | Phase 14 | Pending |
|
|
| RECON-JS-01, RECON-JS-02, RECON-JS-03, RECON-JS-04, RECON-JS-05 | Phase 14 | Pending |
|
|
| RECON-FORUM-01, RECON-FORUM-02, RECON-FORUM-03, RECON-FORUM-04, RECON-FORUM-05, RECON-FORUM-06 | Phase 15 | Pending |
|
|
| RECON-COLLAB-01, RECON-COLLAB-02, RECON-COLLAB-03, RECON-COLLAB-04 | Phase 15 | Pending |
|
|
| RECON-LOG-01, RECON-LOG-02, RECON-LOG-03 | Phase 15 | Pending |
|
|
| RECON-INTEL-01, RECON-INTEL-02, RECON-INTEL-03 | Phase 16 | Pending |
|
|
| RECON-MOBILE-01 | Phase 16 | Pending |
|
|
| RECON-DNS-01, RECON-DNS-02 | Phase 16 | Pending |
|
|
| RECON-API-01, RECON-API-02 | Phase 16 | Pending |
|
|
| TELE-01, TELE-02, TELE-03, TELE-04, TELE-05, TELE-06, TELE-07 | Phase 17 | Pending |
|
|
| SCHED-01, SCHED-02, SCHED-03 | Phase 17 | Pending |
|
|
| WEB-01, WEB-02, WEB-03, WEB-04, WEB-05, WEB-06, WEB-07, WEB-08, WEB-09, WEB-10, WEB-11 | Phase 18 | Pending |
|
|
|
|
**Coverage:**
|
|
- v1 requirements: 146 total (file count; PROJECT.md summary of 120 was a pre-count estimate)
|
|
- Mapped to phases: 146
|
|
- Unmapped: 0
|
|
|
|
---
|
|
*Requirements defined: 2026-04-04*
|
|
*Last updated: 2026-04-04 after roadmap creation (18 phases)*
|