17 KiB
17 KiB
Requirements: KeyHunter
Defined: 2026-04-04 Core Value: Detect leaked LLM API keys across more providers and more internet sources than any other tool, with active verification to confirm keys are real and alive.
v1 Requirements
Requirements for initial release. Each maps to roadmap phases.
Core Engine
- CORE-01: Scanner engine detects API keys using keyword pre-filtering + regex matching pipeline
- CORE-02: Provider definitions loaded from YAML files embedded at compile time via Go embed
- CORE-03: Provider registry manages 108+ provider definitions with pattern, keyword, confidence, and verify metadata
- CORE-04: Entropy analysis as secondary signal for low-confidence providers (generic key formats)
- CORE-05: Worker pool parallelism with configurable worker count (default: CPU count)
- CORE-06: Aho-Corasick keyword pre-filter runs before regex for 10x performance on large files
- CORE-07: mmap-based large file reading for memory efficiency
Providers
- PROV-01: 12 Tier 1 Frontier provider YAML definitions (OpenAI, Anthropic, Google AI, Vertex, AWS Bedrock, Azure OpenAI, Meta AI, xAI, Cohere, Mistral, Inflection, AI21)
- PROV-02: 14 Tier 2 Inference Platform provider definitions (Together, Fireworks, Groq, Replicate, Anyscale, DeepInfra, Lepton, Modal, Baseten, Cerebrium, NovitaAI, Sambanova, OctoAI, Friendli)
- PROV-03: 12 Tier 3 Specialized provider definitions (Perplexity, You.com, Voyage, Jina, Unstructured, AssemblyAI, Deepgram, ElevenLabs, Stability, Runway, Midjourney, HuggingFace)
- PROV-04: 16 Tier 4 Chinese/Regional provider definitions (DeepSeek, Baichuan, Zhipu, Moonshot, Yi, Qwen, Baidu, ByteDance, SenseTime, iFlytek, MiniMax, Stepfun, 360 AI, Kuaishou, Tencent, SiliconFlow)
- PROV-05: 11 Tier 5 Infrastructure/Gateway provider definitions (Cloudflare AI, Vercel AI, LiteLLM, Portkey, Helicone, OpenRouter, Martian, Kong, BricksAI, Aether, Not Diamond)
- PROV-06: 15 Tier 6 Emerging/Niche provider definitions (Reka, Aleph Alpha, Writer, Jasper, Typeface, Comet, W&B, LangSmith, Pinecone, Weaviate, Qdrant, Chroma, Milvus, Neon, Lamini)
- PROV-07: 10 Tier 7 Code/Dev Tools provider definitions (GitHub Copilot, Cursor, Tabnine, Codeium, Sourcegraph, CodeWhisperer, Replit AI, Codestral, watsonx, Oracle AI)
- PROV-08: 10 Tier 8 Self-Hosted provider definitions (Ollama, vLLM, LocalAI, LM Studio, llama.cpp, GPT4All, text-gen-webui, TensorRT-LLM, Triton, Jan AI)
- PROV-09: 8 Tier 9 Enterprise provider definitions (Salesforce Einstein, ServiceNow, SAP AI Core, Palantir, Databricks, Snowflake, Oracle GenAI, HPE GreenLake)
- PROV-10: Provider YAML schema includes format_version and last_verified date for pattern health tracking
Input Sources
- INPUT-01: File and directory scanning with recursive traversal and glob exclusion patterns
- INPUT-02: Git-aware scanning — full history, branches, stash, delta-based diffs
- INPUT-03: Git scanning supports --since flag for time-scoped history scan
- INPUT-04: stdin/pipe input support (cat file | keyhunter scan stdin)
- INPUT-05: URL fetching — scan content from any remote URL
- INPUT-06: Clipboard content scanning
Verification
- VRFY-01: Active key verification via lightweight API calls when --verify flag is set
- VRFY-02: Verification is opt-in only (off by default) with consent prompt on first use
- VRFY-03: Each provider YAML defines verify endpoint, method, headers, success/failure codes
- VRFY-04: Verification extracts additional metadata (org, rate limit, permissions) when available
- VRFY-05: Configurable verification timeout (default 10s, --verify-timeout flag)
- VRFY-06: Legal disclaimer and documentation ships with verification feature
Output & Reporting
- OUT-01: Colored terminal table output (default)
- OUT-02: JSON output format
- OUT-03: SARIF output format (CI/CD compatible)
- OUT-04: CSV output format
- OUT-05: Key masking by default (first 8 + last 4 chars) with --unmask flag for full keys
- OUT-06: Exit codes: 0=clean, 1=keys found, 2=error
Key Management
- KEYS-01: keyhunter keys list — show all found keys (masked by default, --unmask for full)
- KEYS-02: keyhunter keys show — single key full detail (always unmasked)
- KEYS-03: keyhunter keys export --format=json|csv — export all keys with full values
- KEYS-04: keyhunter keys copy — copy full key to clipboard
- KEYS-05: keyhunter keys verify — verify specific key and show full detail
- KEYS-06: keyhunter keys delete — remove key from database
External Tool Import
- IMP-01: TruffleHog JSON output parser and importer
- IMP-02: Gitleaks JSON output parser and importer
- IMP-03: Generic CSV import for custom tool output
Storage
- STOR-01: SQLite database for persisting scan results, keys, recon history
- STOR-02: Application-level AES-256 encryption for stored keys and sensitive config
- STOR-03: Encryption key derived from user passphrase via Argon2
CLI
- CLI-01: Cobra-based CLI with commands: scan, verify, import, recon, keys, serve, dorks, providers, config, hook, schedule
- CLI-02: keyhunter config init creates ~/.keyhunter.yaml
- CLI-03: keyhunter config set for all configuration
- CLI-04: keyhunter providers list/info/stats for provider management
- CLI-05: Scan flags: --providers, --category, --confidence, --exclude, --verify, --workers, --output, --unmask, --notify
CI/CD Integration
- CICD-01: keyhunter hook install/uninstall for git pre-commit hooks
- CICD-02: SARIF output uploadable to GitHub Security tab
OSINT/Recon — IoT & Internet Scanners
- RECON-IOT-01: Shodan API search and dorking
- RECON-IOT-02: Censys API search
- RECON-IOT-03: ZoomEye API search
- RECON-IOT-04: FOFA API search
- RECON-IOT-05: Netlas API search
- RECON-IOT-06: BinaryEdge API search
OSINT/Recon — Code Hosting & Snippets
- RECON-CODE-01: GitHub code search with automated dork execution
- RECON-CODE-02: GitLab code search with dork execution
- RECON-CODE-03: GitHub Gist search
- RECON-CODE-04: Bitbucket code search
- RECON-CODE-05: Codeberg/Gitea search (Gitea auto-discovered via Shodan)
- RECON-CODE-06: Replit public repl scanning
- RECON-CODE-07: CodeSandbox project scanning
- RECON-CODE-08: HuggingFace Spaces and repos scanning
- RECON-CODE-09: Kaggle notebook scanning
- RECON-CODE-10: CodePen, JSFiddle, StackBlitz, Glitch, Observable, Gitpod scanning
OSINT/Recon — Search Engine Dorking
- RECON-DORK-01: Google dorking via Custom Search API / SerpAPI with 100+ built-in dorks
- RECON-DORK-02: Bing dorking via Azure Cognitive Services
- RECON-DORK-03: DuckDuckGo, Yandex, Brave search integration
OSINT/Recon — Paste Sites
- RECON-PASTE-01: Multi-paste aggregator (Pastebin, dpaste, paste.ee, rentry, hastebin, ix.io, etc.)
OSINT/Recon — Package Registries
- RECON-PKG-01: npm registry package scanning (download + extract + grep)
- RECON-PKG-02: PyPI package scanning
- RECON-PKG-03: RubyGems, crates.io, Maven, NuGet, Packagist, Go proxy scanning
OSINT/Recon — Container & Infrastructure
- RECON-INFRA-01: Docker Hub image layer scanning and build arg extraction
- RECON-INFRA-02: Kubernetes exposed dashboards and public Secret/ConfigMap discovery
- RECON-INFRA-03: Terraform state file and registry module scanning
- RECON-INFRA-04: Helm chart and Ansible Galaxy scanning
OSINT/Recon — Cloud Storage
- RECON-CLOUD-01: AWS S3 bucket enumeration and content scanning
- RECON-CLOUD-02: GCS, Azure Blob, DigitalOcean Spaces, Backblaze B2 scanning
- RECON-CLOUD-03: Self-hosted MinIO instance discovery via Shodan
- RECON-CLOUD-04: GrayHatWarfare bucket search engine integration
OSINT/Recon — CI/CD Logs
- RECON-CI-01: GitHub Actions workflow log scanning
- RECON-CI-02: Travis CI and CircleCI public build log scanning
- RECON-CI-03: Exposed Jenkins instance discovery and console output scanning
- RECON-CI-04: GitLab CI/CD pipeline trace scanning
OSINT/Recon — Web Archives
- RECON-ARCH-01: Wayback Machine CDX API historical snapshot scanning
- RECON-ARCH-02: CommonCrawl index and WARC record scanning
OSINT/Recon — Forums & Documentation
- RECON-FORUM-01: Stack Overflow / Stack Exchange API search
- RECON-FORUM-02: Reddit subreddit search
- RECON-FORUM-03: Hacker News Algolia API search
- RECON-FORUM-04: dev.to and Medium article scanning
- RECON-FORUM-05: Telegram public channel scanning
- RECON-FORUM-06: Discord indexed content search
OSINT/Recon — Collaboration Tools
- RECON-COLLAB-01: Notion public page scanning (via Google dorking)
- RECON-COLLAB-02: Confluence exposed instance scanning
- RECON-COLLAB-03: Trello public board scanning
- RECON-COLLAB-04: Google Docs/Sheets public document scanning
OSINT/Recon — Frontend & JS Leaks
- RECON-JS-01: JavaScript source map extraction and scanning
- RECON-JS-02: Webpack/Vite bundle scanning for inlined env vars
- RECON-JS-03: Exposed .env file scanning on web servers
- RECON-JS-04: Exposed Swagger/OpenAPI documentation scanning
- RECON-JS-05: Vercel/Netlify deploy preview JS bundle scanning
OSINT/Recon — Log Aggregators
- RECON-LOG-01: Exposed Elasticsearch/Kibana instance scanning
- RECON-LOG-02: Exposed Grafana dashboard scanning
- RECON-LOG-03: Exposed Sentry instance scanning
OSINT/Recon — Threat Intelligence
- RECON-INTEL-01: VirusTotal file and URL search
- RECON-INTEL-02: Intelligence X aggregated search
- RECON-INTEL-03: URLhaus search
OSINT/Recon — Mobile & DNS
- RECON-MOBILE-01: APK download, decompile, and scanning
- RECON-DNS-01: crt.sh Certificate Transparency log subdomain discovery
- RECON-DNS-02: Subdomain config endpoint probing (.env, /api/config, /actuator/env)
OSINT/Recon — API Marketplaces
- RECON-API-01: Postman public collections and workspaces scanning
- RECON-API-02: SwaggerHub published API scanning
OSINT/Recon — Infrastructure
- RECON-INFRA-05: Per-source rate limiter with configurable limits
- RECON-INFRA-06: Stealth mode (--stealth) with UA rotation and increased delays
- RECON-INFRA-07: robots.txt respect (--respect-robots, default on)
- RECON-INFRA-08: Recon full command — parallel sweep across all sources with deduplication
Dork Engine
- DORK-01: YAML-based dork definitions (GitHub, Google, Shodan, Censys, ZoomEye, FOFA, GitLab, Bing)
- DORK-02: 150+ built-in dorks across all sources
- DORK-03: keyhunter dorks list/add/run/export commands
- DORK-04: Category-filtered dork execution (--category=frontier)
Web Dashboard
- WEB-01: Embedded HTTP server (chi + htmx + Tailwind CSS)
- WEB-02: Dashboard overview page with summary statistics
- WEB-03: Scan history and scan detail pages
- WEB-04: Key listing page with filtering and "Reveal Key" toggle
- WEB-05: OSINT/Recon launcher and results page
- WEB-06: Provider listing and statistics page
- WEB-07: Dork management page
- WEB-08: Settings configuration page
- WEB-09: REST API (/api/v1/*) for programmatic access
- WEB-10: Optional basic auth / token auth
- WEB-11: Server-Sent Events for live scan progress
Telegram Bot
- TELE-01: /scan command — remote scan trigger
- TELE-02: /verify command — key verification
- TELE-03: /recon command — dork execution
- TELE-04: /status, /stats, /providers, /help commands
- TELE-05: /subscribe and /unsubscribe for auto-notifications
- TELE-06: /key command — full key detail in private chat
- TELE-07: Auto-notification on new key findings
Scheduled Scanning
- SCHED-01: Cron-based recurring scan scheduling
- SCHED-02: keyhunter schedule add/list/remove commands
- SCHED-03: Auto-notify on scheduled scan completion
v2 Requirements
Advanced Detection
- ADV-01: BPE tokenization-based detection (Betterleaks approach, 98.6% recall)
- ADV-02: ML/LLM-based key detection for zero-pattern providers
- ADV-03: Custom provider YAML hot-reload without recompile (external dir)
Additional Integrations
- INT-01: Slack notification module
- INT-02: Webhook notification module
- INT-03: JIRA ticket creation on key findings
- INT-04: PagerDuty alert integration
Advanced OSINT
- OSINT-01: Dark web / breach database search (Dehashed, HIBP correlation)
- OSINT-02: IPA (iOS) app decompile and scanning
- OSINT-03: Backblaze B2 deep scanning
- OSINT-04: Rapid7 Open Data integration
Out of Scope
| Feature | Reason |
|---|---|
| GUI desktop app | CLI + web dashboard covers all use cases |
| Key rotation/remediation | KeyHunter detects, doesn't manage — separate concern |
| Automatic key invalidation | Legal exposure, not our responsibility |
| SaaS hosted version | Open-source tool only, no infrastructure to maintain |
| Telemetry/analytics | Privacy-first tool, no phone-home |
| Windows native binary | Linux/macOS primary, Windows via WSL/Docker |
| Real-time streaming API | Batch scanning is primary mode |
| regexp2/PCRE patterns | Catastrophic backtracking risk — Go stdlib regexp (RE2) only |
Traceability
| Requirement | Phase | Status |
|---|---|---|
| CORE-01, CORE-02, CORE-03, CORE-04, CORE-05, CORE-06, CORE-07 | Phase 1 | Pending |
| STOR-01, STOR-02, STOR-03 | Phase 1 | Pending |
| CLI-01, CLI-02, CLI-03, CLI-04, CLI-05 | Phase 1 | Pending |
| PROV-10 | Phase 1 | Complete |
| PROV-01, PROV-02 | Phase 2 | Pending |
| PROV-03, PROV-04, PROV-05, PROV-06, PROV-07, PROV-08, PROV-09 | Phase 3 | Pending |
| INPUT-01, INPUT-02, INPUT-03, INPUT-04, INPUT-05, INPUT-06 | Phase 4 | Pending |
| VRFY-01, VRFY-02, VRFY-03, VRFY-04, VRFY-05, VRFY-06 | Phase 5 | Pending |
| OUT-01, OUT-02, OUT-03, OUT-04, OUT-05, OUT-06 | Phase 6 | Pending |
| KEYS-01, KEYS-02, KEYS-03, KEYS-04, KEYS-05, KEYS-06 | Phase 6 | Pending |
| IMP-01, IMP-02, IMP-03 | Phase 7 | Pending |
| CICD-01, CICD-02 | Phase 7 | Pending |
| DORK-01, DORK-02, DORK-03, DORK-04 | Phase 8 | Pending |
| RECON-INFRA-05, RECON-INFRA-06, RECON-INFRA-07, RECON-INFRA-08 | Phase 9 | Pending |
| RECON-CODE-01, RECON-CODE-02, RECON-CODE-03, RECON-CODE-04, RECON-CODE-05 | Phase 10 | Pending |
| RECON-CODE-06, RECON-CODE-07, RECON-CODE-08, RECON-CODE-09, RECON-CODE-10 | Phase 10 | Pending |
| RECON-DORK-01, RECON-DORK-02, RECON-DORK-03 | Phase 11 | Pending |
| RECON-PASTE-01 | Phase 11 | Complete |
| RECON-IOT-01, RECON-IOT-02, RECON-IOT-03, RECON-IOT-04, RECON-IOT-05, RECON-IOT-06 | Phase 12 | Pending |
| RECON-CLOUD-01, RECON-CLOUD-02, RECON-CLOUD-03, RECON-CLOUD-04 | Phase 12 | Pending |
| RECON-PKG-01, RECON-PKG-02, RECON-PKG-03 | Phase 13 | Pending |
| RECON-INFRA-01, RECON-INFRA-02, RECON-INFRA-03, RECON-INFRA-04 | Phase 13 | Pending |
| RECON-CI-01, RECON-CI-02, RECON-CI-03, RECON-CI-04 | Phase 14 | Pending |
| RECON-ARCH-01, RECON-ARCH-02 | Phase 14 | Pending |
| RECON-JS-01, RECON-JS-02, RECON-JS-03, RECON-JS-04, RECON-JS-05 | Phase 14 | Pending |
| RECON-FORUM-01, RECON-FORUM-02, RECON-FORUM-03, RECON-FORUM-04, RECON-FORUM-05, RECON-FORUM-06 | Phase 15 | Pending |
| RECON-COLLAB-01, RECON-COLLAB-02, RECON-COLLAB-03, RECON-COLLAB-04 | Phase 15 | Pending |
| RECON-LOG-01, RECON-LOG-02, RECON-LOG-03 | Phase 15 | Pending |
| RECON-INTEL-01, RECON-INTEL-02, RECON-INTEL-03 | Phase 16 | Pending |
| RECON-MOBILE-01 | Phase 16 | Pending |
| RECON-DNS-01, RECON-DNS-02 | Phase 16 | Pending |
| RECON-API-01, RECON-API-02 | Phase 16 | Pending |
| TELE-01, TELE-02, TELE-03, TELE-04, TELE-05, TELE-06, TELE-07 | Phase 17 | Pending |
| SCHED-01, SCHED-02, SCHED-03 | Phase 17 | Pending |
| WEB-01, WEB-02, WEB-03, WEB-04, WEB-05, WEB-06, WEB-07, WEB-08, WEB-09, WEB-10, WEB-11 | Phase 18 | Pending |
Coverage:
- v1 requirements: 146 total (file count; PROJECT.md summary of 120 was a pre-count estimate)
- Mapped to phases: 146
- Unmapped: 0
Requirements defined: 2026-04-04 Last updated: 2026-04-04 after roadmap creation (18 phases)