# Requirements: KeyHunter **Defined:** 2026-04-04 **Core Value:** Detect leaked LLM API keys across more providers and more internet sources than any other tool, with active verification to confirm keys are real and alive. ## v1 Requirements Requirements for initial release. Each maps to roadmap phases. ### Core Engine - [x] **CORE-01**: Scanner engine detects API keys using keyword pre-filtering + regex matching pipeline - [x] **CORE-02**: Provider definitions loaded from YAML files embedded at compile time via Go embed - [x] **CORE-03**: Provider registry manages 108+ provider definitions with pattern, keyword, confidence, and verify metadata - [x] **CORE-04**: Entropy analysis as secondary signal for low-confidence providers (generic key formats) - [x] **CORE-05**: Worker pool parallelism with configurable worker count (default: CPU count) - [x] **CORE-06**: Aho-Corasick keyword pre-filter runs before regex for 10x performance on large files - [x] **CORE-07**: mmap-based large file reading for memory efficiency ### Providers - [x] **PROV-01**: 12 Tier 1 Frontier provider YAML definitions (OpenAI, Anthropic, Google AI, Vertex, AWS Bedrock, Azure OpenAI, Meta AI, xAI, Cohere, Mistral, Inflection, AI21) - [x] **PROV-02**: 14 Tier 2 Inference Platform provider definitions (Together, Fireworks, Groq, Replicate, Anyscale, DeepInfra, Lepton, Modal, Baseten, Cerebrium, NovitaAI, Sambanova, OctoAI, Friendli) - [x] **PROV-03**: 12 Tier 3 Specialized provider definitions (Perplexity, You.com, Voyage, Jina, Unstructured, AssemblyAI, Deepgram, ElevenLabs, Stability, Runway, Midjourney, HuggingFace) - [x] **PROV-04**: 16 Tier 4 Chinese/Regional provider definitions (DeepSeek, Baichuan, Zhipu, Moonshot, Yi, Qwen, Baidu, ByteDance, SenseTime, iFlytek, MiniMax, Stepfun, 360 AI, Kuaishou, Tencent, SiliconFlow) - [x] **PROV-05**: 11 Tier 5 Infrastructure/Gateway provider definitions (Cloudflare AI, Vercel AI, LiteLLM, Portkey, Helicone, OpenRouter, Martian, Kong, BricksAI, Aether, Not Diamond) - [x] **PROV-06**: 15 Tier 6 Emerging/Niche provider definitions (Reka, Aleph Alpha, Writer, Jasper, Typeface, Comet, W&B, LangSmith, Pinecone, Weaviate, Qdrant, Chroma, Milvus, Neon, Lamini) - [x] **PROV-07**: 10 Tier 7 Code/Dev Tools provider definitions (GitHub Copilot, Cursor, Tabnine, Codeium, Sourcegraph, CodeWhisperer, Replit AI, Codestral, watsonx, Oracle AI) - [x] **PROV-08**: 10 Tier 8 Self-Hosted provider definitions (Ollama, vLLM, LocalAI, LM Studio, llama.cpp, GPT4All, text-gen-webui, TensorRT-LLM, Triton, Jan AI) - [x] **PROV-09**: 8 Tier 9 Enterprise provider definitions (Salesforce Einstein, ServiceNow, SAP AI Core, Palantir, Databricks, Snowflake, Oracle GenAI, HPE GreenLake) - [x] **PROV-10**: Provider YAML schema includes format_version and last_verified date for pattern health tracking ### Input Sources - [x] **INPUT-01**: File and directory scanning with recursive traversal and glob exclusion patterns - [x] **INPUT-02**: Git-aware scanning — full history, branches, stash, delta-based diffs - [ ] **INPUT-03**: Git scanning supports --since flag for time-scoped history scan - [ ] **INPUT-04**: stdin/pipe input support (cat file | keyhunter scan stdin) - [ ] **INPUT-05**: URL fetching — scan content from any remote URL - [x] **INPUT-06**: Clipboard content scanning ### Verification - [x] **VRFY-01**: Active key verification via lightweight API calls when --verify flag is set - [x] **VRFY-02**: Verification is opt-in only (off by default) with consent prompt on first use - [x] **VRFY-03**: Each provider YAML defines verify endpoint, method, headers, success/failure codes - [ ] **VRFY-04**: Verification extracts additional metadata (org, rate limit, permissions) when available - [ ] **VRFY-05**: Configurable verification timeout (default 10s, --verify-timeout flag) - [x] **VRFY-06**: Legal disclaimer and documentation ships with verification feature ### Output & Reporting - [ ] **OUT-01**: Colored terminal table output (default) - [ ] **OUT-02**: JSON output format - [ ] **OUT-03**: SARIF output format (CI/CD compatible) - [ ] **OUT-04**: CSV output format - [ ] **OUT-05**: Key masking by default (first 8 + last 4 chars) with --unmask flag for full keys - [ ] **OUT-06**: Exit codes: 0=clean, 1=keys found, 2=error ### Key Management - [ ] **KEYS-01**: keyhunter keys list — show all found keys (masked by default, --unmask for full) - [ ] **KEYS-02**: keyhunter keys show — single key full detail (always unmasked) - [ ] **KEYS-03**: keyhunter keys export --format=json|csv — export all keys with full values - [ ] **KEYS-04**: keyhunter keys copy — copy full key to clipboard - [ ] **KEYS-05**: keyhunter keys verify — verify specific key and show full detail - [ ] **KEYS-06**: keyhunter keys delete — remove key from database ### External Tool Import - [ ] **IMP-01**: TruffleHog JSON output parser and importer - [ ] **IMP-02**: Gitleaks JSON output parser and importer - [ ] **IMP-03**: Generic CSV import for custom tool output ### Storage - [ ] **STOR-01**: SQLite database for persisting scan results, keys, recon history - [ ] **STOR-02**: Application-level AES-256 encryption for stored keys and sensitive config - [ ] **STOR-03**: Encryption key derived from user passphrase via Argon2 ### CLI - [x] **CLI-01**: Cobra-based CLI with commands: scan, verify, import, recon, keys, serve, dorks, providers, config, hook, schedule - [x] **CLI-02**: keyhunter config init creates ~/.keyhunter.yaml - [x] **CLI-03**: keyhunter config set for all configuration - [x] **CLI-04**: keyhunter providers list/info/stats for provider management - [x] **CLI-05**: Scan flags: --providers, --category, --confidence, --exclude, --verify, --workers, --output, --unmask, --notify ### CI/CD Integration - [ ] **CICD-01**: keyhunter hook install/uninstall for git pre-commit hooks - [ ] **CICD-02**: SARIF output uploadable to GitHub Security tab ### OSINT/Recon — IoT & Internet Scanners - [ ] **RECON-IOT-01**: Shodan API search and dorking - [ ] **RECON-IOT-02**: Censys API search - [ ] **RECON-IOT-03**: ZoomEye API search - [ ] **RECON-IOT-04**: FOFA API search - [ ] **RECON-IOT-05**: Netlas API search - [ ] **RECON-IOT-06**: BinaryEdge API search ### OSINT/Recon — Code Hosting & Snippets - [ ] **RECON-CODE-01**: GitHub code search with automated dork execution - [ ] **RECON-CODE-02**: GitLab code search with dork execution - [ ] **RECON-CODE-03**: GitHub Gist search - [ ] **RECON-CODE-04**: Bitbucket code search - [ ] **RECON-CODE-05**: Codeberg/Gitea search (Gitea auto-discovered via Shodan) - [ ] **RECON-CODE-06**: Replit public repl scanning - [ ] **RECON-CODE-07**: CodeSandbox project scanning - [ ] **RECON-CODE-08**: HuggingFace Spaces and repos scanning - [ ] **RECON-CODE-09**: Kaggle notebook scanning - [ ] **RECON-CODE-10**: CodePen, JSFiddle, StackBlitz, Glitch, Observable, Gitpod scanning ### OSINT/Recon — Search Engine Dorking - [ ] **RECON-DORK-01**: Google dorking via Custom Search API / SerpAPI with 100+ built-in dorks - [ ] **RECON-DORK-02**: Bing dorking via Azure Cognitive Services - [ ] **RECON-DORK-03**: DuckDuckGo, Yandex, Brave search integration ### OSINT/Recon — Paste Sites - [ ] **RECON-PASTE-01**: Multi-paste aggregator (Pastebin, dpaste, paste.ee, rentry, hastebin, ix.io, etc.) ### OSINT/Recon — Package Registries - [ ] **RECON-PKG-01**: npm registry package scanning (download + extract + grep) - [ ] **RECON-PKG-02**: PyPI package scanning - [ ] **RECON-PKG-03**: RubyGems, crates.io, Maven, NuGet, Packagist, Go proxy scanning ### OSINT/Recon — Container & Infrastructure - [ ] **RECON-INFRA-01**: Docker Hub image layer scanning and build arg extraction - [ ] **RECON-INFRA-02**: Kubernetes exposed dashboards and public Secret/ConfigMap discovery - [ ] **RECON-INFRA-03**: Terraform state file and registry module scanning - [ ] **RECON-INFRA-04**: Helm chart and Ansible Galaxy scanning ### OSINT/Recon — Cloud Storage - [ ] **RECON-CLOUD-01**: AWS S3 bucket enumeration and content scanning - [ ] **RECON-CLOUD-02**: GCS, Azure Blob, DigitalOcean Spaces, Backblaze B2 scanning - [ ] **RECON-CLOUD-03**: Self-hosted MinIO instance discovery via Shodan - [ ] **RECON-CLOUD-04**: GrayHatWarfare bucket search engine integration ### OSINT/Recon — CI/CD Logs - [ ] **RECON-CI-01**: GitHub Actions workflow log scanning - [ ] **RECON-CI-02**: Travis CI and CircleCI public build log scanning - [ ] **RECON-CI-03**: Exposed Jenkins instance discovery and console output scanning - [ ] **RECON-CI-04**: GitLab CI/CD pipeline trace scanning ### OSINT/Recon — Web Archives - [ ] **RECON-ARCH-01**: Wayback Machine CDX API historical snapshot scanning - [ ] **RECON-ARCH-02**: CommonCrawl index and WARC record scanning ### OSINT/Recon — Forums & Documentation - [ ] **RECON-FORUM-01**: Stack Overflow / Stack Exchange API search - [ ] **RECON-FORUM-02**: Reddit subreddit search - [ ] **RECON-FORUM-03**: Hacker News Algolia API search - [ ] **RECON-FORUM-04**: dev.to and Medium article scanning - [ ] **RECON-FORUM-05**: Telegram public channel scanning - [ ] **RECON-FORUM-06**: Discord indexed content search ### OSINT/Recon — Collaboration Tools - [ ] **RECON-COLLAB-01**: Notion public page scanning (via Google dorking) - [ ] **RECON-COLLAB-02**: Confluence exposed instance scanning - [ ] **RECON-COLLAB-03**: Trello public board scanning - [ ] **RECON-COLLAB-04**: Google Docs/Sheets public document scanning ### OSINT/Recon — Frontend & JS Leaks - [ ] **RECON-JS-01**: JavaScript source map extraction and scanning - [ ] **RECON-JS-02**: Webpack/Vite bundle scanning for inlined env vars - [ ] **RECON-JS-03**: Exposed .env file scanning on web servers - [ ] **RECON-JS-04**: Exposed Swagger/OpenAPI documentation scanning - [ ] **RECON-JS-05**: Vercel/Netlify deploy preview JS bundle scanning ### OSINT/Recon — Log Aggregators - [ ] **RECON-LOG-01**: Exposed Elasticsearch/Kibana instance scanning - [ ] **RECON-LOG-02**: Exposed Grafana dashboard scanning - [ ] **RECON-LOG-03**: Exposed Sentry instance scanning ### OSINT/Recon — Threat Intelligence - [ ] **RECON-INTEL-01**: VirusTotal file and URL search - [ ] **RECON-INTEL-02**: Intelligence X aggregated search - [ ] **RECON-INTEL-03**: URLhaus search ### OSINT/Recon — Mobile & DNS - [ ] **RECON-MOBILE-01**: APK download, decompile, and scanning - [ ] **RECON-DNS-01**: crt.sh Certificate Transparency log subdomain discovery - [ ] **RECON-DNS-02**: Subdomain config endpoint probing (.env, /api/config, /actuator/env) ### OSINT/Recon — API Marketplaces - [ ] **RECON-API-01**: Postman public collections and workspaces scanning - [ ] **RECON-API-02**: SwaggerHub published API scanning ### OSINT/Recon — Infrastructure - [ ] **RECON-INFRA-05**: Per-source rate limiter with configurable limits - [ ] **RECON-INFRA-06**: Stealth mode (--stealth) with UA rotation and increased delays - [ ] **RECON-INFRA-07**: robots.txt respect (--respect-robots, default on) - [ ] **RECON-INFRA-08**: Recon full command — parallel sweep across all sources with deduplication ### Dork Engine - [ ] **DORK-01**: YAML-based dork definitions (GitHub, Google, Shodan, Censys, ZoomEye, FOFA, GitLab, Bing) - [ ] **DORK-02**: 150+ built-in dorks across all sources - [ ] **DORK-03**: keyhunter dorks list/add/run/export commands - [ ] **DORK-04**: Category-filtered dork execution (--category=frontier) ### Web Dashboard - [ ] **WEB-01**: Embedded HTTP server (chi + htmx + Tailwind CSS) - [ ] **WEB-02**: Dashboard overview page with summary statistics - [ ] **WEB-03**: Scan history and scan detail pages - [ ] **WEB-04**: Key listing page with filtering and "Reveal Key" toggle - [ ] **WEB-05**: OSINT/Recon launcher and results page - [ ] **WEB-06**: Provider listing and statistics page - [ ] **WEB-07**: Dork management page - [ ] **WEB-08**: Settings configuration page - [ ] **WEB-09**: REST API (/api/v1/*) for programmatic access - [ ] **WEB-10**: Optional basic auth / token auth - [ ] **WEB-11**: Server-Sent Events for live scan progress ### Telegram Bot - [ ] **TELE-01**: /scan command — remote scan trigger - [ ] **TELE-02**: /verify command — key verification - [ ] **TELE-03**: /recon command — dork execution - [ ] **TELE-04**: /status, /stats, /providers, /help commands - [ ] **TELE-05**: /subscribe and /unsubscribe for auto-notifications - [ ] **TELE-06**: /key command — full key detail in private chat - [ ] **TELE-07**: Auto-notification on new key findings ### Scheduled Scanning - [ ] **SCHED-01**: Cron-based recurring scan scheduling - [ ] **SCHED-02**: keyhunter schedule add/list/remove commands - [ ] **SCHED-03**: Auto-notify on scheduled scan completion ## v2 Requirements ### Advanced Detection - **ADV-01**: BPE tokenization-based detection (Betterleaks approach, 98.6% recall) - **ADV-02**: ML/LLM-based key detection for zero-pattern providers - **ADV-03**: Custom provider YAML hot-reload without recompile (external dir) ### Additional Integrations - **INT-01**: Slack notification module - **INT-02**: Webhook notification module - **INT-03**: JIRA ticket creation on key findings - **INT-04**: PagerDuty alert integration ### Advanced OSINT - **OSINT-01**: Dark web / breach database search (Dehashed, HIBP correlation) - **OSINT-02**: IPA (iOS) app decompile and scanning - **OSINT-03**: Backblaze B2 deep scanning - **OSINT-04**: Rapid7 Open Data integration ## Out of Scope | Feature | Reason | |---------|--------| | GUI desktop app | CLI + web dashboard covers all use cases | | Key rotation/remediation | KeyHunter detects, doesn't manage — separate concern | | Automatic key invalidation | Legal exposure, not our responsibility | | SaaS hosted version | Open-source tool only, no infrastructure to maintain | | Telemetry/analytics | Privacy-first tool, no phone-home | | Windows native binary | Linux/macOS primary, Windows via WSL/Docker | | Real-time streaming API | Batch scanning is primary mode | | regexp2/PCRE patterns | Catastrophic backtracking risk — Go stdlib regexp (RE2) only | ## Traceability | Requirement | Phase | Status | |-------------|-------|--------| | CORE-01, CORE-02, CORE-03, CORE-04, CORE-05, CORE-06, CORE-07 | Phase 1 | Pending | | STOR-01, STOR-02, STOR-03 | Phase 1 | Pending | | CLI-01, CLI-02, CLI-03, CLI-04, CLI-05 | Phase 1 | Pending | | PROV-10 | Phase 1 | Complete | | PROV-01, PROV-02 | Phase 2 | Pending | | PROV-03, PROV-04, PROV-05, PROV-06, PROV-07, PROV-08, PROV-09 | Phase 3 | Pending | | INPUT-01, INPUT-02, INPUT-03, INPUT-04, INPUT-05, INPUT-06 | Phase 4 | Pending | | VRFY-01, VRFY-02, VRFY-03, VRFY-04, VRFY-05, VRFY-06 | Phase 5 | Pending | | OUT-01, OUT-02, OUT-03, OUT-04, OUT-05, OUT-06 | Phase 6 | Pending | | KEYS-01, KEYS-02, KEYS-03, KEYS-04, KEYS-05, KEYS-06 | Phase 6 | Pending | | IMP-01, IMP-02, IMP-03 | Phase 7 | Pending | | CICD-01, CICD-02 | Phase 7 | Pending | | DORK-01, DORK-02, DORK-03, DORK-04 | Phase 8 | Pending | | RECON-INFRA-05, RECON-INFRA-06, RECON-INFRA-07, RECON-INFRA-08 | Phase 9 | Pending | | RECON-CODE-01, RECON-CODE-02, RECON-CODE-03, RECON-CODE-04, RECON-CODE-05 | Phase 10 | Pending | | RECON-CODE-06, RECON-CODE-07, RECON-CODE-08, RECON-CODE-09, RECON-CODE-10 | Phase 10 | Pending | | RECON-DORK-01, RECON-DORK-02, RECON-DORK-03 | Phase 11 | Pending | | RECON-PASTE-01 | Phase 11 | Pending | | RECON-IOT-01, RECON-IOT-02, RECON-IOT-03, RECON-IOT-04, RECON-IOT-05, RECON-IOT-06 | Phase 12 | Pending | | RECON-CLOUD-01, RECON-CLOUD-02, RECON-CLOUD-03, RECON-CLOUD-04 | Phase 12 | Pending | | RECON-PKG-01, RECON-PKG-02, RECON-PKG-03 | Phase 13 | Pending | | RECON-INFRA-01, RECON-INFRA-02, RECON-INFRA-03, RECON-INFRA-04 | Phase 13 | Pending | | RECON-CI-01, RECON-CI-02, RECON-CI-03, RECON-CI-04 | Phase 14 | Pending | | RECON-ARCH-01, RECON-ARCH-02 | Phase 14 | Pending | | RECON-JS-01, RECON-JS-02, RECON-JS-03, RECON-JS-04, RECON-JS-05 | Phase 14 | Pending | | RECON-FORUM-01, RECON-FORUM-02, RECON-FORUM-03, RECON-FORUM-04, RECON-FORUM-05, RECON-FORUM-06 | Phase 15 | Pending | | RECON-COLLAB-01, RECON-COLLAB-02, RECON-COLLAB-03, RECON-COLLAB-04 | Phase 15 | Pending | | RECON-LOG-01, RECON-LOG-02, RECON-LOG-03 | Phase 15 | Pending | | RECON-INTEL-01, RECON-INTEL-02, RECON-INTEL-03 | Phase 16 | Pending | | RECON-MOBILE-01 | Phase 16 | Pending | | RECON-DNS-01, RECON-DNS-02 | Phase 16 | Pending | | RECON-API-01, RECON-API-02 | Phase 16 | Pending | | TELE-01, TELE-02, TELE-03, TELE-04, TELE-05, TELE-06, TELE-07 | Phase 17 | Pending | | SCHED-01, SCHED-02, SCHED-03 | Phase 17 | Pending | | WEB-01, WEB-02, WEB-03, WEB-04, WEB-05, WEB-06, WEB-07, WEB-08, WEB-09, WEB-10, WEB-11 | Phase 18 | Pending | **Coverage:** - v1 requirements: 146 total (file count; PROJECT.md summary of 120 was a pre-count estimate) - Mapped to phases: 146 - Unmapped: 0 --- *Requirements defined: 2026-04-04* *Last updated: 2026-04-04 after roadmap creation (18 phases)*