Files
keyhunter/.planning/PROJECT.md
2026-04-05 12:33:26 +03:00

6.6 KiB

KeyHunter

What This Is

KeyHunter is a comprehensive, modular API key scanner built in Go, focused on detecting and validating API keys from 108+ LLM/AI providers. It combines native scanning with external tool integration (TruffleHog, Gitleaks), OSINT/recon across 80+ internet sources, a web dashboard, and Telegram bot notifications. Designed for red teams, DevSecOps, bug bounty hunters, and security researchers.

Core Value

Detect leaked LLM API keys across more providers and more internet sources than any other tool, with active verification to confirm keys are real and alive.

Requirements

Validated

  • ✓ Core scanning engine (regex + entropy + keyword pre-filtering) — Phase 1
  • ✓ Plugin-based architecture — providers as YAML, compile-time embedded — Phase 1
  • ✓ SQLite storage with AES-256 encryption — Phase 1
  • ✓ CLI with Cobra: scan, providers, config commands — Phase 1

Active

  • Core scanning engine (regex + entropy + keyword pre-filtering)
  • 108 provider YAML definitions with patterns, keywords, verify endpoints
  • Plugin-based architecture — providers as YAML, compile-time embedded
  • Multiple input sources: file, dir, git history, stdin, URL, clipboard
  • Active key verification via --verify flag (off by default)
  • Full key access: --unmask, JSON export, keys show, web dashboard, Telegram
  • CLI with Cobra: scan, verify, import, recon, keys, serve, dorks, providers, config, hook, schedule
  • TruffleHog & Gitleaks JSON import adapters
  • OSINT/Recon engine: 80+ sources across 18 categories
  • IoT scanners: Shodan, Censys, ZoomEye, FOFA, Netlas, BinaryEdge
  • Code hosting: GitHub, GitLab, Bitbucket, Codeberg, Gitea, Replit, CodeSandbox, HuggingFace, Kaggle, etc.
  • Search engine dorking: Google, Bing, DuckDuckGo, Yandex, Brave
  • Paste site aggregator: Pastebin, dpaste, paste.ee, rentry, hastebin, ix.io, etc.
  • Package registries: npm, PyPI, RubyGems, crates.io, Maven, NuGet, Packagist, Go proxy
  • Container/infra: Docker Hub layers, K8s configs, Terraform state, Helm, Ansible
  • Cloud storage: S3, GCS, Azure Blob, DO Spaces, MinIO, GrayHatWarfare
  • CI/CD logs: Travis, CircleCI, GitHub Actions, Jenkins, GitLab CI
  • Web archives: Wayback Machine, CommonCrawl
  • Forums: StackOverflow, Reddit, HackerNews, dev.to, Medium, Telegram groups, Discord
  • Collaboration: Notion, Confluence, Trello, Google Docs
  • Frontend/JS: Source maps, webpack bundles, exposed .env, Swagger, deploy previews
  • Log aggregators: Elasticsearch/Kibana, Grafana, Sentry
  • Threat intel: VirusTotal, IntelX, URLhaus
  • Mobile: APK decompile scanning
  • DNS/Subdomain: crt.sh, config endpoint probing
  • API marketplaces: Postman, SwaggerHub
  • Built-in dork engine: 150+ dorks in YAML (GitHub, Google, Shodan, Censys, ZoomEye, FOFA, etc.)
  • Web dashboard: htmx + Tailwind + SQLite, scans/keys/recon/providers/dorks/settings pages
  • Telegram bot: /scan, /verify, /recon, /status, /stats, /subscribe, /key
  • Scheduled scanning: cron-based recurring scans with auto-notify
  • Pre-commit hook & CI/CD integration (SARIF output)
  • Output formats: table (colored), JSON, SARIF, CSV
  • SQLite storage with AES-256 encryption
  • Worker pool parallelism, keyword pre-filtering, mmap, delta-based git scanning
  • Rate limiting per source, stealth mode, robots.txt respect

Out of Scope

  • GUI desktop app — CLI + web dashboard is sufficient
  • Real-time streaming API — batch scanning is the primary mode
  • Key rotation/remediation — KeyHunter finds keys, doesn't manage them
  • Paid SaaS version — open-source tool only
  • Windows native — Linux/macOS primary, Windows via WSL/Docker

Context

  • AI-related credential leaks grew 81% YoY in 2025 (GitGuardian report)
  • 28M credentials leaked on GitHub in 2025
  • Best existing tools (TruffleHog, Gitleaks) cover at most ~15 LLM providers
  • No dedicated tool covers 100+ LLM providers with detection + verification + OSINT
  • LiteLLM supports 107 providers — our 108 provider list covers the market comprehensively
  • High-confidence key prefixes exist for: OpenAI (sk-proj-), Anthropic (sk-ant-api03-), HuggingFace (hf_), Groq (gsk_), Replicate (r8_), Perplexity (pplx-), Fireworks (fw_), Google AI (AIza)
  • Many providers (Mistral, Cohere, Together AI, Chinese providers) use generic keys — keyword-based detection needed

Constraints

  • Language: Go 1.22+ — single binary distribution, performance, TruffleHog/Gitleaks ecosystem alignment
  • Architecture: Plugin-based — providers as YAML files, compile-time embedded via Go embed
  • Storage: SQLite — zero-dependency embedded database, AES-256 encrypted
  • Web stack: htmx + Tailwind CSS — no JS framework dependency, embedded in binary
  • CLI framework: Cobra — industry standard for Go CLIs
  • Verification: Must be opt-in (--verify flag) — passive scanning by default for legal safety
  • Key masking: Default masked output, --unmask for full keys — shoulder surfing protection

Key Decisions

Decision Rationale Outcome
Go over Python/Rust Single binary, performance, ecosystem alignment with TruffleHog/Gitleaks — Pending
Plugin architecture (YAML providers) Community extensibility, easy to add providers without recompile — Pending
Compile-time embed over runtime plugins Single binary advantage, no external dependency loading — Pending
SQLite over PostgreSQL Zero dependency, embedded, sufficient for local tool — Pending
htmx over React/Vue Minimal JS, embedded in binary, server-rendered simplicity — Pending
Keyword pre-filtering before regex 10x performance improvement on large codebases (TruffleHog's approach) — Pending
YAML dorks alongside YAML providers Consistent extensibility pattern, community can add dorks same way — Pending
Configurable verification (--verify) Legal safety — passive scanning by default, active only when explicitly requested — Pending

Evolution

This document evolves at phase transitions and milestone boundaries.

After each phase transition:

  1. Requirements invalidated? -> Move to Out of Scope with reason
  2. Requirements validated? -> Move to Validated with phase reference
  3. New requirements emerged? -> Add to Active
  4. Decisions to log? -> Add to Key Decisions
  5. "What This Is" still accurate? -> Update if drifted

After each milestone:

  1. Full review of all sections
  2. Core Value check — still the right priority?
  3. Audit Out of Scope — reasons still valid?
  4. Update Context with current state

Last updated: 2026-04-05 after Phase 1 completion