KeyHunter

What This Is

KeyHunter is a comprehensive, modular API key scanner built in Go, focused on detecting and validating API keys from 108+ LLM/AI providers. It combines native scanning with external tool integration (TruffleHog, Gitleaks), OSINT/recon across 80+ internet sources, a web dashboard, and Telegram bot notifications. Designed for red teams, DevSecOps, bug bounty hunters, and security researchers.

Core Value

Detect leaked LLM API keys across more providers and more internet sources than any other tool, with active verification to confirm keys are real and alive.

Requirements

Validated

✓ Core scanning engine (regex + entropy + keyword pre-filtering) — Phase 1
✓ Plugin-based architecture — providers as YAML, compile-time embedded — Phase 1
✓ SQLite storage with AES-256 encryption — Phase 1
✓ CLI with Cobra: scan, providers, config commands — Phase 1

Active

Core scanning engine (regex + entropy + keyword pre-filtering)
108 provider YAML definitions with patterns, keywords, verify endpoints
Plugin-based architecture — providers as YAML, compile-time embedded
Multiple input sources: file, dir, git history, stdin, URL, clipboard
Active key verification via --verify flag (off by default)
Full key access: --unmask, JSON export, keys show, web dashboard, Telegram
CLI with Cobra: scan, verify, import, recon, keys, serve, dorks, providers, config, hook, schedule
TruffleHog & Gitleaks JSON import adapters
OSINT/Recon engine: 80+ sources across 18 categories
IoT scanners: Shodan, Censys, ZoomEye, FOFA, Netlas, BinaryEdge
Code hosting: GitHub, GitLab, Bitbucket, Codeberg, Gitea, Replit, CodeSandbox, HuggingFace, Kaggle, etc.
Search engine dorking: Google, Bing, DuckDuckGo, Yandex, Brave
Paste site aggregator: Pastebin, dpaste, paste.ee, rentry, hastebin, ix.io, etc.
Package registries: npm, PyPI, RubyGems, crates.io, Maven, NuGet, Packagist, Go proxy
Container/infra: Docker Hub layers, K8s configs, Terraform state, Helm, Ansible
Cloud storage: S3, GCS, Azure Blob, DO Spaces, MinIO, GrayHatWarfare
CI/CD logs: Travis, CircleCI, GitHub Actions, Jenkins, GitLab CI
Web archives: Wayback Machine, CommonCrawl
Forums: StackOverflow, Reddit, HackerNews, dev.to, Medium, Telegram groups, Discord
Collaboration: Notion, Confluence, Trello, Google Docs
Frontend/JS: Source maps, webpack bundles, exposed .env, Swagger, deploy previews
Log aggregators: Elasticsearch/Kibana, Grafana, Sentry
Threat intel: VirusTotal, IntelX, URLhaus
Mobile: APK decompile scanning
DNS/Subdomain: crt.sh, config endpoint probing
API marketplaces: Postman, SwaggerHub
Built-in dork engine: 150+ dorks in YAML (GitHub, Google, Shodan, Censys, ZoomEye, FOFA, etc.)
Web dashboard: htmx + Tailwind + SQLite, scans/keys/recon/providers/dorks/settings pages
Telegram bot: /scan, /verify, /recon, /status, /stats, /subscribe, /key
Scheduled scanning: cron-based recurring scans with auto-notify
Pre-commit hook & CI/CD integration (SARIF output)
Output formats: table (colored), JSON, SARIF, CSV
SQLite storage with AES-256 encryption
Worker pool parallelism, keyword pre-filtering, mmap, delta-based git scanning
Rate limiting per source, stealth mode, robots.txt respect

Out of Scope

GUI desktop app — CLI + web dashboard is sufficient
Real-time streaming API — batch scanning is the primary mode
Key rotation/remediation — KeyHunter finds keys, doesn't manage them
Paid SaaS version — open-source tool only
Windows native — Linux/macOS primary, Windows via WSL/Docker

Context

AI-related credential leaks grew 81% YoY in 2025 (GitGuardian report)
28M credentials leaked on GitHub in 2025
Best existing tools (TruffleHog, Gitleaks) cover at most ~15 LLM providers
No dedicated tool covers 100+ LLM providers with detection + verification + OSINT
LiteLLM supports 107 providers — our 108 provider list covers the market comprehensively
High-confidence key prefixes exist for: OpenAI (sk-proj-), Anthropic (sk-ant-api03-), HuggingFace (hf_), Groq (gsk_), Replicate (r8_), Perplexity (pplx-), Fireworks (fw_), Google AI (AIza)
Many providers (Mistral, Cohere, Together AI, Chinese providers) use generic keys — keyword-based detection needed

Constraints

Language: Go 1.22+ — single binary distribution, performance, TruffleHog/Gitleaks ecosystem alignment
Architecture: Plugin-based — providers as YAML files, compile-time embedded via Go embed
Storage: SQLite — zero-dependency embedded database, AES-256 encrypted
Web stack: htmx + Tailwind CSS — no JS framework dependency, embedded in binary
CLI framework: Cobra — industry standard for Go CLIs
Verification: Must be opt-in (--verify flag) — passive scanning by default for legal safety
Key masking: Default masked output, --unmask for full keys — shoulder surfing protection

Key Decisions

Decision	Rationale	Outcome
Go over Python/Rust	Single binary, performance, ecosystem alignment with TruffleHog/Gitleaks	— Pending
Plugin architecture (YAML providers)	Community extensibility, easy to add providers without recompile	— Pending
Compile-time embed over runtime plugins	Single binary advantage, no external dependency loading	— Pending
SQLite over PostgreSQL	Zero dependency, embedded, sufficient for local tool	— Pending
htmx over React/Vue	Minimal JS, embedded in binary, server-rendered simplicity	— Pending
Keyword pre-filtering before regex	10x performance improvement on large codebases (TruffleHog's approach)	— Pending
YAML dorks alongside YAML providers	Consistent extensibility pattern, community can add dorks same way	— Pending
Configurable verification (--verify)	Legal safety — passive scanning by default, active only when explicitly requested	— Pending

Evolution

This document evolves at phase transitions and milestone boundaries.

After each phase transition:

Requirements invalidated? -> Move to Out of Scope with reason
Requirements validated? -> Move to Validated with phase reference
New requirements emerged? -> Add to Active
Decisions to log? -> Add to Key Decisions
"What This Is" still accurate? -> Update if drifted

After each milestone:

Full review of all sections
Core Value check — still the right priority?
Audit Out of Scope — reasons still valid?
Update Context with current state

Last updated: 2026-04-05 after Phase 1 completion

6.6 KiB Raw Blame History