# KeyHunter > The most comprehensive API key scanner for LLM/AI providers. Detect, validate, and monitor leaked API keys across 108+ providers. [![Go](https://img.shields.io/badge/Go-1.22+-00ADD8?style=flat-square&logo=go)](https://golang.org) [![License](https://img.shields.io/badge/License-MIT-green?style=flat-square)](LICENSE) [![Providers](https://img.shields.io/badge/Providers-108+-red?style=flat-square)](providers/) --- ## Why KeyHunter? Existing tools like TruffleHog (~3 LLM detectors) and Gitleaks (~5 LLM rules) were built for general secret scanning. AI-related credential leaks grew **81% year-over-year** in 2025, yet no tool covers more than ~15 LLM providers. **KeyHunter fills that gap** with 108+ provider-specific detectors, active key validation, OSINT/recon capabilities, and a growing set of internet sources for leak discovery. ### How It Compares | Feature | KeyHunter | TruffleHog | Gitleaks | detect-secrets | |---------|-----------|------------|----------|----------------| | LLM Providers | **108+** | ~3 | ~5 | ~1 | | Active Verification | **108+ endpoints** | ~20 types | No | No | | OSINT/Recon Sources | **18 live** (80+ planned) | No | No | No | | External Tool Import | **TruffleHog + Gitleaks** | - | - | - | | Dork Engine | **150 built-in YAML dorks** | No | No | No | | Pre-commit Hook | **Built-in** | Yes | Yes | Yes | | SARIF Output | **Yes** | Yes | Yes | No | | Provider YAML Plugin | **Community-extensible** | Go code only | TOML rules | Python plugins | | Web Dashboard | Coming soon | No | No | No | | Telegram Bot | Coming soon | No | No | No | | Scheduled Scanning | Coming soon | No | No | No | --- ## Features ### Implemented #### Core Scanning Engine - **3-stage pipeline** -- AC pre-filter, regex match, entropy scoring - **ants worker pool** for parallel scanning with configurable worker count - **108 provider YAML definitions** (Tier 1-9), dual-located with `go:embed` #### Input Sources - **File scanning** -- single file analysis - **Directory scanning** -- recursive traversal with glob exclusions and mmap - **Git history scanning** -- full commit history analysis - **stdin/pipe** support -- `echo "sk-proj-..." | keyhunter scan stdin` - **URL fetching** -- scan any remote URL content - **Clipboard scanning** -- instant clipboard content analysis #### Active Verification - YAML-driven `HTTPVerifier` -- lightweight API calls to verify if detected keys are active - Permission and scope extraction (org, rate limits, model access) - Consent prompt and `LEGAL.md` for legal safety - Configurable via `--verify` flag (off by default) #### Output Formats - **Table** -- colored terminal output with key masking (default) - **JSON** -- full key values for programmatic consumption - **CSV** -- spreadsheet-compatible export - **SARIF 2.1.0** -- CI/CD integration (GitHub Code Scanning, etc.) - Exit codes: `0` (clean), `1` (findings), `2` (error) #### Key Management - `keyhunter keys list` -- list all discovered keys (masked by default) - `keyhunter keys show ` -- full key details - `keyhunter keys export` -- export in JSON/CSV format - `keyhunter keys copy ` -- copy key to clipboard - `keyhunter keys delete ` -- remove a key from the database - `keyhunter keys verify ` -- verify a specific key #### External Tool Import - **TruffleHog v3** JSON import with LLM-specific enrichment - **Gitleaks** JSON and CSV import - Deduplication across imports via `(provider, masked_key, source)` hashing #### Git Pre-commit Hook - `keyhunter hook install` -- embedded shell script, blocks leaks before commit - `keyhunter hook uninstall` -- clean removal - Backup of existing hooks with `--force` #### Dork Engine - **150 built-in YAML dorks** across 8 source types (GitHub, GitLab, Google, Shodan, Censys, ZoomEye, FOFA, Bing) - GitHub live executor with authenticated API - CLI management: `keyhunter dorks list`, `keyhunter dorks list --source=github`, `keyhunter dorks add`, `keyhunter dorks run`, `keyhunter dorks export` #### OSINT / Recon Engine (18 Sources Live) The recon framework provides a `ReconSource` interface with per-source rate limiting, stealth mode, robots.txt compliance, parallel sweep, and result deduplication. **Code Hosting & Snippets** (live) - **GitHub** -- code search with automated dorks - **GitLab** -- code search - **Bitbucket** -- code search - **GitHub Gist** -- public gist search - **Codeberg** -- alternative Git platform search - **HuggingFace** -- Spaces, repos, model configs (high-yield for LLM keys) - **Replit** -- public repl search - **CodeSandbox** -- sandbox search - **StackBlitz Sandboxes** -- sandbox search - **Kaggle** -- notebooks and datasets with API keys **Search Engine Dorking** (live) - **Google** -- Custom Search API / SerpAPI - **Bing** -- Azure Cognitive Services search - **DuckDuckGo** -- HTML scraping fallback - **Yandex** -- XML API search - **Brave** -- Brave Search API **Paste Sites** (live) - **Pastebin** -- scraping API - **GistPaste** -- paste search - **PasteSites** -- multi-paste aggregator **`recon full`** -- parallel sweep across all 18 live sources with deduplication and unified reporting. #### CLI Commands | Command | Status | |---------|--------| | `keyhunter scan` | Implemented | | `keyhunter providers list/info/stats` | Implemented | | `keyhunter config init/set/get` | Implemented | | `keyhunter keys list/show/export/copy/delete/verify` | Implemented | | `keyhunter import` | Implemented | | `keyhunter hook install/uninstall` | Implemented | | `keyhunter dorks list/add/run/export` | Implemented | | `keyhunter recon full/list` | Implemented | | `keyhunter legal` | Implemented | | `keyhunter verify` | Stub | | `keyhunter serve` | Stub | | `keyhunter schedule` | Stub | ### Coming Soon The following features are on the roadmap but not yet implemented: #### Phase 12 -- IoT Scanners & Cloud Storage - **Shodan** -- exposed LLM proxies, dashboards, API endpoints - **Censys** -- HTTP body search for leaked credentials - **ZoomEye** -- IoT scanner - **FOFA** -- Asian infrastructure scanning - **Netlas** -- HTTP response body search - **BinaryEdge** -- internet-wide scan data - **AWS S3 / GCS / Azure Blob / DigitalOcean Spaces** -- bucket enumeration and scanning #### Phase 13 -- Package Registries, Containers & IaC - **npm / PyPI / RubyGems / crates.io / Maven / NuGet** -- package source scanning - **Docker Hub** -- image layer scanning - **Terraform / Helm Charts / Ansible** -- IaC scanning #### Phase 14 -- CI/CD Logs, Web Archives & Frontend Leaks - **GitHub Actions / Travis CI / CircleCI / Jenkins / GitLab CI** -- public build log scanning - **Wayback Machine / CommonCrawl** -- historical web archive scanning - **JS Source Maps / Webpack bundles / exposed .env** -- frontend leak detection #### Phase 15 -- Forums & Collaboration - **Stack Overflow / Reddit / Hacker News / dev.to / Medium** -- forum scanning - **Notion / Confluence / Trello** -- collaboration tool scanning - **Elasticsearch / Grafana / Sentry** -- exposed log aggregators - **Telegram groups / Discord** -- public channel scanning #### Phase 16 -- Threat Intel, Mobile, DNS & API Marketplaces - **VirusTotal / Intelligence X / URLhaus** -- threat intelligence - **APK analysis** -- mobile app decompilation - **crt.sh / subdomain probing** -- DNS/subdomain discovery - **Postman / SwaggerHub** -- API marketplace scanning #### Phase 17 -- Telegram Bot & Scheduler - **Telegram Bot** -- scan triggers, key alerts, recon results - **Scheduled scanning** -- cron-based recurring scans with auto-notify #### Phase 18 -- Web Dashboard - **Web Dashboard** -- htmx + Tailwind, SQLite-backed, real-time scan viewer --- ## Quick Start ### Install ```bash # From source go install github.com/salvacybersec/keyhunter@latest # Binary release (when available) curl -sSL https://github.com/salvacybersec/keyhunter/releases/latest/download/keyhunter_linux_amd64.tar.gz | tar -xz sudo mv keyhunter /usr/local/bin/ ``` ### Basic Usage ```bash # Scan a directory keyhunter scan ./my-project/ # Scan with active verification keyhunter scan ./my-project/ --verify # Scan git history keyhunter scan --git . # Scan from pipe cat secrets.txt | keyhunter scan stdin # Scan only specific providers keyhunter scan . --providers=openai,anthropic,deepseek # JSON output keyhunter scan . --output=json > results.json # SARIF output for CI/CD keyhunter scan . --output=sarif > keyhunter.sarif # CSV output keyhunter scan . --output=csv > results.csv ``` ### OSINT / Recon ```bash # Full sweep across all 18 live sources keyhunter recon full # Sweep specific sources only keyhunter recon full --sources=github,gitlab,gist # List available recon sources keyhunter recon list # Code hosting sources keyhunter recon full --sources=github keyhunter recon full --sources=gitlab keyhunter recon full --sources=bitbucket keyhunter recon full --sources=gist keyhunter recon full --sources=codeberg keyhunter recon full --sources=huggingface keyhunter recon full --sources=replit keyhunter recon full --sources=codesandbox keyhunter recon full --sources=sandboxes keyhunter recon full --sources=kaggle # Search engine dorking keyhunter recon full --sources=google keyhunter recon full --sources=bing keyhunter recon full --sources=duckduckgo keyhunter recon full --sources=yandex keyhunter recon full --sources=brave # Paste sites keyhunter recon full --sources=pastebin keyhunter recon full --sources=gistpaste keyhunter recon full --sources=pastesites ``` ### Dork Management ```bash keyhunter dorks list # All dorks across all sources keyhunter dorks list --source=github # GitHub dorks only keyhunter dorks list --source=google # Google dorks only keyhunter dorks add github 'filename:.env "GROQ_API_KEY"' keyhunter dorks run google --category=frontier keyhunter dorks export ``` ### Key Management Keys are masked by default in terminal output (shoulder surfing protection). Ways to access full key values: ```bash # Show full keys in scan output keyhunter scan . --unmask # JSON export always includes full keys keyhunter scan . --output=json > results.json # Key management commands keyhunter keys list # Masked list keyhunter keys list --unmask # Full key list keyhunter keys show # Single key full details (always unmasked) keyhunter keys copy # Copy key to clipboard keyhunter keys export --format=json # Export all keys with full values keyhunter keys verify # Verify key + show full details keyhunter keys delete # Remove key from database ``` **Example `keyhunter keys show` output:** ``` ID: a3f7b2c1 Provider: OpenAI Pattern: OpenAI Project Key Key: sk-proj-abc123def456ghi789jkl012mno345pqr678stu901vwx234 Confidence: HIGH Source: src/config.py:42 Found: 2026-04-04 14:32:01 Scan ID: scan_001 Status: ACTIVE (verified 2026-04-04 14:32:05) Org: my-org Rate Limit: 500 req/min Revoke URL: https://platform.openai.com/api-keys ``` ### Import External Tools ```bash # Run TruffleHog, then enrich with KeyHunter trufflehog git . --json > trufflehog.json keyhunter import --format=trufflehog trufflehog.json # Run Gitleaks, then enrich gitleaks detect -f json -r gitleaks.json keyhunter import --format=gitleaks gitleaks.json # Gitleaks CSV gitleaks detect -f csv -r gitleaks.csv keyhunter import --format=gitleaks-csv gitleaks.csv ``` ### CI/CD Integration KeyHunter ships with a git **pre-commit hook** that blocks leaks before they land in history, a **GitHub Actions** integration that uploads SARIF findings directly into the repository's Code Scanning tab, and an `import` command that consolidates TruffleHog and Gitleaks output into one normalized database. ```bash # Install pre-commit hook (scans staged files only) keyhunter hook install # GitHub Actions (SARIF output for Code Scanning upload) keyhunter scan . --output sarif > keyhunter.sarif # Import findings from other scanners keyhunter import --format=trufflehog trufflehog.json keyhunter import --format=gitleaks gitleaks.json # Exit codes: 0 = clean, 1 = keys found, 2 = error keyhunter scan . && echo "Clean" || echo "Keys found!" ``` See [docs/CI-CD.md](docs/CI-CD.md) for the full guide, including a copy-paste GitHub Actions workflow and the pre-commit hook install/uninstall lifecycle. --- ## Configuration ```bash # Initialize config keyhunter config init # Creates ~/.keyhunter.yaml # Set API tokens for recon sources (currently supported) keyhunter config set recon.github.token "YOUR_GITHUB_TOKEN" keyhunter config set recon.gitlab.token "YOUR_GITLAB_TOKEN" keyhunter config set recon.bitbucket.token "YOUR_BITBUCKET_TOKEN" keyhunter config set recon.huggingface.token "YOUR_HF_TOKEN" keyhunter config set recon.kaggle.token "YOUR_KAGGLE_TOKEN" keyhunter config set recon.google.apikey "YOUR_GOOGLE_API_KEY" keyhunter config set recon.google.cx "YOUR_GOOGLE_CX_ID" keyhunter config set recon.bing.apikey "YOUR_BING_API_KEY" keyhunter config set recon.brave.apikey "YOUR_BRAVE_API_KEY" keyhunter config set recon.yandex.apikey "YOUR_YANDEX_API_KEY" keyhunter config set recon.yandex.user "YOUR_YANDEX_USER" # View current config keyhunter config get recon.github.token ``` ### Config File (`~/.keyhunter.yaml`) ```yaml scan: workers: 8 verify_timeout: 10s default_output: table recon: stealth: false respect_robots: true github: token: "" gitlab: token: "" bitbucket: token: "" huggingface: token: "" kaggle: token: "" google: apikey: "" cx: "" bing: apikey: "" brave: apikey: "" yandex: apikey: "" user: "" ``` ### Stealth & Ethics Flags ```bash --stealth # User-agent rotation, increased request spacing --respect-robots # Respect robots.txt (default: on) ``` --- ## Supported Providers (108) ### Tier 1 -- Frontier | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| | OpenAI | `sk-proj-*`, `sk-svcacct-*` | High | `GET /v1/models` | | Anthropic | `sk-ant-api03-*` | High | `GET /v1/models` | | Google AI (Gemini) | `AIza*` | High | `GET /v1/models` | | Google Vertex AI | OAuth token | Medium | `GET /v1/models` | | AWS Bedrock | `AKIA*` | High | `GetFoundationModel` | | Azure OpenAI | 32-char hex | Medium | `GET /openai/deployments` | | Meta AI | `meta-llama-*` | Medium | `GET /v1/models` | | xAI (Grok) | `xai-*` | High | `GET /v1/models` | | Cohere | `co-*` | High | `GET /v1/models` | | Mistral AI | 32-char generic | Low | `GET /v1/models` | | Inflection AI | Generic UUID | Low | `GET /api/models` | | AI21 Labs | Generic key | Low | `GET /v1/models` | ### Tier 2 -- Inference Platforms | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| | Together AI | Generic key | Low | `GET /v1/models` | | Fireworks AI | `fw_*` | High | `GET /v1/models` | | Groq | `gsk_*` | High | `GET /openai/v1/models` | | Replicate | `r8_*` | High | `GET /v1/predictions` | | Anyscale | Generic key | Low | `GET /v1/models` | | DeepInfra | Generic key | Low | `GET /v1/models` | | Lepton AI | `lpt_*` | High | `GET /v1/models` | | Modal | Generic token | Low | `GET /api/apps` | | Baseten | Generic key | Low | `GET /v1/models` | | Cerebrium | Generic key | Low | `GET /v1/models` | | NovitaAI | Generic key | Low | `GET /v1/models` | | Sambanova | Generic key | Low | `GET /v1/models` | | OctoAI | Generic key | Low | `GET /v1/models` | | Friendli AI | Generic key | Low | `GET /v1/models` | ### Tier 3 -- Specialized/Vertical | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| | Perplexity | `pplx-*` | High | `GET /chat/completions` | | You.com | Generic key | Low | `GET /v1/search` | | Voyage AI | `voy-*` | High | `GET /v1/models` | | Jina AI | `jina_*` | High | `GET /v1/models` | | Unstructured | Generic key | Low | `GET /general/v0/general` | | AssemblyAI | Generic key | Low | `GET /v2/transcript` | | Deepgram | Generic key | Low | `GET /v1/projects` | | ElevenLabs | `el_*` | High | `GET /v1/user` | | Stability AI | `sk-*` | Medium | `GET /v1/engines/list` | | Runway ML | Generic key | Low | `GET /v1/models` | | Midjourney | Generic key | Low | N/A | | HuggingFace | `hf_*` | High | `GET /api/whoami` | ### Tier 4 -- Chinese/Regional | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| | DeepSeek | `sk-*` | Medium | `GET /v1/models` | | Baichuan | Generic key | Low | `GET /v1/models` | | Zhipu AI (GLM) | Generic key | Low | `POST /api/paas/v4/chat` | | Moonshot AI (Kimi) | `sk-*` | Medium | `GET /v1/models` | | Yi (01.AI) | Generic key | Low | `GET /v1/models` | | Qwen (Alibaba) | `sk-*` | Medium | `GET /v1/models` | | Baidu (ERNIE) | API Key + Secret | Medium | Token endpoint | | ByteDance (Doubao) | Generic key | Low | `GET /v1/models` | | SenseTime | Generic key | Low | `GET /v1/models` | | iFlytek (Spark) | API Key + Secret | Medium | WebSocket handshake | | MiniMax | Generic key | Low | `GET /v1/models` | | Stepfun | Generic key | Low | `GET /v1/models` | | 360 AI | Generic key | Low | `GET /v1/models` | | Kuaishou (Kling) | Generic key | Low | `GET /v1/models` | | Tencent Hunyuan | SecretId + SecretKey | Medium | `DescribeModels` | | SiliconFlow | `sf_*` | High | `GET /v1/models` | ### Tier 5 -- Infrastructure/Gateway | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| | Cloudflare AI | Cloudflare API token | Medium | `GET /ai/models` | | Vercel AI | `vercel_*` | High | `GET /v1/models` | | LiteLLM | Generic key | Low | `GET /v1/models` | | Portkey | Generic key | Low | `GET /v1/models` | | Helicone | `sk-helicone-*` | High | `GET /v1/models` | | OpenRouter | `sk-or-*` | High | `GET /api/v1/models` | | Martian | Generic key | Low | `GET /v1/models` | | AI Gateway (Kong) | Generic key | Low | Health endpoint | | BricksAI | Generic key | Low | `GET /v1/models` | | Aether | Generic key | Low | `GET /v1/models` | | Not Diamond | Generic key | Low | `GET /v1/models` | ### Tier 6 -- Emerging/Niche | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| | Reka AI | Generic key | Low | `GET /v1/models` | | Aleph Alpha | Generic key | Low | `GET /models` | | Writer | Generic key | Low | `GET /v1/models` | | Jasper AI | Generic key | Low | N/A | | Typeface | Generic key | Low | N/A | | Comet ML | Generic key | Low | `GET /api/rest/v2` | | Weights & Biases | Generic key | Low | `GET /api/v1/viewer` | | LangSmith | `ls__*` | High | `GET /api/v1/info` | | Pinecone | Generic key | Low | `GET /databases` | | Weaviate | Generic key | Low | `GET /v1/meta` | | Qdrant | Generic key | Low | `GET /collections` | | Chroma | Generic key | Low | `GET /api/v1/heartbeat` | | Milvus | Generic key | Low | `GET /v1/vector/collections` | | Neon AI | Generic key | Low | N/A | | Lamini | Generic key | Low | `GET /v1/models` | ### Tier 7 -- Code & Dev Tools | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| | GitHub Copilot | `ghu_*`, `ghp_*` | High | `GET /user` | | Cursor | Generic key | Low | N/A | | Tabnine | Generic key | Low | N/A | | Codeium/Windsurf | Generic key | Low | N/A | | Sourcegraph Cody | `sgp_*` | High | `GET /.api/current-user` | | Amazon CodeWhisperer | `AKIA*` | High | STS GetCallerIdentity | | Replit AI | Generic key | Low | N/A | | Codestral (Mistral) | Generic key | Low | `GET /v1/models` | | IBM watsonx.ai | `ibm_*` | Medium | IAM token endpoint | | Oracle AI | Generic key | Low | N/A | ### Tier 8 -- Self-Hosted/Open Infra | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| | Ollama | N/A (local) | N/A | `GET /api/tags` | | vLLM | Generic key | Low | `GET /v1/models` | | LocalAI | Generic key | Low | `GET /v1/models` | | LM Studio | N/A (local) | N/A | `GET /v1/models` | | llama.cpp | N/A (local) | N/A | `GET /health` | | GPT4All | N/A (local) | N/A | N/A | | text-generation-webui | Generic key | Low | `GET /v1/models` | | TensorRT-LLM | N/A | N/A | Health endpoint | | Triton Inference Server | N/A | N/A | `GET /v2/health/ready` | | Jan AI | N/A (local) | N/A | `GET /v1/models` | ### Tier 9 -- Enterprise/Legacy | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| | Salesforce Einstein | Generic token | Low | REST API | | ServiceNow AI | Generic token | Low | REST API | | SAP AI Core | OAuth token | Low | Token endpoint | | Palantir AIP | Generic token | Low | REST API | | Databricks (DBRX) | `dapi*` | High | `GET /api/2.0/clusters` | | Snowflake Cortex | JWT token | Medium | SQL endpoint | | Oracle Generative AI | Generic key | Low | REST API | | HPE GreenLake AI | Generic token | Low | REST API | --- ## Architecture ``` +------------------+ | CLI (Cobra) | +--------+---------+ | +--------------+--------------+ | | | +--------v--+ +------v-----+ +-----v------+ | Input | | Recon | | Import | | Adapters | | Engine | | Adapters | | - file | | (18 live) | | - trufflehog| | - dir | | - Code(10) | | - gitleaks | | - git | | - Search(5)| +-----+------+ | - stdin | | - Paste(3) | | | - url | +------+-----+ | | - clipboard| | | +--------+---+ | | | | | +-------+------+--------------+ | +-------v--------+ | Scanner Engine | | - matcher.go | | - verifier.go | +-------+--------+ | +------------+-------------+ | | | +-----v----+ +----v-----+ +----v-------+ | Output | | Dork | | Key | | - table | | Engine | | Management | | - json | | - 150 | | - list | | - sarif | | dorks | | - show | | - csv | | - 8 src | | - export | +----------+ +----------+ +------------+ +------------------------------------------+ | Provider Registry (108+ YAML providers) | | Dork Registry (150 YAML dorks) | +------------------------------------------+ ``` ### Key Design Decisions - **YAML Providers** -- Adding a new provider = adding a YAML file. No recompile needed for pattern-only changes (when using external provider dir). Built-in providers are embedded at compile time. - **Keyword Pre-filtering** -- Before running regex, files are scanned for keywords via Aho-Corasick. This provides ~10x speedup on large codebases. - **Worker Pool** -- Parallel scanning with configurable worker count via ants. Default: CPU count. - **Delta-based Git Scanning** -- Only scans changes between commits, not entire trees. - **SQLite Storage** -- All scan results persisted with AES-256 encryption. --- ## Dork Examples (150 Built-in) ### GitHub ``` filename:.env "OPENAI_API_KEY" filename:.env "ANTHROPIC_API_KEY" filename:config.yaml "api_key" "sk-" "sk-proj-" language:python "sk-ant-api03" language:javascript filename:docker-compose "API_KEY" "api_key" extension:ipynb filename:.toml "api_key" "sk-" filename:terraform.tfvars "api_key" ``` ### Google Dorking ``` "sk-proj-" -github.com -stackoverflow.com "sk-ant-api03-" filetype:env "OPENAI_API_KEY" filetype:yml "ANTHROPIC_API_KEY" filetype:json inurl:.env "API_KEY" intitle:"index of" .env site:pastebin.com "sk-proj-" site:replit.com "OPENAI_API_KEY" ``` ### Shodan (for future IoT recon sources) ``` http.html:"openai" "api_key" port:8080 http.title:"LiteLLM" port:4000 http.html:"ollama" port:11434 http.title:"Kubernetes Dashboard" ``` --- ## Use Cases ### Red Team / Pentest ```bash # Multi-source recon against a target org keyhunter recon full --sources=github,gitlab,gist,pastebin # Scan a cloned repository keyhunter scan ./target-repo/ --verify # Scan git history for rotated keys keyhunter scan --git ./target-repo/ ``` ### DevSecOps / CI Pipeline ```bash # Pre-commit hook keyhunter hook install # GitHub Actions step - name: KeyHunter Scan run: keyhunter scan . --output=sarif > keyhunter.sarif ``` ### Bug Bounty ```bash # Search code hosting platforms for leaked keys keyhunter recon full --sources=github,gitlab,bitbucket,gist,codeberg keyhunter recon full --sources=huggingface,kaggle,replit,codesandbox # Search engine dorking keyhunter recon full --sources=google,bing,duckduckgo,brave # Paste site monitoring keyhunter recon full --sources=pastebin,pastesites,gistpaste ``` --- ## Security & Ethics ### Built-in Protections - Key values **masked by default** in terminal (first 8 + last 4 chars) -- use `--unmask` for full keys - **Full keys always available** via: `--unmask`, `--output=json`, `keyhunter keys show` - Database is **AES-256 encrypted** (full keys stored encrypted) - API tokens stored **encrypted** in config - No key values written to logs during `--verify` ### Rate Limiting (Recon Sources) | Source | Rate Limit | |--------|-----------| | GitHub API (auth) | 30 req/min | | GitHub API (unauth) | 10 req/min | | Google Custom Search | 100/day free, 10K/day paid | | Bing Search | 1,000/month (free) | | Brave Search | Per API plan | | Paste sites | 1 req/2sec | --- ## Contributing ### Adding a New Provider 1. Create `providers/your-provider.yaml`: ```yaml id: your-provider name: Your Provider category: emerging website: https://api.yourprovider.com confidence: medium patterns: - id: your-provider-key name: "Your Provider API Key" regex: '\byp_[A-Za-z0-9]{32}\b' confidence: high description: "Your Provider API key with yp_ prefix" keywords: - "yp_" - "YOUR_PROVIDER_API_KEY" verify: enabled: true method: GET url: "https://api.yourprovider.com/v1/models" headers: Authorization: "Bearer {{key}}" success_codes: [200] failure_codes: [401, 403] metadata: docs: "https://docs.yourprovider.com" key_url: "https://dashboard.yourprovider.com/keys" env_vars: ["YOUR_PROVIDER_API_KEY"] ``` 2. Run tests: `go test ./pkg/provider/...` 3. Submit a PR ### Adding a New Dork 1. Edit `dorks/.yaml` and add your dork entry 2. Submit a PR --- ## Roadmap - [x] Core scanning engine (file, dir, git, stdin, url, clipboard) - [x] 108 provider YAML definitions (Tier 1-9) - [x] Active verification (YAML-driven HTTPVerifier) - [x] Output formats: table, JSON, CSV, SARIF 2.1.0 - [x] CLI with Cobra (scan, providers, config, keys, import, hook, dorks, recon, legal) - [x] TruffleHog & Gitleaks import adapters - [x] Key management (list, show, export, copy, delete, verify) - [x] Git pre-commit hook (install/uninstall) - [x] Dork engine with 150 built-in dorks across 8 sources - [x] OSINT recon framework with 18 live sources - [ ] IoT scanners (Shodan, Censys, ZoomEye, FOFA, Netlas, BinaryEdge) - [ ] Cloud storage scanning (S3, GCS, Azure, DigitalOcean) - [ ] Package registries (npm, PyPI, RubyGems, crates.io, Maven, NuGet) - [ ] Container & IaC scanning (Docker Hub, Terraform, Helm, Ansible) - [ ] CI/CD log scanning (GitHub Actions, Travis, CircleCI, Jenkins, GitLab CI) - [ ] Web archives (Wayback Machine, CommonCrawl) - [ ] Frontend leak detection (source maps, webpack, .env exposure) - [ ] Forums & collaboration tools (Stack Overflow, Reddit, Notion, Trello) - [ ] Threat intel (VirusTotal, Intelligence X, URLhaus) - [ ] Telegram bot with auto-notifications - [ ] Scheduled scanning (cron-based) - [ ] Web dashboard (htmx + Tailwind + SQLite) - [ ] Docker image - [ ] Homebrew formula --- ## Disclaimer KeyHunter is designed for **authorized security testing**, **defensive security**, **bug bounty programs**, and **educational purposes** only. Always ensure you have proper authorization before scanning any target. Unauthorized access to computer systems is illegal. --- ## License MIT License - see [LICENSE](LICENSE) for details.