Files
keyhunter/README.md
2026-04-06 12:20:42 +03:00

807 lines
28 KiB
Markdown

# KeyHunter
> The most comprehensive API key scanner for LLM/AI providers. Detect, validate, and monitor leaked API keys across 108+ providers.
[![Go](https://img.shields.io/badge/Go-1.22+-00ADD8?style=flat-square&logo=go)](https://golang.org)
[![License](https://img.shields.io/badge/License-MIT-green?style=flat-square)](LICENSE)
[![Providers](https://img.shields.io/badge/Providers-108+-red?style=flat-square)](providers/)
---
## Why KeyHunter?
Existing tools like TruffleHog (~3 LLM detectors) and Gitleaks (~5 LLM rules) were built for general secret scanning. AI-related credential leaks grew **81% year-over-year** in 2025, yet no tool covers more than ~15 LLM providers.
**KeyHunter fills that gap** with 108+ provider-specific detectors, active key validation, OSINT/recon capabilities, and a growing set of internet sources for leak discovery.
### How It Compares
| Feature | KeyHunter | TruffleHog | Gitleaks | detect-secrets |
|---------|-----------|------------|----------|----------------|
| LLM Providers | **108+** | ~3 | ~5 | ~1 |
| Active Verification | **108+ endpoints** | ~20 types | No | No |
| OSINT/Recon Sources | **18 live** (80+ planned) | No | No | No |
| External Tool Import | **TruffleHog + Gitleaks** | - | - | - |
| Dork Engine | **150 built-in YAML dorks** | No | No | No |
| Pre-commit Hook | **Built-in** | Yes | Yes | Yes |
| SARIF Output | **Yes** | Yes | Yes | No |
| Provider YAML Plugin | **Community-extensible** | Go code only | TOML rules | Python plugins |
| Web Dashboard | Coming soon | No | No | No |
| Telegram Bot | Coming soon | No | No | No |
| Scheduled Scanning | Coming soon | No | No | No |
---
## Features
### Implemented
#### Core Scanning Engine
- **3-stage pipeline** -- AC pre-filter, regex match, entropy scoring
- **ants worker pool** for parallel scanning with configurable worker count
- **108 provider YAML definitions** (Tier 1-9), dual-located with `go:embed`
#### Input Sources
- **File scanning** -- single file analysis
- **Directory scanning** -- recursive traversal with glob exclusions and mmap
- **Git history scanning** -- full commit history analysis
- **stdin/pipe** support -- `echo "sk-proj-..." | keyhunter scan stdin`
- **URL fetching** -- scan any remote URL content
- **Clipboard scanning** -- instant clipboard content analysis
#### Active Verification
- YAML-driven `HTTPVerifier` -- lightweight API calls to verify if detected keys are active
- Permission and scope extraction (org, rate limits, model access)
- Consent prompt and `LEGAL.md` for legal safety
- Configurable via `--verify` flag (off by default)
#### Output Formats
- **Table** -- colored terminal output with key masking (default)
- **JSON** -- full key values for programmatic consumption
- **CSV** -- spreadsheet-compatible export
- **SARIF 2.1.0** -- CI/CD integration (GitHub Code Scanning, etc.)
- Exit codes: `0` (clean), `1` (findings), `2` (error)
#### Key Management
- `keyhunter keys list` -- list all discovered keys (masked by default)
- `keyhunter keys show <id>` -- full key details
- `keyhunter keys export` -- export in JSON/CSV format
- `keyhunter keys copy <id>` -- copy key to clipboard
- `keyhunter keys delete <id>` -- remove a key from the database
- `keyhunter keys verify <id>` -- verify a specific key
#### External Tool Import
- **TruffleHog v3** JSON import with LLM-specific enrichment
- **Gitleaks** JSON and CSV import
- Deduplication across imports via `(provider, masked_key, source)` hashing
#### Git Pre-commit Hook
- `keyhunter hook install` -- embedded shell script, blocks leaks before commit
- `keyhunter hook uninstall` -- clean removal
- Backup of existing hooks with `--force`
#### Dork Engine
- **150 built-in YAML dorks** across 8 source types (GitHub, GitLab, Google, Shodan, Censys, ZoomEye, FOFA, Bing)
- GitHub live executor with authenticated API
- CLI management: `keyhunter dorks list`, `keyhunter dorks list --source=github`, `keyhunter dorks add`, `keyhunter dorks run`, `keyhunter dorks export`
#### OSINT / Recon Engine (18 Sources Live)
The recon framework provides a `ReconSource` interface with per-source rate limiting, stealth mode, robots.txt compliance, parallel sweep, and result deduplication.
**Code Hosting & Snippets** (live)
- **GitHub** -- code search with automated dorks
- **GitLab** -- code search
- **Bitbucket** -- code search
- **GitHub Gist** -- public gist search
- **Codeberg** -- alternative Git platform search
- **HuggingFace** -- Spaces, repos, model configs (high-yield for LLM keys)
- **Replit** -- public repl search
- **CodeSandbox** -- sandbox search
- **StackBlitz Sandboxes** -- sandbox search
- **Kaggle** -- notebooks and datasets with API keys
**Search Engine Dorking** (live)
- **Google** -- Custom Search API / SerpAPI
- **Bing** -- Azure Cognitive Services search
- **DuckDuckGo** -- HTML scraping fallback
- **Yandex** -- XML API search
- **Brave** -- Brave Search API
**Paste Sites** (live)
- **Pastebin** -- scraping API
- **GistPaste** -- paste search
- **PasteSites** -- multi-paste aggregator
**`recon full`** -- parallel sweep across all 18 live sources with deduplication and unified reporting.
#### CLI Commands
| Command | Status |
|---------|--------|
| `keyhunter scan` | Implemented |
| `keyhunter providers list/info/stats` | Implemented |
| `keyhunter config init/set/get` | Implemented |
| `keyhunter keys list/show/export/copy/delete/verify` | Implemented |
| `keyhunter import` | Implemented |
| `keyhunter hook install/uninstall` | Implemented |
| `keyhunter dorks list/add/run/export` | Implemented |
| `keyhunter recon full/list` | Implemented |
| `keyhunter legal` | Implemented |
| `keyhunter verify` | Stub |
| `keyhunter serve` | Stub |
| `keyhunter schedule` | Stub |
### Coming Soon
The following features are on the roadmap but not yet implemented:
#### Phase 12 -- IoT Scanners & Cloud Storage
- **Shodan** -- exposed LLM proxies, dashboards, API endpoints
- **Censys** -- HTTP body search for leaked credentials
- **ZoomEye** -- IoT scanner
- **FOFA** -- Asian infrastructure scanning
- **Netlas** -- HTTP response body search
- **BinaryEdge** -- internet-wide scan data
- **AWS S3 / GCS / Azure Blob / DigitalOcean Spaces** -- bucket enumeration and scanning
#### Phase 13 -- Package Registries, Containers & IaC
- **npm / PyPI / RubyGems / crates.io / Maven / NuGet** -- package source scanning
- **Docker Hub** -- image layer scanning
- **Terraform / Helm Charts / Ansible** -- IaC scanning
#### Phase 14 -- CI/CD Logs, Web Archives & Frontend Leaks
- **GitHub Actions / Travis CI / CircleCI / Jenkins / GitLab CI** -- public build log scanning
- **Wayback Machine / CommonCrawl** -- historical web archive scanning
- **JS Source Maps / Webpack bundles / exposed .env** -- frontend leak detection
#### Phase 15 -- Forums & Collaboration
- **Stack Overflow / Reddit / Hacker News / dev.to / Medium** -- forum scanning
- **Notion / Confluence / Trello** -- collaboration tool scanning
- **Elasticsearch / Grafana / Sentry** -- exposed log aggregators
- **Telegram groups / Discord** -- public channel scanning
#### Phase 16 -- Threat Intel, Mobile, DNS & API Marketplaces
- **VirusTotal / Intelligence X / URLhaus** -- threat intelligence
- **APK analysis** -- mobile app decompilation
- **crt.sh / subdomain probing** -- DNS/subdomain discovery
- **Postman / SwaggerHub** -- API marketplace scanning
#### Phase 17 -- Telegram Bot & Scheduler
- **Telegram Bot** -- scan triggers, key alerts, recon results
- **Scheduled scanning** -- cron-based recurring scans with auto-notify
#### Phase 18 -- Web Dashboard
- **Web Dashboard** -- htmx + Tailwind, SQLite-backed, real-time scan viewer
---
## Quick Start
### Install
```bash
# From source
go install github.com/salvacybersec/keyhunter@latest
# Binary release (when available)
curl -sSL https://github.com/salvacybersec/keyhunter/releases/latest/download/keyhunter_linux_amd64.tar.gz | tar -xz
sudo mv keyhunter /usr/local/bin/
```
### Basic Usage
```bash
# Scan a directory
keyhunter scan ./my-project/
# Scan with active verification
keyhunter scan ./my-project/ --verify
# Scan git history
keyhunter scan --git .
# Scan from pipe
cat secrets.txt | keyhunter scan stdin
# Scan only specific providers
keyhunter scan . --providers=openai,anthropic,deepseek
# JSON output
keyhunter scan . --output=json > results.json
# SARIF output for CI/CD
keyhunter scan . --output=sarif > keyhunter.sarif
# CSV output
keyhunter scan . --output=csv > results.csv
```
### OSINT / Recon
```bash
# Full sweep across all 18 live sources
keyhunter recon full
# Sweep specific sources only
keyhunter recon full --sources=github,gitlab,gist
# List available recon sources
keyhunter recon list
# Code hosting sources
keyhunter recon full --sources=github
keyhunter recon full --sources=gitlab
keyhunter recon full --sources=bitbucket
keyhunter recon full --sources=gist
keyhunter recon full --sources=codeberg
keyhunter recon full --sources=huggingface
keyhunter recon full --sources=replit
keyhunter recon full --sources=codesandbox
keyhunter recon full --sources=sandboxes
keyhunter recon full --sources=kaggle
# Search engine dorking
keyhunter recon full --sources=google
keyhunter recon full --sources=bing
keyhunter recon full --sources=duckduckgo
keyhunter recon full --sources=yandex
keyhunter recon full --sources=brave
# Paste sites
keyhunter recon full --sources=pastebin
keyhunter recon full --sources=gistpaste
keyhunter recon full --sources=pastesites
```
### Dork Management
```bash
keyhunter dorks list # All dorks across all sources
keyhunter dorks list --source=github # GitHub dorks only
keyhunter dorks list --source=google # Google dorks only
keyhunter dorks add github 'filename:.env "GROQ_API_KEY"'
keyhunter dorks run google --category=frontier
keyhunter dorks export
```
### Key Management
Keys are masked by default in terminal output (shoulder surfing protection). Ways to access full key values:
```bash
# Show full keys in scan output
keyhunter scan . --unmask
# JSON export always includes full keys
keyhunter scan . --output=json > results.json
# Key management commands
keyhunter keys list # Masked list
keyhunter keys list --unmask # Full key list
keyhunter keys show <id> # Single key full details (always unmasked)
keyhunter keys copy <id> # Copy key to clipboard
keyhunter keys export --format=json # Export all keys with full values
keyhunter keys verify <id> # Verify key + show full details
keyhunter keys delete <id> # Remove key from database
```
**Example `keyhunter keys show` output:**
```
ID: a3f7b2c1
Provider: OpenAI
Pattern: OpenAI Project Key
Key: sk-proj-abc123def456ghi789jkl012mno345pqr678stu901vwx234
Confidence: HIGH
Source: src/config.py:42
Found: 2026-04-04 14:32:01
Scan ID: scan_001
Status: ACTIVE (verified 2026-04-04 14:32:05)
Org: my-org
Rate Limit: 500 req/min
Revoke URL: https://platform.openai.com/api-keys
```
### Import External Tools
```bash
# Run TruffleHog, then enrich with KeyHunter
trufflehog git . --json > trufflehog.json
keyhunter import --format=trufflehog trufflehog.json
# Run Gitleaks, then enrich
gitleaks detect -f json -r gitleaks.json
keyhunter import --format=gitleaks gitleaks.json
# Gitleaks CSV
gitleaks detect -f csv -r gitleaks.csv
keyhunter import --format=gitleaks-csv gitleaks.csv
```
### CI/CD Integration
KeyHunter ships with a git **pre-commit hook** that blocks leaks before they land in
history, a **GitHub Actions** integration that uploads SARIF findings directly into
the repository's Code Scanning tab, and an `import` command that consolidates
TruffleHog and Gitleaks output into one normalized database.
```bash
# Install pre-commit hook (scans staged files only)
keyhunter hook install
# GitHub Actions (SARIF output for Code Scanning upload)
keyhunter scan . --output sarif > keyhunter.sarif
# Import findings from other scanners
keyhunter import --format=trufflehog trufflehog.json
keyhunter import --format=gitleaks gitleaks.json
# Exit codes: 0 = clean, 1 = keys found, 2 = error
keyhunter scan . && echo "Clean" || echo "Keys found!"
```
See [docs/CI-CD.md](docs/CI-CD.md) for the full guide, including a copy-paste
GitHub Actions workflow and the pre-commit hook install/uninstall lifecycle.
---
## Configuration
```bash
# Initialize config
keyhunter config init
# Creates ~/.keyhunter.yaml
# Set API tokens for recon sources (currently supported)
keyhunter config set recon.github.token "YOUR_GITHUB_TOKEN"
keyhunter config set recon.gitlab.token "YOUR_GITLAB_TOKEN"
keyhunter config set recon.bitbucket.token "YOUR_BITBUCKET_TOKEN"
keyhunter config set recon.huggingface.token "YOUR_HF_TOKEN"
keyhunter config set recon.kaggle.token "YOUR_KAGGLE_TOKEN"
keyhunter config set recon.google.apikey "YOUR_GOOGLE_API_KEY"
keyhunter config set recon.google.cx "YOUR_GOOGLE_CX_ID"
keyhunter config set recon.bing.apikey "YOUR_BING_API_KEY"
keyhunter config set recon.brave.apikey "YOUR_BRAVE_API_KEY"
keyhunter config set recon.yandex.apikey "YOUR_YANDEX_API_KEY"
keyhunter config set recon.yandex.user "YOUR_YANDEX_USER"
# View current config
keyhunter config get recon.github.token
```
### Config File (`~/.keyhunter.yaml`)
```yaml
scan:
workers: 8
verify_timeout: 10s
default_output: table
recon:
stealth: false
respect_robots: true
github:
token: ""
gitlab:
token: ""
bitbucket:
token: ""
huggingface:
token: ""
kaggle:
token: ""
google:
apikey: ""
cx: ""
bing:
apikey: ""
brave:
apikey: ""
yandex:
apikey: ""
user: ""
```
### Stealth & Ethics Flags
```bash
--stealth # User-agent rotation, increased request spacing
--respect-robots # Respect robots.txt (default: on)
```
---
## Supported Providers (108)
### Tier 1 -- Frontier
| Provider | Key Pattern | Confidence | Verify |
|----------|-------------|------------|--------|
| OpenAI | `sk-proj-*`, `sk-svcacct-*` | High | `GET /v1/models` |
| Anthropic | `sk-ant-api03-*` | High | `GET /v1/models` |
| Google AI (Gemini) | `AIza*` | High | `GET /v1/models` |
| Google Vertex AI | OAuth token | Medium | `GET /v1/models` |
| AWS Bedrock | `AKIA*` | High | `GetFoundationModel` |
| Azure OpenAI | 32-char hex | Medium | `GET /openai/deployments` |
| Meta AI | `meta-llama-*` | Medium | `GET /v1/models` |
| xAI (Grok) | `xai-*` | High | `GET /v1/models` |
| Cohere | `co-*` | High | `GET /v1/models` |
| Mistral AI | 32-char generic | Low | `GET /v1/models` |
| Inflection AI | Generic UUID | Low | `GET /api/models` |
| AI21 Labs | Generic key | Low | `GET /v1/models` |
### Tier 2 -- Inference Platforms
| Provider | Key Pattern | Confidence | Verify |
|----------|-------------|------------|--------|
| Together AI | Generic key | Low | `GET /v1/models` |
| Fireworks AI | `fw_*` | High | `GET /v1/models` |
| Groq | `gsk_*` | High | `GET /openai/v1/models` |
| Replicate | `r8_*` | High | `GET /v1/predictions` |
| Anyscale | Generic key | Low | `GET /v1/models` |
| DeepInfra | Generic key | Low | `GET /v1/models` |
| Lepton AI | `lpt_*` | High | `GET /v1/models` |
| Modal | Generic token | Low | `GET /api/apps` |
| Baseten | Generic key | Low | `GET /v1/models` |
| Cerebrium | Generic key | Low | `GET /v1/models` |
| NovitaAI | Generic key | Low | `GET /v1/models` |
| Sambanova | Generic key | Low | `GET /v1/models` |
| OctoAI | Generic key | Low | `GET /v1/models` |
| Friendli AI | Generic key | Low | `GET /v1/models` |
### Tier 3 -- Specialized/Vertical
| Provider | Key Pattern | Confidence | Verify |
|----------|-------------|------------|--------|
| Perplexity | `pplx-*` | High | `GET /chat/completions` |
| You.com | Generic key | Low | `GET /v1/search` |
| Voyage AI | `voy-*` | High | `GET /v1/models` |
| Jina AI | `jina_*` | High | `GET /v1/models` |
| Unstructured | Generic key | Low | `GET /general/v0/general` |
| AssemblyAI | Generic key | Low | `GET /v2/transcript` |
| Deepgram | Generic key | Low | `GET /v1/projects` |
| ElevenLabs | `el_*` | High | `GET /v1/user` |
| Stability AI | `sk-*` | Medium | `GET /v1/engines/list` |
| Runway ML | Generic key | Low | `GET /v1/models` |
| Midjourney | Generic key | Low | N/A |
| HuggingFace | `hf_*` | High | `GET /api/whoami` |
### Tier 4 -- Chinese/Regional
| Provider | Key Pattern | Confidence | Verify |
|----------|-------------|------------|--------|
| DeepSeek | `sk-*` | Medium | `GET /v1/models` |
| Baichuan | Generic key | Low | `GET /v1/models` |
| Zhipu AI (GLM) | Generic key | Low | `POST /api/paas/v4/chat` |
| Moonshot AI (Kimi) | `sk-*` | Medium | `GET /v1/models` |
| Yi (01.AI) | Generic key | Low | `GET /v1/models` |
| Qwen (Alibaba) | `sk-*` | Medium | `GET /v1/models` |
| Baidu (ERNIE) | API Key + Secret | Medium | Token endpoint |
| ByteDance (Doubao) | Generic key | Low | `GET /v1/models` |
| SenseTime | Generic key | Low | `GET /v1/models` |
| iFlytek (Spark) | API Key + Secret | Medium | WebSocket handshake |
| MiniMax | Generic key | Low | `GET /v1/models` |
| Stepfun | Generic key | Low | `GET /v1/models` |
| 360 AI | Generic key | Low | `GET /v1/models` |
| Kuaishou (Kling) | Generic key | Low | `GET /v1/models` |
| Tencent Hunyuan | SecretId + SecretKey | Medium | `DescribeModels` |
| SiliconFlow | `sf_*` | High | `GET /v1/models` |
### Tier 5 -- Infrastructure/Gateway
| Provider | Key Pattern | Confidence | Verify |
|----------|-------------|------------|--------|
| Cloudflare AI | Cloudflare API token | Medium | `GET /ai/models` |
| Vercel AI | `vercel_*` | High | `GET /v1/models` |
| LiteLLM | Generic key | Low | `GET /v1/models` |
| Portkey | Generic key | Low | `GET /v1/models` |
| Helicone | `sk-helicone-*` | High | `GET /v1/models` |
| OpenRouter | `sk-or-*` | High | `GET /api/v1/models` |
| Martian | Generic key | Low | `GET /v1/models` |
| AI Gateway (Kong) | Generic key | Low | Health endpoint |
| BricksAI | Generic key | Low | `GET /v1/models` |
| Aether | Generic key | Low | `GET /v1/models` |
| Not Diamond | Generic key | Low | `GET /v1/models` |
### Tier 6 -- Emerging/Niche
| Provider | Key Pattern | Confidence | Verify |
|----------|-------------|------------|--------|
| Reka AI | Generic key | Low | `GET /v1/models` |
| Aleph Alpha | Generic key | Low | `GET /models` |
| Writer | Generic key | Low | `GET /v1/models` |
| Jasper AI | Generic key | Low | N/A |
| Typeface | Generic key | Low | N/A |
| Comet ML | Generic key | Low | `GET /api/rest/v2` |
| Weights & Biases | Generic key | Low | `GET /api/v1/viewer` |
| LangSmith | `ls__*` | High | `GET /api/v1/info` |
| Pinecone | Generic key | Low | `GET /databases` |
| Weaviate | Generic key | Low | `GET /v1/meta` |
| Qdrant | Generic key | Low | `GET /collections` |
| Chroma | Generic key | Low | `GET /api/v1/heartbeat` |
| Milvus | Generic key | Low | `GET /v1/vector/collections` |
| Neon AI | Generic key | Low | N/A |
| Lamini | Generic key | Low | `GET /v1/models` |
### Tier 7 -- Code & Dev Tools
| Provider | Key Pattern | Confidence | Verify |
|----------|-------------|------------|--------|
| GitHub Copilot | `ghu_*`, `ghp_*` | High | `GET /user` |
| Cursor | Generic key | Low | N/A |
| Tabnine | Generic key | Low | N/A |
| Codeium/Windsurf | Generic key | Low | N/A |
| Sourcegraph Cody | `sgp_*` | High | `GET /.api/current-user` |
| Amazon CodeWhisperer | `AKIA*` | High | STS GetCallerIdentity |
| Replit AI | Generic key | Low | N/A |
| Codestral (Mistral) | Generic key | Low | `GET /v1/models` |
| IBM watsonx.ai | `ibm_*` | Medium | IAM token endpoint |
| Oracle AI | Generic key | Low | N/A |
### Tier 8 -- Self-Hosted/Open Infra
| Provider | Key Pattern | Confidence | Verify |
|----------|-------------|------------|--------|
| Ollama | N/A (local) | N/A | `GET /api/tags` |
| vLLM | Generic key | Low | `GET /v1/models` |
| LocalAI | Generic key | Low | `GET /v1/models` |
| LM Studio | N/A (local) | N/A | `GET /v1/models` |
| llama.cpp | N/A (local) | N/A | `GET /health` |
| GPT4All | N/A (local) | N/A | N/A |
| text-generation-webui | Generic key | Low | `GET /v1/models` |
| TensorRT-LLM | N/A | N/A | Health endpoint |
| Triton Inference Server | N/A | N/A | `GET /v2/health/ready` |
| Jan AI | N/A (local) | N/A | `GET /v1/models` |
### Tier 9 -- Enterprise/Legacy
| Provider | Key Pattern | Confidence | Verify |
|----------|-------------|------------|--------|
| Salesforce Einstein | Generic token | Low | REST API |
| ServiceNow AI | Generic token | Low | REST API |
| SAP AI Core | OAuth token | Low | Token endpoint |
| Palantir AIP | Generic token | Low | REST API |
| Databricks (DBRX) | `dapi*` | High | `GET /api/2.0/clusters` |
| Snowflake Cortex | JWT token | Medium | SQL endpoint |
| Oracle Generative AI | Generic key | Low | REST API |
| HPE GreenLake AI | Generic token | Low | REST API |
---
## Architecture
```
+------------------+
| CLI (Cobra) |
+--------+---------+
|
+--------------+--------------+
| | |
+--------v--+ +------v-----+ +-----v------+
| Input | | Recon | | Import |
| Adapters | | Engine | | Adapters |
| - file | | (18 live) | | - trufflehog|
| - dir | | - Code(10) | | - gitleaks |
| - git | | - Search(5)| +-----+------+
| - stdin | | - Paste(3) | |
| - url | +------+-----+ |
| - clipboard| | |
+--------+---+ | |
| | |
+-------+------+--------------+
|
+-------v--------+
| Scanner Engine |
| - matcher.go |
| - verifier.go |
+-------+--------+
|
+------------+-------------+
| | |
+-----v----+ +----v-----+ +----v-------+
| Output | | Dork | | Key |
| - table | | Engine | | Management |
| - json | | - 150 | | - list |
| - sarif | | dorks | | - show |
| - csv | | - 8 src | | - export |
+----------+ +----------+ +------------+
+------------------------------------------+
| Provider Registry (108+ YAML providers) |
| Dork Registry (150 YAML dorks) |
+------------------------------------------+
```
### Key Design Decisions
- **YAML Providers** -- Adding a new provider = adding a YAML file. No recompile needed for pattern-only changes (when using external provider dir). Built-in providers are embedded at compile time.
- **Keyword Pre-filtering** -- Before running regex, files are scanned for keywords via Aho-Corasick. This provides ~10x speedup on large codebases.
- **Worker Pool** -- Parallel scanning with configurable worker count via ants. Default: CPU count.
- **Delta-based Git Scanning** -- Only scans changes between commits, not entire trees.
- **SQLite Storage** -- All scan results persisted with AES-256 encryption.
---
## Dork Examples (150 Built-in)
### GitHub
```
filename:.env "OPENAI_API_KEY"
filename:.env "ANTHROPIC_API_KEY"
filename:config.yaml "api_key" "sk-"
"sk-proj-" language:python
"sk-ant-api03" language:javascript
filename:docker-compose "API_KEY"
"api_key" extension:ipynb
filename:.toml "api_key" "sk-"
filename:terraform.tfvars "api_key"
```
### Google Dorking
```
"sk-proj-" -github.com -stackoverflow.com
"sk-ant-api03-" filetype:env
"OPENAI_API_KEY" filetype:yml
"ANTHROPIC_API_KEY" filetype:json
inurl:.env "API_KEY"
intitle:"index of" .env
site:pastebin.com "sk-proj-"
site:replit.com "OPENAI_API_KEY"
```
### Shodan (for future IoT recon sources)
```
http.html:"openai" "api_key" port:8080
http.title:"LiteLLM" port:4000
http.html:"ollama" port:11434
http.title:"Kubernetes Dashboard"
```
---
## Use Cases
### Red Team / Pentest
```bash
# Multi-source recon against a target org
keyhunter recon full --sources=github,gitlab,gist,pastebin
# Scan a cloned repository
keyhunter scan ./target-repo/ --verify
# Scan git history for rotated keys
keyhunter scan --git ./target-repo/
```
### DevSecOps / CI Pipeline
```bash
# Pre-commit hook
keyhunter hook install
# GitHub Actions step
- name: KeyHunter Scan
run: keyhunter scan . --output=sarif > keyhunter.sarif
```
### Bug Bounty
```bash
# Search code hosting platforms for leaked keys
keyhunter recon full --sources=github,gitlab,bitbucket,gist,codeberg
keyhunter recon full --sources=huggingface,kaggle,replit,codesandbox
# Search engine dorking
keyhunter recon full --sources=google,bing,duckduckgo,brave
# Paste site monitoring
keyhunter recon full --sources=pastebin,pastesites,gistpaste
```
---
## Security & Ethics
### Built-in Protections
- Key values **masked by default** in terminal (first 8 + last 4 chars) -- use `--unmask` for full keys
- **Full keys always available** via: `--unmask`, `--output=json`, `keyhunter keys show`
- Database is **AES-256 encrypted** (full keys stored encrypted)
- API tokens stored **encrypted** in config
- No key values written to logs during `--verify`
### Rate Limiting (Recon Sources)
| Source | Rate Limit |
|--------|-----------|
| GitHub API (auth) | 30 req/min |
| GitHub API (unauth) | 10 req/min |
| Google Custom Search | 100/day free, 10K/day paid |
| Bing Search | 1,000/month (free) |
| Brave Search | Per API plan |
| Paste sites | 1 req/2sec |
---
## Contributing
### Adding a New Provider
1. Create `providers/your-provider.yaml`:
```yaml
id: your-provider
name: Your Provider
category: emerging
website: https://api.yourprovider.com
confidence: medium
patterns:
- id: your-provider-key
name: "Your Provider API Key"
regex: '\byp_[A-Za-z0-9]{32}\b'
confidence: high
description: "Your Provider API key with yp_ prefix"
keywords:
- "yp_"
- "YOUR_PROVIDER_API_KEY"
verify:
enabled: true
method: GET
url: "https://api.yourprovider.com/v1/models"
headers:
Authorization: "Bearer {{key}}"
success_codes: [200]
failure_codes: [401, 403]
metadata:
docs: "https://docs.yourprovider.com"
key_url: "https://dashboard.yourprovider.com/keys"
env_vars: ["YOUR_PROVIDER_API_KEY"]
```
2. Run tests: `go test ./pkg/provider/...`
3. Submit a PR
### Adding a New Dork
1. Edit `dorks/<source>.yaml` and add your dork entry
2. Submit a PR
---
## Roadmap
- [x] Core scanning engine (file, dir, git, stdin, url, clipboard)
- [x] 108 provider YAML definitions (Tier 1-9)
- [x] Active verification (YAML-driven HTTPVerifier)
- [x] Output formats: table, JSON, CSV, SARIF 2.1.0
- [x] CLI with Cobra (scan, providers, config, keys, import, hook, dorks, recon, legal)
- [x] TruffleHog & Gitleaks import adapters
- [x] Key management (list, show, export, copy, delete, verify)
- [x] Git pre-commit hook (install/uninstall)
- [x] Dork engine with 150 built-in dorks across 8 sources
- [x] OSINT recon framework with 18 live sources
- [ ] IoT scanners (Shodan, Censys, ZoomEye, FOFA, Netlas, BinaryEdge)
- [ ] Cloud storage scanning (S3, GCS, Azure, DigitalOcean)
- [ ] Package registries (npm, PyPI, RubyGems, crates.io, Maven, NuGet)
- [ ] Container & IaC scanning (Docker Hub, Terraform, Helm, Ansible)
- [ ] CI/CD log scanning (GitHub Actions, Travis, CircleCI, Jenkins, GitLab CI)
- [ ] Web archives (Wayback Machine, CommonCrawl)
- [ ] Frontend leak detection (source maps, webpack, .env exposure)
- [ ] Forums & collaboration tools (Stack Overflow, Reddit, Notion, Trello)
- [ ] Threat intel (VirusTotal, Intelligence X, URLhaus)
- [ ] Telegram bot with auto-notifications
- [ ] Scheduled scanning (cron-based)
- [ ] Web dashboard (htmx + Tailwind + SQLite)
- [ ] Docker image
- [ ] Homebrew formula
---
## Disclaimer
KeyHunter is designed for **authorized security testing**, **defensive security**, **bug bounty programs**, and **educational purposes** only. Always ensure you have proper authorization before scanning any target. Unauthorized access to computer systems is illegal.
---
## License
MIT License - see [LICENSE](LICENSE) for details.