807 lines
28 KiB
Markdown
807 lines
28 KiB
Markdown
# KeyHunter
|
|
|
|
> The most comprehensive API key scanner for LLM/AI providers. Detect, validate, and monitor leaked API keys across 108+ providers.
|
|
|
|
[](https://golang.org)
|
|
[](LICENSE)
|
|
[](providers/)
|
|
|
|
---
|
|
|
|
## Why KeyHunter?
|
|
|
|
Existing tools like TruffleHog (~3 LLM detectors) and Gitleaks (~5 LLM rules) were built for general secret scanning. AI-related credential leaks grew **81% year-over-year** in 2025, yet no tool covers more than ~15 LLM providers.
|
|
|
|
**KeyHunter fills that gap** with 108+ provider-specific detectors, active key validation, OSINT/recon capabilities, and a growing set of internet sources for leak discovery.
|
|
|
|
### How It Compares
|
|
|
|
| Feature | KeyHunter | TruffleHog | Gitleaks | detect-secrets |
|
|
|---------|-----------|------------|----------|----------------|
|
|
| LLM Providers | **108+** | ~3 | ~5 | ~1 |
|
|
| Active Verification | **108+ endpoints** | ~20 types | No | No |
|
|
| OSINT/Recon Sources | **18 live** (80+ planned) | No | No | No |
|
|
| External Tool Import | **TruffleHog + Gitleaks** | - | - | - |
|
|
| Dork Engine | **150 built-in YAML dorks** | No | No | No |
|
|
| Pre-commit Hook | **Built-in** | Yes | Yes | Yes |
|
|
| SARIF Output | **Yes** | Yes | Yes | No |
|
|
| Provider YAML Plugin | **Community-extensible** | Go code only | TOML rules | Python plugins |
|
|
| Web Dashboard | Coming soon | No | No | No |
|
|
| Telegram Bot | Coming soon | No | No | No |
|
|
| Scheduled Scanning | Coming soon | No | No | No |
|
|
|
|
---
|
|
|
|
## Features
|
|
|
|
### Implemented
|
|
|
|
#### Core Scanning Engine
|
|
- **3-stage pipeline** -- AC pre-filter, regex match, entropy scoring
|
|
- **ants worker pool** for parallel scanning with configurable worker count
|
|
- **108 provider YAML definitions** (Tier 1-9), dual-located with `go:embed`
|
|
|
|
#### Input Sources
|
|
- **File scanning** -- single file analysis
|
|
- **Directory scanning** -- recursive traversal with glob exclusions and mmap
|
|
- **Git history scanning** -- full commit history analysis
|
|
- **stdin/pipe** support -- `echo "sk-proj-..." | keyhunter scan stdin`
|
|
- **URL fetching** -- scan any remote URL content
|
|
- **Clipboard scanning** -- instant clipboard content analysis
|
|
|
|
#### Active Verification
|
|
- YAML-driven `HTTPVerifier` -- lightweight API calls to verify if detected keys are active
|
|
- Permission and scope extraction (org, rate limits, model access)
|
|
- Consent prompt and `LEGAL.md` for legal safety
|
|
- Configurable via `--verify` flag (off by default)
|
|
|
|
#### Output Formats
|
|
- **Table** -- colored terminal output with key masking (default)
|
|
- **JSON** -- full key values for programmatic consumption
|
|
- **CSV** -- spreadsheet-compatible export
|
|
- **SARIF 2.1.0** -- CI/CD integration (GitHub Code Scanning, etc.)
|
|
- Exit codes: `0` (clean), `1` (findings), `2` (error)
|
|
|
|
#### Key Management
|
|
- `keyhunter keys list` -- list all discovered keys (masked by default)
|
|
- `keyhunter keys show <id>` -- full key details
|
|
- `keyhunter keys export` -- export in JSON/CSV format
|
|
- `keyhunter keys copy <id>` -- copy key to clipboard
|
|
- `keyhunter keys delete <id>` -- remove a key from the database
|
|
- `keyhunter keys verify <id>` -- verify a specific key
|
|
|
|
#### External Tool Import
|
|
- **TruffleHog v3** JSON import with LLM-specific enrichment
|
|
- **Gitleaks** JSON and CSV import
|
|
- Deduplication across imports via `(provider, masked_key, source)` hashing
|
|
|
|
#### Git Pre-commit Hook
|
|
- `keyhunter hook install` -- embedded shell script, blocks leaks before commit
|
|
- `keyhunter hook uninstall` -- clean removal
|
|
- Backup of existing hooks with `--force`
|
|
|
|
#### Dork Engine
|
|
- **150 built-in YAML dorks** across 8 source types (GitHub, GitLab, Google, Shodan, Censys, ZoomEye, FOFA, Bing)
|
|
- GitHub live executor with authenticated API
|
|
- CLI management: `keyhunter dorks list`, `keyhunter dorks list --source=github`, `keyhunter dorks add`, `keyhunter dorks run`, `keyhunter dorks export`
|
|
|
|
#### OSINT / Recon Engine (18 Sources Live)
|
|
|
|
The recon framework provides a `ReconSource` interface with per-source rate limiting, stealth mode, robots.txt compliance, parallel sweep, and result deduplication.
|
|
|
|
**Code Hosting & Snippets** (live)
|
|
- **GitHub** -- code search with automated dorks
|
|
- **GitLab** -- code search
|
|
- **Bitbucket** -- code search
|
|
- **GitHub Gist** -- public gist search
|
|
- **Codeberg** -- alternative Git platform search
|
|
- **HuggingFace** -- Spaces, repos, model configs (high-yield for LLM keys)
|
|
- **Replit** -- public repl search
|
|
- **CodeSandbox** -- sandbox search
|
|
- **StackBlitz Sandboxes** -- sandbox search
|
|
- **Kaggle** -- notebooks and datasets with API keys
|
|
|
|
**Search Engine Dorking** (live)
|
|
- **Google** -- Custom Search API / SerpAPI
|
|
- **Bing** -- Azure Cognitive Services search
|
|
- **DuckDuckGo** -- HTML scraping fallback
|
|
- **Yandex** -- XML API search
|
|
- **Brave** -- Brave Search API
|
|
|
|
**Paste Sites** (live)
|
|
- **Pastebin** -- scraping API
|
|
- **GistPaste** -- paste search
|
|
- **PasteSites** -- multi-paste aggregator
|
|
|
|
**`recon full`** -- parallel sweep across all 18 live sources with deduplication and unified reporting.
|
|
|
|
#### CLI Commands
|
|
| Command | Status |
|
|
|---------|--------|
|
|
| `keyhunter scan` | Implemented |
|
|
| `keyhunter providers list/info/stats` | Implemented |
|
|
| `keyhunter config init/set/get` | Implemented |
|
|
| `keyhunter keys list/show/export/copy/delete/verify` | Implemented |
|
|
| `keyhunter import` | Implemented |
|
|
| `keyhunter hook install/uninstall` | Implemented |
|
|
| `keyhunter dorks list/add/run/export` | Implemented |
|
|
| `keyhunter recon full/list` | Implemented |
|
|
| `keyhunter legal` | Implemented |
|
|
| `keyhunter verify` | Stub |
|
|
| `keyhunter serve` | Stub |
|
|
| `keyhunter schedule` | Stub |
|
|
|
|
### Coming Soon
|
|
|
|
The following features are on the roadmap but not yet implemented:
|
|
|
|
#### Phase 12 -- IoT Scanners & Cloud Storage
|
|
- **Shodan** -- exposed LLM proxies, dashboards, API endpoints
|
|
- **Censys** -- HTTP body search for leaked credentials
|
|
- **ZoomEye** -- IoT scanner
|
|
- **FOFA** -- Asian infrastructure scanning
|
|
- **Netlas** -- HTTP response body search
|
|
- **BinaryEdge** -- internet-wide scan data
|
|
- **AWS S3 / GCS / Azure Blob / DigitalOcean Spaces** -- bucket enumeration and scanning
|
|
|
|
#### Phase 13 -- Package Registries, Containers & IaC
|
|
- **npm / PyPI / RubyGems / crates.io / Maven / NuGet** -- package source scanning
|
|
- **Docker Hub** -- image layer scanning
|
|
- **Terraform / Helm Charts / Ansible** -- IaC scanning
|
|
|
|
#### Phase 14 -- CI/CD Logs, Web Archives & Frontend Leaks
|
|
- **GitHub Actions / Travis CI / CircleCI / Jenkins / GitLab CI** -- public build log scanning
|
|
- **Wayback Machine / CommonCrawl** -- historical web archive scanning
|
|
- **JS Source Maps / Webpack bundles / exposed .env** -- frontend leak detection
|
|
|
|
#### Phase 15 -- Forums & Collaboration
|
|
- **Stack Overflow / Reddit / Hacker News / dev.to / Medium** -- forum scanning
|
|
- **Notion / Confluence / Trello** -- collaboration tool scanning
|
|
- **Elasticsearch / Grafana / Sentry** -- exposed log aggregators
|
|
- **Telegram groups / Discord** -- public channel scanning
|
|
|
|
#### Phase 16 -- Threat Intel, Mobile, DNS & API Marketplaces
|
|
- **VirusTotal / Intelligence X / URLhaus** -- threat intelligence
|
|
- **APK analysis** -- mobile app decompilation
|
|
- **crt.sh / subdomain probing** -- DNS/subdomain discovery
|
|
- **Postman / SwaggerHub** -- API marketplace scanning
|
|
|
|
#### Phase 17 -- Telegram Bot & Scheduler
|
|
- **Telegram Bot** -- scan triggers, key alerts, recon results
|
|
- **Scheduled scanning** -- cron-based recurring scans with auto-notify
|
|
|
|
#### Phase 18 -- Web Dashboard
|
|
- **Web Dashboard** -- htmx + Tailwind, SQLite-backed, real-time scan viewer
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
### Install
|
|
|
|
```bash
|
|
# From source
|
|
go install github.com/salvacybersec/keyhunter@latest
|
|
|
|
# Binary release (when available)
|
|
curl -sSL https://github.com/salvacybersec/keyhunter/releases/latest/download/keyhunter_linux_amd64.tar.gz | tar -xz
|
|
sudo mv keyhunter /usr/local/bin/
|
|
```
|
|
|
|
### Basic Usage
|
|
|
|
```bash
|
|
# Scan a directory
|
|
keyhunter scan ./my-project/
|
|
|
|
# Scan with active verification
|
|
keyhunter scan ./my-project/ --verify
|
|
|
|
# Scan git history
|
|
keyhunter scan --git .
|
|
|
|
# Scan from pipe
|
|
cat secrets.txt | keyhunter scan stdin
|
|
|
|
# Scan only specific providers
|
|
keyhunter scan . --providers=openai,anthropic,deepseek
|
|
|
|
# JSON output
|
|
keyhunter scan . --output=json > results.json
|
|
|
|
# SARIF output for CI/CD
|
|
keyhunter scan . --output=sarif > keyhunter.sarif
|
|
|
|
# CSV output
|
|
keyhunter scan . --output=csv > results.csv
|
|
```
|
|
|
|
### OSINT / Recon
|
|
|
|
```bash
|
|
# Full sweep across all 18 live sources
|
|
keyhunter recon full
|
|
|
|
# Sweep specific sources only
|
|
keyhunter recon full --sources=github,gitlab,gist
|
|
|
|
# List available recon sources
|
|
keyhunter recon list
|
|
|
|
# Code hosting sources
|
|
keyhunter recon full --sources=github
|
|
keyhunter recon full --sources=gitlab
|
|
keyhunter recon full --sources=bitbucket
|
|
keyhunter recon full --sources=gist
|
|
keyhunter recon full --sources=codeberg
|
|
keyhunter recon full --sources=huggingface
|
|
keyhunter recon full --sources=replit
|
|
keyhunter recon full --sources=codesandbox
|
|
keyhunter recon full --sources=sandboxes
|
|
keyhunter recon full --sources=kaggle
|
|
|
|
# Search engine dorking
|
|
keyhunter recon full --sources=google
|
|
keyhunter recon full --sources=bing
|
|
keyhunter recon full --sources=duckduckgo
|
|
keyhunter recon full --sources=yandex
|
|
keyhunter recon full --sources=brave
|
|
|
|
# Paste sites
|
|
keyhunter recon full --sources=pastebin
|
|
keyhunter recon full --sources=gistpaste
|
|
keyhunter recon full --sources=pastesites
|
|
```
|
|
|
|
### Dork Management
|
|
|
|
```bash
|
|
keyhunter dorks list # All dorks across all sources
|
|
keyhunter dorks list --source=github # GitHub dorks only
|
|
keyhunter dorks list --source=google # Google dorks only
|
|
keyhunter dorks add github 'filename:.env "GROQ_API_KEY"'
|
|
keyhunter dorks run google --category=frontier
|
|
keyhunter dorks export
|
|
```
|
|
|
|
### Key Management
|
|
|
|
Keys are masked by default in terminal output (shoulder surfing protection). Ways to access full key values:
|
|
|
|
```bash
|
|
# Show full keys in scan output
|
|
keyhunter scan . --unmask
|
|
|
|
# JSON export always includes full keys
|
|
keyhunter scan . --output=json > results.json
|
|
|
|
# Key management commands
|
|
keyhunter keys list # Masked list
|
|
keyhunter keys list --unmask # Full key list
|
|
keyhunter keys show <id> # Single key full details (always unmasked)
|
|
keyhunter keys copy <id> # Copy key to clipboard
|
|
keyhunter keys export --format=json # Export all keys with full values
|
|
keyhunter keys verify <id> # Verify key + show full details
|
|
keyhunter keys delete <id> # Remove key from database
|
|
```
|
|
|
|
**Example `keyhunter keys show` output:**
|
|
```
|
|
ID: a3f7b2c1
|
|
Provider: OpenAI
|
|
Pattern: OpenAI Project Key
|
|
Key: sk-proj-abc123def456ghi789jkl012mno345pqr678stu901vwx234
|
|
Confidence: HIGH
|
|
Source: src/config.py:42
|
|
Found: 2026-04-04 14:32:01
|
|
Scan ID: scan_001
|
|
Status: ACTIVE (verified 2026-04-04 14:32:05)
|
|
Org: my-org
|
|
Rate Limit: 500 req/min
|
|
Revoke URL: https://platform.openai.com/api-keys
|
|
```
|
|
|
|
### Import External Tools
|
|
|
|
```bash
|
|
# Run TruffleHog, then enrich with KeyHunter
|
|
trufflehog git . --json > trufflehog.json
|
|
keyhunter import --format=trufflehog trufflehog.json
|
|
|
|
# Run Gitleaks, then enrich
|
|
gitleaks detect -f json -r gitleaks.json
|
|
keyhunter import --format=gitleaks gitleaks.json
|
|
|
|
# Gitleaks CSV
|
|
gitleaks detect -f csv -r gitleaks.csv
|
|
keyhunter import --format=gitleaks-csv gitleaks.csv
|
|
```
|
|
|
|
### CI/CD Integration
|
|
|
|
KeyHunter ships with a git **pre-commit hook** that blocks leaks before they land in
|
|
history, a **GitHub Actions** integration that uploads SARIF findings directly into
|
|
the repository's Code Scanning tab, and an `import` command that consolidates
|
|
TruffleHog and Gitleaks output into one normalized database.
|
|
|
|
```bash
|
|
# Install pre-commit hook (scans staged files only)
|
|
keyhunter hook install
|
|
|
|
# GitHub Actions (SARIF output for Code Scanning upload)
|
|
keyhunter scan . --output sarif > keyhunter.sarif
|
|
|
|
# Import findings from other scanners
|
|
keyhunter import --format=trufflehog trufflehog.json
|
|
keyhunter import --format=gitleaks gitleaks.json
|
|
|
|
# Exit codes: 0 = clean, 1 = keys found, 2 = error
|
|
keyhunter scan . && echo "Clean" || echo "Keys found!"
|
|
```
|
|
|
|
See [docs/CI-CD.md](docs/CI-CD.md) for the full guide, including a copy-paste
|
|
GitHub Actions workflow and the pre-commit hook install/uninstall lifecycle.
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
```bash
|
|
# Initialize config
|
|
keyhunter config init
|
|
# Creates ~/.keyhunter.yaml
|
|
|
|
# Set API tokens for recon sources (currently supported)
|
|
keyhunter config set recon.github.token "YOUR_GITHUB_TOKEN"
|
|
keyhunter config set recon.gitlab.token "YOUR_GITLAB_TOKEN"
|
|
keyhunter config set recon.bitbucket.token "YOUR_BITBUCKET_TOKEN"
|
|
keyhunter config set recon.huggingface.token "YOUR_HF_TOKEN"
|
|
keyhunter config set recon.kaggle.token "YOUR_KAGGLE_TOKEN"
|
|
keyhunter config set recon.google.apikey "YOUR_GOOGLE_API_KEY"
|
|
keyhunter config set recon.google.cx "YOUR_GOOGLE_CX_ID"
|
|
keyhunter config set recon.bing.apikey "YOUR_BING_API_KEY"
|
|
keyhunter config set recon.brave.apikey "YOUR_BRAVE_API_KEY"
|
|
keyhunter config set recon.yandex.apikey "YOUR_YANDEX_API_KEY"
|
|
keyhunter config set recon.yandex.user "YOUR_YANDEX_USER"
|
|
|
|
# View current config
|
|
keyhunter config get recon.github.token
|
|
```
|
|
|
|
### Config File (`~/.keyhunter.yaml`)
|
|
|
|
```yaml
|
|
scan:
|
|
workers: 8
|
|
verify_timeout: 10s
|
|
default_output: table
|
|
|
|
recon:
|
|
stealth: false
|
|
respect_robots: true
|
|
github:
|
|
token: ""
|
|
gitlab:
|
|
token: ""
|
|
bitbucket:
|
|
token: ""
|
|
huggingface:
|
|
token: ""
|
|
kaggle:
|
|
token: ""
|
|
google:
|
|
apikey: ""
|
|
cx: ""
|
|
bing:
|
|
apikey: ""
|
|
brave:
|
|
apikey: ""
|
|
yandex:
|
|
apikey: ""
|
|
user: ""
|
|
```
|
|
|
|
### Stealth & Ethics Flags
|
|
```bash
|
|
--stealth # User-agent rotation, increased request spacing
|
|
--respect-robots # Respect robots.txt (default: on)
|
|
```
|
|
|
|
---
|
|
|
|
## Supported Providers (108)
|
|
|
|
### Tier 1 -- Frontier
|
|
|
|
| Provider | Key Pattern | Confidence | Verify |
|
|
|----------|-------------|------------|--------|
|
|
| OpenAI | `sk-proj-*`, `sk-svcacct-*` | High | `GET /v1/models` |
|
|
| Anthropic | `sk-ant-api03-*` | High | `GET /v1/models` |
|
|
| Google AI (Gemini) | `AIza*` | High | `GET /v1/models` |
|
|
| Google Vertex AI | OAuth token | Medium | `GET /v1/models` |
|
|
| AWS Bedrock | `AKIA*` | High | `GetFoundationModel` |
|
|
| Azure OpenAI | 32-char hex | Medium | `GET /openai/deployments` |
|
|
| Meta AI | `meta-llama-*` | Medium | `GET /v1/models` |
|
|
| xAI (Grok) | `xai-*` | High | `GET /v1/models` |
|
|
| Cohere | `co-*` | High | `GET /v1/models` |
|
|
| Mistral AI | 32-char generic | Low | `GET /v1/models` |
|
|
| Inflection AI | Generic UUID | Low | `GET /api/models` |
|
|
| AI21 Labs | Generic key | Low | `GET /v1/models` |
|
|
|
|
### Tier 2 -- Inference Platforms
|
|
|
|
| Provider | Key Pattern | Confidence | Verify |
|
|
|----------|-------------|------------|--------|
|
|
| Together AI | Generic key | Low | `GET /v1/models` |
|
|
| Fireworks AI | `fw_*` | High | `GET /v1/models` |
|
|
| Groq | `gsk_*` | High | `GET /openai/v1/models` |
|
|
| Replicate | `r8_*` | High | `GET /v1/predictions` |
|
|
| Anyscale | Generic key | Low | `GET /v1/models` |
|
|
| DeepInfra | Generic key | Low | `GET /v1/models` |
|
|
| Lepton AI | `lpt_*` | High | `GET /v1/models` |
|
|
| Modal | Generic token | Low | `GET /api/apps` |
|
|
| Baseten | Generic key | Low | `GET /v1/models` |
|
|
| Cerebrium | Generic key | Low | `GET /v1/models` |
|
|
| NovitaAI | Generic key | Low | `GET /v1/models` |
|
|
| Sambanova | Generic key | Low | `GET /v1/models` |
|
|
| OctoAI | Generic key | Low | `GET /v1/models` |
|
|
| Friendli AI | Generic key | Low | `GET /v1/models` |
|
|
|
|
### Tier 3 -- Specialized/Vertical
|
|
|
|
| Provider | Key Pattern | Confidence | Verify |
|
|
|----------|-------------|------------|--------|
|
|
| Perplexity | `pplx-*` | High | `GET /chat/completions` |
|
|
| You.com | Generic key | Low | `GET /v1/search` |
|
|
| Voyage AI | `voy-*` | High | `GET /v1/models` |
|
|
| Jina AI | `jina_*` | High | `GET /v1/models` |
|
|
| Unstructured | Generic key | Low | `GET /general/v0/general` |
|
|
| AssemblyAI | Generic key | Low | `GET /v2/transcript` |
|
|
| Deepgram | Generic key | Low | `GET /v1/projects` |
|
|
| ElevenLabs | `el_*` | High | `GET /v1/user` |
|
|
| Stability AI | `sk-*` | Medium | `GET /v1/engines/list` |
|
|
| Runway ML | Generic key | Low | `GET /v1/models` |
|
|
| Midjourney | Generic key | Low | N/A |
|
|
| HuggingFace | `hf_*` | High | `GET /api/whoami` |
|
|
|
|
### Tier 4 -- Chinese/Regional
|
|
|
|
| Provider | Key Pattern | Confidence | Verify |
|
|
|----------|-------------|------------|--------|
|
|
| DeepSeek | `sk-*` | Medium | `GET /v1/models` |
|
|
| Baichuan | Generic key | Low | `GET /v1/models` |
|
|
| Zhipu AI (GLM) | Generic key | Low | `POST /api/paas/v4/chat` |
|
|
| Moonshot AI (Kimi) | `sk-*` | Medium | `GET /v1/models` |
|
|
| Yi (01.AI) | Generic key | Low | `GET /v1/models` |
|
|
| Qwen (Alibaba) | `sk-*` | Medium | `GET /v1/models` |
|
|
| Baidu (ERNIE) | API Key + Secret | Medium | Token endpoint |
|
|
| ByteDance (Doubao) | Generic key | Low | `GET /v1/models` |
|
|
| SenseTime | Generic key | Low | `GET /v1/models` |
|
|
| iFlytek (Spark) | API Key + Secret | Medium | WebSocket handshake |
|
|
| MiniMax | Generic key | Low | `GET /v1/models` |
|
|
| Stepfun | Generic key | Low | `GET /v1/models` |
|
|
| 360 AI | Generic key | Low | `GET /v1/models` |
|
|
| Kuaishou (Kling) | Generic key | Low | `GET /v1/models` |
|
|
| Tencent Hunyuan | SecretId + SecretKey | Medium | `DescribeModels` |
|
|
| SiliconFlow | `sf_*` | High | `GET /v1/models` |
|
|
|
|
### Tier 5 -- Infrastructure/Gateway
|
|
|
|
| Provider | Key Pattern | Confidence | Verify |
|
|
|----------|-------------|------------|--------|
|
|
| Cloudflare AI | Cloudflare API token | Medium | `GET /ai/models` |
|
|
| Vercel AI | `vercel_*` | High | `GET /v1/models` |
|
|
| LiteLLM | Generic key | Low | `GET /v1/models` |
|
|
| Portkey | Generic key | Low | `GET /v1/models` |
|
|
| Helicone | `sk-helicone-*` | High | `GET /v1/models` |
|
|
| OpenRouter | `sk-or-*` | High | `GET /api/v1/models` |
|
|
| Martian | Generic key | Low | `GET /v1/models` |
|
|
| AI Gateway (Kong) | Generic key | Low | Health endpoint |
|
|
| BricksAI | Generic key | Low | `GET /v1/models` |
|
|
| Aether | Generic key | Low | `GET /v1/models` |
|
|
| Not Diamond | Generic key | Low | `GET /v1/models` |
|
|
|
|
### Tier 6 -- Emerging/Niche
|
|
|
|
| Provider | Key Pattern | Confidence | Verify |
|
|
|----------|-------------|------------|--------|
|
|
| Reka AI | Generic key | Low | `GET /v1/models` |
|
|
| Aleph Alpha | Generic key | Low | `GET /models` |
|
|
| Writer | Generic key | Low | `GET /v1/models` |
|
|
| Jasper AI | Generic key | Low | N/A |
|
|
| Typeface | Generic key | Low | N/A |
|
|
| Comet ML | Generic key | Low | `GET /api/rest/v2` |
|
|
| Weights & Biases | Generic key | Low | `GET /api/v1/viewer` |
|
|
| LangSmith | `ls__*` | High | `GET /api/v1/info` |
|
|
| Pinecone | Generic key | Low | `GET /databases` |
|
|
| Weaviate | Generic key | Low | `GET /v1/meta` |
|
|
| Qdrant | Generic key | Low | `GET /collections` |
|
|
| Chroma | Generic key | Low | `GET /api/v1/heartbeat` |
|
|
| Milvus | Generic key | Low | `GET /v1/vector/collections` |
|
|
| Neon AI | Generic key | Low | N/A |
|
|
| Lamini | Generic key | Low | `GET /v1/models` |
|
|
|
|
### Tier 7 -- Code & Dev Tools
|
|
|
|
| Provider | Key Pattern | Confidence | Verify |
|
|
|----------|-------------|------------|--------|
|
|
| GitHub Copilot | `ghu_*`, `ghp_*` | High | `GET /user` |
|
|
| Cursor | Generic key | Low | N/A |
|
|
| Tabnine | Generic key | Low | N/A |
|
|
| Codeium/Windsurf | Generic key | Low | N/A |
|
|
| Sourcegraph Cody | `sgp_*` | High | `GET /.api/current-user` |
|
|
| Amazon CodeWhisperer | `AKIA*` | High | STS GetCallerIdentity |
|
|
| Replit AI | Generic key | Low | N/A |
|
|
| Codestral (Mistral) | Generic key | Low | `GET /v1/models` |
|
|
| IBM watsonx.ai | `ibm_*` | Medium | IAM token endpoint |
|
|
| Oracle AI | Generic key | Low | N/A |
|
|
|
|
### Tier 8 -- Self-Hosted/Open Infra
|
|
|
|
| Provider | Key Pattern | Confidence | Verify |
|
|
|----------|-------------|------------|--------|
|
|
| Ollama | N/A (local) | N/A | `GET /api/tags` |
|
|
| vLLM | Generic key | Low | `GET /v1/models` |
|
|
| LocalAI | Generic key | Low | `GET /v1/models` |
|
|
| LM Studio | N/A (local) | N/A | `GET /v1/models` |
|
|
| llama.cpp | N/A (local) | N/A | `GET /health` |
|
|
| GPT4All | N/A (local) | N/A | N/A |
|
|
| text-generation-webui | Generic key | Low | `GET /v1/models` |
|
|
| TensorRT-LLM | N/A | N/A | Health endpoint |
|
|
| Triton Inference Server | N/A | N/A | `GET /v2/health/ready` |
|
|
| Jan AI | N/A (local) | N/A | `GET /v1/models` |
|
|
|
|
### Tier 9 -- Enterprise/Legacy
|
|
|
|
| Provider | Key Pattern | Confidence | Verify |
|
|
|----------|-------------|------------|--------|
|
|
| Salesforce Einstein | Generic token | Low | REST API |
|
|
| ServiceNow AI | Generic token | Low | REST API |
|
|
| SAP AI Core | OAuth token | Low | Token endpoint |
|
|
| Palantir AIP | Generic token | Low | REST API |
|
|
| Databricks (DBRX) | `dapi*` | High | `GET /api/2.0/clusters` |
|
|
| Snowflake Cortex | JWT token | Medium | SQL endpoint |
|
|
| Oracle Generative AI | Generic key | Low | REST API |
|
|
| HPE GreenLake AI | Generic token | Low | REST API |
|
|
|
|
---
|
|
|
|
## Architecture
|
|
|
|
```
|
|
+------------------+
|
|
| CLI (Cobra) |
|
|
+--------+---------+
|
|
|
|
|
+--------------+--------------+
|
|
| | |
|
|
+--------v--+ +------v-----+ +-----v------+
|
|
| Input | | Recon | | Import |
|
|
| Adapters | | Engine | | Adapters |
|
|
| - file | | (18 live) | | - trufflehog|
|
|
| - dir | | - Code(10) | | - gitleaks |
|
|
| - git | | - Search(5)| +-----+------+
|
|
| - stdin | | - Paste(3) | |
|
|
| - url | +------+-----+ |
|
|
| - clipboard| | |
|
|
+--------+---+ | |
|
|
| | |
|
|
+-------+------+--------------+
|
|
|
|
|
+-------v--------+
|
|
| Scanner Engine |
|
|
| - matcher.go |
|
|
| - verifier.go |
|
|
+-------+--------+
|
|
|
|
|
+------------+-------------+
|
|
| | |
|
|
+-----v----+ +----v-----+ +----v-------+
|
|
| Output | | Dork | | Key |
|
|
| - table | | Engine | | Management |
|
|
| - json | | - 150 | | - list |
|
|
| - sarif | | dorks | | - show |
|
|
| - csv | | - 8 src | | - export |
|
|
+----------+ +----------+ +------------+
|
|
|
|
+------------------------------------------+
|
|
| Provider Registry (108+ YAML providers) |
|
|
| Dork Registry (150 YAML dorks) |
|
|
+------------------------------------------+
|
|
```
|
|
|
|
### Key Design Decisions
|
|
|
|
- **YAML Providers** -- Adding a new provider = adding a YAML file. No recompile needed for pattern-only changes (when using external provider dir). Built-in providers are embedded at compile time.
|
|
- **Keyword Pre-filtering** -- Before running regex, files are scanned for keywords via Aho-Corasick. This provides ~10x speedup on large codebases.
|
|
- **Worker Pool** -- Parallel scanning with configurable worker count via ants. Default: CPU count.
|
|
- **Delta-based Git Scanning** -- Only scans changes between commits, not entire trees.
|
|
- **SQLite Storage** -- All scan results persisted with AES-256 encryption.
|
|
|
|
---
|
|
|
|
## Dork Examples (150 Built-in)
|
|
|
|
### GitHub
|
|
```
|
|
filename:.env "OPENAI_API_KEY"
|
|
filename:.env "ANTHROPIC_API_KEY"
|
|
filename:config.yaml "api_key" "sk-"
|
|
"sk-proj-" language:python
|
|
"sk-ant-api03" language:javascript
|
|
filename:docker-compose "API_KEY"
|
|
"api_key" extension:ipynb
|
|
filename:.toml "api_key" "sk-"
|
|
filename:terraform.tfvars "api_key"
|
|
```
|
|
|
|
### Google Dorking
|
|
```
|
|
"sk-proj-" -github.com -stackoverflow.com
|
|
"sk-ant-api03-" filetype:env
|
|
"OPENAI_API_KEY" filetype:yml
|
|
"ANTHROPIC_API_KEY" filetype:json
|
|
inurl:.env "API_KEY"
|
|
intitle:"index of" .env
|
|
site:pastebin.com "sk-proj-"
|
|
site:replit.com "OPENAI_API_KEY"
|
|
```
|
|
|
|
### Shodan (for future IoT recon sources)
|
|
```
|
|
http.html:"openai" "api_key" port:8080
|
|
http.title:"LiteLLM" port:4000
|
|
http.html:"ollama" port:11434
|
|
http.title:"Kubernetes Dashboard"
|
|
```
|
|
|
|
---
|
|
|
|
## Use Cases
|
|
|
|
### Red Team / Pentest
|
|
```bash
|
|
# Multi-source recon against a target org
|
|
keyhunter recon full --sources=github,gitlab,gist,pastebin
|
|
|
|
# Scan a cloned repository
|
|
keyhunter scan ./target-repo/ --verify
|
|
|
|
# Scan git history for rotated keys
|
|
keyhunter scan --git ./target-repo/
|
|
```
|
|
|
|
### DevSecOps / CI Pipeline
|
|
```bash
|
|
# Pre-commit hook
|
|
keyhunter hook install
|
|
|
|
# GitHub Actions step
|
|
- name: KeyHunter Scan
|
|
run: keyhunter scan . --output=sarif > keyhunter.sarif
|
|
```
|
|
|
|
### Bug Bounty
|
|
```bash
|
|
# Search code hosting platforms for leaked keys
|
|
keyhunter recon full --sources=github,gitlab,bitbucket,gist,codeberg
|
|
keyhunter recon full --sources=huggingface,kaggle,replit,codesandbox
|
|
|
|
# Search engine dorking
|
|
keyhunter recon full --sources=google,bing,duckduckgo,brave
|
|
|
|
# Paste site monitoring
|
|
keyhunter recon full --sources=pastebin,pastesites,gistpaste
|
|
```
|
|
|
|
---
|
|
|
|
## Security & Ethics
|
|
|
|
### Built-in Protections
|
|
- Key values **masked by default** in terminal (first 8 + last 4 chars) -- use `--unmask` for full keys
|
|
- **Full keys always available** via: `--unmask`, `--output=json`, `keyhunter keys show`
|
|
- Database is **AES-256 encrypted** (full keys stored encrypted)
|
|
- API tokens stored **encrypted** in config
|
|
- No key values written to logs during `--verify`
|
|
|
|
### Rate Limiting (Recon Sources)
|
|
| Source | Rate Limit |
|
|
|--------|-----------|
|
|
| GitHub API (auth) | 30 req/min |
|
|
| GitHub API (unauth) | 10 req/min |
|
|
| Google Custom Search | 100/day free, 10K/day paid |
|
|
| Bing Search | 1,000/month (free) |
|
|
| Brave Search | Per API plan |
|
|
| Paste sites | 1 req/2sec |
|
|
|
|
---
|
|
|
|
## Contributing
|
|
|
|
### Adding a New Provider
|
|
|
|
1. Create `providers/your-provider.yaml`:
|
|
|
|
```yaml
|
|
id: your-provider
|
|
name: Your Provider
|
|
category: emerging
|
|
website: https://api.yourprovider.com
|
|
confidence: medium
|
|
|
|
patterns:
|
|
- id: your-provider-key
|
|
name: "Your Provider API Key"
|
|
regex: '\byp_[A-Za-z0-9]{32}\b'
|
|
confidence: high
|
|
description: "Your Provider API key with yp_ prefix"
|
|
|
|
keywords:
|
|
- "yp_"
|
|
- "YOUR_PROVIDER_API_KEY"
|
|
|
|
verify:
|
|
enabled: true
|
|
method: GET
|
|
url: "https://api.yourprovider.com/v1/models"
|
|
headers:
|
|
Authorization: "Bearer {{key}}"
|
|
success_codes: [200]
|
|
failure_codes: [401, 403]
|
|
|
|
metadata:
|
|
docs: "https://docs.yourprovider.com"
|
|
key_url: "https://dashboard.yourprovider.com/keys"
|
|
env_vars: ["YOUR_PROVIDER_API_KEY"]
|
|
```
|
|
|
|
2. Run tests: `go test ./pkg/provider/...`
|
|
3. Submit a PR
|
|
|
|
### Adding a New Dork
|
|
|
|
1. Edit `dorks/<source>.yaml` and add your dork entry
|
|
2. Submit a PR
|
|
|
|
---
|
|
|
|
## Roadmap
|
|
|
|
- [x] Core scanning engine (file, dir, git, stdin, url, clipboard)
|
|
- [x] 108 provider YAML definitions (Tier 1-9)
|
|
- [x] Active verification (YAML-driven HTTPVerifier)
|
|
- [x] Output formats: table, JSON, CSV, SARIF 2.1.0
|
|
- [x] CLI with Cobra (scan, providers, config, keys, import, hook, dorks, recon, legal)
|
|
- [x] TruffleHog & Gitleaks import adapters
|
|
- [x] Key management (list, show, export, copy, delete, verify)
|
|
- [x] Git pre-commit hook (install/uninstall)
|
|
- [x] Dork engine with 150 built-in dorks across 8 sources
|
|
- [x] OSINT recon framework with 18 live sources
|
|
- [ ] IoT scanners (Shodan, Censys, ZoomEye, FOFA, Netlas, BinaryEdge)
|
|
- [ ] Cloud storage scanning (S3, GCS, Azure, DigitalOcean)
|
|
- [ ] Package registries (npm, PyPI, RubyGems, crates.io, Maven, NuGet)
|
|
- [ ] Container & IaC scanning (Docker Hub, Terraform, Helm, Ansible)
|
|
- [ ] CI/CD log scanning (GitHub Actions, Travis, CircleCI, Jenkins, GitLab CI)
|
|
- [ ] Web archives (Wayback Machine, CommonCrawl)
|
|
- [ ] Frontend leak detection (source maps, webpack, .env exposure)
|
|
- [ ] Forums & collaboration tools (Stack Overflow, Reddit, Notion, Trello)
|
|
- [ ] Threat intel (VirusTotal, Intelligence X, URLhaus)
|
|
- [ ] Telegram bot with auto-notifications
|
|
- [ ] Scheduled scanning (cron-based)
|
|
- [ ] Web dashboard (htmx + Tailwind + SQLite)
|
|
- [ ] Docker image
|
|
- [ ] Homebrew formula
|
|
|
|
---
|
|
|
|
## Disclaimer
|
|
|
|
KeyHunter is designed for **authorized security testing**, **defensive security**, **bug bounty programs**, and **educational purposes** only. Always ensure you have proper authorization before scanning any target. Unauthorized access to computer systems is illegal.
|
|
|
|
---
|
|
|
|
## License
|
|
|
|
MIT License - see [LICENSE](LICENSE) for details.
|