# KeyHunter

> The most comprehensive API key scanner for LLM/AI providers. Detect, validate, and monitor leaked API keys across 108+ providers.

[![Go](https://img.shields.io/badge/Go-1.22+-00ADD8?style=flat-square&logo=go)](https://golang.org)
[![License](https://img.shields.io/badge/License-MIT-green?style=flat-square)](LICENSE)
[![Providers](https://img.shields.io/badge/Providers-108+-red?style=flat-square)](providers/)

---

## Why KeyHunter?

Existing tools like TruffleHog (~3 LLM detectors) and Gitleaks (~5 LLM rules) were built for general secret scanning. AI-related credential leaks grew **81% year-over-year** in 2025, yet no tool covers more than ~15 LLM providers.

**KeyHunter fills that gap** with 108+ provider-specific detectors, active key validation, OSINT/recon capabilities, and real-time notifications.

### How It Compares

| Feature | KeyHunter | TruffleHog | Gitleaks | detect-secrets |
|---------|-----------|------------|----------|----------------|
| LLM Providers | **108+** | ~3 | ~5 | ~1 |
| Active Verification | **108+ endpoints** | ~20 types | No | No |
| OSINT/Recon | **Shodan, Censys, GitHub, GitLab, Paste, S3** | No | No | No |
| External Tool Import | **TruffleHog + Gitleaks** | - | - | - |
| Web Dashboard | **Built-in** | No | No | No |
| Telegram Bot | **Built-in** | No | No | No |
| Dork Engine | **Built-in YAML dorks** | No | No | No |
| Provider YAML Plugin | **Community-extensible** | Go code only | TOML rules | Python plugins |
| Scheduled Scanning | **Cron-based** | No | No | No |

---

## Features

### Core Scanning
- **File/Directory scanning** with recursive traversal and glob exclusions
- **Git-aware scanning** — full history, branches, stash, delta-based diffs
- **stdin/pipe** support — `cat dump.txt | keyhunter scan stdin`
- **URL fetching** — scan any remote URL content
- **Clipboard scanning** — instant clipboard content analysis

### OSINT / Recon Engine (80+ Sources, 18 Categories)

**IoT & Internet Scanners**
- **Shodan** — exposed LLM proxies, dashboards, API endpoints
- **Censys** — HTTP body search for leaked credentials
- **ZoomEye** — Chinese IoT scanner, different coverage perspective
- **FOFA** — Asian infrastructure scanning, body content search
- **Netlas** — HTTP response body keyword search
- **BinaryEdge** — internet-wide scan data

**Code Hosting & Snippets**
- **GitHub / GitLab / Bitbucket** — code search with automated dorks
- **Codeberg / Gitea instances** — alternative Git platforms (Gitea auto-discovered via Shodan)
- **Replit / CodeSandbox / StackBlitz / Glitch** — interactive dev environments with hardcoded keys
- **CodePen / JSFiddle / Observable** — browser snippet platforms
- **HuggingFace** — Spaces, repos, model configs (high-yield for LLM keys)
- **Kaggle** — notebooks and datasets with API keys
- **Jupyter / nbviewer** — shared notebooks
- **GitHub Gist** — public gist search
- **Gitpod** — workspace snapshots

**Search Engine Dorking**
- **Google** — Custom Search API / SerpAPI, 100+ built-in dorks
- **Bing** — Azure Cognitive Services search
- **DuckDuckGo / Yandex / Brave** — alternative indexes for broader coverage

**Paste Sites**
- **Multi-paste aggregator** — Pastebin, dpaste, paste.ee, rentry, hastebin, ix.io, and more

**Package Registries**
- **npm / PyPI / RubyGems / crates.io / Maven / NuGet / Packagist / Go modules** — download packages, extract source, scan for key patterns

**Container & Infrastructure**
- **Docker Hub** — image layer scanning, build arg extraction
- **Kubernetes** — exposed dashboards, public Secret/ConfigMap YAML files
- **Terraform** — state files (`.tfstate` with plaintext secrets), registry modules
- **Helm Charts / Ansible Galaxy** — default values with credentials

**Cloud Storage**
- **AWS S3 / GCS / Azure Blob / DigitalOcean Spaces / Backblaze B2** — bucket enumeration and content scanning
- **MinIO** — self-hosted instances discovered via Shodan
- **GrayHatWarfare** — searchable database of public bucket objects

**CI/CD Log Leaks**
- **Travis CI / CircleCI** — public build logs with leaked env vars
- **GitHub Actions** — workflow run log scanning
- **Jenkins** — exposed instances (Shodan-discovered), console output
- **GitLab CI/CD** — public pipeline job traces

**Web Archives**
- **Wayback Machine** — historical snapshots of removed `.env` files, config pages
- **CommonCrawl** — massive web crawl data, WARC record scanning

**Forums & Documentation**
- **Stack Overflow** — API + SEDE queries for code snippets with real keys
- **Reddit** — programming subreddit scanning
- **Hacker News** — Algolia API comment search
- **dev.to / Medium** — tutorial articles with hardcoded keys
- **Telegram groups** — public channels sharing configs and "free API keys"
- **Discord** — indexed public server content

**Collaboration Tools**
- **Notion / Confluence** — public pages and spaces with credentials
- **Trello** — public boards with API key cards
- **Google Docs/Sheets** — publicly shared documents

**Frontend & JavaScript Leaks**
- **JS Source Maps** — original source recovery with inlined secrets
- **Webpack / Vite bundles** — `REACT_APP_*`, `NEXT_PUBLIC_*`, `VITE_*` variable extraction
- **Exposed `.env` files** — misconfigured web servers serving dotenv from root
- **Swagger / OpenAPI docs** — real auth examples in API docs
- **Vercel / Netlify previews** — deploy preview JS bundles with production secrets

**Log Aggregators**
- **Elasticsearch / Kibana** — exposed instances with application logs containing API keys
- **Grafana** — exposed dashboards with datasource configs
- **Sentry** — error tracking capturing request headers with keys

**Threat Intelligence**
- **VirusTotal** — uploaded files/scripts containing embedded keys
- **Intelligence X** — aggregated paste, darknet, and leak search
- **URLhaus** — malicious URLs with API keys in parameters

**Mobile Apps**
- **APK analysis** — download, decompile, grep for key patterns (via apktool/jadx)

**DNS / Subdomain Discovery**
- **crt.sh** — Certificate Transparency log for API subdomain discovery
- **Subdomain probing** — config endpoint enumeration (`.env`, `/api/config`, `/actuator/env`)

**API Marketplaces**
- **Postman** — public collections, workspaces, environments
- **SwaggerHub** — published API definitions with example values

**`recon full`** — parallel sweep across all 80+ sources with deduplication and unified reporting

### Active Verification
- Lightweight API calls to verify if detected keys are active
- Permission and scope extraction (org, rate limits, model access)
- Configurable via `--verify` flag (off by default)
- Provider-specific verification endpoints

### External Tool Integration
- **Import TruffleHog** JSON output — enrich with LLM-specific analysis
- **Import Gitleaks** JSON output — cross-reference with 108+ providers
- Generic CSV import for custom tool output

### Notifications & Dashboard
- **Telegram Bot** — scan triggers, key alerts, recon results
- **Web Dashboard** — htmx + Tailwind, SQLite-backed, real-time scan viewer
- **Webhook** — generic HTTP POST notifications
- **Slack** — workspace notifications
- **Scheduled scans** — cron-based recurring scans with auto-notify

---

## Quick Start

### Install

```bash
# From source
go install github.com/keyhunter/keyhunter@latest

# Binary release
curl -sSL https://get.keyhunter.dev | bash

# Docker
docker pull keyhunter/keyhunter:latest
```

### Basic Usage

```bash
# Scan a directory
keyhunter scan path ./my-project/

# Scan with active verification
keyhunter scan path ./my-project/ --verify

# Scan git history (last 30 days)
keyhunter scan git . --since="30 days ago"

# Scan from pipe
cat secrets.txt | keyhunter scan stdin

# Scan only specific providers
keyhunter scan path . --providers=openai,anthropic,deepseek

# JSON output
keyhunter scan path . --output=json > results.json
```

### OSINT / Recon

```bash
# ── IoT & Internet Scanners ──
keyhunter recon shodan --dork="http.title:\"LiteLLM\" port:4000"
keyhunter recon censys --query='services.http.response.body:"sk-proj-"'
keyhunter recon zoomeye --query='app:"Elasticsearch" +"api_key"'
keyhunter recon fofa --query='body="OPENAI_API_KEY"'
keyhunter recon netlas --query='http.body:"sk-ant-"'

# ── Code Hosting ──
keyhunter recon github --dork=auto               # Tum built-in GitHub dork'lari
keyhunter recon gitlab --dork=auto
keyhunter recon bitbucket --query="OPENAI_API_KEY"
keyhunter recon replit --query="sk-proj-"         # Public repl'ler
keyhunter recon huggingface --spaces --query="api_key"  # HF Spaces
keyhunter recon kaggle --notebooks --query="openai"
keyhunter recon codesandbox --query="sk-ant-"
keyhunter recon glitch --query="ANTHROPIC_API_KEY"
keyhunter recon gitea --instances-from=shodan     # Auto-discover Gitea instances

# ── Search Engine Dorking ──
keyhunter recon google --dork=auto                # 100+ built-in Google dorks
keyhunter recon google --dork='"sk-proj-" -github.com filetype:env'
keyhunter recon bing --dork=auto
keyhunter recon brave --query="OPENAI_API_KEY filetype:yaml"

# ── Package Registries ──
keyhunter recon npm --recent --query="openai"     # Scan yeni paketler
keyhunter recon pypi --recent --query="llm"
keyhunter recon crates --query="api_key"

# ── Cloud Storage ──
keyhunter recon s3 --domain=targetcorp            # S3 bucket enumeration
keyhunter recon gcs --domain=targetcorp           # GCS buckets
keyhunter recon azure --domain=targetcorp         # Azure Blob
keyhunter recon minio --shodan                    # Exposed MinIO instances
keyhunter recon grayhat --query="openai api_key"  # GrayHatWarfare search

# ── CI/CD Logs ──
keyhunter recon ghactions --org=targetcorp        # GitHub Actions logs
keyhunter recon travis --org=targetcorp
keyhunter recon jenkins --shodan                  # Exposed Jenkins instances
keyhunter recon circleci --org=targetcorp

# ── Web Archives ──
keyhunter recon wayback --domain=targetcorp.com   # Wayback Machine
keyhunter recon commoncrawl --domain=targetcorp.com

# ── Frontend & JS ──
keyhunter recon dotenv --domain-list=targets.txt  # Exposed .env files
keyhunter recon sourcemaps --domain=app.target.com  # JS source maps
keyhunter recon webpack --url=https://app.target.com/main.js
keyhunter recon swagger --shodan                  # Exposed Swagger UI's
keyhunter recon deploys --domain=targetcorp       # Vercel/Netlify previews

# ── Forums ──
keyhunter recon stackoverflow --query="sk-proj-"
keyhunter recon reddit --subreddit=openai --query="api key"
keyhunter recon hackernews --query="leaked api key"
keyhunter recon telegram-groups --query="free api key"

# ── Collaboration ──
keyhunter recon notion --query="API_KEY"          # Google dorked
keyhunter recon confluence --shodan               # Exposed instances
keyhunter recon trello --query="openai api key"

# ── Log Aggregators ──
keyhunter recon elasticsearch --shodan            # Exposed ES instances
keyhunter recon grafana --shodan
keyhunter recon sentry --shodan

# ── Threat Intelligence ──
keyhunter recon virustotal --query="sk-proj-"
keyhunter recon intelx --query="sk-ant-api03"     # Intelligence X
keyhunter recon urlhaus --query="openai"

# ── Mobile Apps ──
keyhunter recon apk --query="ai chatbot"          # APK download + decompile

# ── DNS/Subdomain ──
keyhunter recon crtsh --domain=targetcorp.com     # Cert transparency
keyhunter recon subdomain --domain=targetcorp.com --probe-configs

# ── Full Sweep ──
keyhunter recon full --providers=openai,anthropic  # ALL 80+ sources parallel
keyhunter recon full --categories=code,cloud       # Category-filtered sweep

# ── Dork Management ──
keyhunter dorks list                               # All dorks across all sources
keyhunter dorks list --source=github
keyhunter dorks list --source=google
keyhunter dorks add github 'filename:.env "GROQ_API_KEY"'
keyhunter dorks run google --category=frontier     # Run Google dorks for frontier providers
keyhunter dorks export
```

### Viewing Full API Keys

Default olarak key'ler terminalde maskelenir (omuz surfing koruması). Gerçek key'e erişim yolları:

```bash
# 1. CLI'da --unmask flag'i ile tam key gör
keyhunter scan path . --unmask
#  Provider    | Key                                          | Confidence | File          | Line | Status
# ─────────────┼──────────────────────────────────────────────┼────────────┼───────────────┼──────┼────────
#  OpenAI      | sk-proj-abc123def456ghi789jkl012mno345pqr678 | HIGH       | src/config.py | 42   | ACTIVE

# 2. JSON export — her zaman tam key içerir
keyhunter scan path . --output=json > results.json

# 3. Key management komutu — bulunan tüm key'leri yönet
keyhunter keys list                   # Maskelenmiş liste
keyhunter keys list --unmask          # Tam key'li liste
keyhunter keys show <id>              # Tek key tam detay (her zaman unmasked)
keyhunter keys copy <id>              # Key'i clipboard'a kopyala
keyhunter keys export --format=json   # Tüm key'leri tam değerleriyle export et
keyhunter keys verify <id>            # Key'i doğrula + tam detay göster

# 4. Web Dashboard — /keys/:id sayfasında "Reveal Key" butonu
# 5. Telegram Bot — /key <id> komutu ile tam key
```

**Örnek `keyhunter keys show` çıktısı:**
```
 ID:          a3f7b2c1
 Provider:    OpenAI
 Pattern:     OpenAI Project Key
 Key:         sk-proj-abc123def456ghi789jkl012mno345pqr678stu901vwx234
 Confidence:  HIGH
 Source:      src/config.py:42
 Found:       2026-04-04 14:32:01
 Scan ID:     scan_001
 Status:      ACTIVE (verified 2026-04-04 14:32:05)
 Org:         my-org
 Rate Limit:  500 req/min
 Revoke URL:  https://platform.openai.com/api-keys
```

### Verify a Single Key

```bash
keyhunter verify sk-proj-abc123...
# Output:
# Provider:  OpenAI
# Status:    ACTIVE
# Org:       my-org
# Rate Limit: 500 req/min
# Revoke:    https://platform.openai.com/api-keys
```

### Import External Tools

```bash
# Run TruffleHog, then enrich with KeyHunter
trufflehog git . --json > trufflehog.json
keyhunter import trufflehog trufflehog.json --verify

# Run Gitleaks, then enrich
gitleaks detect -r gitleaks.json
keyhunter import gitleaks gitleaks.json
```

### Web Dashboard & Telegram Bot

```bash
# Start web dashboard
keyhunter serve --port=8080

# Start with Telegram bot
keyhunter serve --port=8080 --telegram

# Configure Telegram
keyhunter config set telegram.token "YOUR_BOT_TOKEN"
keyhunter config set telegram.chat_id "YOUR_CHAT_ID"
```

### CI/CD Integration

KeyHunter ships with a git **pre-commit hook** that blocks leaks before they land in
history, a **GitHub Actions** integration that uploads SARIF findings directly into
the repository's Code Scanning tab, and an `import` command that consolidates
TruffleHog and Gitleaks output into one normalized database.

```bash
# Install pre-commit hook (scans staged files only)
keyhunter hook install

# GitHub Actions (SARIF output for Code Scanning upload)
keyhunter scan . --output sarif > keyhunter.sarif

# Import findings from other scanners
keyhunter import --format=trufflehog trufflehog.json
keyhunter import --format=gitleaks   gitleaks.json

# Exit codes: 0 = clean, 1 = keys found, 2 = error
keyhunter scan . && echo "Clean" || echo "Keys found!"
```

See [docs/CI-CD.md](docs/CI-CD.md) for the full guide, including a copy-paste
GitHub Actions workflow and the pre-commit hook install/uninstall lifecycle.

### Scheduled Scanning

```bash
# Daily GitHub recon at 09:00
keyhunter schedule add \
  --name="daily-github" \
  --cron="0 9 * * *" \
  --command="recon github --dork=auto" \
  --notify=telegram

# Hourly paste site monitoring
keyhunter schedule add \
  --name="hourly-paste" \
  --cron="0 * * * *" \
  --command="recon paste --sources=pastebin" \
  --notify=telegram

keyhunter schedule list
keyhunter schedule remove daily-github
```

---

## Configuration

```bash
# Initialize config
keyhunter config init
# Creates ~/.keyhunter.yaml

# Set API keys for recon sources
keyhunter config set shodan.apikey "YOUR_SHODAN_KEY"
keyhunter config set censys.api_id "YOUR_CENSYS_ID"
keyhunter config set censys.api_secret "YOUR_CENSYS_SECRET"
keyhunter config set github.token "YOUR_GITHUB_TOKEN"
keyhunter config set gitlab.token "YOUR_GITLAB_TOKEN"
keyhunter config set zoomeye.apikey "YOUR_ZOOMEYE_KEY"
keyhunter config set fofa.email "YOUR_FOFA_EMAIL"
keyhunter config set fofa.apikey "YOUR_FOFA_KEY"
keyhunter config set netlas.apikey "YOUR_NETLAS_KEY"
keyhunter config set binaryedge.apikey "YOUR_BINARYEDGE_KEY"
keyhunter config set google.cx "YOUR_GOOGLE_CX_ID"
keyhunter config set google.apikey "YOUR_GOOGLE_API_KEY"
keyhunter config set bing.apikey "YOUR_BING_API_KEY"
keyhunter config set brave.apikey "YOUR_BRAVE_API_KEY"
keyhunter config set virustotal.apikey "YOUR_VT_KEY"
keyhunter config set intelx.apikey "YOUR_INTELX_KEY"
keyhunter config set grayhat.apikey "YOUR_GRAYHAT_KEY"
keyhunter config set reddit.client_id "YOUR_REDDIT_ID"
keyhunter config set reddit.client_secret "YOUR_REDDIT_SECRET"
keyhunter config set stackoverflow.apikey "YOUR_SO_KEY"
keyhunter config set kaggle.username "YOUR_KAGGLE_USER"
keyhunter config set kaggle.apikey "YOUR_KAGGLE_KEY"

# Set notification channels
keyhunter config set telegram.token "YOUR_BOT_TOKEN"
keyhunter config set telegram.chat_id "YOUR_CHAT_ID"
keyhunter config set webhook.url "https://your-webhook.com/alert"

# Database encryption
keyhunter config set db.password "YOUR_DB_PASSWORD"
```

### Config File (`~/.keyhunter.yaml`)

```yaml
scan:
  workers: 8
  verify_timeout: 10s
  default_output: table
  respect_robots: true

recon:
  stealth: false
  rate_limits:
    github: 30        # req/min
    shodan: 1         # req/sec
    censys: 5         # req/sec
    zoomeye: 10       # req/sec
    fofa: 1           # req/sec
    netlas: 1         # req/sec
    google: 100       # req/day (Custom Search API)
    bing: 3           # req/sec
    stackoverflow: 30 # req/sec
    hackernews: 100   # req/min
    paste: 0.5        # req/sec
    npm: 10           # req/sec
    pypi: 5           # req/sec
    virustotal: 4     # req/min (free tier)
    intelx: 10        # req/day (free tier)
    grayhat: 5        # req/sec
    wayback: 15       # req/min
    trello: 10        # req/sec
    devto: 1          # req/sec

telegram:
  token: "encrypted:..."
  chat_id: "123456789"
  auto_notify: true

web:
  port: 8080
  auth:
    enabled: false
    username: admin
    password: "encrypted:..."

db:
  path: ~/.keyhunter/keyhunter.db
  encrypted: true
```

---

## Supported Providers (108)

### Tier 1 — Frontier

| Provider | Key Pattern | Confidence | Verify |
|----------|-------------|------------|--------|
| OpenAI | `sk-proj-*`, `sk-svcacct-*` | High | `GET /v1/models` |
| Anthropic | `sk-ant-api03-*` | High | `GET /v1/models` |
| Google AI (Gemini) | `AIza*` | High | `GET /v1/models` |
| Google Vertex AI | OAuth token | Medium | `GET /v1/models` |
| AWS Bedrock | `AKIA*` | High | `GetFoundationModel` |
| Azure OpenAI | 32-char hex | Medium | `GET /openai/deployments` |
| Meta AI | `meta-llama-*` | Medium | `GET /v1/models` |
| xAI (Grok) | `xai-*` | High | `GET /v1/models` |
| Cohere | `co-*` | High | `GET /v1/models` |
| Mistral AI | 32-char generic | Low | `GET /v1/models` |
| Inflection AI | Generic UUID | Low | `GET /api/models` |
| AI21 Labs | Generic key | Low | `GET /v1/models` |

### Tier 2 — Inference Platforms

| Provider | Key Pattern | Confidence | Verify |
|----------|-------------|------------|--------|
| Together AI | Generic key | Low | `GET /v1/models` |
| Fireworks AI | `fw_*` | High | `GET /v1/models` |
| Groq | `gsk_*` | High | `GET /openai/v1/models` |
| Replicate | `r8_*` | High | `GET /v1/predictions` |
| Anyscale | Generic key | Low | `GET /v1/models` |
| DeepInfra | Generic key | Low | `GET /v1/models` |
| Lepton AI | `lpt_*` | High | `GET /v1/models` |
| Modal | Generic token | Low | `GET /api/apps` |
| Baseten | Generic key | Low | `GET /v1/models` |
| Cerebrium | Generic key | Low | `GET /v1/models` |
| NovitaAI | Generic key | Low | `GET /v1/models` |
| Sambanova | Generic key | Low | `GET /v1/models` |
| OctoAI | Generic key | Low | `GET /v1/models` |
| Friendli AI | Generic key | Low | `GET /v1/models` |

### Tier 3 — Specialized/Vertical

| Provider | Key Pattern | Confidence | Verify |
|----------|-------------|------------|--------|
| Perplexity | `pplx-*` | High | `GET /chat/completions` |
| You.com | Generic key | Low | `GET /v1/search` |
| Voyage AI | `voy-*` | High | `GET /v1/models` |
| Jina AI | `jina_*` | High | `GET /v1/models` |
| Unstructured | Generic key | Low | `GET /general/v0/general` |
| AssemblyAI | Generic key | Low | `GET /v2/transcript` |
| Deepgram | Generic key | Low | `GET /v1/projects` |
| ElevenLabs | `el_*` | High | `GET /v1/user` |
| Stability AI | `sk-*` | Medium | `GET /v1/engines/list` |
| Runway ML | Generic key | Low | `GET /v1/models` |
| Midjourney | Generic key | Low | N/A |
| HuggingFace | `hf_*` | High | `GET /api/whoami` |

### Tier 4 — Chinese/Regional

| Provider | Key Pattern | Confidence | Verify |
|----------|-------------|------------|--------|
| DeepSeek | `sk-*` | Medium | `GET /v1/models` |
| Baichuan | Generic key | Low | `GET /v1/models` |
| Zhipu AI (GLM) | Generic key | Low | `POST /api/paas/v4/chat` |
| Moonshot AI (Kimi) | `sk-*` | Medium | `GET /v1/models` |
| Yi (01.AI) | Generic key | Low | `GET /v1/models` |
| Qwen (Alibaba) | `sk-*` | Medium | `GET /v1/models` |
| Baidu (ERNIE) | API Key + Secret | Medium | Token endpoint |
| ByteDance (Doubao) | Generic key | Low | `GET /v1/models` |
| SenseTime | Generic key | Low | `GET /v1/models` |
| iFlytek (Spark) | API Key + Secret | Medium | WebSocket handshake |
| MiniMax | Generic key | Low | `GET /v1/models` |
| Stepfun | Generic key | Low | `GET /v1/models` |
| 360 AI | Generic key | Low | `GET /v1/models` |
| Kuaishou (Kling) | Generic key | Low | `GET /v1/models` |
| Tencent Hunyuan | SecretId + SecretKey | Medium | `DescribeModels` |
| SiliconFlow | `sf_*` | High | `GET /v1/models` |

### Tier 5 — Infrastructure/Gateway

| Provider | Key Pattern | Confidence | Verify |
|----------|-------------|------------|--------|
| Cloudflare AI | Cloudflare API token | Medium | `GET /ai/models` |
| Vercel AI | `vercel_*` | High | `GET /v1/models` |
| LiteLLM | Generic key | Low | `GET /v1/models` |
| Portkey | Generic key | Low | `GET /v1/models` |
| Helicone | `sk-helicone-*` | High | `GET /v1/models` |
| OpenRouter | `sk-or-*` | High | `GET /api/v1/models` |
| Martian | Generic key | Low | `GET /v1/models` |
| AI Gateway (Kong) | Generic key | Low | Health endpoint |
| BricksAI | Generic key | Low | `GET /v1/models` |
| Aether | Generic key | Low | `GET /v1/models` |
| Not Diamond | Generic key | Low | `GET /v1/models` |

### Tier 6 — Emerging/Niche

| Provider | Key Pattern | Confidence | Verify |
|----------|-------------|------------|--------|
| Reka AI | Generic key | Low | `GET /v1/models` |
| Aleph Alpha | Generic key | Low | `GET /models` |
| Writer | Generic key | Low | `GET /v1/models` |
| Jasper AI | Generic key | Low | N/A |
| Typeface | Generic key | Low | N/A |
| Comet ML | Generic key | Low | `GET /api/rest/v2` |
| Weights & Biases | Generic key | Low | `GET /api/v1/viewer` |
| LangSmith | `ls__*` | High | `GET /api/v1/info` |
| Pinecone | Generic key | Low | `GET /databases` |
| Weaviate | Generic key | Low | `GET /v1/meta` |
| Qdrant | Generic key | Low | `GET /collections` |
| Chroma | Generic key | Low | `GET /api/v1/heartbeat` |
| Milvus | Generic key | Low | `GET /v1/vector/collections` |
| Neon AI | Generic key | Low | N/A |
| Lamini | Generic key | Low | `GET /v1/models` |

### Tier 7 — Code & Dev Tools

| Provider | Key Pattern | Confidence | Verify |
|----------|-------------|------------|--------|
| GitHub Copilot | `ghu_*`, `ghp_*` | High | `GET /user` |
| Cursor | Generic key | Low | N/A |
| Tabnine | Generic key | Low | N/A |
| Codeium/Windsurf | Generic key | Low | N/A |
| Sourcegraph Cody | `sgp_*` | High | `GET /.api/current-user` |
| Amazon CodeWhisperer | `AKIA*` | High | STS GetCallerIdentity |
| Replit AI | Generic key | Low | N/A |
| Codestral (Mistral) | Generic key | Low | `GET /v1/models` |
| IBM watsonx.ai | `ibm_*` | Medium | IAM token endpoint |
| Oracle AI | Generic key | Low | N/A |

### Tier 8 — Self-Hosted/Open Infra

| Provider | Key Pattern | Confidence | Verify |
|----------|-------------|------------|--------|
| Ollama | N/A (local) | N/A | `GET /api/tags` |
| vLLM | Generic key | Low | `GET /v1/models` |
| LocalAI | Generic key | Low | `GET /v1/models` |
| LM Studio | N/A (local) | N/A | `GET /v1/models` |
| llama.cpp | N/A (local) | N/A | `GET /health` |
| GPT4All | N/A (local) | N/A | N/A |
| text-generation-webui | Generic key | Low | `GET /v1/models` |
| TensorRT-LLM | N/A | N/A | Health endpoint |
| Triton Inference Server | N/A | N/A | `GET /v2/health/ready` |
| Jan AI | N/A (local) | N/A | `GET /v1/models` |

### Tier 9 — Enterprise/Legacy

| Provider | Key Pattern | Confidence | Verify |
|----------|-------------|------------|--------|
| Salesforce Einstein | Generic token | Low | REST API |
| ServiceNow AI | Generic token | Low | REST API |
| SAP AI Core | OAuth token | Low | Token endpoint |
| Palantir AIP | Generic token | Low | REST API |
| Databricks (DBRX) | `dapi*` | High | `GET /api/2.0/clusters` |
| Snowflake Cortex | JWT token | Medium | SQL endpoint |
| Oracle Generative AI | Generic key | Low | REST API |
| HPE GreenLake AI | Generic token | Low | REST API |

---

## Architecture

```
                    +------------------+
                    |   CLI (Cobra)    |
                    +--------+---------+
                             |
              +--------------+--------------+
              |              |              |
     +--------v--+   +------v-----+  +-----v------+
     | Input      |   | Recon      |  | Import     |
     | Adapters   |   | Engine     |  | Adapters   |
     | - file     |   | (80+ src)  |  | - trufflehog|
     | - git      |   | - IoT (6)  |  | - gitleaks |
     | - stdin    |   | - Code(16) |  | - generic  |
     | - url      |   | - Search(5)|  +-----+------+
     | - clipboard|   | - Paste(8+)|        |
     +--------+---+   | - Pkg (8)  |        |
              |        | - Cloud(7) |        |
              |        | - CI/CD(5) |        |
              |        | - Archive2 |        |
              |        | - Forum(7) |        |
              |        | - Collab(4)|        |
              |        | - JS/FE(5) |        |
              |        | - Logs (3) |        |
              |        | - Intel(3) |        |
              |        | - Mobile(1)|        |
              |        | - DNS (2)  |        |
              |        | - API (3)  |        |
              |        +------+-----+        |
              |               |              |
              +-------+-------+--------------+
                      |
              +-------v--------+
              | Scanner Engine |
              | - matcher.go   |
              | - verifier.go  |
              +-------+--------+
                      |
         +------------+-------------+
         |            |             |
   +-----v----+ +----v-----+ +----v-------+
   | Output   | | Notify   | | Web        |
   | - table  | | - telegram| | Dashboard  |
   | - json   | | - webhook| | - htmx     |
   | - sarif  | | - slack  | | - REST API |
   | - csv    | +----------+ | - SQLite   |
   +----------+              +------------+

   +------------------------------------------+
   | Provider Registry (108+ YAML providers)  |
   | Dork Registry (50+ YAML dorks)           |
   +------------------------------------------+
```

### Key Design Decisions

- **YAML Providers** — Adding a new provider = adding a YAML file. No recompile needed for pattern-only changes (when using external provider dir). Built-in providers are embedded at compile time.
- **Keyword Pre-filtering** — Before running regex, files are scanned for keywords. This provides ~10x speedup on large codebases.
- **Worker Pool** — Parallel scanning with configurable worker count. Default: CPU count.
- **Delta-based Git Scanning** — Only scans changes between commits, not entire trees.
- **SQLite Storage** — All scan results persisted with AES-256 encryption.

---

## Security & Ethics

### Built-in Protections
- Key values **masked by default** in terminal (first 8 + last 4 chars) — use `--unmask` for full keys
- **Full keys always available** via: `--unmask`, `--output=json`, `keyhunter keys show`, web dashboard, Telegram bot
- Database is **AES-256 encrypted** (full keys stored encrypted)
- API tokens stored **encrypted** in config
- No key values written to logs during `--verify`
- Web dashboard supports **basic auth / token auth**

### Rate Limiting
| Source | Rate Limit |
|--------|-----------|
| GitHub API (auth) | 30 req/min |
| GitHub API (unauth) | 10 req/min |
| Shodan | Per API plan |
| Censys | 250 queries/day (free) |
| ZoomEye | 10,000 results/month (free) |
| FOFA | 100 results/query (free) |
| Netlas | 50 queries/day (free) |
| Google Custom Search | 100/day free, 10K/day paid |
| Bing Search | 1,000/month (free) |
| Stack Overflow | 300/day (no key), 10K/day (key) |
| HN Algolia | 10,000 req/hour |
| VirusTotal | 4 req/min (free) |
| IntelX | 10 searches/day (free) |
| GrayHatWarfare | Per plan |
| Wayback Machine | ~15 req/min |
| Paste sites | 1 req/2sec |
| npm/PyPI | Generous, be respectful |
| Trello | 100 req/10sec |
| Docker Hub | 100 pulls/6hr (unauth) |

### Stealth & Ethics Flags
```bash
--stealth           # User-agent rotation, increased request spacing
--respect-robots    # Respect robots.txt (default: on)
```

---

## Use Cases

### Red Team / Pentest
```bash
# Full multi-source recon against a target org
keyhunter recon github --query="targetcorp OPENAI_API_KEY"
keyhunter recon gitlab --query="targetcorp api_key"
keyhunter recon shodan --dork='http.html:"targetcorp" "sk-"'
keyhunter recon censys --query='services.http.response.body:"targetcorp" AND "api_key"'
keyhunter recon zoomeye --query='site:targetcorp.com +"api_key"'
keyhunter recon elasticsearch --shodan   # Find exposed ES with leaked keys
keyhunter recon jenkins --shodan         # Exposed Jenkins with build logs
keyhunter recon dotenv --domain-list=targetcorp-subdomains.txt  # .env exposure
keyhunter recon wayback --domain=targetcorp.com  # Historical leaks
keyhunter recon sourcemaps --domain=app.targetcorp.com  # JS source maps
keyhunter recon crtsh --domain=targetcorp.com  # Discover API subdomains
keyhunter recon full --providers=openai,anthropic  # Everything at once
```

### DevSecOps / CI Pipeline
```bash
# Pre-commit hook
keyhunter hook install

# GitHub Actions step
- name: KeyHunter Scan
  run: |
    keyhunter scan path . --output=sarif > keyhunter.sarif
    # Upload to GitHub Security tab
```

### Bug Bounty
```bash
# Comprehensive target recon
keyhunter recon github --org=targetcorp --dork=auto --verify
keyhunter recon gist --query="targetcorp"
keyhunter recon paste --sources=all --query="targetcorp"
keyhunter recon postman --query="targetcorp"
keyhunter recon trello --query="targetcorp api key"
keyhunter recon notion --query="targetcorp API_KEY"
keyhunter recon confluence --shodan
keyhunter recon npm --query="targetcorp"   # Check their published packages
keyhunter recon pypi --query="targetcorp"
keyhunter recon docker --query="targetcorp" --layers  # Docker image layer scan
keyhunter recon apk --query="targetcorp"   # Mobile app decompile
keyhunter recon swagger --domain=api.targetcorp.com
```

### Monitoring / Alerting
```bash
# Continuous monitoring with Telegram alerts
keyhunter schedule add \
  --name="monitor-github" \
  --cron="*/30 * * * *" \
  --command="recon github --dork=auto --providers=openai" \
  --notify=telegram

keyhunter serve --telegram
```

---

## Dork Examples (150+ Built-in)

### GitHub
```
filename:.env "OPENAI_API_KEY"
filename:.env "ANTHROPIC_API_KEY"
filename:config.yaml "api_key" "sk-"
"sk-proj-" language:python
"sk-ant-api03" language:javascript
filename:docker-compose "API_KEY"
"api_key" extension:ipynb
filename:.toml "api_key" "sk-"
filename:terraform.tfvars "api_key"
"kind: Secret" "data:" filename:*.yaml          # K8s secrets
filename:.npmrc "_authToken"                     # npm tokens
filename:requirements.txt "openai" path:.env     # Python projects
```

### GitLab
```
"OPENAI_API_KEY" filename:.env
"sk-ant-" filename:*.py
"api_key" filename:settings.json
```

### Google Dorking
```
"sk-proj-" -github.com -stackoverflow.com        # Outside known code sites
"sk-ant-api03-" filetype:env
"OPENAI_API_KEY" filetype:yml
"ANTHROPIC_API_KEY" filetype:json
inurl:.env "API_KEY"
intitle:"index of" .env
site:pastebin.com "sk-proj-"
site:replit.com "OPENAI_API_KEY"
site:codesandbox.io "sk-ant-"
site:notion.so "API_KEY"
site:trello.com "openai"
site:docs.google.com "sk-proj-"
site:medium.com "ANTHROPIC_API_KEY"
site:dev.to "sk-proj-"
site:huggingface.co "OPENAI_API_KEY"
site:kaggle.com "api_key" "sk-"
intitle:"Swagger UI" "api_key"
inurl:graphql "authorization" "Bearer sk-"
filetype:tfstate "api_key"                       # Terraform state
filetype:ipynb "sk-proj-"                        # Jupyter notebooks
```

### Shodan
```
http.html:"openai" "api_key" port:8080
http.title:"LiteLLM" port:4000
http.html:"ollama" port:11434
http.title:"Kubernetes Dashboard"
"X-Jenkins" "200 OK"
http.title:"Kibana" port:5601
http.title:"Grafana"
http.title:"Swagger UI"
http.title:"Gitea" port:3000
http.html:"PrivateBin"
http.title:"MinIO Browser"
http.title:"Sentry"
http.title:"Confluence"
port:6443 "kube-apiserver"
http.html:"langchain" port:8000
```

### Censys
```
services.http.response.body:"openai" and services.http.response.body:"sk-"
services.http.response.body:"langchain" and services.port:8000
services.http.response.body:"OPENAI_API_KEY"
services.http.response.body:"sk-ant-api03"
```

### ZoomEye
```
app:"Elasticsearch" +"api_key"
app:"Jenkins" +openai
app:"Grafana" +anthropic
app:"Gitea"
```

### FOFA
```
body="sk-proj-"
body="OPENAI_API_KEY"
body="sk-ant-api03"
title="LiteLLM"
title="Swagger UI" && body="api_key"
title="Kibana" && body="authorization"
```

---

## Contributing

### Adding a New Provider

1. Create `providers/your-provider.yaml`:

```yaml
id: your-provider
name: Your Provider
category: emerging
website: https://api.yourprovider.com
confidence: medium

patterns:
  - id: your-provider-key
    name: "Your Provider API Key"
    regex: '\byp_[A-Za-z0-9]{32}\b'
    confidence: high
    description: "Your Provider API key with yp_ prefix"

keywords:
  - "yp_"
  - "YOUR_PROVIDER_API_KEY"

verify:
  enabled: true
  method: GET
  url: "https://api.yourprovider.com/v1/models"
  headers:
    Authorization: "Bearer {{key}}"
  success_codes: [200]
  failure_codes: [401, 403]

metadata:
  docs: "https://docs.yourprovider.com"
  key_url: "https://dashboard.yourprovider.com/keys"
  env_vars: ["YOUR_PROVIDER_API_KEY"]
```

2. Run tests: `go test ./pkg/provider/...`
3. Submit a PR

### Adding a New Dork

1. Edit `dorks/<source>.yaml` and add your dork entry
2. Submit a PR

---

## Roadmap

- [ ] Core scanning engine (file, git, stdin)
- [ ] 108 provider YAML definitions
- [ ] Active verification for all providers
- [ ] CLI with Cobra (scan, verify, import, recon, serve)
- [ ] TruffleHog & Gitleaks import adapters
- [ ] OSINT/Recon engine (Shodan, Censys, GitHub, GitLab, Paste, S3)
- [ ] Built-in dork engine with 50+ dorks
- [ ] Web dashboard (htmx + Tailwind + SQLite)
- [ ] Telegram bot with auto-notifications
- [ ] Scheduled scanning (cron-based)
- [ ] Pre-commit hook & CI/CD integration (SARIF)
- [ ] Docker image
- [ ] Homebrew formula

---

## Disclaimer

KeyHunter is designed for **authorized security testing**, **defensive security**, **bug bounty programs**, and **educational purposes** only. Always ensure you have proper authorization before scanning any target. Unauthorized access to computer systems is illegal.

---

## License

MIT License - see [LICENSE](LICENSE) for details.