- infrastructure.yaml: 10 dorks covering Tier 5 gateways (OpenRouter, LiteLLM, Portkey, Helicone, Cloudflare AI, Vercel AI) and Tier 8 self-hosted (Ollama, vLLM, LocalAI) - emerging.yaml: 10 dorks covering Tier 4 Chinese providers (DeepSeek, Moonshot, Qwen, Zhipu, MiniMax) and Tier 6 vector DBs (Pinecone, Weaviate, Qdrant, Chroma) plus Writer.com - enterprise.yaml: 5 dorks covering Tier 7 dev tools (Codeium, Tabnine) and Tier 9 enterprise (Databricks, Snowflake Cortex, IBM watsonx) - Registry now loads 50 total GitHub dorks across all 5 categories, mirrored in both dorks/github/ and pkg/dorks/definitions/github/
KeyHunter
The most comprehensive API key scanner for LLM/AI providers. Detect, validate, and monitor leaked API keys across 108+ providers.
Why KeyHunter?
Existing tools like TruffleHog (~3 LLM detectors) and Gitleaks (~5 LLM rules) were built for general secret scanning. AI-related credential leaks grew 81% year-over-year in 2025, yet no tool covers more than ~15 LLM providers.
KeyHunter fills that gap with 108+ provider-specific detectors, active key validation, OSINT/recon capabilities, and real-time notifications.
How It Compares
| Feature | KeyHunter | TruffleHog | Gitleaks | detect-secrets |
|---|---|---|---|---|
| LLM Providers | 108+ | ~3 | ~5 | ~1 |
| Active Verification | 108+ endpoints | ~20 types | No | No |
| OSINT/Recon | Shodan, Censys, GitHub, GitLab, Paste, S3 | No | No | No |
| External Tool Import | TruffleHog + Gitleaks | - | - | - |
| Web Dashboard | Built-in | No | No | No |
| Telegram Bot | Built-in | No | No | No |
| Dork Engine | Built-in YAML dorks | No | No | No |
| Provider YAML Plugin | Community-extensible | Go code only | TOML rules | Python plugins |
| Scheduled Scanning | Cron-based | No | No | No |
Features
Core Scanning
- File/Directory scanning with recursive traversal and glob exclusions
- Git-aware scanning — full history, branches, stash, delta-based diffs
- stdin/pipe support —
cat dump.txt | keyhunter scan stdin - URL fetching — scan any remote URL content
- Clipboard scanning — instant clipboard content analysis
OSINT / Recon Engine (80+ Sources, 18 Categories)
IoT & Internet Scanners
- Shodan — exposed LLM proxies, dashboards, API endpoints
- Censys — HTTP body search for leaked credentials
- ZoomEye — Chinese IoT scanner, different coverage perspective
- FOFA — Asian infrastructure scanning, body content search
- Netlas — HTTP response body keyword search
- BinaryEdge — internet-wide scan data
Code Hosting & Snippets
- GitHub / GitLab / Bitbucket — code search with automated dorks
- Codeberg / Gitea instances — alternative Git platforms (Gitea auto-discovered via Shodan)
- Replit / CodeSandbox / StackBlitz / Glitch — interactive dev environments with hardcoded keys
- CodePen / JSFiddle / Observable — browser snippet platforms
- HuggingFace — Spaces, repos, model configs (high-yield for LLM keys)
- Kaggle — notebooks and datasets with API keys
- Jupyter / nbviewer — shared notebooks
- GitHub Gist — public gist search
- Gitpod — workspace snapshots
Search Engine Dorking
- Google — Custom Search API / SerpAPI, 100+ built-in dorks
- Bing — Azure Cognitive Services search
- DuckDuckGo / Yandex / Brave — alternative indexes for broader coverage
Paste Sites
- Multi-paste aggregator — Pastebin, dpaste, paste.ee, rentry, hastebin, ix.io, and more
Package Registries
- npm / PyPI / RubyGems / crates.io / Maven / NuGet / Packagist / Go modules — download packages, extract source, scan for key patterns
Container & Infrastructure
- Docker Hub — image layer scanning, build arg extraction
- Kubernetes — exposed dashboards, public Secret/ConfigMap YAML files
- Terraform — state files (
.tfstatewith plaintext secrets), registry modules - Helm Charts / Ansible Galaxy — default values with credentials
Cloud Storage
- AWS S3 / GCS / Azure Blob / DigitalOcean Spaces / Backblaze B2 — bucket enumeration and content scanning
- MinIO — self-hosted instances discovered via Shodan
- GrayHatWarfare — searchable database of public bucket objects
CI/CD Log Leaks
- Travis CI / CircleCI — public build logs with leaked env vars
- GitHub Actions — workflow run log scanning
- Jenkins — exposed instances (Shodan-discovered), console output
- GitLab CI/CD — public pipeline job traces
Web Archives
- Wayback Machine — historical snapshots of removed
.envfiles, config pages - CommonCrawl — massive web crawl data, WARC record scanning
Forums & Documentation
- Stack Overflow — API + SEDE queries for code snippets with real keys
- Reddit — programming subreddit scanning
- Hacker News — Algolia API comment search
- dev.to / Medium — tutorial articles with hardcoded keys
- Telegram groups — public channels sharing configs and "free API keys"
- Discord — indexed public server content
Collaboration Tools
- Notion / Confluence — public pages and spaces with credentials
- Trello — public boards with API key cards
- Google Docs/Sheets — publicly shared documents
Frontend & JavaScript Leaks
- JS Source Maps — original source recovery with inlined secrets
- Webpack / Vite bundles —
REACT_APP_*,NEXT_PUBLIC_*,VITE_*variable extraction - Exposed
.envfiles — misconfigured web servers serving dotenv from root - Swagger / OpenAPI docs — real auth examples in API docs
- Vercel / Netlify previews — deploy preview JS bundles with production secrets
Log Aggregators
- Elasticsearch / Kibana — exposed instances with application logs containing API keys
- Grafana — exposed dashboards with datasource configs
- Sentry — error tracking capturing request headers with keys
Threat Intelligence
- VirusTotal — uploaded files/scripts containing embedded keys
- Intelligence X — aggregated paste, darknet, and leak search
- URLhaus — malicious URLs with API keys in parameters
Mobile Apps
- APK analysis — download, decompile, grep for key patterns (via apktool/jadx)
DNS / Subdomain Discovery
- crt.sh — Certificate Transparency log for API subdomain discovery
- Subdomain probing — config endpoint enumeration (
.env,/api/config,/actuator/env)
API Marketplaces
- Postman — public collections, workspaces, environments
- SwaggerHub — published API definitions with example values
recon full — parallel sweep across all 80+ sources with deduplication and unified reporting
Active Verification
- Lightweight API calls to verify if detected keys are active
- Permission and scope extraction (org, rate limits, model access)
- Configurable via
--verifyflag (off by default) - Provider-specific verification endpoints
External Tool Integration
- Import TruffleHog JSON output — enrich with LLM-specific analysis
- Import Gitleaks JSON output — cross-reference with 108+ providers
- Generic CSV import for custom tool output
Notifications & Dashboard
- Telegram Bot — scan triggers, key alerts, recon results
- Web Dashboard — htmx + Tailwind, SQLite-backed, real-time scan viewer
- Webhook — generic HTTP POST notifications
- Slack — workspace notifications
- Scheduled scans — cron-based recurring scans with auto-notify
Quick Start
Install
# From source
go install github.com/keyhunter/keyhunter@latest
# Binary release
curl -sSL https://get.keyhunter.dev | bash
# Docker
docker pull keyhunter/keyhunter:latest
Basic Usage
# Scan a directory
keyhunter scan path ./my-project/
# Scan with active verification
keyhunter scan path ./my-project/ --verify
# Scan git history (last 30 days)
keyhunter scan git . --since="30 days ago"
# Scan from pipe
cat secrets.txt | keyhunter scan stdin
# Scan only specific providers
keyhunter scan path . --providers=openai,anthropic,deepseek
# JSON output
keyhunter scan path . --output=json > results.json
OSINT / Recon
# ── IoT & Internet Scanners ──
keyhunter recon shodan --dork="http.title:\"LiteLLM\" port:4000"
keyhunter recon censys --query='services.http.response.body:"sk-proj-"'
keyhunter recon zoomeye --query='app:"Elasticsearch" +"api_key"'
keyhunter recon fofa --query='body="OPENAI_API_KEY"'
keyhunter recon netlas --query='http.body:"sk-ant-"'
# ── Code Hosting ──
keyhunter recon github --dork=auto # Tum built-in GitHub dork'lari
keyhunter recon gitlab --dork=auto
keyhunter recon bitbucket --query="OPENAI_API_KEY"
keyhunter recon replit --query="sk-proj-" # Public repl'ler
keyhunter recon huggingface --spaces --query="api_key" # HF Spaces
keyhunter recon kaggle --notebooks --query="openai"
keyhunter recon codesandbox --query="sk-ant-"
keyhunter recon glitch --query="ANTHROPIC_API_KEY"
keyhunter recon gitea --instances-from=shodan # Auto-discover Gitea instances
# ── Search Engine Dorking ──
keyhunter recon google --dork=auto # 100+ built-in Google dorks
keyhunter recon google --dork='"sk-proj-" -github.com filetype:env'
keyhunter recon bing --dork=auto
keyhunter recon brave --query="OPENAI_API_KEY filetype:yaml"
# ── Package Registries ──
keyhunter recon npm --recent --query="openai" # Scan yeni paketler
keyhunter recon pypi --recent --query="llm"
keyhunter recon crates --query="api_key"
# ── Cloud Storage ──
keyhunter recon s3 --domain=targetcorp # S3 bucket enumeration
keyhunter recon gcs --domain=targetcorp # GCS buckets
keyhunter recon azure --domain=targetcorp # Azure Blob
keyhunter recon minio --shodan # Exposed MinIO instances
keyhunter recon grayhat --query="openai api_key" # GrayHatWarfare search
# ── CI/CD Logs ──
keyhunter recon ghactions --org=targetcorp # GitHub Actions logs
keyhunter recon travis --org=targetcorp
keyhunter recon jenkins --shodan # Exposed Jenkins instances
keyhunter recon circleci --org=targetcorp
# ── Web Archives ──
keyhunter recon wayback --domain=targetcorp.com # Wayback Machine
keyhunter recon commoncrawl --domain=targetcorp.com
# ── Frontend & JS ──
keyhunter recon dotenv --domain-list=targets.txt # Exposed .env files
keyhunter recon sourcemaps --domain=app.target.com # JS source maps
keyhunter recon webpack --url=https://app.target.com/main.js
keyhunter recon swagger --shodan # Exposed Swagger UI's
keyhunter recon deploys --domain=targetcorp # Vercel/Netlify previews
# ── Forums ──
keyhunter recon stackoverflow --query="sk-proj-"
keyhunter recon reddit --subreddit=openai --query="api key"
keyhunter recon hackernews --query="leaked api key"
keyhunter recon telegram-groups --query="free api key"
# ── Collaboration ──
keyhunter recon notion --query="API_KEY" # Google dorked
keyhunter recon confluence --shodan # Exposed instances
keyhunter recon trello --query="openai api key"
# ── Log Aggregators ──
keyhunter recon elasticsearch --shodan # Exposed ES instances
keyhunter recon grafana --shodan
keyhunter recon sentry --shodan
# ── Threat Intelligence ──
keyhunter recon virustotal --query="sk-proj-"
keyhunter recon intelx --query="sk-ant-api03" # Intelligence X
keyhunter recon urlhaus --query="openai"
# ── Mobile Apps ──
keyhunter recon apk --query="ai chatbot" # APK download + decompile
# ── DNS/Subdomain ──
keyhunter recon crtsh --domain=targetcorp.com # Cert transparency
keyhunter recon subdomain --domain=targetcorp.com --probe-configs
# ── Full Sweep ──
keyhunter recon full --providers=openai,anthropic # ALL 80+ sources parallel
keyhunter recon full --categories=code,cloud # Category-filtered sweep
# ── Dork Management ──
keyhunter dorks list # All dorks across all sources
keyhunter dorks list --source=github
keyhunter dorks list --source=google
keyhunter dorks add github 'filename:.env "GROQ_API_KEY"'
keyhunter dorks run google --category=frontier # Run Google dorks for frontier providers
keyhunter dorks export
Viewing Full API Keys
Default olarak key'ler terminalde maskelenir (omuz surfing koruması). Gerçek key'e erişim yolları:
# 1. CLI'da --unmask flag'i ile tam key gör
keyhunter scan path . --unmask
# Provider | Key | Confidence | File | Line | Status
# ─────────────┼──────────────────────────────────────────────┼────────────┼───────────────┼──────┼────────
# OpenAI | sk-proj-abc123def456ghi789jkl012mno345pqr678 | HIGH | src/config.py | 42 | ACTIVE
# 2. JSON export — her zaman tam key içerir
keyhunter scan path . --output=json > results.json
# 3. Key management komutu — bulunan tüm key'leri yönet
keyhunter keys list # Maskelenmiş liste
keyhunter keys list --unmask # Tam key'li liste
keyhunter keys show <id> # Tek key tam detay (her zaman unmasked)
keyhunter keys copy <id> # Key'i clipboard'a kopyala
keyhunter keys export --format=json # Tüm key'leri tam değerleriyle export et
keyhunter keys verify <id> # Key'i doğrula + tam detay göster
# 4. Web Dashboard — /keys/:id sayfasında "Reveal Key" butonu
# 5. Telegram Bot — /key <id> komutu ile tam key
Örnek keyhunter keys show çıktısı:
ID: a3f7b2c1
Provider: OpenAI
Pattern: OpenAI Project Key
Key: sk-proj-abc123def456ghi789jkl012mno345pqr678stu901vwx234
Confidence: HIGH
Source: src/config.py:42
Found: 2026-04-04 14:32:01
Scan ID: scan_001
Status: ACTIVE (verified 2026-04-04 14:32:05)
Org: my-org
Rate Limit: 500 req/min
Revoke URL: https://platform.openai.com/api-keys
Verify a Single Key
keyhunter verify sk-proj-abc123...
# Output:
# Provider: OpenAI
# Status: ACTIVE
# Org: my-org
# Rate Limit: 500 req/min
# Revoke: https://platform.openai.com/api-keys
Import External Tools
# Run TruffleHog, then enrich with KeyHunter
trufflehog git . --json > trufflehog.json
keyhunter import trufflehog trufflehog.json --verify
# Run Gitleaks, then enrich
gitleaks detect -r gitleaks.json
keyhunter import gitleaks gitleaks.json
Web Dashboard & Telegram Bot
# Start web dashboard
keyhunter serve --port=8080
# Start with Telegram bot
keyhunter serve --port=8080 --telegram
# Configure Telegram
keyhunter config set telegram.token "YOUR_BOT_TOKEN"
keyhunter config set telegram.chat_id "YOUR_CHAT_ID"
CI/CD Integration
KeyHunter ships with a git pre-commit hook that blocks leaks before they land in
history, a GitHub Actions integration that uploads SARIF findings directly into
the repository's Code Scanning tab, and an import command that consolidates
TruffleHog and Gitleaks output into one normalized database.
# Install pre-commit hook (scans staged files only)
keyhunter hook install
# GitHub Actions (SARIF output for Code Scanning upload)
keyhunter scan . --output sarif > keyhunter.sarif
# Import findings from other scanners
keyhunter import --format=trufflehog trufflehog.json
keyhunter import --format=gitleaks gitleaks.json
# Exit codes: 0 = clean, 1 = keys found, 2 = error
keyhunter scan . && echo "Clean" || echo "Keys found!"
See docs/CI-CD.md for the full guide, including a copy-paste GitHub Actions workflow and the pre-commit hook install/uninstall lifecycle.
Scheduled Scanning
# Daily GitHub recon at 09:00
keyhunter schedule add \
--name="daily-github" \
--cron="0 9 * * *" \
--command="recon github --dork=auto" \
--notify=telegram
# Hourly paste site monitoring
keyhunter schedule add \
--name="hourly-paste" \
--cron="0 * * * *" \
--command="recon paste --sources=pastebin" \
--notify=telegram
keyhunter schedule list
keyhunter schedule remove daily-github
Configuration
# Initialize config
keyhunter config init
# Creates ~/.keyhunter.yaml
# Set API keys for recon sources
keyhunter config set shodan.apikey "YOUR_SHODAN_KEY"
keyhunter config set censys.api_id "YOUR_CENSYS_ID"
keyhunter config set censys.api_secret "YOUR_CENSYS_SECRET"
keyhunter config set github.token "YOUR_GITHUB_TOKEN"
keyhunter config set gitlab.token "YOUR_GITLAB_TOKEN"
keyhunter config set zoomeye.apikey "YOUR_ZOOMEYE_KEY"
keyhunter config set fofa.email "YOUR_FOFA_EMAIL"
keyhunter config set fofa.apikey "YOUR_FOFA_KEY"
keyhunter config set netlas.apikey "YOUR_NETLAS_KEY"
keyhunter config set binaryedge.apikey "YOUR_BINARYEDGE_KEY"
keyhunter config set google.cx "YOUR_GOOGLE_CX_ID"
keyhunter config set google.apikey "YOUR_GOOGLE_API_KEY"
keyhunter config set bing.apikey "YOUR_BING_API_KEY"
keyhunter config set brave.apikey "YOUR_BRAVE_API_KEY"
keyhunter config set virustotal.apikey "YOUR_VT_KEY"
keyhunter config set intelx.apikey "YOUR_INTELX_KEY"
keyhunter config set grayhat.apikey "YOUR_GRAYHAT_KEY"
keyhunter config set reddit.client_id "YOUR_REDDIT_ID"
keyhunter config set reddit.client_secret "YOUR_REDDIT_SECRET"
keyhunter config set stackoverflow.apikey "YOUR_SO_KEY"
keyhunter config set kaggle.username "YOUR_KAGGLE_USER"
keyhunter config set kaggle.apikey "YOUR_KAGGLE_KEY"
# Set notification channels
keyhunter config set telegram.token "YOUR_BOT_TOKEN"
keyhunter config set telegram.chat_id "YOUR_CHAT_ID"
keyhunter config set webhook.url "https://your-webhook.com/alert"
# Database encryption
keyhunter config set db.password "YOUR_DB_PASSWORD"
Config File (~/.keyhunter.yaml)
scan:
workers: 8
verify_timeout: 10s
default_output: table
respect_robots: true
recon:
stealth: false
rate_limits:
github: 30 # req/min
shodan: 1 # req/sec
censys: 5 # req/sec
zoomeye: 10 # req/sec
fofa: 1 # req/sec
netlas: 1 # req/sec
google: 100 # req/day (Custom Search API)
bing: 3 # req/sec
stackoverflow: 30 # req/sec
hackernews: 100 # req/min
paste: 0.5 # req/sec
npm: 10 # req/sec
pypi: 5 # req/sec
virustotal: 4 # req/min (free tier)
intelx: 10 # req/day (free tier)
grayhat: 5 # req/sec
wayback: 15 # req/min
trello: 10 # req/sec
devto: 1 # req/sec
telegram:
token: "encrypted:..."
chat_id: "123456789"
auto_notify: true
web:
port: 8080
auth:
enabled: false
username: admin
password: "encrypted:..."
db:
path: ~/.keyhunter/keyhunter.db
encrypted: true
Supported Providers (108)
Tier 1 — Frontier
| Provider | Key Pattern | Confidence | Verify |
|---|---|---|---|
| OpenAI | sk-proj-*, sk-svcacct-* |
High | GET /v1/models |
| Anthropic | sk-ant-api03-* |
High | GET /v1/models |
| Google AI (Gemini) | AIza* |
High | GET /v1/models |
| Google Vertex AI | OAuth token | Medium | GET /v1/models |
| AWS Bedrock | AKIA* |
High | GetFoundationModel |
| Azure OpenAI | 32-char hex | Medium | GET /openai/deployments |
| Meta AI | meta-llama-* |
Medium | GET /v1/models |
| xAI (Grok) | xai-* |
High | GET /v1/models |
| Cohere | co-* |
High | GET /v1/models |
| Mistral AI | 32-char generic | Low | GET /v1/models |
| Inflection AI | Generic UUID | Low | GET /api/models |
| AI21 Labs | Generic key | Low | GET /v1/models |
Tier 2 — Inference Platforms
| Provider | Key Pattern | Confidence | Verify |
|---|---|---|---|
| Together AI | Generic key | Low | GET /v1/models |
| Fireworks AI | fw_* |
High | GET /v1/models |
| Groq | gsk_* |
High | GET /openai/v1/models |
| Replicate | r8_* |
High | GET /v1/predictions |
| Anyscale | Generic key | Low | GET /v1/models |
| DeepInfra | Generic key | Low | GET /v1/models |
| Lepton AI | lpt_* |
High | GET /v1/models |
| Modal | Generic token | Low | GET /api/apps |
| Baseten | Generic key | Low | GET /v1/models |
| Cerebrium | Generic key | Low | GET /v1/models |
| NovitaAI | Generic key | Low | GET /v1/models |
| Sambanova | Generic key | Low | GET /v1/models |
| OctoAI | Generic key | Low | GET /v1/models |
| Friendli AI | Generic key | Low | GET /v1/models |
Tier 3 — Specialized/Vertical
| Provider | Key Pattern | Confidence | Verify |
|---|---|---|---|
| Perplexity | pplx-* |
High | GET /chat/completions |
| You.com | Generic key | Low | GET /v1/search |
| Voyage AI | voy-* |
High | GET /v1/models |
| Jina AI | jina_* |
High | GET /v1/models |
| Unstructured | Generic key | Low | GET /general/v0/general |
| AssemblyAI | Generic key | Low | GET /v2/transcript |
| Deepgram | Generic key | Low | GET /v1/projects |
| ElevenLabs | el_* |
High | GET /v1/user |
| Stability AI | sk-* |
Medium | GET /v1/engines/list |
| Runway ML | Generic key | Low | GET /v1/models |
| Midjourney | Generic key | Low | N/A |
| HuggingFace | hf_* |
High | GET /api/whoami |
Tier 4 — Chinese/Regional
| Provider | Key Pattern | Confidence | Verify |
|---|---|---|---|
| DeepSeek | sk-* |
Medium | GET /v1/models |
| Baichuan | Generic key | Low | GET /v1/models |
| Zhipu AI (GLM) | Generic key | Low | POST /api/paas/v4/chat |
| Moonshot AI (Kimi) | sk-* |
Medium | GET /v1/models |
| Yi (01.AI) | Generic key | Low | GET /v1/models |
| Qwen (Alibaba) | sk-* |
Medium | GET /v1/models |
| Baidu (ERNIE) | API Key + Secret | Medium | Token endpoint |
| ByteDance (Doubao) | Generic key | Low | GET /v1/models |
| SenseTime | Generic key | Low | GET /v1/models |
| iFlytek (Spark) | API Key + Secret | Medium | WebSocket handshake |
| MiniMax | Generic key | Low | GET /v1/models |
| Stepfun | Generic key | Low | GET /v1/models |
| 360 AI | Generic key | Low | GET /v1/models |
| Kuaishou (Kling) | Generic key | Low | GET /v1/models |
| Tencent Hunyuan | SecretId + SecretKey | Medium | DescribeModels |
| SiliconFlow | sf_* |
High | GET /v1/models |
Tier 5 — Infrastructure/Gateway
| Provider | Key Pattern | Confidence | Verify |
|---|---|---|---|
| Cloudflare AI | Cloudflare API token | Medium | GET /ai/models |
| Vercel AI | vercel_* |
High | GET /v1/models |
| LiteLLM | Generic key | Low | GET /v1/models |
| Portkey | Generic key | Low | GET /v1/models |
| Helicone | sk-helicone-* |
High | GET /v1/models |
| OpenRouter | sk-or-* |
High | GET /api/v1/models |
| Martian | Generic key | Low | GET /v1/models |
| AI Gateway (Kong) | Generic key | Low | Health endpoint |
| BricksAI | Generic key | Low | GET /v1/models |
| Aether | Generic key | Low | GET /v1/models |
| Not Diamond | Generic key | Low | GET /v1/models |
Tier 6 — Emerging/Niche
| Provider | Key Pattern | Confidence | Verify |
|---|---|---|---|
| Reka AI | Generic key | Low | GET /v1/models |
| Aleph Alpha | Generic key | Low | GET /models |
| Writer | Generic key | Low | GET /v1/models |
| Jasper AI | Generic key | Low | N/A |
| Typeface | Generic key | Low | N/A |
| Comet ML | Generic key | Low | GET /api/rest/v2 |
| Weights & Biases | Generic key | Low | GET /api/v1/viewer |
| LangSmith | ls__* |
High | GET /api/v1/info |
| Pinecone | Generic key | Low | GET /databases |
| Weaviate | Generic key | Low | GET /v1/meta |
| Qdrant | Generic key | Low | GET /collections |
| Chroma | Generic key | Low | GET /api/v1/heartbeat |
| Milvus | Generic key | Low | GET /v1/vector/collections |
| Neon AI | Generic key | Low | N/A |
| Lamini | Generic key | Low | GET /v1/models |
Tier 7 — Code & Dev Tools
| Provider | Key Pattern | Confidence | Verify |
|---|---|---|---|
| GitHub Copilot | ghu_*, ghp_* |
High | GET /user |
| Cursor | Generic key | Low | N/A |
| Tabnine | Generic key | Low | N/A |
| Codeium/Windsurf | Generic key | Low | N/A |
| Sourcegraph Cody | sgp_* |
High | GET /.api/current-user |
| Amazon CodeWhisperer | AKIA* |
High | STS GetCallerIdentity |
| Replit AI | Generic key | Low | N/A |
| Codestral (Mistral) | Generic key | Low | GET /v1/models |
| IBM watsonx.ai | ibm_* |
Medium | IAM token endpoint |
| Oracle AI | Generic key | Low | N/A |
Tier 8 — Self-Hosted/Open Infra
| Provider | Key Pattern | Confidence | Verify |
|---|---|---|---|
| Ollama | N/A (local) | N/A | GET /api/tags |
| vLLM | Generic key | Low | GET /v1/models |
| LocalAI | Generic key | Low | GET /v1/models |
| LM Studio | N/A (local) | N/A | GET /v1/models |
| llama.cpp | N/A (local) | N/A | GET /health |
| GPT4All | N/A (local) | N/A | N/A |
| text-generation-webui | Generic key | Low | GET /v1/models |
| TensorRT-LLM | N/A | N/A | Health endpoint |
| Triton Inference Server | N/A | N/A | GET /v2/health/ready |
| Jan AI | N/A (local) | N/A | GET /v1/models |
Tier 9 — Enterprise/Legacy
| Provider | Key Pattern | Confidence | Verify |
|---|---|---|---|
| Salesforce Einstein | Generic token | Low | REST API |
| ServiceNow AI | Generic token | Low | REST API |
| SAP AI Core | OAuth token | Low | Token endpoint |
| Palantir AIP | Generic token | Low | REST API |
| Databricks (DBRX) | dapi* |
High | GET /api/2.0/clusters |
| Snowflake Cortex | JWT token | Medium | SQL endpoint |
| Oracle Generative AI | Generic key | Low | REST API |
| HPE GreenLake AI | Generic token | Low | REST API |
Architecture
+------------------+
| CLI (Cobra) |
+--------+---------+
|
+--------------+--------------+
| | |
+--------v--+ +------v-----+ +-----v------+
| Input | | Recon | | Import |
| Adapters | | Engine | | Adapters |
| - file | | (80+ src) | | - trufflehog|
| - git | | - IoT (6) | | - gitleaks |
| - stdin | | - Code(16) | | - generic |
| - url | | - Search(5)| +-----+------+
| - clipboard| | - Paste(8+)| |
+--------+---+ | - Pkg (8) | |
| | - Cloud(7) | |
| | - CI/CD(5) | |
| | - Archive2 | |
| | - Forum(7) | |
| | - Collab(4)| |
| | - JS/FE(5) | |
| | - Logs (3) | |
| | - Intel(3) | |
| | - Mobile(1)| |
| | - DNS (2) | |
| | - API (3) | |
| +------+-----+ |
| | |
+-------+-------+--------------+
|
+-------v--------+
| Scanner Engine |
| - matcher.go |
| - verifier.go |
+-------+--------+
|
+------------+-------------+
| | |
+-----v----+ +----v-----+ +----v-------+
| Output | | Notify | | Web |
| - table | | - telegram| | Dashboard |
| - json | | - webhook| | - htmx |
| - sarif | | - slack | | - REST API |
| - csv | +----------+ | - SQLite |
+----------+ +------------+
+------------------------------------------+
| Provider Registry (108+ YAML providers) |
| Dork Registry (50+ YAML dorks) |
+------------------------------------------+
Key Design Decisions
- YAML Providers — Adding a new provider = adding a YAML file. No recompile needed for pattern-only changes (when using external provider dir). Built-in providers are embedded at compile time.
- Keyword Pre-filtering — Before running regex, files are scanned for keywords. This provides ~10x speedup on large codebases.
- Worker Pool — Parallel scanning with configurable worker count. Default: CPU count.
- Delta-based Git Scanning — Only scans changes between commits, not entire trees.
- SQLite Storage — All scan results persisted with AES-256 encryption.
Security & Ethics
Built-in Protections
- Key values masked by default in terminal (first 8 + last 4 chars) — use
--unmaskfor full keys - Full keys always available via:
--unmask,--output=json,keyhunter keys show, web dashboard, Telegram bot - Database is AES-256 encrypted (full keys stored encrypted)
- API tokens stored encrypted in config
- No key values written to logs during
--verify - Web dashboard supports basic auth / token auth
Rate Limiting
| Source | Rate Limit |
|---|---|
| GitHub API (auth) | 30 req/min |
| GitHub API (unauth) | 10 req/min |
| Shodan | Per API plan |
| Censys | 250 queries/day (free) |
| ZoomEye | 10,000 results/month (free) |
| FOFA | 100 results/query (free) |
| Netlas | 50 queries/day (free) |
| Google Custom Search | 100/day free, 10K/day paid |
| Bing Search | 1,000/month (free) |
| Stack Overflow | 300/day (no key), 10K/day (key) |
| HN Algolia | 10,000 req/hour |
| VirusTotal | 4 req/min (free) |
| IntelX | 10 searches/day (free) |
| GrayHatWarfare | Per plan |
| Wayback Machine | ~15 req/min |
| Paste sites | 1 req/2sec |
| npm/PyPI | Generous, be respectful |
| Trello | 100 req/10sec |
| Docker Hub | 100 pulls/6hr (unauth) |
Stealth & Ethics Flags
--stealth # User-agent rotation, increased request spacing
--respect-robots # Respect robots.txt (default: on)
Use Cases
Red Team / Pentest
# Full multi-source recon against a target org
keyhunter recon github --query="targetcorp OPENAI_API_KEY"
keyhunter recon gitlab --query="targetcorp api_key"
keyhunter recon shodan --dork='http.html:"targetcorp" "sk-"'
keyhunter recon censys --query='services.http.response.body:"targetcorp" AND "api_key"'
keyhunter recon zoomeye --query='site:targetcorp.com +"api_key"'
keyhunter recon elasticsearch --shodan # Find exposed ES with leaked keys
keyhunter recon jenkins --shodan # Exposed Jenkins with build logs
keyhunter recon dotenv --domain-list=targetcorp-subdomains.txt # .env exposure
keyhunter recon wayback --domain=targetcorp.com # Historical leaks
keyhunter recon sourcemaps --domain=app.targetcorp.com # JS source maps
keyhunter recon crtsh --domain=targetcorp.com # Discover API subdomains
keyhunter recon full --providers=openai,anthropic # Everything at once
DevSecOps / CI Pipeline
# Pre-commit hook
keyhunter hook install
# GitHub Actions step
- name: KeyHunter Scan
run: |
keyhunter scan path . --output=sarif > keyhunter.sarif
# Upload to GitHub Security tab
Bug Bounty
# Comprehensive target recon
keyhunter recon github --org=targetcorp --dork=auto --verify
keyhunter recon gist --query="targetcorp"
keyhunter recon paste --sources=all --query="targetcorp"
keyhunter recon postman --query="targetcorp"
keyhunter recon trello --query="targetcorp api key"
keyhunter recon notion --query="targetcorp API_KEY"
keyhunter recon confluence --shodan
keyhunter recon npm --query="targetcorp" # Check their published packages
keyhunter recon pypi --query="targetcorp"
keyhunter recon docker --query="targetcorp" --layers # Docker image layer scan
keyhunter recon apk --query="targetcorp" # Mobile app decompile
keyhunter recon swagger --domain=api.targetcorp.com
Monitoring / Alerting
# Continuous monitoring with Telegram alerts
keyhunter schedule add \
--name="monitor-github" \
--cron="*/30 * * * *" \
--command="recon github --dork=auto --providers=openai" \
--notify=telegram
keyhunter serve --telegram
Dork Examples (150+ Built-in)
GitHub
filename:.env "OPENAI_API_KEY"
filename:.env "ANTHROPIC_API_KEY"
filename:config.yaml "api_key" "sk-"
"sk-proj-" language:python
"sk-ant-api03" language:javascript
filename:docker-compose "API_KEY"
"api_key" extension:ipynb
filename:.toml "api_key" "sk-"
filename:terraform.tfvars "api_key"
"kind: Secret" "data:" filename:*.yaml # K8s secrets
filename:.npmrc "_authToken" # npm tokens
filename:requirements.txt "openai" path:.env # Python projects
GitLab
"OPENAI_API_KEY" filename:.env
"sk-ant-" filename:*.py
"api_key" filename:settings.json
Google Dorking
"sk-proj-" -github.com -stackoverflow.com # Outside known code sites
"sk-ant-api03-" filetype:env
"OPENAI_API_KEY" filetype:yml
"ANTHROPIC_API_KEY" filetype:json
inurl:.env "API_KEY"
intitle:"index of" .env
site:pastebin.com "sk-proj-"
site:replit.com "OPENAI_API_KEY"
site:codesandbox.io "sk-ant-"
site:notion.so "API_KEY"
site:trello.com "openai"
site:docs.google.com "sk-proj-"
site:medium.com "ANTHROPIC_API_KEY"
site:dev.to "sk-proj-"
site:huggingface.co "OPENAI_API_KEY"
site:kaggle.com "api_key" "sk-"
intitle:"Swagger UI" "api_key"
inurl:graphql "authorization" "Bearer sk-"
filetype:tfstate "api_key" # Terraform state
filetype:ipynb "sk-proj-" # Jupyter notebooks
Shodan
http.html:"openai" "api_key" port:8080
http.title:"LiteLLM" port:4000
http.html:"ollama" port:11434
http.title:"Kubernetes Dashboard"
"X-Jenkins" "200 OK"
http.title:"Kibana" port:5601
http.title:"Grafana"
http.title:"Swagger UI"
http.title:"Gitea" port:3000
http.html:"PrivateBin"
http.title:"MinIO Browser"
http.title:"Sentry"
http.title:"Confluence"
port:6443 "kube-apiserver"
http.html:"langchain" port:8000
Censys
services.http.response.body:"openai" and services.http.response.body:"sk-"
services.http.response.body:"langchain" and services.port:8000
services.http.response.body:"OPENAI_API_KEY"
services.http.response.body:"sk-ant-api03"
ZoomEye
app:"Elasticsearch" +"api_key"
app:"Jenkins" +openai
app:"Grafana" +anthropic
app:"Gitea"
FOFA
body="sk-proj-"
body="OPENAI_API_KEY"
body="sk-ant-api03"
title="LiteLLM"
title="Swagger UI" && body="api_key"
title="Kibana" && body="authorization"
Contributing
Adding a New Provider
- Create
providers/your-provider.yaml:
id: your-provider
name: Your Provider
category: emerging
website: https://api.yourprovider.com
confidence: medium
patterns:
- id: your-provider-key
name: "Your Provider API Key"
regex: '\byp_[A-Za-z0-9]{32}\b'
confidence: high
description: "Your Provider API key with yp_ prefix"
keywords:
- "yp_"
- "YOUR_PROVIDER_API_KEY"
verify:
enabled: true
method: GET
url: "https://api.yourprovider.com/v1/models"
headers:
Authorization: "Bearer {{key}}"
success_codes: [200]
failure_codes: [401, 403]
metadata:
docs: "https://docs.yourprovider.com"
key_url: "https://dashboard.yourprovider.com/keys"
env_vars: ["YOUR_PROVIDER_API_KEY"]
- Run tests:
go test ./pkg/provider/... - Submit a PR
Adding a New Dork
- Edit
dorks/<source>.yamland add your dork entry - Submit a PR
Roadmap
- Core scanning engine (file, git, stdin)
- 108 provider YAML definitions
- Active verification for all providers
- CLI with Cobra (scan, verify, import, recon, serve)
- TruffleHog & Gitleaks import adapters
- OSINT/Recon engine (Shodan, Censys, GitHub, GitLab, Paste, S3)
- Built-in dork engine with 50+ dorks
- Web dashboard (htmx + Tailwind + SQLite)
- Telegram bot with auto-notifications
- Scheduled scanning (cron-based)
- Pre-commit hook & CI/CD integration (SARIF)
- Docker image
- Homebrew formula
Disclaimer
KeyHunter is designed for authorized security testing, defensive security, bug bounty programs, and educational purposes only. Always ensure you have proper authorization before scanning any target. Unauthorized access to computer systems is illegal.
License
MIT License - see LICENSE for details.