# KeyHunter > The most comprehensive API key scanner for LLM/AI providers. Detect, validate, and monitor leaked API keys across 108+ providers. [![Go](https://img.shields.io/badge/Go-1.22+-00ADD8?style=flat-square&logo=go)](https://golang.org) [![License](https://img.shields.io/badge/License-MIT-green?style=flat-square)](LICENSE) [![Providers](https://img.shields.io/badge/Providers-108+-red?style=flat-square)](providers/) --- ## Why KeyHunter? Existing tools like TruffleHog (~3 LLM detectors) and Gitleaks (~5 LLM rules) were built for general secret scanning. AI-related credential leaks grew **81% year-over-year** in 2025, yet no tool covers more than ~15 LLM providers. **KeyHunter fills that gap** with 108+ provider-specific detectors, active key validation, OSINT/recon capabilities, and real-time notifications. ### How It Compares | Feature | KeyHunter | TruffleHog | Gitleaks | detect-secrets | |---------|-----------|------------|----------|----------------| | LLM Providers | **108+** | ~3 | ~5 | ~1 | | Active Verification | **108+ endpoints** | ~20 types | No | No | | OSINT/Recon | **Shodan, Censys, GitHub, GitLab, Paste, S3** | No | No | No | | External Tool Import | **TruffleHog + Gitleaks** | - | - | - | | Web Dashboard | **Built-in** | No | No | No | | Telegram Bot | **Built-in** | No | No | No | | Dork Engine | **Built-in YAML dorks** | No | No | No | | Provider YAML Plugin | **Community-extensible** | Go code only | TOML rules | Python plugins | | Scheduled Scanning | **Cron-based** | No | No | No | --- ## Features ### Core Scanning - **File/Directory scanning** with recursive traversal and glob exclusions - **Git-aware scanning** — full history, branches, stash, delta-based diffs - **stdin/pipe** support — `cat dump.txt | keyhunter scan stdin` - **URL fetching** — scan any remote URL content - **Clipboard scanning** — instant clipboard content analysis ### OSINT / Recon Engine (80+ Sources, 18 Categories) **IoT & Internet Scanners** - **Shodan** — exposed LLM proxies, dashboards, API endpoints - **Censys** — HTTP body search for leaked credentials - **ZoomEye** — Chinese IoT scanner, different coverage perspective - **FOFA** — Asian infrastructure scanning, body content search - **Netlas** — HTTP response body keyword search - **BinaryEdge** — internet-wide scan data **Code Hosting & Snippets** - **GitHub / GitLab / Bitbucket** — code search with automated dorks - **Codeberg / Gitea instances** — alternative Git platforms (Gitea auto-discovered via Shodan) - **Replit / CodeSandbox / StackBlitz / Glitch** — interactive dev environments with hardcoded keys - **CodePen / JSFiddle / Observable** — browser snippet platforms - **HuggingFace** — Spaces, repos, model configs (high-yield for LLM keys) - **Kaggle** — notebooks and datasets with API keys - **Jupyter / nbviewer** — shared notebooks - **GitHub Gist** — public gist search - **Gitpod** — workspace snapshots **Search Engine Dorking** - **Google** — Custom Search API / SerpAPI, 100+ built-in dorks - **Bing** — Azure Cognitive Services search - **DuckDuckGo / Yandex / Brave** — alternative indexes for broader coverage **Paste Sites** - **Multi-paste aggregator** — Pastebin, dpaste, paste.ee, rentry, hastebin, ix.io, and more **Package Registries** - **npm / PyPI / RubyGems / crates.io / Maven / NuGet / Packagist / Go modules** — download packages, extract source, scan for key patterns **Container & Infrastructure** - **Docker Hub** — image layer scanning, build arg extraction - **Kubernetes** — exposed dashboards, public Secret/ConfigMap YAML files - **Terraform** — state files (`.tfstate` with plaintext secrets), registry modules - **Helm Charts / Ansible Galaxy** — default values with credentials **Cloud Storage** - **AWS S3 / GCS / Azure Blob / DigitalOcean Spaces / Backblaze B2** — bucket enumeration and content scanning - **MinIO** — self-hosted instances discovered via Shodan - **GrayHatWarfare** — searchable database of public bucket objects **CI/CD Log Leaks** - **Travis CI / CircleCI** — public build logs with leaked env vars - **GitHub Actions** — workflow run log scanning - **Jenkins** — exposed instances (Shodan-discovered), console output - **GitLab CI/CD** — public pipeline job traces **Web Archives** - **Wayback Machine** — historical snapshots of removed `.env` files, config pages - **CommonCrawl** — massive web crawl data, WARC record scanning **Forums & Documentation** - **Stack Overflow** — API + SEDE queries for code snippets with real keys - **Reddit** — programming subreddit scanning - **Hacker News** — Algolia API comment search - **dev.to / Medium** — tutorial articles with hardcoded keys - **Telegram groups** — public channels sharing configs and "free API keys" - **Discord** — indexed public server content **Collaboration Tools** - **Notion / Confluence** — public pages and spaces with credentials - **Trello** — public boards with API key cards - **Google Docs/Sheets** — publicly shared documents **Frontend & JavaScript Leaks** - **JS Source Maps** — original source recovery with inlined secrets - **Webpack / Vite bundles** — `REACT_APP_*`, `NEXT_PUBLIC_*`, `VITE_*` variable extraction - **Exposed `.env` files** — misconfigured web servers serving dotenv from root - **Swagger / OpenAPI docs** — real auth examples in API docs - **Vercel / Netlify previews** — deploy preview JS bundles with production secrets **Log Aggregators** - **Elasticsearch / Kibana** — exposed instances with application logs containing API keys - **Grafana** — exposed dashboards with datasource configs - **Sentry** — error tracking capturing request headers with keys **Threat Intelligence** - **VirusTotal** — uploaded files/scripts containing embedded keys - **Intelligence X** — aggregated paste, darknet, and leak search - **URLhaus** — malicious URLs with API keys in parameters **Mobile Apps** - **APK analysis** — download, decompile, grep for key patterns (via apktool/jadx) **DNS / Subdomain Discovery** - **crt.sh** — Certificate Transparency log for API subdomain discovery - **Subdomain probing** — config endpoint enumeration (`.env`, `/api/config`, `/actuator/env`) **API Marketplaces** - **Postman** — public collections, workspaces, environments - **SwaggerHub** — published API definitions with example values **`recon full`** — parallel sweep across all 80+ sources with deduplication and unified reporting ### Active Verification - Lightweight API calls to verify if detected keys are active - Permission and scope extraction (org, rate limits, model access) - Configurable via `--verify` flag (off by default) - Provider-specific verification endpoints ### External Tool Integration - **Import TruffleHog** JSON output — enrich with LLM-specific analysis - **Import Gitleaks** JSON output — cross-reference with 108+ providers - Generic CSV import for custom tool output ### Notifications & Dashboard - **Telegram Bot** — scan triggers, key alerts, recon results - **Web Dashboard** — htmx + Tailwind, SQLite-backed, real-time scan viewer - **Webhook** — generic HTTP POST notifications - **Slack** — workspace notifications - **Scheduled scans** — cron-based recurring scans with auto-notify --- ## Quick Start ### Install ```bash # From source go install github.com/keyhunter/keyhunter@latest # Binary release curl -sSL https://get.keyhunter.dev | bash # Docker docker pull keyhunter/keyhunter:latest ``` ### Basic Usage ```bash # Scan a directory keyhunter scan path ./my-project/ # Scan with active verification keyhunter scan path ./my-project/ --verify # Scan git history (last 30 days) keyhunter scan git . --since="30 days ago" # Scan from pipe cat secrets.txt | keyhunter scan stdin # Scan only specific providers keyhunter scan path . --providers=openai,anthropic,deepseek # JSON output keyhunter scan path . --output=json > results.json ``` ### OSINT / Recon ```bash # ── IoT & Internet Scanners ── keyhunter recon shodan --dork="http.title:\"LiteLLM\" port:4000" keyhunter recon censys --query='services.http.response.body:"sk-proj-"' keyhunter recon zoomeye --query='app:"Elasticsearch" +"api_key"' keyhunter recon fofa --query='body="OPENAI_API_KEY"' keyhunter recon netlas --query='http.body:"sk-ant-"' # ── Code Hosting ── keyhunter recon github --dork=auto # Tum built-in GitHub dork'lari keyhunter recon gitlab --dork=auto keyhunter recon bitbucket --query="OPENAI_API_KEY" keyhunter recon replit --query="sk-proj-" # Public repl'ler keyhunter recon huggingface --spaces --query="api_key" # HF Spaces keyhunter recon kaggle --notebooks --query="openai" keyhunter recon codesandbox --query="sk-ant-" keyhunter recon glitch --query="ANTHROPIC_API_KEY" keyhunter recon gitea --instances-from=shodan # Auto-discover Gitea instances # ── Search Engine Dorking ── keyhunter recon google --dork=auto # 100+ built-in Google dorks keyhunter recon google --dork='"sk-proj-" -github.com filetype:env' keyhunter recon bing --dork=auto keyhunter recon brave --query="OPENAI_API_KEY filetype:yaml" # ── Package Registries ── keyhunter recon npm --recent --query="openai" # Scan yeni paketler keyhunter recon pypi --recent --query="llm" keyhunter recon crates --query="api_key" # ── Cloud Storage ── keyhunter recon s3 --domain=targetcorp # S3 bucket enumeration keyhunter recon gcs --domain=targetcorp # GCS buckets keyhunter recon azure --domain=targetcorp # Azure Blob keyhunter recon minio --shodan # Exposed MinIO instances keyhunter recon grayhat --query="openai api_key" # GrayHatWarfare search # ── CI/CD Logs ── keyhunter recon ghactions --org=targetcorp # GitHub Actions logs keyhunter recon travis --org=targetcorp keyhunter recon jenkins --shodan # Exposed Jenkins instances keyhunter recon circleci --org=targetcorp # ── Web Archives ── keyhunter recon wayback --domain=targetcorp.com # Wayback Machine keyhunter recon commoncrawl --domain=targetcorp.com # ── Frontend & JS ── keyhunter recon dotenv --domain-list=targets.txt # Exposed .env files keyhunter recon sourcemaps --domain=app.target.com # JS source maps keyhunter recon webpack --url=https://app.target.com/main.js keyhunter recon swagger --shodan # Exposed Swagger UI's keyhunter recon deploys --domain=targetcorp # Vercel/Netlify previews # ── Forums ── keyhunter recon stackoverflow --query="sk-proj-" keyhunter recon reddit --subreddit=openai --query="api key" keyhunter recon hackernews --query="leaked api key" keyhunter recon telegram-groups --query="free api key" # ── Collaboration ── keyhunter recon notion --query="API_KEY" # Google dorked keyhunter recon confluence --shodan # Exposed instances keyhunter recon trello --query="openai api key" # ── Log Aggregators ── keyhunter recon elasticsearch --shodan # Exposed ES instances keyhunter recon grafana --shodan keyhunter recon sentry --shodan # ── Threat Intelligence ── keyhunter recon virustotal --query="sk-proj-" keyhunter recon intelx --query="sk-ant-api03" # Intelligence X keyhunter recon urlhaus --query="openai" # ── Mobile Apps ── keyhunter recon apk --query="ai chatbot" # APK download + decompile # ── DNS/Subdomain ── keyhunter recon crtsh --domain=targetcorp.com # Cert transparency keyhunter recon subdomain --domain=targetcorp.com --probe-configs # ── Full Sweep ── keyhunter recon full --providers=openai,anthropic # ALL 80+ sources parallel keyhunter recon full --categories=code,cloud # Category-filtered sweep # ── Dork Management ── keyhunter dorks list # All dorks across all sources keyhunter dorks list --source=github keyhunter dorks list --source=google keyhunter dorks add github 'filename:.env "GROQ_API_KEY"' keyhunter dorks run google --category=frontier # Run Google dorks for frontier providers keyhunter dorks export ``` ### Viewing Full API Keys Default olarak key'ler terminalde maskelenir (omuz surfing koruması). Gerçek key'e erişim yolları: ```bash # 1. CLI'da --unmask flag'i ile tam key gör keyhunter scan path . --unmask # Provider | Key | Confidence | File | Line | Status # ─────────────┼──────────────────────────────────────────────┼────────────┼───────────────┼──────┼──────── # OpenAI | sk-proj-abc123def456ghi789jkl012mno345pqr678 | HIGH | src/config.py | 42 | ACTIVE # 2. JSON export — her zaman tam key içerir keyhunter scan path . --output=json > results.json # 3. Key management komutu — bulunan tüm key'leri yönet keyhunter keys list # Maskelenmiş liste keyhunter keys list --unmask # Tam key'li liste keyhunter keys show # Tek key tam detay (her zaman unmasked) keyhunter keys copy # Key'i clipboard'a kopyala keyhunter keys export --format=json # Tüm key'leri tam değerleriyle export et keyhunter keys verify # Key'i doğrula + tam detay göster # 4. Web Dashboard — /keys/:id sayfasında "Reveal Key" butonu # 5. Telegram Bot — /key komutu ile tam key ``` **Örnek `keyhunter keys show` çıktısı:** ``` ID: a3f7b2c1 Provider: OpenAI Pattern: OpenAI Project Key Key: sk-proj-abc123def456ghi789jkl012mno345pqr678stu901vwx234 Confidence: HIGH Source: src/config.py:42 Found: 2026-04-04 14:32:01 Scan ID: scan_001 Status: ACTIVE (verified 2026-04-04 14:32:05) Org: my-org Rate Limit: 500 req/min Revoke URL: https://platform.openai.com/api-keys ``` ### Verify a Single Key ```bash keyhunter verify sk-proj-abc123... # Output: # Provider: OpenAI # Status: ACTIVE # Org: my-org # Rate Limit: 500 req/min # Revoke: https://platform.openai.com/api-keys ``` ### Import External Tools ```bash # Run TruffleHog, then enrich with KeyHunter trufflehog git . --json > trufflehog.json keyhunter import trufflehog trufflehog.json --verify # Run Gitleaks, then enrich gitleaks detect -r gitleaks.json keyhunter import gitleaks gitleaks.json ``` ### Web Dashboard & Telegram Bot ```bash # Start web dashboard keyhunter serve --port=8080 # Start with Telegram bot keyhunter serve --port=8080 --telegram # Configure Telegram keyhunter config set telegram.token "YOUR_BOT_TOKEN" keyhunter config set telegram.chat_id "YOUR_CHAT_ID" ``` ### CI/CD Integration KeyHunter ships with a git **pre-commit hook** that blocks leaks before they land in history, a **GitHub Actions** integration that uploads SARIF findings directly into the repository's Code Scanning tab, and an `import` command that consolidates TruffleHog and Gitleaks output into one normalized database. ```bash # Install pre-commit hook (scans staged files only) keyhunter hook install # GitHub Actions (SARIF output for Code Scanning upload) keyhunter scan . --output sarif > keyhunter.sarif # Import findings from other scanners keyhunter import --format=trufflehog trufflehog.json keyhunter import --format=gitleaks gitleaks.json # Exit codes: 0 = clean, 1 = keys found, 2 = error keyhunter scan . && echo "Clean" || echo "Keys found!" ``` See [docs/CI-CD.md](docs/CI-CD.md) for the full guide, including a copy-paste GitHub Actions workflow and the pre-commit hook install/uninstall lifecycle. ### Scheduled Scanning ```bash # Daily GitHub recon at 09:00 keyhunter schedule add \ --name="daily-github" \ --cron="0 9 * * *" \ --command="recon github --dork=auto" \ --notify=telegram # Hourly paste site monitoring keyhunter schedule add \ --name="hourly-paste" \ --cron="0 * * * *" \ --command="recon paste --sources=pastebin" \ --notify=telegram keyhunter schedule list keyhunter schedule remove daily-github ``` --- ## Configuration ```bash # Initialize config keyhunter config init # Creates ~/.keyhunter.yaml # Set API keys for recon sources keyhunter config set shodan.apikey "YOUR_SHODAN_KEY" keyhunter config set censys.api_id "YOUR_CENSYS_ID" keyhunter config set censys.api_secret "YOUR_CENSYS_SECRET" keyhunter config set github.token "YOUR_GITHUB_TOKEN" keyhunter config set gitlab.token "YOUR_GITLAB_TOKEN" keyhunter config set zoomeye.apikey "YOUR_ZOOMEYE_KEY" keyhunter config set fofa.email "YOUR_FOFA_EMAIL" keyhunter config set fofa.apikey "YOUR_FOFA_KEY" keyhunter config set netlas.apikey "YOUR_NETLAS_KEY" keyhunter config set binaryedge.apikey "YOUR_BINARYEDGE_KEY" keyhunter config set google.cx "YOUR_GOOGLE_CX_ID" keyhunter config set google.apikey "YOUR_GOOGLE_API_KEY" keyhunter config set bing.apikey "YOUR_BING_API_KEY" keyhunter config set brave.apikey "YOUR_BRAVE_API_KEY" keyhunter config set virustotal.apikey "YOUR_VT_KEY" keyhunter config set intelx.apikey "YOUR_INTELX_KEY" keyhunter config set grayhat.apikey "YOUR_GRAYHAT_KEY" keyhunter config set reddit.client_id "YOUR_REDDIT_ID" keyhunter config set reddit.client_secret "YOUR_REDDIT_SECRET" keyhunter config set stackoverflow.apikey "YOUR_SO_KEY" keyhunter config set kaggle.username "YOUR_KAGGLE_USER" keyhunter config set kaggle.apikey "YOUR_KAGGLE_KEY" # Set notification channels keyhunter config set telegram.token "YOUR_BOT_TOKEN" keyhunter config set telegram.chat_id "YOUR_CHAT_ID" keyhunter config set webhook.url "https://your-webhook.com/alert" # Database encryption keyhunter config set db.password "YOUR_DB_PASSWORD" ``` ### Config File (`~/.keyhunter.yaml`) ```yaml scan: workers: 8 verify_timeout: 10s default_output: table respect_robots: true recon: stealth: false rate_limits: github: 30 # req/min shodan: 1 # req/sec censys: 5 # req/sec zoomeye: 10 # req/sec fofa: 1 # req/sec netlas: 1 # req/sec google: 100 # req/day (Custom Search API) bing: 3 # req/sec stackoverflow: 30 # req/sec hackernews: 100 # req/min paste: 0.5 # req/sec npm: 10 # req/sec pypi: 5 # req/sec virustotal: 4 # req/min (free tier) intelx: 10 # req/day (free tier) grayhat: 5 # req/sec wayback: 15 # req/min trello: 10 # req/sec devto: 1 # req/sec telegram: token: "encrypted:..." chat_id: "123456789" auto_notify: true web: port: 8080 auth: enabled: false username: admin password: "encrypted:..." db: path: ~/.keyhunter/keyhunter.db encrypted: true ``` --- ## Supported Providers (108) ### Tier 1 — Frontier | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| | OpenAI | `sk-proj-*`, `sk-svcacct-*` | High | `GET /v1/models` | | Anthropic | `sk-ant-api03-*` | High | `GET /v1/models` | | Google AI (Gemini) | `AIza*` | High | `GET /v1/models` | | Google Vertex AI | OAuth token | Medium | `GET /v1/models` | | AWS Bedrock | `AKIA*` | High | `GetFoundationModel` | | Azure OpenAI | 32-char hex | Medium | `GET /openai/deployments` | | Meta AI | `meta-llama-*` | Medium | `GET /v1/models` | | xAI (Grok) | `xai-*` | High | `GET /v1/models` | | Cohere | `co-*` | High | `GET /v1/models` | | Mistral AI | 32-char generic | Low | `GET /v1/models` | | Inflection AI | Generic UUID | Low | `GET /api/models` | | AI21 Labs | Generic key | Low | `GET /v1/models` | ### Tier 2 — Inference Platforms | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| | Together AI | Generic key | Low | `GET /v1/models` | | Fireworks AI | `fw_*` | High | `GET /v1/models` | | Groq | `gsk_*` | High | `GET /openai/v1/models` | | Replicate | `r8_*` | High | `GET /v1/predictions` | | Anyscale | Generic key | Low | `GET /v1/models` | | DeepInfra | Generic key | Low | `GET /v1/models` | | Lepton AI | `lpt_*` | High | `GET /v1/models` | | Modal | Generic token | Low | `GET /api/apps` | | Baseten | Generic key | Low | `GET /v1/models` | | Cerebrium | Generic key | Low | `GET /v1/models` | | NovitaAI | Generic key | Low | `GET /v1/models` | | Sambanova | Generic key | Low | `GET /v1/models` | | OctoAI | Generic key | Low | `GET /v1/models` | | Friendli AI | Generic key | Low | `GET /v1/models` | ### Tier 3 — Specialized/Vertical | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| | Perplexity | `pplx-*` | High | `GET /chat/completions` | | You.com | Generic key | Low | `GET /v1/search` | | Voyage AI | `voy-*` | High | `GET /v1/models` | | Jina AI | `jina_*` | High | `GET /v1/models` | | Unstructured | Generic key | Low | `GET /general/v0/general` | | AssemblyAI | Generic key | Low | `GET /v2/transcript` | | Deepgram | Generic key | Low | `GET /v1/projects` | | ElevenLabs | `el_*` | High | `GET /v1/user` | | Stability AI | `sk-*` | Medium | `GET /v1/engines/list` | | Runway ML | Generic key | Low | `GET /v1/models` | | Midjourney | Generic key | Low | N/A | | HuggingFace | `hf_*` | High | `GET /api/whoami` | ### Tier 4 — Chinese/Regional | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| | DeepSeek | `sk-*` | Medium | `GET /v1/models` | | Baichuan | Generic key | Low | `GET /v1/models` | | Zhipu AI (GLM) | Generic key | Low | `POST /api/paas/v4/chat` | | Moonshot AI (Kimi) | `sk-*` | Medium | `GET /v1/models` | | Yi (01.AI) | Generic key | Low | `GET /v1/models` | | Qwen (Alibaba) | `sk-*` | Medium | `GET /v1/models` | | Baidu (ERNIE) | API Key + Secret | Medium | Token endpoint | | ByteDance (Doubao) | Generic key | Low | `GET /v1/models` | | SenseTime | Generic key | Low | `GET /v1/models` | | iFlytek (Spark) | API Key + Secret | Medium | WebSocket handshake | | MiniMax | Generic key | Low | `GET /v1/models` | | Stepfun | Generic key | Low | `GET /v1/models` | | 360 AI | Generic key | Low | `GET /v1/models` | | Kuaishou (Kling) | Generic key | Low | `GET /v1/models` | | Tencent Hunyuan | SecretId + SecretKey | Medium | `DescribeModels` | | SiliconFlow | `sf_*` | High | `GET /v1/models` | ### Tier 5 — Infrastructure/Gateway | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| | Cloudflare AI | Cloudflare API token | Medium | `GET /ai/models` | | Vercel AI | `vercel_*` | High | `GET /v1/models` | | LiteLLM | Generic key | Low | `GET /v1/models` | | Portkey | Generic key | Low | `GET /v1/models` | | Helicone | `sk-helicone-*` | High | `GET /v1/models` | | OpenRouter | `sk-or-*` | High | `GET /api/v1/models` | | Martian | Generic key | Low | `GET /v1/models` | | AI Gateway (Kong) | Generic key | Low | Health endpoint | | BricksAI | Generic key | Low | `GET /v1/models` | | Aether | Generic key | Low | `GET /v1/models` | | Not Diamond | Generic key | Low | `GET /v1/models` | ### Tier 6 — Emerging/Niche | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| | Reka AI | Generic key | Low | `GET /v1/models` | | Aleph Alpha | Generic key | Low | `GET /models` | | Writer | Generic key | Low | `GET /v1/models` | | Jasper AI | Generic key | Low | N/A | | Typeface | Generic key | Low | N/A | | Comet ML | Generic key | Low | `GET /api/rest/v2` | | Weights & Biases | Generic key | Low | `GET /api/v1/viewer` | | LangSmith | `ls__*` | High | `GET /api/v1/info` | | Pinecone | Generic key | Low | `GET /databases` | | Weaviate | Generic key | Low | `GET /v1/meta` | | Qdrant | Generic key | Low | `GET /collections` | | Chroma | Generic key | Low | `GET /api/v1/heartbeat` | | Milvus | Generic key | Low | `GET /v1/vector/collections` | | Neon AI | Generic key | Low | N/A | | Lamini | Generic key | Low | `GET /v1/models` | ### Tier 7 — Code & Dev Tools | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| | GitHub Copilot | `ghu_*`, `ghp_*` | High | `GET /user` | | Cursor | Generic key | Low | N/A | | Tabnine | Generic key | Low | N/A | | Codeium/Windsurf | Generic key | Low | N/A | | Sourcegraph Cody | `sgp_*` | High | `GET /.api/current-user` | | Amazon CodeWhisperer | `AKIA*` | High | STS GetCallerIdentity | | Replit AI | Generic key | Low | N/A | | Codestral (Mistral) | Generic key | Low | `GET /v1/models` | | IBM watsonx.ai | `ibm_*` | Medium | IAM token endpoint | | Oracle AI | Generic key | Low | N/A | ### Tier 8 — Self-Hosted/Open Infra | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| | Ollama | N/A (local) | N/A | `GET /api/tags` | | vLLM | Generic key | Low | `GET /v1/models` | | LocalAI | Generic key | Low | `GET /v1/models` | | LM Studio | N/A (local) | N/A | `GET /v1/models` | | llama.cpp | N/A (local) | N/A | `GET /health` | | GPT4All | N/A (local) | N/A | N/A | | text-generation-webui | Generic key | Low | `GET /v1/models` | | TensorRT-LLM | N/A | N/A | Health endpoint | | Triton Inference Server | N/A | N/A | `GET /v2/health/ready` | | Jan AI | N/A (local) | N/A | `GET /v1/models` | ### Tier 9 — Enterprise/Legacy | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| | Salesforce Einstein | Generic token | Low | REST API | | ServiceNow AI | Generic token | Low | REST API | | SAP AI Core | OAuth token | Low | Token endpoint | | Palantir AIP | Generic token | Low | REST API | | Databricks (DBRX) | `dapi*` | High | `GET /api/2.0/clusters` | | Snowflake Cortex | JWT token | Medium | SQL endpoint | | Oracle Generative AI | Generic key | Low | REST API | | HPE GreenLake AI | Generic token | Low | REST API | --- ## Architecture ``` +------------------+ | CLI (Cobra) | +--------+---------+ | +--------------+--------------+ | | | +--------v--+ +------v-----+ +-----v------+ | Input | | Recon | | Import | | Adapters | | Engine | | Adapters | | - file | | (80+ src) | | - trufflehog| | - git | | - IoT (6) | | - gitleaks | | - stdin | | - Code(16) | | - generic | | - url | | - Search(5)| +-----+------+ | - clipboard| | - Paste(8+)| | +--------+---+ | - Pkg (8) | | | | - Cloud(7) | | | | - CI/CD(5) | | | | - Archive2 | | | | - Forum(7) | | | | - Collab(4)| | | | - JS/FE(5) | | | | - Logs (3) | | | | - Intel(3) | | | | - Mobile(1)| | | | - DNS (2) | | | | - API (3) | | | +------+-----+ | | | | +-------+-------+--------------+ | +-------v--------+ | Scanner Engine | | - matcher.go | | - verifier.go | +-------+--------+ | +------------+-------------+ | | | +-----v----+ +----v-----+ +----v-------+ | Output | | Notify | | Web | | - table | | - telegram| | Dashboard | | - json | | - webhook| | - htmx | | - sarif | | - slack | | - REST API | | - csv | +----------+ | - SQLite | +----------+ +------------+ +------------------------------------------+ | Provider Registry (108+ YAML providers) | | Dork Registry (50+ YAML dorks) | +------------------------------------------+ ``` ### Key Design Decisions - **YAML Providers** — Adding a new provider = adding a YAML file. No recompile needed for pattern-only changes (when using external provider dir). Built-in providers are embedded at compile time. - **Keyword Pre-filtering** — Before running regex, files are scanned for keywords. This provides ~10x speedup on large codebases. - **Worker Pool** — Parallel scanning with configurable worker count. Default: CPU count. - **Delta-based Git Scanning** — Only scans changes between commits, not entire trees. - **SQLite Storage** — All scan results persisted with AES-256 encryption. --- ## Security & Ethics ### Built-in Protections - Key values **masked by default** in terminal (first 8 + last 4 chars) — use `--unmask` for full keys - **Full keys always available** via: `--unmask`, `--output=json`, `keyhunter keys show`, web dashboard, Telegram bot - Database is **AES-256 encrypted** (full keys stored encrypted) - API tokens stored **encrypted** in config - No key values written to logs during `--verify` - Web dashboard supports **basic auth / token auth** ### Rate Limiting | Source | Rate Limit | |--------|-----------| | GitHub API (auth) | 30 req/min | | GitHub API (unauth) | 10 req/min | | Shodan | Per API plan | | Censys | 250 queries/day (free) | | ZoomEye | 10,000 results/month (free) | | FOFA | 100 results/query (free) | | Netlas | 50 queries/day (free) | | Google Custom Search | 100/day free, 10K/day paid | | Bing Search | 1,000/month (free) | | Stack Overflow | 300/day (no key), 10K/day (key) | | HN Algolia | 10,000 req/hour | | VirusTotal | 4 req/min (free) | | IntelX | 10 searches/day (free) | | GrayHatWarfare | Per plan | | Wayback Machine | ~15 req/min | | Paste sites | 1 req/2sec | | npm/PyPI | Generous, be respectful | | Trello | 100 req/10sec | | Docker Hub | 100 pulls/6hr (unauth) | ### Stealth & Ethics Flags ```bash --stealth # User-agent rotation, increased request spacing --respect-robots # Respect robots.txt (default: on) ``` --- ## Use Cases ### Red Team / Pentest ```bash # Full multi-source recon against a target org keyhunter recon github --query="targetcorp OPENAI_API_KEY" keyhunter recon gitlab --query="targetcorp api_key" keyhunter recon shodan --dork='http.html:"targetcorp" "sk-"' keyhunter recon censys --query='services.http.response.body:"targetcorp" AND "api_key"' keyhunter recon zoomeye --query='site:targetcorp.com +"api_key"' keyhunter recon elasticsearch --shodan # Find exposed ES with leaked keys keyhunter recon jenkins --shodan # Exposed Jenkins with build logs keyhunter recon dotenv --domain-list=targetcorp-subdomains.txt # .env exposure keyhunter recon wayback --domain=targetcorp.com # Historical leaks keyhunter recon sourcemaps --domain=app.targetcorp.com # JS source maps keyhunter recon crtsh --domain=targetcorp.com # Discover API subdomains keyhunter recon full --providers=openai,anthropic # Everything at once ``` ### DevSecOps / CI Pipeline ```bash # Pre-commit hook keyhunter hook install # GitHub Actions step - name: KeyHunter Scan run: | keyhunter scan path . --output=sarif > keyhunter.sarif # Upload to GitHub Security tab ``` ### Bug Bounty ```bash # Comprehensive target recon keyhunter recon github --org=targetcorp --dork=auto --verify keyhunter recon gist --query="targetcorp" keyhunter recon paste --sources=all --query="targetcorp" keyhunter recon postman --query="targetcorp" keyhunter recon trello --query="targetcorp api key" keyhunter recon notion --query="targetcorp API_KEY" keyhunter recon confluence --shodan keyhunter recon npm --query="targetcorp" # Check their published packages keyhunter recon pypi --query="targetcorp" keyhunter recon docker --query="targetcorp" --layers # Docker image layer scan keyhunter recon apk --query="targetcorp" # Mobile app decompile keyhunter recon swagger --domain=api.targetcorp.com ``` ### Monitoring / Alerting ```bash # Continuous monitoring with Telegram alerts keyhunter schedule add \ --name="monitor-github" \ --cron="*/30 * * * *" \ --command="recon github --dork=auto --providers=openai" \ --notify=telegram keyhunter serve --telegram ``` --- ## Dork Examples (150+ Built-in) ### GitHub ``` filename:.env "OPENAI_API_KEY" filename:.env "ANTHROPIC_API_KEY" filename:config.yaml "api_key" "sk-" "sk-proj-" language:python "sk-ant-api03" language:javascript filename:docker-compose "API_KEY" "api_key" extension:ipynb filename:.toml "api_key" "sk-" filename:terraform.tfvars "api_key" "kind: Secret" "data:" filename:*.yaml # K8s secrets filename:.npmrc "_authToken" # npm tokens filename:requirements.txt "openai" path:.env # Python projects ``` ### GitLab ``` "OPENAI_API_KEY" filename:.env "sk-ant-" filename:*.py "api_key" filename:settings.json ``` ### Google Dorking ``` "sk-proj-" -github.com -stackoverflow.com # Outside known code sites "sk-ant-api03-" filetype:env "OPENAI_API_KEY" filetype:yml "ANTHROPIC_API_KEY" filetype:json inurl:.env "API_KEY" intitle:"index of" .env site:pastebin.com "sk-proj-" site:replit.com "OPENAI_API_KEY" site:codesandbox.io "sk-ant-" site:notion.so "API_KEY" site:trello.com "openai" site:docs.google.com "sk-proj-" site:medium.com "ANTHROPIC_API_KEY" site:dev.to "sk-proj-" site:huggingface.co "OPENAI_API_KEY" site:kaggle.com "api_key" "sk-" intitle:"Swagger UI" "api_key" inurl:graphql "authorization" "Bearer sk-" filetype:tfstate "api_key" # Terraform state filetype:ipynb "sk-proj-" # Jupyter notebooks ``` ### Shodan ``` http.html:"openai" "api_key" port:8080 http.title:"LiteLLM" port:4000 http.html:"ollama" port:11434 http.title:"Kubernetes Dashboard" "X-Jenkins" "200 OK" http.title:"Kibana" port:5601 http.title:"Grafana" http.title:"Swagger UI" http.title:"Gitea" port:3000 http.html:"PrivateBin" http.title:"MinIO Browser" http.title:"Sentry" http.title:"Confluence" port:6443 "kube-apiserver" http.html:"langchain" port:8000 ``` ### Censys ``` services.http.response.body:"openai" and services.http.response.body:"sk-" services.http.response.body:"langchain" and services.port:8000 services.http.response.body:"OPENAI_API_KEY" services.http.response.body:"sk-ant-api03" ``` ### ZoomEye ``` app:"Elasticsearch" +"api_key" app:"Jenkins" +openai app:"Grafana" +anthropic app:"Gitea" ``` ### FOFA ``` body="sk-proj-" body="OPENAI_API_KEY" body="sk-ant-api03" title="LiteLLM" title="Swagger UI" && body="api_key" title="Kibana" && body="authorization" ``` --- ## Contributing ### Adding a New Provider 1. Create `providers/your-provider.yaml`: ```yaml id: your-provider name: Your Provider category: emerging website: https://api.yourprovider.com confidence: medium patterns: - id: your-provider-key name: "Your Provider API Key" regex: '\byp_[A-Za-z0-9]{32}\b' confidence: high description: "Your Provider API key with yp_ prefix" keywords: - "yp_" - "YOUR_PROVIDER_API_KEY" verify: enabled: true method: GET url: "https://api.yourprovider.com/v1/models" headers: Authorization: "Bearer {{key}}" success_codes: [200] failure_codes: [401, 403] metadata: docs: "https://docs.yourprovider.com" key_url: "https://dashboard.yourprovider.com/keys" env_vars: ["YOUR_PROVIDER_API_KEY"] ``` 2. Run tests: `go test ./pkg/provider/...` 3. Submit a PR ### Adding a New Dork 1. Edit `dorks/.yaml` and add your dork entry 2. Submit a PR --- ## Roadmap - [ ] Core scanning engine (file, git, stdin) - [ ] 108 provider YAML definitions - [ ] Active verification for all providers - [ ] CLI with Cobra (scan, verify, import, recon, serve) - [ ] TruffleHog & Gitleaks import adapters - [ ] OSINT/Recon engine (Shodan, Censys, GitHub, GitLab, Paste, S3) - [ ] Built-in dork engine with 50+ dorks - [ ] Web dashboard (htmx + Tailwind + SQLite) - [ ] Telegram bot with auto-notifications - [ ] Scheduled scanning (cron-based) - [ ] Pre-commit hook & CI/CD integration (SARIF) - [ ] Docker image - [ ] Homebrew formula --- ## Disclaimer KeyHunter is designed for **authorized security testing**, **defensive security**, **bug bounty programs**, and **educational purposes** only. Always ensure you have proper authorization before scanning any target. Unauthorized access to computer systems is illegal. --- ## License MIT License - see [LICENSE](LICENSE) for details.