From 87c5a002035f09197daec3152cd0dbcb49a08495 Mon Sep 17 00:00:00 2001 From: salvacybersec Date: Sun, 5 Apr 2026 23:58:31 +0300 Subject: [PATCH] docs(07-06): link README CI/CD section to full guide - Expand CI/CD Integration section with import examples - Link to docs/CI-CD.md for full walkthrough --- README.md | 1007 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1007 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..20d72c7 --- /dev/null +++ b/README.md @@ -0,0 +1,1007 @@ +# KeyHunter + +> The most comprehensive API key scanner for LLM/AI providers. Detect, validate, and monitor leaked API keys across 108+ providers. + +[![Go](https://img.shields.io/badge/Go-1.22+-00ADD8?style=flat-square&logo=go)](https://golang.org) +[![License](https://img.shields.io/badge/License-MIT-green?style=flat-square)](LICENSE) +[![Providers](https://img.shields.io/badge/Providers-108+-red?style=flat-square)](providers/) + +--- + +## Why KeyHunter? + +Existing tools like TruffleHog (~3 LLM detectors) and Gitleaks (~5 LLM rules) were built for general secret scanning. AI-related credential leaks grew **81% year-over-year** in 2025, yet no tool covers more than ~15 LLM providers. + +**KeyHunter fills that gap** with 108+ provider-specific detectors, active key validation, OSINT/recon capabilities, and real-time notifications. + +### How It Compares + +| Feature | KeyHunter | TruffleHog | Gitleaks | detect-secrets | +|---------|-----------|------------|----------|----------------| +| LLM Providers | **108+** | ~3 | ~5 | ~1 | +| Active Verification | **108+ endpoints** | ~20 types | No | No | +| OSINT/Recon | **Shodan, Censys, GitHub, GitLab, Paste, S3** | No | No | No | +| External Tool Import | **TruffleHog + Gitleaks** | - | - | - | +| Web Dashboard | **Built-in** | No | No | No | +| Telegram Bot | **Built-in** | No | No | No | +| Dork Engine | **Built-in YAML dorks** | No | No | No | +| Provider YAML Plugin | **Community-extensible** | Go code only | TOML rules | Python plugins | +| Scheduled Scanning | **Cron-based** | No | No | No | + +--- + +## Features + +### Core Scanning +- **File/Directory scanning** with recursive traversal and glob exclusions +- **Git-aware scanning** — full history, branches, stash, delta-based diffs +- **stdin/pipe** support — `cat dump.txt | keyhunter scan stdin` +- **URL fetching** — scan any remote URL content +- **Clipboard scanning** — instant clipboard content analysis + +### OSINT / Recon Engine (80+ Sources, 18 Categories) + +**IoT & Internet Scanners** +- **Shodan** — exposed LLM proxies, dashboards, API endpoints +- **Censys** — HTTP body search for leaked credentials +- **ZoomEye** — Chinese IoT scanner, different coverage perspective +- **FOFA** — Asian infrastructure scanning, body content search +- **Netlas** — HTTP response body keyword search +- **BinaryEdge** — internet-wide scan data + +**Code Hosting & Snippets** +- **GitHub / GitLab / Bitbucket** — code search with automated dorks +- **Codeberg / Gitea instances** — alternative Git platforms (Gitea auto-discovered via Shodan) +- **Replit / CodeSandbox / StackBlitz / Glitch** — interactive dev environments with hardcoded keys +- **CodePen / JSFiddle / Observable** — browser snippet platforms +- **HuggingFace** — Spaces, repos, model configs (high-yield for LLM keys) +- **Kaggle** — notebooks and datasets with API keys +- **Jupyter / nbviewer** — shared notebooks +- **GitHub Gist** — public gist search +- **Gitpod** — workspace snapshots + +**Search Engine Dorking** +- **Google** — Custom Search API / SerpAPI, 100+ built-in dorks +- **Bing** — Azure Cognitive Services search +- **DuckDuckGo / Yandex / Brave** — alternative indexes for broader coverage + +**Paste Sites** +- **Multi-paste aggregator** — Pastebin, dpaste, paste.ee, rentry, hastebin, ix.io, and more + +**Package Registries** +- **npm / PyPI / RubyGems / crates.io / Maven / NuGet / Packagist / Go modules** — download packages, extract source, scan for key patterns + +**Container & Infrastructure** +- **Docker Hub** — image layer scanning, build arg extraction +- **Kubernetes** — exposed dashboards, public Secret/ConfigMap YAML files +- **Terraform** — state files (`.tfstate` with plaintext secrets), registry modules +- **Helm Charts / Ansible Galaxy** — default values with credentials + +**Cloud Storage** +- **AWS S3 / GCS / Azure Blob / DigitalOcean Spaces / Backblaze B2** — bucket enumeration and content scanning +- **MinIO** — self-hosted instances discovered via Shodan +- **GrayHatWarfare** — searchable database of public bucket objects + +**CI/CD Log Leaks** +- **Travis CI / CircleCI** — public build logs with leaked env vars +- **GitHub Actions** — workflow run log scanning +- **Jenkins** — exposed instances (Shodan-discovered), console output +- **GitLab CI/CD** — public pipeline job traces + +**Web Archives** +- **Wayback Machine** — historical snapshots of removed `.env` files, config pages +- **CommonCrawl** — massive web crawl data, WARC record scanning + +**Forums & Documentation** +- **Stack Overflow** — API + SEDE queries for code snippets with real keys +- **Reddit** — programming subreddit scanning +- **Hacker News** — Algolia API comment search +- **dev.to / Medium** — tutorial articles with hardcoded keys +- **Telegram groups** — public channels sharing configs and "free API keys" +- **Discord** — indexed public server content + +**Collaboration Tools** +- **Notion / Confluence** — public pages and spaces with credentials +- **Trello** — public boards with API key cards +- **Google Docs/Sheets** — publicly shared documents + +**Frontend & JavaScript Leaks** +- **JS Source Maps** — original source recovery with inlined secrets +- **Webpack / Vite bundles** — `REACT_APP_*`, `NEXT_PUBLIC_*`, `VITE_*` variable extraction +- **Exposed `.env` files** — misconfigured web servers serving dotenv from root +- **Swagger / OpenAPI docs** — real auth examples in API docs +- **Vercel / Netlify previews** — deploy preview JS bundles with production secrets + +**Log Aggregators** +- **Elasticsearch / Kibana** — exposed instances with application logs containing API keys +- **Grafana** — exposed dashboards with datasource configs +- **Sentry** — error tracking capturing request headers with keys + +**Threat Intelligence** +- **VirusTotal** — uploaded files/scripts containing embedded keys +- **Intelligence X** — aggregated paste, darknet, and leak search +- **URLhaus** — malicious URLs with API keys in parameters + +**Mobile Apps** +- **APK analysis** — download, decompile, grep for key patterns (via apktool/jadx) + +**DNS / Subdomain Discovery** +- **crt.sh** — Certificate Transparency log for API subdomain discovery +- **Subdomain probing** — config endpoint enumeration (`.env`, `/api/config`, `/actuator/env`) + +**API Marketplaces** +- **Postman** — public collections, workspaces, environments +- **SwaggerHub** — published API definitions with example values + +**`recon full`** — parallel sweep across all 80+ sources with deduplication and unified reporting + +### Active Verification +- Lightweight API calls to verify if detected keys are active +- Permission and scope extraction (org, rate limits, model access) +- Configurable via `--verify` flag (off by default) +- Provider-specific verification endpoints + +### External Tool Integration +- **Import TruffleHog** JSON output — enrich with LLM-specific analysis +- **Import Gitleaks** JSON output — cross-reference with 108+ providers +- Generic CSV import for custom tool output + +### Notifications & Dashboard +- **Telegram Bot** — scan triggers, key alerts, recon results +- **Web Dashboard** — htmx + Tailwind, SQLite-backed, real-time scan viewer +- **Webhook** — generic HTTP POST notifications +- **Slack** — workspace notifications +- **Scheduled scans** — cron-based recurring scans with auto-notify + +--- + +## Quick Start + +### Install + +```bash +# From source +go install github.com/keyhunter/keyhunter@latest + +# Binary release +curl -sSL https://get.keyhunter.dev | bash + +# Docker +docker pull keyhunter/keyhunter:latest +``` + +### Basic Usage + +```bash +# Scan a directory +keyhunter scan path ./my-project/ + +# Scan with active verification +keyhunter scan path ./my-project/ --verify + +# Scan git history (last 30 days) +keyhunter scan git . --since="30 days ago" + +# Scan from pipe +cat secrets.txt | keyhunter scan stdin + +# Scan only specific providers +keyhunter scan path . --providers=openai,anthropic,deepseek + +# JSON output +keyhunter scan path . --output=json > results.json +``` + +### OSINT / Recon + +```bash +# ── IoT & Internet Scanners ── +keyhunter recon shodan --dork="http.title:\"LiteLLM\" port:4000" +keyhunter recon censys --query='services.http.response.body:"sk-proj-"' +keyhunter recon zoomeye --query='app:"Elasticsearch" +"api_key"' +keyhunter recon fofa --query='body="OPENAI_API_KEY"' +keyhunter recon netlas --query='http.body:"sk-ant-"' + +# ── Code Hosting ── +keyhunter recon github --dork=auto # Tum built-in GitHub dork'lari +keyhunter recon gitlab --dork=auto +keyhunter recon bitbucket --query="OPENAI_API_KEY" +keyhunter recon replit --query="sk-proj-" # Public repl'ler +keyhunter recon huggingface --spaces --query="api_key" # HF Spaces +keyhunter recon kaggle --notebooks --query="openai" +keyhunter recon codesandbox --query="sk-ant-" +keyhunter recon glitch --query="ANTHROPIC_API_KEY" +keyhunter recon gitea --instances-from=shodan # Auto-discover Gitea instances + +# ── Search Engine Dorking ── +keyhunter recon google --dork=auto # 100+ built-in Google dorks +keyhunter recon google --dork='"sk-proj-" -github.com filetype:env' +keyhunter recon bing --dork=auto +keyhunter recon brave --query="OPENAI_API_KEY filetype:yaml" + +# ── Package Registries ── +keyhunter recon npm --recent --query="openai" # Scan yeni paketler +keyhunter recon pypi --recent --query="llm" +keyhunter recon crates --query="api_key" + +# ── Cloud Storage ── +keyhunter recon s3 --domain=targetcorp # S3 bucket enumeration +keyhunter recon gcs --domain=targetcorp # GCS buckets +keyhunter recon azure --domain=targetcorp # Azure Blob +keyhunter recon minio --shodan # Exposed MinIO instances +keyhunter recon grayhat --query="openai api_key" # GrayHatWarfare search + +# ── CI/CD Logs ── +keyhunter recon ghactions --org=targetcorp # GitHub Actions logs +keyhunter recon travis --org=targetcorp +keyhunter recon jenkins --shodan # Exposed Jenkins instances +keyhunter recon circleci --org=targetcorp + +# ── Web Archives ── +keyhunter recon wayback --domain=targetcorp.com # Wayback Machine +keyhunter recon commoncrawl --domain=targetcorp.com + +# ── Frontend & JS ── +keyhunter recon dotenv --domain-list=targets.txt # Exposed .env files +keyhunter recon sourcemaps --domain=app.target.com # JS source maps +keyhunter recon webpack --url=https://app.target.com/main.js +keyhunter recon swagger --shodan # Exposed Swagger UI's +keyhunter recon deploys --domain=targetcorp # Vercel/Netlify previews + +# ── Forums ── +keyhunter recon stackoverflow --query="sk-proj-" +keyhunter recon reddit --subreddit=openai --query="api key" +keyhunter recon hackernews --query="leaked api key" +keyhunter recon telegram-groups --query="free api key" + +# ── Collaboration ── +keyhunter recon notion --query="API_KEY" # Google dorked +keyhunter recon confluence --shodan # Exposed instances +keyhunter recon trello --query="openai api key" + +# ── Log Aggregators ── +keyhunter recon elasticsearch --shodan # Exposed ES instances +keyhunter recon grafana --shodan +keyhunter recon sentry --shodan + +# ── Threat Intelligence ── +keyhunter recon virustotal --query="sk-proj-" +keyhunter recon intelx --query="sk-ant-api03" # Intelligence X +keyhunter recon urlhaus --query="openai" + +# ── Mobile Apps ── +keyhunter recon apk --query="ai chatbot" # APK download + decompile + +# ── DNS/Subdomain ── +keyhunter recon crtsh --domain=targetcorp.com # Cert transparency +keyhunter recon subdomain --domain=targetcorp.com --probe-configs + +# ── Full Sweep ── +keyhunter recon full --providers=openai,anthropic # ALL 80+ sources parallel +keyhunter recon full --categories=code,cloud # Category-filtered sweep + +# ── Dork Management ── +keyhunter dorks list # All dorks across all sources +keyhunter dorks list --source=github +keyhunter dorks list --source=google +keyhunter dorks add github 'filename:.env "GROQ_API_KEY"' +keyhunter dorks run google --category=frontier # Run Google dorks for frontier providers +keyhunter dorks export +``` + +### Viewing Full API Keys + +Default olarak key'ler terminalde maskelenir (omuz surfing koruması). Gerçek key'e erişim yolları: + +```bash +# 1. CLI'da --unmask flag'i ile tam key gör +keyhunter scan path . --unmask +# Provider | Key | Confidence | File | Line | Status +# ─────────────┼──────────────────────────────────────────────┼────────────┼───────────────┼──────┼──────── +# OpenAI | sk-proj-abc123def456ghi789jkl012mno345pqr678 | HIGH | src/config.py | 42 | ACTIVE + +# 2. JSON export — her zaman tam key içerir +keyhunter scan path . --output=json > results.json + +# 3. Key management komutu — bulunan tüm key'leri yönet +keyhunter keys list # Maskelenmiş liste +keyhunter keys list --unmask # Tam key'li liste +keyhunter keys show # Tek key tam detay (her zaman unmasked) +keyhunter keys copy # Key'i clipboard'a kopyala +keyhunter keys export --format=json # Tüm key'leri tam değerleriyle export et +keyhunter keys verify # Key'i doğrula + tam detay göster + +# 4. Web Dashboard — /keys/:id sayfasında "Reveal Key" butonu +# 5. Telegram Bot — /key komutu ile tam key +``` + +**Örnek `keyhunter keys show` çıktısı:** +``` + ID: a3f7b2c1 + Provider: OpenAI + Pattern: OpenAI Project Key + Key: sk-proj-abc123def456ghi789jkl012mno345pqr678stu901vwx234 + Confidence: HIGH + Source: src/config.py:42 + Found: 2026-04-04 14:32:01 + Scan ID: scan_001 + Status: ACTIVE (verified 2026-04-04 14:32:05) + Org: my-org + Rate Limit: 500 req/min + Revoke URL: https://platform.openai.com/api-keys +``` + +### Verify a Single Key + +```bash +keyhunter verify sk-proj-abc123... +# Output: +# Provider: OpenAI +# Status: ACTIVE +# Org: my-org +# Rate Limit: 500 req/min +# Revoke: https://platform.openai.com/api-keys +``` + +### Import External Tools + +```bash +# Run TruffleHog, then enrich with KeyHunter +trufflehog git . --json > trufflehog.json +keyhunter import trufflehog trufflehog.json --verify + +# Run Gitleaks, then enrich +gitleaks detect -r gitleaks.json +keyhunter import gitleaks gitleaks.json +``` + +### Web Dashboard & Telegram Bot + +```bash +# Start web dashboard +keyhunter serve --port=8080 + +# Start with Telegram bot +keyhunter serve --port=8080 --telegram + +# Configure Telegram +keyhunter config set telegram.token "YOUR_BOT_TOKEN" +keyhunter config set telegram.chat_id "YOUR_CHAT_ID" +``` + +### CI/CD Integration + +KeyHunter ships with a git **pre-commit hook** that blocks leaks before they land in +history, a **GitHub Actions** integration that uploads SARIF findings directly into +the repository's Code Scanning tab, and an `import` command that consolidates +TruffleHog and Gitleaks output into one normalized database. + +```bash +# Install pre-commit hook (scans staged files only) +keyhunter hook install + +# GitHub Actions (SARIF output for Code Scanning upload) +keyhunter scan . --output sarif > keyhunter.sarif + +# Import findings from other scanners +keyhunter import --format=trufflehog trufflehog.json +keyhunter import --format=gitleaks gitleaks.json + +# Exit codes: 0 = clean, 1 = keys found, 2 = error +keyhunter scan . && echo "Clean" || echo "Keys found!" +``` + +See [docs/CI-CD.md](docs/CI-CD.md) for the full guide, including a copy-paste +GitHub Actions workflow and the pre-commit hook install/uninstall lifecycle. + +### Scheduled Scanning + +```bash +# Daily GitHub recon at 09:00 +keyhunter schedule add \ + --name="daily-github" \ + --cron="0 9 * * *" \ + --command="recon github --dork=auto" \ + --notify=telegram + +# Hourly paste site monitoring +keyhunter schedule add \ + --name="hourly-paste" \ + --cron="0 * * * *" \ + --command="recon paste --sources=pastebin" \ + --notify=telegram + +keyhunter schedule list +keyhunter schedule remove daily-github +``` + +--- + +## Configuration + +```bash +# Initialize config +keyhunter config init +# Creates ~/.keyhunter.yaml + +# Set API keys for recon sources +keyhunter config set shodan.apikey "YOUR_SHODAN_KEY" +keyhunter config set censys.api_id "YOUR_CENSYS_ID" +keyhunter config set censys.api_secret "YOUR_CENSYS_SECRET" +keyhunter config set github.token "YOUR_GITHUB_TOKEN" +keyhunter config set gitlab.token "YOUR_GITLAB_TOKEN" +keyhunter config set zoomeye.apikey "YOUR_ZOOMEYE_KEY" +keyhunter config set fofa.email "YOUR_FOFA_EMAIL" +keyhunter config set fofa.apikey "YOUR_FOFA_KEY" +keyhunter config set netlas.apikey "YOUR_NETLAS_KEY" +keyhunter config set binaryedge.apikey "YOUR_BINARYEDGE_KEY" +keyhunter config set google.cx "YOUR_GOOGLE_CX_ID" +keyhunter config set google.apikey "YOUR_GOOGLE_API_KEY" +keyhunter config set bing.apikey "YOUR_BING_API_KEY" +keyhunter config set brave.apikey "YOUR_BRAVE_API_KEY" +keyhunter config set virustotal.apikey "YOUR_VT_KEY" +keyhunter config set intelx.apikey "YOUR_INTELX_KEY" +keyhunter config set grayhat.apikey "YOUR_GRAYHAT_KEY" +keyhunter config set reddit.client_id "YOUR_REDDIT_ID" +keyhunter config set reddit.client_secret "YOUR_REDDIT_SECRET" +keyhunter config set stackoverflow.apikey "YOUR_SO_KEY" +keyhunter config set kaggle.username "YOUR_KAGGLE_USER" +keyhunter config set kaggle.apikey "YOUR_KAGGLE_KEY" + +# Set notification channels +keyhunter config set telegram.token "YOUR_BOT_TOKEN" +keyhunter config set telegram.chat_id "YOUR_CHAT_ID" +keyhunter config set webhook.url "https://your-webhook.com/alert" + +# Database encryption +keyhunter config set db.password "YOUR_DB_PASSWORD" +``` + +### Config File (`~/.keyhunter.yaml`) + +```yaml +scan: + workers: 8 + verify_timeout: 10s + default_output: table + respect_robots: true + +recon: + stealth: false + rate_limits: + github: 30 # req/min + shodan: 1 # req/sec + censys: 5 # req/sec + zoomeye: 10 # req/sec + fofa: 1 # req/sec + netlas: 1 # req/sec + google: 100 # req/day (Custom Search API) + bing: 3 # req/sec + stackoverflow: 30 # req/sec + hackernews: 100 # req/min + paste: 0.5 # req/sec + npm: 10 # req/sec + pypi: 5 # req/sec + virustotal: 4 # req/min (free tier) + intelx: 10 # req/day (free tier) + grayhat: 5 # req/sec + wayback: 15 # req/min + trello: 10 # req/sec + devto: 1 # req/sec + +telegram: + token: "encrypted:..." + chat_id: "123456789" + auto_notify: true + +web: + port: 8080 + auth: + enabled: false + username: admin + password: "encrypted:..." + +db: + path: ~/.keyhunter/keyhunter.db + encrypted: true +``` + +--- + +## Supported Providers (108) + +### Tier 1 — Frontier + +| Provider | Key Pattern | Confidence | Verify | +|----------|-------------|------------|--------| +| OpenAI | `sk-proj-*`, `sk-svcacct-*` | High | `GET /v1/models` | +| Anthropic | `sk-ant-api03-*` | High | `GET /v1/models` | +| Google AI (Gemini) | `AIza*` | High | `GET /v1/models` | +| Google Vertex AI | OAuth token | Medium | `GET /v1/models` | +| AWS Bedrock | `AKIA*` | High | `GetFoundationModel` | +| Azure OpenAI | 32-char hex | Medium | `GET /openai/deployments` | +| Meta AI | `meta-llama-*` | Medium | `GET /v1/models` | +| xAI (Grok) | `xai-*` | High | `GET /v1/models` | +| Cohere | `co-*` | High | `GET /v1/models` | +| Mistral AI | 32-char generic | Low | `GET /v1/models` | +| Inflection AI | Generic UUID | Low | `GET /api/models` | +| AI21 Labs | Generic key | Low | `GET /v1/models` | + +### Tier 2 — Inference Platforms + +| Provider | Key Pattern | Confidence | Verify | +|----------|-------------|------------|--------| +| Together AI | Generic key | Low | `GET /v1/models` | +| Fireworks AI | `fw_*` | High | `GET /v1/models` | +| Groq | `gsk_*` | High | `GET /openai/v1/models` | +| Replicate | `r8_*` | High | `GET /v1/predictions` | +| Anyscale | Generic key | Low | `GET /v1/models` | +| DeepInfra | Generic key | Low | `GET /v1/models` | +| Lepton AI | `lpt_*` | High | `GET /v1/models` | +| Modal | Generic token | Low | `GET /api/apps` | +| Baseten | Generic key | Low | `GET /v1/models` | +| Cerebrium | Generic key | Low | `GET /v1/models` | +| NovitaAI | Generic key | Low | `GET /v1/models` | +| Sambanova | Generic key | Low | `GET /v1/models` | +| OctoAI | Generic key | Low | `GET /v1/models` | +| Friendli AI | Generic key | Low | `GET /v1/models` | + +### Tier 3 — Specialized/Vertical + +| Provider | Key Pattern | Confidence | Verify | +|----------|-------------|------------|--------| +| Perplexity | `pplx-*` | High | `GET /chat/completions` | +| You.com | Generic key | Low | `GET /v1/search` | +| Voyage AI | `voy-*` | High | `GET /v1/models` | +| Jina AI | `jina_*` | High | `GET /v1/models` | +| Unstructured | Generic key | Low | `GET /general/v0/general` | +| AssemblyAI | Generic key | Low | `GET /v2/transcript` | +| Deepgram | Generic key | Low | `GET /v1/projects` | +| ElevenLabs | `el_*` | High | `GET /v1/user` | +| Stability AI | `sk-*` | Medium | `GET /v1/engines/list` | +| Runway ML | Generic key | Low | `GET /v1/models` | +| Midjourney | Generic key | Low | N/A | +| HuggingFace | `hf_*` | High | `GET /api/whoami` | + +### Tier 4 — Chinese/Regional + +| Provider | Key Pattern | Confidence | Verify | +|----------|-------------|------------|--------| +| DeepSeek | `sk-*` | Medium | `GET /v1/models` | +| Baichuan | Generic key | Low | `GET /v1/models` | +| Zhipu AI (GLM) | Generic key | Low | `POST /api/paas/v4/chat` | +| Moonshot AI (Kimi) | `sk-*` | Medium | `GET /v1/models` | +| Yi (01.AI) | Generic key | Low | `GET /v1/models` | +| Qwen (Alibaba) | `sk-*` | Medium | `GET /v1/models` | +| Baidu (ERNIE) | API Key + Secret | Medium | Token endpoint | +| ByteDance (Doubao) | Generic key | Low | `GET /v1/models` | +| SenseTime | Generic key | Low | `GET /v1/models` | +| iFlytek (Spark) | API Key + Secret | Medium | WebSocket handshake | +| MiniMax | Generic key | Low | `GET /v1/models` | +| Stepfun | Generic key | Low | `GET /v1/models` | +| 360 AI | Generic key | Low | `GET /v1/models` | +| Kuaishou (Kling) | Generic key | Low | `GET /v1/models` | +| Tencent Hunyuan | SecretId + SecretKey | Medium | `DescribeModels` | +| SiliconFlow | `sf_*` | High | `GET /v1/models` | + +### Tier 5 — Infrastructure/Gateway + +| Provider | Key Pattern | Confidence | Verify | +|----------|-------------|------------|--------| +| Cloudflare AI | Cloudflare API token | Medium | `GET /ai/models` | +| Vercel AI | `vercel_*` | High | `GET /v1/models` | +| LiteLLM | Generic key | Low | `GET /v1/models` | +| Portkey | Generic key | Low | `GET /v1/models` | +| Helicone | `sk-helicone-*` | High | `GET /v1/models` | +| OpenRouter | `sk-or-*` | High | `GET /api/v1/models` | +| Martian | Generic key | Low | `GET /v1/models` | +| AI Gateway (Kong) | Generic key | Low | Health endpoint | +| BricksAI | Generic key | Low | `GET /v1/models` | +| Aether | Generic key | Low | `GET /v1/models` | +| Not Diamond | Generic key | Low | `GET /v1/models` | + +### Tier 6 — Emerging/Niche + +| Provider | Key Pattern | Confidence | Verify | +|----------|-------------|------------|--------| +| Reka AI | Generic key | Low | `GET /v1/models` | +| Aleph Alpha | Generic key | Low | `GET /models` | +| Writer | Generic key | Low | `GET /v1/models` | +| Jasper AI | Generic key | Low | N/A | +| Typeface | Generic key | Low | N/A | +| Comet ML | Generic key | Low | `GET /api/rest/v2` | +| Weights & Biases | Generic key | Low | `GET /api/v1/viewer` | +| LangSmith | `ls__*` | High | `GET /api/v1/info` | +| Pinecone | Generic key | Low | `GET /databases` | +| Weaviate | Generic key | Low | `GET /v1/meta` | +| Qdrant | Generic key | Low | `GET /collections` | +| Chroma | Generic key | Low | `GET /api/v1/heartbeat` | +| Milvus | Generic key | Low | `GET /v1/vector/collections` | +| Neon AI | Generic key | Low | N/A | +| Lamini | Generic key | Low | `GET /v1/models` | + +### Tier 7 — Code & Dev Tools + +| Provider | Key Pattern | Confidence | Verify | +|----------|-------------|------------|--------| +| GitHub Copilot | `ghu_*`, `ghp_*` | High | `GET /user` | +| Cursor | Generic key | Low | N/A | +| Tabnine | Generic key | Low | N/A | +| Codeium/Windsurf | Generic key | Low | N/A | +| Sourcegraph Cody | `sgp_*` | High | `GET /.api/current-user` | +| Amazon CodeWhisperer | `AKIA*` | High | STS GetCallerIdentity | +| Replit AI | Generic key | Low | N/A | +| Codestral (Mistral) | Generic key | Low | `GET /v1/models` | +| IBM watsonx.ai | `ibm_*` | Medium | IAM token endpoint | +| Oracle AI | Generic key | Low | N/A | + +### Tier 8 — Self-Hosted/Open Infra + +| Provider | Key Pattern | Confidence | Verify | +|----------|-------------|------------|--------| +| Ollama | N/A (local) | N/A | `GET /api/tags` | +| vLLM | Generic key | Low | `GET /v1/models` | +| LocalAI | Generic key | Low | `GET /v1/models` | +| LM Studio | N/A (local) | N/A | `GET /v1/models` | +| llama.cpp | N/A (local) | N/A | `GET /health` | +| GPT4All | N/A (local) | N/A | N/A | +| text-generation-webui | Generic key | Low | `GET /v1/models` | +| TensorRT-LLM | N/A | N/A | Health endpoint | +| Triton Inference Server | N/A | N/A | `GET /v2/health/ready` | +| Jan AI | N/A (local) | N/A | `GET /v1/models` | + +### Tier 9 — Enterprise/Legacy + +| Provider | Key Pattern | Confidence | Verify | +|----------|-------------|------------|--------| +| Salesforce Einstein | Generic token | Low | REST API | +| ServiceNow AI | Generic token | Low | REST API | +| SAP AI Core | OAuth token | Low | Token endpoint | +| Palantir AIP | Generic token | Low | REST API | +| Databricks (DBRX) | `dapi*` | High | `GET /api/2.0/clusters` | +| Snowflake Cortex | JWT token | Medium | SQL endpoint | +| Oracle Generative AI | Generic key | Low | REST API | +| HPE GreenLake AI | Generic token | Low | REST API | + +--- + +## Architecture + +``` + +------------------+ + | CLI (Cobra) | + +--------+---------+ + | + +--------------+--------------+ + | | | + +--------v--+ +------v-----+ +-----v------+ + | Input | | Recon | | Import | + | Adapters | | Engine | | Adapters | + | - file | | (80+ src) | | - trufflehog| + | - git | | - IoT (6) | | - gitleaks | + | - stdin | | - Code(16) | | - generic | + | - url | | - Search(5)| +-----+------+ + | - clipboard| | - Paste(8+)| | + +--------+---+ | - Pkg (8) | | + | | - Cloud(7) | | + | | - CI/CD(5) | | + | | - Archive2 | | + | | - Forum(7) | | + | | - Collab(4)| | + | | - JS/FE(5) | | + | | - Logs (3) | | + | | - Intel(3) | | + | | - Mobile(1)| | + | | - DNS (2) | | + | | - API (3) | | + | +------+-----+ | + | | | + +-------+-------+--------------+ + | + +-------v--------+ + | Scanner Engine | + | - matcher.go | + | - verifier.go | + +-------+--------+ + | + +------------+-------------+ + | | | + +-----v----+ +----v-----+ +----v-------+ + | Output | | Notify | | Web | + | - table | | - telegram| | Dashboard | + | - json | | - webhook| | - htmx | + | - sarif | | - slack | | - REST API | + | - csv | +----------+ | - SQLite | + +----------+ +------------+ + + +------------------------------------------+ + | Provider Registry (108+ YAML providers) | + | Dork Registry (50+ YAML dorks) | + +------------------------------------------+ +``` + +### Key Design Decisions + +- **YAML Providers** — Adding a new provider = adding a YAML file. No recompile needed for pattern-only changes (when using external provider dir). Built-in providers are embedded at compile time. +- **Keyword Pre-filtering** — Before running regex, files are scanned for keywords. This provides ~10x speedup on large codebases. +- **Worker Pool** — Parallel scanning with configurable worker count. Default: CPU count. +- **Delta-based Git Scanning** — Only scans changes between commits, not entire trees. +- **SQLite Storage** — All scan results persisted with AES-256 encryption. + +--- + +## Security & Ethics + +### Built-in Protections +- Key values **masked by default** in terminal (first 8 + last 4 chars) — use `--unmask` for full keys +- **Full keys always available** via: `--unmask`, `--output=json`, `keyhunter keys show`, web dashboard, Telegram bot +- Database is **AES-256 encrypted** (full keys stored encrypted) +- API tokens stored **encrypted** in config +- No key values written to logs during `--verify` +- Web dashboard supports **basic auth / token auth** + +### Rate Limiting +| Source | Rate Limit | +|--------|-----------| +| GitHub API (auth) | 30 req/min | +| GitHub API (unauth) | 10 req/min | +| Shodan | Per API plan | +| Censys | 250 queries/day (free) | +| ZoomEye | 10,000 results/month (free) | +| FOFA | 100 results/query (free) | +| Netlas | 50 queries/day (free) | +| Google Custom Search | 100/day free, 10K/day paid | +| Bing Search | 1,000/month (free) | +| Stack Overflow | 300/day (no key), 10K/day (key) | +| HN Algolia | 10,000 req/hour | +| VirusTotal | 4 req/min (free) | +| IntelX | 10 searches/day (free) | +| GrayHatWarfare | Per plan | +| Wayback Machine | ~15 req/min | +| Paste sites | 1 req/2sec | +| npm/PyPI | Generous, be respectful | +| Trello | 100 req/10sec | +| Docker Hub | 100 pulls/6hr (unauth) | + +### Stealth & Ethics Flags +```bash +--stealth # User-agent rotation, increased request spacing +--respect-robots # Respect robots.txt (default: on) +``` + +--- + +## Use Cases + +### Red Team / Pentest +```bash +# Full multi-source recon against a target org +keyhunter recon github --query="targetcorp OPENAI_API_KEY" +keyhunter recon gitlab --query="targetcorp api_key" +keyhunter recon shodan --dork='http.html:"targetcorp" "sk-"' +keyhunter recon censys --query='services.http.response.body:"targetcorp" AND "api_key"' +keyhunter recon zoomeye --query='site:targetcorp.com +"api_key"' +keyhunter recon elasticsearch --shodan # Find exposed ES with leaked keys +keyhunter recon jenkins --shodan # Exposed Jenkins with build logs +keyhunter recon dotenv --domain-list=targetcorp-subdomains.txt # .env exposure +keyhunter recon wayback --domain=targetcorp.com # Historical leaks +keyhunter recon sourcemaps --domain=app.targetcorp.com # JS source maps +keyhunter recon crtsh --domain=targetcorp.com # Discover API subdomains +keyhunter recon full --providers=openai,anthropic # Everything at once +``` + +### DevSecOps / CI Pipeline +```bash +# Pre-commit hook +keyhunter hook install + +# GitHub Actions step +- name: KeyHunter Scan + run: | + keyhunter scan path . --output=sarif > keyhunter.sarif + # Upload to GitHub Security tab +``` + +### Bug Bounty +```bash +# Comprehensive target recon +keyhunter recon github --org=targetcorp --dork=auto --verify +keyhunter recon gist --query="targetcorp" +keyhunter recon paste --sources=all --query="targetcorp" +keyhunter recon postman --query="targetcorp" +keyhunter recon trello --query="targetcorp api key" +keyhunter recon notion --query="targetcorp API_KEY" +keyhunter recon confluence --shodan +keyhunter recon npm --query="targetcorp" # Check their published packages +keyhunter recon pypi --query="targetcorp" +keyhunter recon docker --query="targetcorp" --layers # Docker image layer scan +keyhunter recon apk --query="targetcorp" # Mobile app decompile +keyhunter recon swagger --domain=api.targetcorp.com +``` + +### Monitoring / Alerting +```bash +# Continuous monitoring with Telegram alerts +keyhunter schedule add \ + --name="monitor-github" \ + --cron="*/30 * * * *" \ + --command="recon github --dork=auto --providers=openai" \ + --notify=telegram + +keyhunter serve --telegram +``` + +--- + +## Dork Examples (150+ Built-in) + +### GitHub +``` +filename:.env "OPENAI_API_KEY" +filename:.env "ANTHROPIC_API_KEY" +filename:config.yaml "api_key" "sk-" +"sk-proj-" language:python +"sk-ant-api03" language:javascript +filename:docker-compose "API_KEY" +"api_key" extension:ipynb +filename:.toml "api_key" "sk-" +filename:terraform.tfvars "api_key" +"kind: Secret" "data:" filename:*.yaml # K8s secrets +filename:.npmrc "_authToken" # npm tokens +filename:requirements.txt "openai" path:.env # Python projects +``` + +### GitLab +``` +"OPENAI_API_KEY" filename:.env +"sk-ant-" filename:*.py +"api_key" filename:settings.json +``` + +### Google Dorking +``` +"sk-proj-" -github.com -stackoverflow.com # Outside known code sites +"sk-ant-api03-" filetype:env +"OPENAI_API_KEY" filetype:yml +"ANTHROPIC_API_KEY" filetype:json +inurl:.env "API_KEY" +intitle:"index of" .env +site:pastebin.com "sk-proj-" +site:replit.com "OPENAI_API_KEY" +site:codesandbox.io "sk-ant-" +site:notion.so "API_KEY" +site:trello.com "openai" +site:docs.google.com "sk-proj-" +site:medium.com "ANTHROPIC_API_KEY" +site:dev.to "sk-proj-" +site:huggingface.co "OPENAI_API_KEY" +site:kaggle.com "api_key" "sk-" +intitle:"Swagger UI" "api_key" +inurl:graphql "authorization" "Bearer sk-" +filetype:tfstate "api_key" # Terraform state +filetype:ipynb "sk-proj-" # Jupyter notebooks +``` + +### Shodan +``` +http.html:"openai" "api_key" port:8080 +http.title:"LiteLLM" port:4000 +http.html:"ollama" port:11434 +http.title:"Kubernetes Dashboard" +"X-Jenkins" "200 OK" +http.title:"Kibana" port:5601 +http.title:"Grafana" +http.title:"Swagger UI" +http.title:"Gitea" port:3000 +http.html:"PrivateBin" +http.title:"MinIO Browser" +http.title:"Sentry" +http.title:"Confluence" +port:6443 "kube-apiserver" +http.html:"langchain" port:8000 +``` + +### Censys +``` +services.http.response.body:"openai" and services.http.response.body:"sk-" +services.http.response.body:"langchain" and services.port:8000 +services.http.response.body:"OPENAI_API_KEY" +services.http.response.body:"sk-ant-api03" +``` + +### ZoomEye +``` +app:"Elasticsearch" +"api_key" +app:"Jenkins" +openai +app:"Grafana" +anthropic +app:"Gitea" +``` + +### FOFA +``` +body="sk-proj-" +body="OPENAI_API_KEY" +body="sk-ant-api03" +title="LiteLLM" +title="Swagger UI" && body="api_key" +title="Kibana" && body="authorization" +``` + +--- + +## Contributing + +### Adding a New Provider + +1. Create `providers/your-provider.yaml`: + +```yaml +id: your-provider +name: Your Provider +category: emerging +website: https://api.yourprovider.com +confidence: medium + +patterns: + - id: your-provider-key + name: "Your Provider API Key" + regex: '\byp_[A-Za-z0-9]{32}\b' + confidence: high + description: "Your Provider API key with yp_ prefix" + +keywords: + - "yp_" + - "YOUR_PROVIDER_API_KEY" + +verify: + enabled: true + method: GET + url: "https://api.yourprovider.com/v1/models" + headers: + Authorization: "Bearer {{key}}" + success_codes: [200] + failure_codes: [401, 403] + +metadata: + docs: "https://docs.yourprovider.com" + key_url: "https://dashboard.yourprovider.com/keys" + env_vars: ["YOUR_PROVIDER_API_KEY"] +``` + +2. Run tests: `go test ./pkg/provider/...` +3. Submit a PR + +### Adding a New Dork + +1. Edit `dorks/.yaml` and add your dork entry +2. Submit a PR + +--- + +## Roadmap + +- [ ] Core scanning engine (file, git, stdin) +- [ ] 108 provider YAML definitions +- [ ] Active verification for all providers +- [ ] CLI with Cobra (scan, verify, import, recon, serve) +- [ ] TruffleHog & Gitleaks import adapters +- [ ] OSINT/Recon engine (Shodan, Censys, GitHub, GitLab, Paste, S3) +- [ ] Built-in dork engine with 50+ dorks +- [ ] Web dashboard (htmx + Tailwind + SQLite) +- [ ] Telegram bot with auto-notifications +- [ ] Scheduled scanning (cron-based) +- [ ] Pre-commit hook & CI/CD integration (SARIF) +- [ ] Docker image +- [ ] Homebrew formula + +--- + +## Disclaimer + +KeyHunter is designed for **authorized security testing**, **defensive security**, **bug bounty programs**, and **educational purposes** only. Always ensure you have proper authorization before scanning any target. Unauthorized access to computer systems is illegal. + +--- + +## License + +MIT License - see [LICENSE](LICENSE) for details.