Go to file

salvacybersec 9755b3756a feat(08-02): add 25 GitHub dorks for infrastructure, emerging, enterprise categories

- infrastructure.yaml: 10 dorks covering Tier 5 gateways (OpenRouter,
  LiteLLM, Portkey, Helicone, Cloudflare AI, Vercel AI) and Tier 8
  self-hosted (Ollama, vLLM, LocalAI)
- emerging.yaml: 10 dorks covering Tier 4 Chinese providers (DeepSeek,
  Moonshot, Qwen, Zhipu, MiniMax) and Tier 6 vector DBs (Pinecone,
  Weaviate, Qdrant, Chroma) plus Writer.com
- enterprise.yaml: 5 dorks covering Tier 7 dev tools (Codeium, Tabnine)
  and Tier 9 enterprise (Databricks, Snowflake Cortex, IBM watsonx)
- Registry now loads 50 total GitHub dorks across all 5 categories,
  mirrored in both dorks/github/ and pkg/dorks/definitions/github/

2026-04-06 00:20:52 +03:00

.planning

docs(08-01): complete dork engine foundation plan

2026-04-06 00:17:53 +03:00

cmd

feat(07-04): wire keyhunter import command with dedup and DB persist

2026-04-05 23:59:39 +03:00

docs

docs(07-06): add CI/CD integration guide

2026-04-05 23:58:31 +03:00

dorks

feat(08-02): add 25 GitHub dorks for infrastructure, emerging, enterprise categories

2026-04-06 00:20:52 +03:00

pkg

feat(08-02): add 25 GitHub dorks for infrastructure, emerging, enterprise categories

2026-04-06 00:20:52 +03:00

providers

feat(05-04): extend Tier 1 provider verify specs

2026-04-05 15:46:30 +03:00

testdata

test(07-03): SARIF GitHub code scanning validation

2026-04-05 23:55:38 +03:00

CLAUDE.md

docs: create roadmap (18 phases)

2026-04-04 19:12:41 +03:00

go.mod

feat(06-01): add Formatter interface, Registry, and TTY color detection

2026-04-05 18:41:23 +03:00

go.sum

feat(05-01): extend VerifySpec and Finding, add gjson dep

2026-04-05 15:41:13 +03:00

LEGAL.md

feat(05-02): add LEGAL.md, embed it, and wire keyhunter legal command

2026-04-05 15:46:11 +03:00

main.go

feat(01-01): create main.go, test scaffolding, and testdata fixtures

2026-04-05 00:04:42 +03:00

README.md

docs(07-06): link README CI/CD section to full guide

2026-04-05 23:58:31 +03:00

tools.go

chore(01-01): initialize Go module with Phase 1 dependencies

2026-04-05 00:04:06 +03:00

README.md

KeyHunter

The most comprehensive API key scanner for LLM/AI providers. Detect, validate, and monitor leaked API keys across 108+ providers.

Why KeyHunter?

Existing tools like TruffleHog (~3 LLM detectors) and Gitleaks (~5 LLM rules) were built for general secret scanning. AI-related credential leaks grew 81% year-over-year in 2025, yet no tool covers more than ~15 LLM providers.

KeyHunter fills that gap with 108+ provider-specific detectors, active key validation, OSINT/recon capabilities, and real-time notifications.

How It Compares

Feature	KeyHunter	TruffleHog	Gitleaks	detect-secrets
LLM Providers	108+	~3	~5	~1
Active Verification	108+ endpoints	~20 types	No	No
OSINT/Recon	Shodan, Censys, GitHub, GitLab, Paste, S3	No	No	No
External Tool Import	TruffleHog + Gitleaks	-	-	-
Web Dashboard	Built-in	No	No	No
Telegram Bot	Built-in	No	No	No
Dork Engine	Built-in YAML dorks	No	No	No
Provider YAML Plugin	Community-extensible	Go code only	TOML rules	Python plugins
Scheduled Scanning	Cron-based	No	No	No

Features

Core Scanning

File/Directory scanning with recursive traversal and glob exclusions
Git-aware scanning — full history, branches, stash, delta-based diffs
stdin/pipe support — cat dump.txt | keyhunter scan stdin
URL fetching — scan any remote URL content
Clipboard scanning — instant clipboard content analysis

OSINT / Recon Engine (80+ Sources, 18 Categories)

IoT & Internet Scanners

Shodan — exposed LLM proxies, dashboards, API endpoints
Censys — HTTP body search for leaked credentials
ZoomEye — Chinese IoT scanner, different coverage perspective
FOFA — Asian infrastructure scanning, body content search
Netlas — HTTP response body keyword search
BinaryEdge — internet-wide scan data

Code Hosting & Snippets

GitHub / GitLab / Bitbucket — code search with automated dorks
Codeberg / Gitea instances — alternative Git platforms (Gitea auto-discovered via Shodan)
Replit / CodeSandbox / StackBlitz / Glitch — interactive dev environments with hardcoded keys
CodePen / JSFiddle / Observable — browser snippet platforms
HuggingFace — Spaces, repos, model configs (high-yield for LLM keys)
Kaggle — notebooks and datasets with API keys
Jupyter / nbviewer — shared notebooks
GitHub Gist — public gist search
Gitpod — workspace snapshots

Search Engine Dorking

Google — Custom Search API / SerpAPI, 100+ built-in dorks
Bing — Azure Cognitive Services search
DuckDuckGo / Yandex / Brave — alternative indexes for broader coverage

Paste Sites

Multi-paste aggregator — Pastebin, dpaste, paste.ee, rentry, hastebin, ix.io, and more

Package Registries

npm / PyPI / RubyGems / crates.io / Maven / NuGet / Packagist / Go modules — download packages, extract source, scan for key patterns

Container & Infrastructure

Docker Hub — image layer scanning, build arg extraction
Kubernetes — exposed dashboards, public Secret/ConfigMap YAML files
Terraform — state files (.tfstate with plaintext secrets), registry modules
Helm Charts / Ansible Galaxy — default values with credentials

Cloud Storage

AWS S3 / GCS / Azure Blob / DigitalOcean Spaces / Backblaze B2 — bucket enumeration and content scanning
MinIO — self-hosted instances discovered via Shodan
GrayHatWarfare — searchable database of public bucket objects

CI/CD Log Leaks

Travis CI / CircleCI — public build logs with leaked env vars
GitHub Actions — workflow run log scanning
Jenkins — exposed instances (Shodan-discovered), console output
GitLab CI/CD — public pipeline job traces

Web Archives

Wayback Machine — historical snapshots of removed .env files, config pages
CommonCrawl — massive web crawl data, WARC record scanning

Forums & Documentation

Stack Overflow — API + SEDE queries for code snippets with real keys
Reddit — programming subreddit scanning
Hacker News — Algolia API comment search
dev.to / Medium — tutorial articles with hardcoded keys
Telegram groups — public channels sharing configs and "free API keys"
Discord — indexed public server content

Collaboration Tools

Notion / Confluence — public pages and spaces with credentials
Trello — public boards with API key cards
Google Docs/Sheets — publicly shared documents

Frontend & JavaScript Leaks

JS Source Maps — original source recovery with inlined secrets
Webpack / Vite bundles — REACT_APP_*, NEXT_PUBLIC_*, VITE_* variable extraction
Exposed .env files — misconfigured web servers serving dotenv from root
Swagger / OpenAPI docs — real auth examples in API docs
Vercel / Netlify previews — deploy preview JS bundles with production secrets

Log Aggregators

Elasticsearch / Kibana — exposed instances with application logs containing API keys
Grafana — exposed dashboards with datasource configs
Sentry — error tracking capturing request headers with keys

Threat Intelligence

VirusTotal — uploaded files/scripts containing embedded keys
Intelligence X — aggregated paste, darknet, and leak search
URLhaus — malicious URLs with API keys in parameters

Mobile Apps

APK analysis — download, decompile, grep for key patterns (via apktool/jadx)

DNS / Subdomain Discovery

crt.sh — Certificate Transparency log for API subdomain discovery
Subdomain probing — config endpoint enumeration (.env, /api/config, /actuator/env)

API Marketplaces

Postman — public collections, workspaces, environments
SwaggerHub — published API definitions with example values

recon full — parallel sweep across all 80+ sources with deduplication and unified reporting

Active Verification

Lightweight API calls to verify if detected keys are active
Permission and scope extraction (org, rate limits, model access)
Configurable via --verify flag (off by default)
Provider-specific verification endpoints

External Tool Integration

Import TruffleHog JSON output — enrich with LLM-specific analysis
Import Gitleaks JSON output — cross-reference with 108+ providers
Generic CSV import for custom tool output

Notifications & Dashboard

Telegram Bot — scan triggers, key alerts, recon results
Web Dashboard — htmx + Tailwind, SQLite-backed, real-time scan viewer
Webhook — generic HTTP POST notifications
Slack — workspace notifications
Scheduled scans — cron-based recurring scans with auto-notify

Quick Start

Install

# From source
go install github.com/keyhunter/keyhunter@latest

# Binary release
curl -sSL https://get.keyhunter.dev | bash

# Docker
docker pull keyhunter/keyhunter:latest

Basic Usage

# Scan a directory
keyhunter scan path ./my-project/

# Scan with active verification
keyhunter scan path ./my-project/ --verify

# Scan git history (last 30 days)
keyhunter scan git . --since="30 days ago"

# Scan from pipe
cat secrets.txt | keyhunter scan stdin

# Scan only specific providers
keyhunter scan path . --providers=openai,anthropic,deepseek

# JSON output
keyhunter scan path . --output=json > results.json

OSINT / Recon

# ── IoT & Internet Scanners ──
keyhunter recon shodan --dork="http.title:\"LiteLLM\" port:4000"
keyhunter recon censys --query='services.http.response.body:"sk-proj-"'
keyhunter recon zoomeye --query='app:"Elasticsearch" +"api_key"'
keyhunter recon fofa --query='body="OPENAI_API_KEY"'
keyhunter recon netlas --query='http.body:"sk-ant-"'

# ── Code Hosting ──
keyhunter recon github --dork=auto               # Tum built-in GitHub dork'lari
keyhunter recon gitlab --dork=auto
keyhunter recon bitbucket --query="OPENAI_API_KEY"
keyhunter recon replit --query="sk-proj-"         # Public repl'ler
keyhunter recon huggingface --spaces --query="api_key"  # HF Spaces
keyhunter recon kaggle --notebooks --query="openai"
keyhunter recon codesandbox --query="sk-ant-"
keyhunter recon glitch --query="ANTHROPIC_API_KEY"
keyhunter recon gitea --instances-from=shodan     # Auto-discover Gitea instances

# ── Search Engine Dorking ──
keyhunter recon google --dork=auto                # 100+ built-in Google dorks
keyhunter recon google --dork='"sk-proj-" -github.com filetype:env'
keyhunter recon bing --dork=auto
keyhunter recon brave --query="OPENAI_API_KEY filetype:yaml"

# ── Package Registries ──
keyhunter recon npm --recent --query="openai"     # Scan yeni paketler
keyhunter recon pypi --recent --query="llm"
keyhunter recon crates --query="api_key"

# ── Cloud Storage ──
keyhunter recon s3 --domain=targetcorp            # S3 bucket enumeration
keyhunter recon gcs --domain=targetcorp           # GCS buckets
keyhunter recon azure --domain=targetcorp         # Azure Blob
keyhunter recon minio --shodan                    # Exposed MinIO instances
keyhunter recon grayhat --query="openai api_key"  # GrayHatWarfare search

# ── CI/CD Logs ──
keyhunter recon ghactions --org=targetcorp        # GitHub Actions logs
keyhunter recon travis --org=targetcorp
keyhunter recon jenkins --shodan                  # Exposed Jenkins instances
keyhunter recon circleci --org=targetcorp

# ── Web Archives ──
keyhunter recon wayback --domain=targetcorp.com   # Wayback Machine
keyhunter recon commoncrawl --domain=targetcorp.com

# ── Frontend & JS ──
keyhunter recon dotenv --domain-list=targets.txt  # Exposed .env files
keyhunter recon sourcemaps --domain=app.target.com  # JS source maps
keyhunter recon webpack --url=https://app.target.com/main.js
keyhunter recon swagger --shodan                  # Exposed Swagger UI's
keyhunter recon deploys --domain=targetcorp       # Vercel/Netlify previews

# ── Forums ──
keyhunter recon stackoverflow --query="sk-proj-"
keyhunter recon reddit --subreddit=openai --query="api key"
keyhunter recon hackernews --query="leaked api key"
keyhunter recon telegram-groups --query="free api key"

# ── Collaboration ──
keyhunter recon notion --query="API_KEY"          # Google dorked
keyhunter recon confluence --shodan               # Exposed instances
keyhunter recon trello --query="openai api key"

# ── Log Aggregators ──
keyhunter recon elasticsearch --shodan            # Exposed ES instances
keyhunter recon grafana --shodan
keyhunter recon sentry --shodan

# ── Threat Intelligence ──
keyhunter recon virustotal --query="sk-proj-"
keyhunter recon intelx --query="sk-ant-api03"     # Intelligence X
keyhunter recon urlhaus --query="openai"

# ── Mobile Apps ──
keyhunter recon apk --query="ai chatbot"          # APK download + decompile

# ── DNS/Subdomain ──
keyhunter recon crtsh --domain=targetcorp.com     # Cert transparency
keyhunter recon subdomain --domain=targetcorp.com --probe-configs

# ── Full Sweep ──
keyhunter recon full --providers=openai,anthropic  # ALL 80+ sources parallel
keyhunter recon full --categories=code,cloud       # Category-filtered sweep

# ── Dork Management ──
keyhunter dorks list                               # All dorks across all sources
keyhunter dorks list --source=github
keyhunter dorks list --source=google
keyhunter dorks add github 'filename:.env "GROQ_API_KEY"'
keyhunter dorks run google --category=frontier     # Run Google dorks for frontier providers
keyhunter dorks export

Viewing Full API Keys

Default olarak key'ler terminalde maskelenir (omuz surfing koruması). Gerçek key'e erişim yolları:

# 1. CLI'da --unmask flag'i ile tam key gör
keyhunter scan path . --unmask
#  Provider    | Key                                          | Confidence | File          | Line | Status
# ─────────────┼──────────────────────────────────────────────┼────────────┼───────────────┼──────┼────────
#  OpenAI      | sk-proj-abc123def456ghi789jkl012mno345pqr678 | HIGH       | src/config.py | 42   | ACTIVE

# 2. JSON export — her zaman tam key içerir
keyhunter scan path . --output=json > results.json

# 3. Key management komutu — bulunan tüm key'leri yönet
keyhunter keys list                   # Maskelenmiş liste
keyhunter keys list --unmask          # Tam key'li liste
keyhunter keys show <id>              # Tek key tam detay (her zaman unmasked)
keyhunter keys copy <id>              # Key'i clipboard'a kopyala
keyhunter keys export --format=json   # Tüm key'leri tam değerleriyle export et
keyhunter keys verify <id>            # Key'i doğrula + tam detay göster

# 4. Web Dashboard — /keys/:id sayfasında "Reveal Key" butonu
# 5. Telegram Bot — /key <id> komutu ile tam key

Örnek keyhunter keys show çıktısı:

 ID:          a3f7b2c1
 Provider:    OpenAI
 Pattern:     OpenAI Project Key
 Key:         sk-proj-abc123def456ghi789jkl012mno345pqr678stu901vwx234
 Confidence:  HIGH
 Source:      src/config.py:42
 Found:       2026-04-04 14:32:01
 Scan ID:     scan_001
 Status:      ACTIVE (verified 2026-04-04 14:32:05)
 Org:         my-org
 Rate Limit:  500 req/min
 Revoke URL:  https://platform.openai.com/api-keys

Verify a Single Key

keyhunter verify sk-proj-abc123...
# Output:
# Provider:  OpenAI
# Status:    ACTIVE
# Org:       my-org
# Rate Limit: 500 req/min
# Revoke:    https://platform.openai.com/api-keys

Import External Tools

# Run TruffleHog, then enrich with KeyHunter
trufflehog git . --json > trufflehog.json
keyhunter import trufflehog trufflehog.json --verify

# Run Gitleaks, then enrich
gitleaks detect -r gitleaks.json
keyhunter import gitleaks gitleaks.json

Web Dashboard & Telegram Bot

# Start web dashboard
keyhunter serve --port=8080

# Start with Telegram bot
keyhunter serve --port=8080 --telegram

# Configure Telegram
keyhunter config set telegram.token "YOUR_BOT_TOKEN"
keyhunter config set telegram.chat_id "YOUR_CHAT_ID"

CI/CD Integration

KeyHunter ships with a git pre-commit hook that blocks leaks before they land in history, a GitHub Actions integration that uploads SARIF findings directly into the repository's Code Scanning tab, and an import command that consolidates TruffleHog and Gitleaks output into one normalized database.

# Install pre-commit hook (scans staged files only)
keyhunter hook install

# GitHub Actions (SARIF output for Code Scanning upload)
keyhunter scan . --output sarif > keyhunter.sarif

# Import findings from other scanners
keyhunter import --format=trufflehog trufflehog.json
keyhunter import --format=gitleaks   gitleaks.json

# Exit codes: 0 = clean, 1 = keys found, 2 = error
keyhunter scan . && echo "Clean" || echo "Keys found!"

See docs/CI-CD.md for the full guide, including a copy-paste GitHub Actions workflow and the pre-commit hook install/uninstall lifecycle.

Scheduled Scanning

# Daily GitHub recon at 09:00
keyhunter schedule add \
  --name="daily-github" \
  --cron="0 9 * * *" \
  --command="recon github --dork=auto" \
  --notify=telegram

# Hourly paste site monitoring
keyhunter schedule add \
  --name="hourly-paste" \
  --cron="0 * * * *" \
  --command="recon paste --sources=pastebin" \
  --notify=telegram

keyhunter schedule list
keyhunter schedule remove daily-github

Configuration

# Initialize config
keyhunter config init
# Creates ~/.keyhunter.yaml

# Set API keys for recon sources
keyhunter config set shodan.apikey "YOUR_SHODAN_KEY"
keyhunter config set censys.api_id "YOUR_CENSYS_ID"
keyhunter config set censys.api_secret "YOUR_CENSYS_SECRET"
keyhunter config set github.token "YOUR_GITHUB_TOKEN"
keyhunter config set gitlab.token "YOUR_GITLAB_TOKEN"
keyhunter config set zoomeye.apikey "YOUR_ZOOMEYE_KEY"
keyhunter config set fofa.email "YOUR_FOFA_EMAIL"
keyhunter config set fofa.apikey "YOUR_FOFA_KEY"
keyhunter config set netlas.apikey "YOUR_NETLAS_KEY"
keyhunter config set binaryedge.apikey "YOUR_BINARYEDGE_KEY"
keyhunter config set google.cx "YOUR_GOOGLE_CX_ID"
keyhunter config set google.apikey "YOUR_GOOGLE_API_KEY"
keyhunter config set bing.apikey "YOUR_BING_API_KEY"
keyhunter config set brave.apikey "YOUR_BRAVE_API_KEY"
keyhunter config set virustotal.apikey "YOUR_VT_KEY"
keyhunter config set intelx.apikey "YOUR_INTELX_KEY"
keyhunter config set grayhat.apikey "YOUR_GRAYHAT_KEY"
keyhunter config set reddit.client_id "YOUR_REDDIT_ID"
keyhunter config set reddit.client_secret "YOUR_REDDIT_SECRET"
keyhunter config set stackoverflow.apikey "YOUR_SO_KEY"
keyhunter config set kaggle.username "YOUR_KAGGLE_USER"
keyhunter config set kaggle.apikey "YOUR_KAGGLE_KEY"

# Set notification channels
keyhunter config set telegram.token "YOUR_BOT_TOKEN"
keyhunter config set telegram.chat_id "YOUR_CHAT_ID"
keyhunter config set webhook.url "https://your-webhook.com/alert"

# Database encryption
keyhunter config set db.password "YOUR_DB_PASSWORD"

Config File (`~/.keyhunter.yaml`)

scan:
  workers: 8
  verify_timeout: 10s
  default_output: table
  respect_robots: true

recon:
  stealth: false
  rate_limits:
    github: 30        # req/min
    shodan: 1         # req/sec
    censys: 5         # req/sec
    zoomeye: 10       # req/sec
    fofa: 1           # req/sec
    netlas: 1         # req/sec
    google: 100       # req/day (Custom Search API)
    bing: 3           # req/sec
    stackoverflow: 30 # req/sec
    hackernews: 100   # req/min
    paste: 0.5        # req/sec
    npm: 10           # req/sec
    pypi: 5           # req/sec
    virustotal: 4     # req/min (free tier)
    intelx: 10        # req/day (free tier)
    grayhat: 5        # req/sec
    wayback: 15       # req/min
    trello: 10        # req/sec
    devto: 1          # req/sec

telegram:
  token: "encrypted:..."
  chat_id: "123456789"
  auto_notify: true

web:
  port: 8080
  auth:
    enabled: false
    username: admin
    password: "encrypted:..."

db:
  path: ~/.keyhunter/keyhunter.db
  encrypted: true

Supported Providers (108)

Tier 1 — Frontier

Provider	Key Pattern	Confidence	Verify
OpenAI	`sk-proj-`, `sk-svcacct-`	High	`GET /v1/models`
Anthropic	`sk-ant-api03-*`	High	`GET /v1/models`
Google AI (Gemini)	`AIza*`	High	`GET /v1/models`
Google Vertex AI	OAuth token	Medium	`GET /v1/models`
AWS Bedrock	`AKIA*`	High	`GetFoundationModel`
Azure OpenAI	32-char hex	Medium	`GET /openai/deployments`
Meta AI	`meta-llama-*`	Medium	`GET /v1/models`
xAI (Grok)	`xai-*`	High	`GET /v1/models`
Cohere	`co-*`	High	`GET /v1/models`
Mistral AI	32-char generic	Low	`GET /v1/models`
Inflection AI	Generic UUID	Low	`GET /api/models`
AI21 Labs	Generic key	Low	`GET /v1/models`

Tier 2 — Inference Platforms

Provider	Key Pattern	Confidence	Verify
Together AI	Generic key	Low	`GET /v1/models`
Fireworks AI	`fw_*`	High	`GET /v1/models`
Groq	`gsk_*`	High	`GET /openai/v1/models`
Replicate	`r8_*`	High	`GET /v1/predictions`
Anyscale	Generic key	Low	`GET /v1/models`
DeepInfra	Generic key	Low	`GET /v1/models`
Lepton AI	`lpt_*`	High	`GET /v1/models`
Modal	Generic token	Low	`GET /api/apps`
Baseten	Generic key	Low	`GET /v1/models`
Cerebrium	Generic key	Low	`GET /v1/models`
NovitaAI	Generic key	Low	`GET /v1/models`
Sambanova	Generic key	Low	`GET /v1/models`
OctoAI	Generic key	Low	`GET /v1/models`
Friendli AI	Generic key	Low	`GET /v1/models`

Tier 3 — Specialized/Vertical

Provider	Key Pattern	Confidence	Verify
Perplexity	`pplx-*`	High	`GET /chat/completions`
You.com	Generic key	Low	`GET /v1/search`
Voyage AI	`voy-*`	High	`GET /v1/models`
Jina AI	`jina_*`	High	`GET /v1/models`
Unstructured	Generic key	Low	`GET /general/v0/general`
AssemblyAI	Generic key	Low	`GET /v2/transcript`
Deepgram	Generic key	Low	`GET /v1/projects`
ElevenLabs	`el_*`	High	`GET /v1/user`
Stability AI	`sk-*`	Medium	`GET /v1/engines/list`
Runway ML	Generic key	Low	`GET /v1/models`
Midjourney	Generic key	Low	N/A
HuggingFace	`hf_*`	High	`GET /api/whoami`

Tier 4 — Chinese/Regional

Provider	Key Pattern	Confidence	Verify
DeepSeek	`sk-*`	Medium	`GET /v1/models`
Baichuan	Generic key	Low	`GET /v1/models`
Zhipu AI (GLM)	Generic key	Low	`POST /api/paas/v4/chat`
Moonshot AI (Kimi)	`sk-*`	Medium	`GET /v1/models`
Yi (01.AI)	Generic key	Low	`GET /v1/models`
Qwen (Alibaba)	`sk-*`	Medium	`GET /v1/models`
Baidu (ERNIE)	API Key + Secret	Medium	Token endpoint
ByteDance (Doubao)	Generic key	Low	`GET /v1/models`
SenseTime	Generic key	Low	`GET /v1/models`
iFlytek (Spark)	API Key + Secret	Medium	WebSocket handshake
MiniMax	Generic key	Low	`GET /v1/models`
Stepfun	Generic key	Low	`GET /v1/models`
360 AI	Generic key	Low	`GET /v1/models`
Kuaishou (Kling)	Generic key	Low	`GET /v1/models`
Tencent Hunyuan	SecretId + SecretKey	Medium	`DescribeModels`
SiliconFlow	`sf_*`	High	`GET /v1/models`

Tier 5 — Infrastructure/Gateway

Provider	Key Pattern	Confidence	Verify
Cloudflare AI	Cloudflare API token	Medium	`GET /ai/models`
Vercel AI	`vercel_*`	High	`GET /v1/models`
LiteLLM	Generic key	Low	`GET /v1/models`
Portkey	Generic key	Low	`GET /v1/models`
Helicone	`sk-helicone-*`	High	`GET /v1/models`
OpenRouter	`sk-or-*`	High	`GET /api/v1/models`
Martian	Generic key	Low	`GET /v1/models`
AI Gateway (Kong)	Generic key	Low	Health endpoint
BricksAI	Generic key	Low	`GET /v1/models`
Aether	Generic key	Low	`GET /v1/models`
Not Diamond	Generic key	Low	`GET /v1/models`

Tier 6 — Emerging/Niche

Provider	Key Pattern	Confidence	Verify
Reka AI	Generic key	Low	`GET /v1/models`
Aleph Alpha	Generic key	Low	`GET /models`
Writer	Generic key	Low	`GET /v1/models`
Jasper AI	Generic key	Low	N/A
Typeface	Generic key	Low	N/A
Comet ML	Generic key	Low	`GET /api/rest/v2`
Weights & Biases	Generic key	Low	`GET /api/v1/viewer`
LangSmith	`ls__*`	High	`GET /api/v1/info`
Pinecone	Generic key	Low	`GET /databases`
Weaviate	Generic key	Low	`GET /v1/meta`
Qdrant	Generic key	Low	`GET /collections`
Chroma	Generic key	Low	`GET /api/v1/heartbeat`
Milvus	Generic key	Low	`GET /v1/vector/collections`
Neon AI	Generic key	Low	N/A
Lamini	Generic key	Low	`GET /v1/models`

Tier 7 — Code & Dev Tools

Provider	Key Pattern	Confidence	Verify
GitHub Copilot	`ghu_`, `ghp_`	High	`GET /user`
Cursor	Generic key	Low	N/A
Tabnine	Generic key	Low	N/A
Codeium/Windsurf	Generic key	Low	N/A
Sourcegraph Cody	`sgp_*`	High	`GET /.api/current-user`
Amazon CodeWhisperer	`AKIA*`	High	STS GetCallerIdentity
Replit AI	Generic key	Low	N/A
Codestral (Mistral)	Generic key	Low	`GET /v1/models`
IBM watsonx.ai	`ibm_*`	Medium	IAM token endpoint
Oracle AI	Generic key	Low	N/A

Tier 8 — Self-Hosted/Open Infra

Provider	Key Pattern	Confidence	Verify
Ollama	N/A (local)	N/A	`GET /api/tags`
vLLM	Generic key	Low	`GET /v1/models`
LocalAI	Generic key	Low	`GET /v1/models`
LM Studio	N/A (local)	N/A	`GET /v1/models`
llama.cpp	N/A (local)	N/A	`GET /health`
GPT4All	N/A (local)	N/A	N/A
text-generation-webui	Generic key	Low	`GET /v1/models`
TensorRT-LLM	N/A	N/A	Health endpoint
Triton Inference Server	N/A	N/A	`GET /v2/health/ready`
Jan AI	N/A (local)	N/A	`GET /v1/models`

Tier 9 — Enterprise/Legacy

Provider	Key Pattern	Confidence	Verify
Salesforce Einstein	Generic token	Low	REST API
ServiceNow AI	Generic token	Low	REST API
SAP AI Core	OAuth token	Low	Token endpoint
Palantir AIP	Generic token	Low	REST API
Databricks (DBRX)	`dapi*`	High	`GET /api/2.0/clusters`
Snowflake Cortex	JWT token	Medium	SQL endpoint
Oracle Generative AI	Generic key	Low	REST API
HPE GreenLake AI	Generic token	Low	REST API

Architecture

                    +------------------+
                    |   CLI (Cobra)    |
                    +--------+---------+
                             |
              +--------------+--------------+
              |              |              |
     +--------v--+   +------v-----+  +-----v------+
     | Input      |   | Recon      |  | Import     |
     | Adapters   |   | Engine     |  | Adapters   |
     | - file     |   | (80+ src)  |  | - trufflehog|
     | - git      |   | - IoT (6)  |  | - gitleaks |
     | - stdin    |   | - Code(16) |  | - generic  |
     | - url      |   | - Search(5)|  +-----+------+
     | - clipboard|   | - Paste(8+)|        |
     +--------+---+   | - Pkg (8)  |        |
              |        | - Cloud(7) |        |
              |        | - CI/CD(5) |        |
              |        | - Archive2 |        |
              |        | - Forum(7) |        |
              |        | - Collab(4)|        |
              |        | - JS/FE(5) |        |
              |        | - Logs (3) |        |
              |        | - Intel(3) |        |
              |        | - Mobile(1)|        |
              |        | - DNS (2)  |        |
              |        | - API (3)  |        |
              |        +------+-----+        |
              |               |              |
              +-------+-------+--------------+
                      |
              +-------v--------+
              | Scanner Engine |
              | - matcher.go   |
              | - verifier.go  |
              +-------+--------+
                      |
         +------------+-------------+
         |            |             |
   +-----v----+ +----v-----+ +----v-------+
   | Output   | | Notify   | | Web        |
   | - table  | | - telegram| | Dashboard  |
   | - json   | | - webhook| | - htmx     |
   | - sarif  | | - slack  | | - REST API |
   | - csv    | +----------+ | - SQLite   |
   +----------+              +------------+

   +------------------------------------------+
   | Provider Registry (108+ YAML providers)  |
   | Dork Registry (50+ YAML dorks)           |
   +------------------------------------------+

Key Design Decisions

YAML Providers — Adding a new provider = adding a YAML file. No recompile needed for pattern-only changes (when using external provider dir). Built-in providers are embedded at compile time.
Keyword Pre-filtering — Before running regex, files are scanned for keywords. This provides ~10x speedup on large codebases.
Worker Pool — Parallel scanning with configurable worker count. Default: CPU count.
Delta-based Git Scanning — Only scans changes between commits, not entire trees.
SQLite Storage — All scan results persisted with AES-256 encryption.

Security & Ethics

Built-in Protections

Key values masked by default in terminal (first 8 + last 4 chars) — use --unmask for full keys
Full keys always available via: --unmask, --output=json, keyhunter keys show, web dashboard, Telegram bot
Database is AES-256 encrypted (full keys stored encrypted)
API tokens stored encrypted in config
No key values written to logs during --verify
Web dashboard supports basic auth / token auth

Rate Limiting

Source	Rate Limit
GitHub API (auth)	30 req/min
GitHub API (unauth)	10 req/min
Shodan	Per API plan
Censys	250 queries/day (free)
ZoomEye	10,000 results/month (free)
FOFA	100 results/query (free)
Netlas	50 queries/day (free)
Google Custom Search	100/day free, 10K/day paid
Bing Search	1,000/month (free)
Stack Overflow	300/day (no key), 10K/day (key)
HN Algolia	10,000 req/hour
VirusTotal	4 req/min (free)
IntelX	10 searches/day (free)
GrayHatWarfare	Per plan
Wayback Machine	~15 req/min
Paste sites	1 req/2sec
npm/PyPI	Generous, be respectful
Trello	100 req/10sec
Docker Hub	100 pulls/6hr (unauth)

Stealth & Ethics Flags

--stealth           # User-agent rotation, increased request spacing
--respect-robots    # Respect robots.txt (default: on)

Use Cases

Red Team / Pentest

# Full multi-source recon against a target org
keyhunter recon github --query="targetcorp OPENAI_API_KEY"
keyhunter recon gitlab --query="targetcorp api_key"
keyhunter recon shodan --dork='http.html:"targetcorp" "sk-"'
keyhunter recon censys --query='services.http.response.body:"targetcorp" AND "api_key"'
keyhunter recon zoomeye --query='site:targetcorp.com +"api_key"'
keyhunter recon elasticsearch --shodan   # Find exposed ES with leaked keys
keyhunter recon jenkins --shodan         # Exposed Jenkins with build logs
keyhunter recon dotenv --domain-list=targetcorp-subdomains.txt  # .env exposure
keyhunter recon wayback --domain=targetcorp.com  # Historical leaks
keyhunter recon sourcemaps --domain=app.targetcorp.com  # JS source maps
keyhunter recon crtsh --domain=targetcorp.com  # Discover API subdomains
keyhunter recon full --providers=openai,anthropic  # Everything at once

DevSecOps / CI Pipeline

# Pre-commit hook
keyhunter hook install

# GitHub Actions step
- name: KeyHunter Scan
  run: |
    keyhunter scan path . --output=sarif > keyhunter.sarif
    # Upload to GitHub Security tab

Bug Bounty

# Comprehensive target recon
keyhunter recon github --org=targetcorp --dork=auto --verify
keyhunter recon gist --query="targetcorp"
keyhunter recon paste --sources=all --query="targetcorp"
keyhunter recon postman --query="targetcorp"
keyhunter recon trello --query="targetcorp api key"
keyhunter recon notion --query="targetcorp API_KEY"
keyhunter recon confluence --shodan
keyhunter recon npm --query="targetcorp"   # Check their published packages
keyhunter recon pypi --query="targetcorp"
keyhunter recon docker --query="targetcorp" --layers  # Docker image layer scan
keyhunter recon apk --query="targetcorp"   # Mobile app decompile
keyhunter recon swagger --domain=api.targetcorp.com

Monitoring / Alerting

# Continuous monitoring with Telegram alerts
keyhunter schedule add \
  --name="monitor-github" \
  --cron="*/30 * * * *" \
  --command="recon github --dork=auto --providers=openai" \
  --notify=telegram

keyhunter serve --telegram

Dork Examples (150+ Built-in)

GitHub

filename:.env "OPENAI_API_KEY"
filename:.env "ANTHROPIC_API_KEY"
filename:config.yaml "api_key" "sk-"
"sk-proj-" language:python
"sk-ant-api03" language:javascript
filename:docker-compose "API_KEY"
"api_key" extension:ipynb
filename:.toml "api_key" "sk-"
filename:terraform.tfvars "api_key"
"kind: Secret" "data:" filename:*.yaml          # K8s secrets
filename:.npmrc "_authToken"                     # npm tokens
filename:requirements.txt "openai" path:.env     # Python projects

GitLab

"OPENAI_API_KEY" filename:.env
"sk-ant-" filename:*.py
"api_key" filename:settings.json

Google Dorking

"sk-proj-" -github.com -stackoverflow.com        # Outside known code sites
"sk-ant-api03-" filetype:env
"OPENAI_API_KEY" filetype:yml
"ANTHROPIC_API_KEY" filetype:json
inurl:.env "API_KEY"
intitle:"index of" .env
site:pastebin.com "sk-proj-"
site:replit.com "OPENAI_API_KEY"
site:codesandbox.io "sk-ant-"
site:notion.so "API_KEY"
site:trello.com "openai"
site:docs.google.com "sk-proj-"
site:medium.com "ANTHROPIC_API_KEY"
site:dev.to "sk-proj-"
site:huggingface.co "OPENAI_API_KEY"
site:kaggle.com "api_key" "sk-"
intitle:"Swagger UI" "api_key"
inurl:graphql "authorization" "Bearer sk-"
filetype:tfstate "api_key"                       # Terraform state
filetype:ipynb "sk-proj-"                        # Jupyter notebooks

Shodan

http.html:"openai" "api_key" port:8080
http.title:"LiteLLM" port:4000
http.html:"ollama" port:11434
http.title:"Kubernetes Dashboard"
"X-Jenkins" "200 OK"
http.title:"Kibana" port:5601
http.title:"Grafana"
http.title:"Swagger UI"
http.title:"Gitea" port:3000
http.html:"PrivateBin"
http.title:"MinIO Browser"
http.title:"Sentry"
http.title:"Confluence"
port:6443 "kube-apiserver"
http.html:"langchain" port:8000

Censys

services.http.response.body:"openai" and services.http.response.body:"sk-"
services.http.response.body:"langchain" and services.port:8000
services.http.response.body:"OPENAI_API_KEY"
services.http.response.body:"sk-ant-api03"

ZoomEye

app:"Elasticsearch" +"api_key"
app:"Jenkins" +openai
app:"Grafana" +anthropic
app:"Gitea"

FOFA

body="sk-proj-"
body="OPENAI_API_KEY"
body="sk-ant-api03"
title="LiteLLM"
title="Swagger UI" && body="api_key"
title="Kibana" && body="authorization"

Contributing

Adding a New Provider

Create providers/your-provider.yaml:

id: your-provider
name: Your Provider
category: emerging
website: https://api.yourprovider.com
confidence: medium

patterns:
  - id: your-provider-key
    name: "Your Provider API Key"
    regex: '\byp_[A-Za-z0-9]{32}\b'
    confidence: high
    description: "Your Provider API key with yp_ prefix"

keywords:
  - "yp_"
  - "YOUR_PROVIDER_API_KEY"

verify:
  enabled: true
  method: GET
  url: "https://api.yourprovider.com/v1/models"
  headers:
    Authorization: "Bearer {{key}}"
  success_codes: [200]
  failure_codes: [401, 403]

metadata:
  docs: "https://docs.yourprovider.com"
  key_url: "https://dashboard.yourprovider.com/keys"
  env_vars: ["YOUR_PROVIDER_API_KEY"]

Run tests: go test ./pkg/provider/...
Submit a PR

Adding a New Dork

Edit dorks/<source>.yaml and add your dork entry
Submit a PR

Roadmap

Core scanning engine (file, git, stdin)
108 provider YAML definitions
Active verification for all providers
CLI with Cobra (scan, verify, import, recon, serve)
TruffleHog & Gitleaks import adapters
OSINT/Recon engine (Shodan, Censys, GitHub, GitLab, Paste, S3)
Built-in dork engine with 50+ dorks
Web dashboard (htmx + Tailwind + SQLite)
Telegram bot with auto-notifications
Scheduled scanning (cron-based)
Pre-commit hook & CI/CD integration (SARIF)
Docker image
Homebrew formula

Disclaimer

KeyHunter is designed for authorized security testing, defensive security, bug bounty programs, and educational purposes only. Always ensure you have proper authorization before scanning any target. Unauthorized access to computer systems is illegal.

License

MIT License - see LICENSE for details.

README.md Unescape Escape

KeyHunter

Why KeyHunter?

How It Compares

Features

Core Scanning

OSINT / Recon Engine (80+ Sources, 18 Categories)

Active Verification

External Tool Integration

Notifications & Dashboard

Quick Start

Install

Basic Usage

OSINT / Recon

Viewing Full API Keys

Verify a Single Key

Import External Tools

Web Dashboard & Telegram Bot

CI/CD Integration

Scheduled Scanning

Configuration

Config File (~/.keyhunter.yaml)

Supported Providers (108)

Tier 1 — Frontier

Tier 2 — Inference Platforms

Tier 3 — Specialized/Vertical

Tier 4 — Chinese/Regional

Tier 5 — Infrastructure/Gateway

Tier 6 — Emerging/Niche

Tier 7 — Code & Dev Tools

Tier 8 — Self-Hosted/Open Infra

Tier 9 — Enterprise/Legacy

Architecture

Key Design Decisions

Security & Ethics

Built-in Protections

Rate Limiting

Stealth & Ethics Flags

Use Cases

Red Team / Pentest

DevSecOps / CI Pipeline

Bug Bounty

Monitoring / Alerting

Dork Examples (150+ Built-in)

GitHub

GitLab

Google Dorking

Shodan

Censys

ZoomEye

FOFA

Contributing

Adding a New Provider

Adding a New Dork

Roadmap

Disclaimer

License

README.md

Config File (`~/.keyhunter.yaml`)