diff --git a/README.md b/README.md index 20d72c7..7b6c324 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ Existing tools like TruffleHog (~3 LLM detectors) and Gitleaks (~5 LLM rules) were built for general secret scanning. AI-related credential leaks grew **81% year-over-year** in 2025, yet no tool covers more than ~15 LLM providers. -**KeyHunter fills that gap** with 108+ provider-specific detectors, active key validation, OSINT/recon capabilities, and real-time notifications. +**KeyHunter fills that gap** with 108+ provider-specific detectors, active key validation, OSINT/recon capabilities, and a growing set of internet sources for leak discovery. ### How It Compares @@ -20,138 +20,158 @@ Existing tools like TruffleHog (~3 LLM detectors) and Gitleaks (~5 LLM rules) we |---------|-----------|------------|----------|----------------| | LLM Providers | **108+** | ~3 | ~5 | ~1 | | Active Verification | **108+ endpoints** | ~20 types | No | No | -| OSINT/Recon | **Shodan, Censys, GitHub, GitLab, Paste, S3** | No | No | No | +| OSINT/Recon Sources | **18 live** (80+ planned) | No | No | No | | External Tool Import | **TruffleHog + Gitleaks** | - | - | - | -| Web Dashboard | **Built-in** | No | No | No | -| Telegram Bot | **Built-in** | No | No | No | -| Dork Engine | **Built-in YAML dorks** | No | No | No | +| Dork Engine | **150 built-in YAML dorks** | No | No | No | +| Pre-commit Hook | **Built-in** | Yes | Yes | Yes | +| SARIF Output | **Yes** | Yes | Yes | No | | Provider YAML Plugin | **Community-extensible** | Go code only | TOML rules | Python plugins | -| Scheduled Scanning | **Cron-based** | No | No | No | +| Web Dashboard | Coming soon | No | No | No | +| Telegram Bot | Coming soon | No | No | No | +| Scheduled Scanning | Coming soon | No | No | No | --- ## Features -### Core Scanning -- **File/Directory scanning** with recursive traversal and glob exclusions -- **Git-aware scanning** — full history, branches, stash, delta-based diffs -- **stdin/pipe** support — `cat dump.txt | keyhunter scan stdin` -- **URL fetching** — scan any remote URL content -- **Clipboard scanning** — instant clipboard content analysis +### Implemented -### OSINT / Recon Engine (80+ Sources, 18 Categories) +#### Core Scanning Engine +- **3-stage pipeline** -- AC pre-filter, regex match, entropy scoring +- **ants worker pool** for parallel scanning with configurable worker count +- **108 provider YAML definitions** (Tier 1-9), dual-located with `go:embed` -**IoT & Internet Scanners** -- **Shodan** — exposed LLM proxies, dashboards, API endpoints -- **Censys** — HTTP body search for leaked credentials -- **ZoomEye** — Chinese IoT scanner, different coverage perspective -- **FOFA** — Asian infrastructure scanning, body content search -- **Netlas** — HTTP response body keyword search -- **BinaryEdge** — internet-wide scan data +#### Input Sources +- **File scanning** -- single file analysis +- **Directory scanning** -- recursive traversal with glob exclusions and mmap +- **Git history scanning** -- full commit history analysis +- **stdin/pipe** support -- `echo "sk-proj-..." | keyhunter scan stdin` +- **URL fetching** -- scan any remote URL content +- **Clipboard scanning** -- instant clipboard content analysis -**Code Hosting & Snippets** -- **GitHub / GitLab / Bitbucket** — code search with automated dorks -- **Codeberg / Gitea instances** — alternative Git platforms (Gitea auto-discovered via Shodan) -- **Replit / CodeSandbox / StackBlitz / Glitch** — interactive dev environments with hardcoded keys -- **CodePen / JSFiddle / Observable** — browser snippet platforms -- **HuggingFace** — Spaces, repos, model configs (high-yield for LLM keys) -- **Kaggle** — notebooks and datasets with API keys -- **Jupyter / nbviewer** — shared notebooks -- **GitHub Gist** — public gist search -- **Gitpod** — workspace snapshots - -**Search Engine Dorking** -- **Google** — Custom Search API / SerpAPI, 100+ built-in dorks -- **Bing** — Azure Cognitive Services search -- **DuckDuckGo / Yandex / Brave** — alternative indexes for broader coverage - -**Paste Sites** -- **Multi-paste aggregator** — Pastebin, dpaste, paste.ee, rentry, hastebin, ix.io, and more - -**Package Registries** -- **npm / PyPI / RubyGems / crates.io / Maven / NuGet / Packagist / Go modules** — download packages, extract source, scan for key patterns - -**Container & Infrastructure** -- **Docker Hub** — image layer scanning, build arg extraction -- **Kubernetes** — exposed dashboards, public Secret/ConfigMap YAML files -- **Terraform** — state files (`.tfstate` with plaintext secrets), registry modules -- **Helm Charts / Ansible Galaxy** — default values with credentials - -**Cloud Storage** -- **AWS S3 / GCS / Azure Blob / DigitalOcean Spaces / Backblaze B2** — bucket enumeration and content scanning -- **MinIO** — self-hosted instances discovered via Shodan -- **GrayHatWarfare** — searchable database of public bucket objects - -**CI/CD Log Leaks** -- **Travis CI / CircleCI** — public build logs with leaked env vars -- **GitHub Actions** — workflow run log scanning -- **Jenkins** — exposed instances (Shodan-discovered), console output -- **GitLab CI/CD** — public pipeline job traces - -**Web Archives** -- **Wayback Machine** — historical snapshots of removed `.env` files, config pages -- **CommonCrawl** — massive web crawl data, WARC record scanning - -**Forums & Documentation** -- **Stack Overflow** — API + SEDE queries for code snippets with real keys -- **Reddit** — programming subreddit scanning -- **Hacker News** — Algolia API comment search -- **dev.to / Medium** — tutorial articles with hardcoded keys -- **Telegram groups** — public channels sharing configs and "free API keys" -- **Discord** — indexed public server content - -**Collaboration Tools** -- **Notion / Confluence** — public pages and spaces with credentials -- **Trello** — public boards with API key cards -- **Google Docs/Sheets** — publicly shared documents - -**Frontend & JavaScript Leaks** -- **JS Source Maps** — original source recovery with inlined secrets -- **Webpack / Vite bundles** — `REACT_APP_*`, `NEXT_PUBLIC_*`, `VITE_*` variable extraction -- **Exposed `.env` files** — misconfigured web servers serving dotenv from root -- **Swagger / OpenAPI docs** — real auth examples in API docs -- **Vercel / Netlify previews** — deploy preview JS bundles with production secrets - -**Log Aggregators** -- **Elasticsearch / Kibana** — exposed instances with application logs containing API keys -- **Grafana** — exposed dashboards with datasource configs -- **Sentry** — error tracking capturing request headers with keys - -**Threat Intelligence** -- **VirusTotal** — uploaded files/scripts containing embedded keys -- **Intelligence X** — aggregated paste, darknet, and leak search -- **URLhaus** — malicious URLs with API keys in parameters - -**Mobile Apps** -- **APK analysis** — download, decompile, grep for key patterns (via apktool/jadx) - -**DNS / Subdomain Discovery** -- **crt.sh** — Certificate Transparency log for API subdomain discovery -- **Subdomain probing** — config endpoint enumeration (`.env`, `/api/config`, `/actuator/env`) - -**API Marketplaces** -- **Postman** — public collections, workspaces, environments -- **SwaggerHub** — published API definitions with example values - -**`recon full`** — parallel sweep across all 80+ sources with deduplication and unified reporting - -### Active Verification -- Lightweight API calls to verify if detected keys are active +#### Active Verification +- YAML-driven `HTTPVerifier` -- lightweight API calls to verify if detected keys are active - Permission and scope extraction (org, rate limits, model access) +- Consent prompt and `LEGAL.md` for legal safety - Configurable via `--verify` flag (off by default) -- Provider-specific verification endpoints -### External Tool Integration -- **Import TruffleHog** JSON output — enrich with LLM-specific analysis -- **Import Gitleaks** JSON output — cross-reference with 108+ providers -- Generic CSV import for custom tool output +#### Output Formats +- **Table** -- colored terminal output with key masking (default) +- **JSON** -- full key values for programmatic consumption +- **CSV** -- spreadsheet-compatible export +- **SARIF 2.1.0** -- CI/CD integration (GitHub Code Scanning, etc.) +- Exit codes: `0` (clean), `1` (findings), `2` (error) -### Notifications & Dashboard -- **Telegram Bot** — scan triggers, key alerts, recon results -- **Web Dashboard** — htmx + Tailwind, SQLite-backed, real-time scan viewer -- **Webhook** — generic HTTP POST notifications -- **Slack** — workspace notifications -- **Scheduled scans** — cron-based recurring scans with auto-notify +#### Key Management +- `keyhunter keys list` -- list all discovered keys (masked by default) +- `keyhunter keys show ` -- full key details +- `keyhunter keys export` -- export in JSON/CSV format +- `keyhunter keys copy ` -- copy key to clipboard +- `keyhunter keys delete ` -- remove a key from the database +- `keyhunter keys verify ` -- verify a specific key + +#### External Tool Import +- **TruffleHog v3** JSON import with LLM-specific enrichment +- **Gitleaks** JSON and CSV import +- Deduplication across imports via `(provider, masked_key, source)` hashing + +#### Git Pre-commit Hook +- `keyhunter hook install` -- embedded shell script, blocks leaks before commit +- `keyhunter hook uninstall` -- clean removal +- Backup of existing hooks with `--force` + +#### Dork Engine +- **150 built-in YAML dorks** across 8 source types (GitHub, GitLab, Google, Shodan, Censys, ZoomEye, FOFA, Bing) +- GitHub live executor with authenticated API +- CLI management: `keyhunter dorks list`, `keyhunter dorks list --source=github`, `keyhunter dorks add`, `keyhunter dorks run`, `keyhunter dorks export` + +#### OSINT / Recon Engine (18 Sources Live) + +The recon framework provides a `ReconSource` interface with per-source rate limiting, stealth mode, robots.txt compliance, parallel sweep, and result deduplication. + +**Code Hosting & Snippets** (live) +- **GitHub** -- code search with automated dorks +- **GitLab** -- code search +- **Bitbucket** -- code search +- **GitHub Gist** -- public gist search +- **Codeberg** -- alternative Git platform search +- **HuggingFace** -- Spaces, repos, model configs (high-yield for LLM keys) +- **Replit** -- public repl search +- **CodeSandbox** -- sandbox search +- **StackBlitz Sandboxes** -- sandbox search +- **Kaggle** -- notebooks and datasets with API keys + +**Search Engine Dorking** (live) +- **Google** -- Custom Search API / SerpAPI +- **Bing** -- Azure Cognitive Services search +- **DuckDuckGo** -- HTML scraping fallback +- **Yandex** -- XML API search +- **Brave** -- Brave Search API + +**Paste Sites** (live) +- **Pastebin** -- scraping API +- **GistPaste** -- paste search +- **PasteSites** -- multi-paste aggregator + +**`recon full`** -- parallel sweep across all 18 live sources with deduplication and unified reporting. + +#### CLI Commands +| Command | Status | +|---------|--------| +| `keyhunter scan` | Implemented | +| `keyhunter providers list/info/stats` | Implemented | +| `keyhunter config init/set/get` | Implemented | +| `keyhunter keys list/show/export/copy/delete/verify` | Implemented | +| `keyhunter import` | Implemented | +| `keyhunter hook install/uninstall` | Implemented | +| `keyhunter dorks list/add/run/export` | Implemented | +| `keyhunter recon full/list` | Implemented | +| `keyhunter legal` | Implemented | +| `keyhunter verify` | Stub | +| `keyhunter serve` | Stub | +| `keyhunter schedule` | Stub | + +### Coming Soon + +The following features are on the roadmap but not yet implemented: + +#### Phase 12 -- IoT Scanners & Cloud Storage +- **Shodan** -- exposed LLM proxies, dashboards, API endpoints +- **Censys** -- HTTP body search for leaked credentials +- **ZoomEye** -- IoT scanner +- **FOFA** -- Asian infrastructure scanning +- **Netlas** -- HTTP response body search +- **BinaryEdge** -- internet-wide scan data +- **AWS S3 / GCS / Azure Blob / DigitalOcean Spaces** -- bucket enumeration and scanning + +#### Phase 13 -- Package Registries, Containers & IaC +- **npm / PyPI / RubyGems / crates.io / Maven / NuGet** -- package source scanning +- **Docker Hub** -- image layer scanning +- **Terraform / Helm Charts / Ansible** -- IaC scanning + +#### Phase 14 -- CI/CD Logs, Web Archives & Frontend Leaks +- **GitHub Actions / Travis CI / CircleCI / Jenkins / GitLab CI** -- public build log scanning +- **Wayback Machine / CommonCrawl** -- historical web archive scanning +- **JS Source Maps / Webpack bundles / exposed .env** -- frontend leak detection + +#### Phase 15 -- Forums & Collaboration +- **Stack Overflow / Reddit / Hacker News / dev.to / Medium** -- forum scanning +- **Notion / Confluence / Trello** -- collaboration tool scanning +- **Elasticsearch / Grafana / Sentry** -- exposed log aggregators +- **Telegram groups / Discord** -- public channel scanning + +#### Phase 16 -- Threat Intel, Mobile, DNS & API Marketplaces +- **VirusTotal / Intelligence X / URLhaus** -- threat intelligence +- **APK analysis** -- mobile app decompilation +- **crt.sh / subdomain probing** -- DNS/subdomain discovery +- **Postman / SwaggerHub** -- API marketplace scanning + +#### Phase 17 -- Telegram Bot & Scheduler +- **Telegram Bot** -- scan triggers, key alerts, recon results +- **Scheduled scanning** -- cron-based recurring scans with auto-notify + +#### Phase 18 -- Web Dashboard +- **Web Dashboard** -- htmx + Tailwind, SQLite-backed, real-time scan viewer --- @@ -161,161 +181,111 @@ Existing tools like TruffleHog (~3 LLM detectors) and Gitleaks (~5 LLM rules) we ```bash # From source -go install github.com/keyhunter/keyhunter@latest +go install github.com/salvacybersec/keyhunter@latest -# Binary release -curl -sSL https://get.keyhunter.dev | bash - -# Docker -docker pull keyhunter/keyhunter:latest +# Binary release (when available) +curl -sSL https://github.com/salvacybersec/keyhunter/releases/latest/download/keyhunter_linux_amd64.tar.gz | tar -xz +sudo mv keyhunter /usr/local/bin/ ``` ### Basic Usage ```bash # Scan a directory -keyhunter scan path ./my-project/ +keyhunter scan ./my-project/ # Scan with active verification -keyhunter scan path ./my-project/ --verify +keyhunter scan ./my-project/ --verify -# Scan git history (last 30 days) -keyhunter scan git . --since="30 days ago" +# Scan git history +keyhunter scan --git . # Scan from pipe cat secrets.txt | keyhunter scan stdin # Scan only specific providers -keyhunter scan path . --providers=openai,anthropic,deepseek +keyhunter scan . --providers=openai,anthropic,deepseek # JSON output -keyhunter scan path . --output=json > results.json +keyhunter scan . --output=json > results.json + +# SARIF output for CI/CD +keyhunter scan . --output=sarif > keyhunter.sarif + +# CSV output +keyhunter scan . --output=csv > results.csv ``` ### OSINT / Recon ```bash -# ── IoT & Internet Scanners ── -keyhunter recon shodan --dork="http.title:\"LiteLLM\" port:4000" -keyhunter recon censys --query='services.http.response.body:"sk-proj-"' -keyhunter recon zoomeye --query='app:"Elasticsearch" +"api_key"' -keyhunter recon fofa --query='body="OPENAI_API_KEY"' -keyhunter recon netlas --query='http.body:"sk-ant-"' +# Full sweep across all 18 live sources +keyhunter recon full -# ── Code Hosting ── -keyhunter recon github --dork=auto # Tum built-in GitHub dork'lari -keyhunter recon gitlab --dork=auto -keyhunter recon bitbucket --query="OPENAI_API_KEY" -keyhunter recon replit --query="sk-proj-" # Public repl'ler -keyhunter recon huggingface --spaces --query="api_key" # HF Spaces -keyhunter recon kaggle --notebooks --query="openai" -keyhunter recon codesandbox --query="sk-ant-" -keyhunter recon glitch --query="ANTHROPIC_API_KEY" -keyhunter recon gitea --instances-from=shodan # Auto-discover Gitea instances +# Sweep specific sources only +keyhunter recon full --sources=github,gitlab,gist -# ── Search Engine Dorking ── -keyhunter recon google --dork=auto # 100+ built-in Google dorks -keyhunter recon google --dork='"sk-proj-" -github.com filetype:env' -keyhunter recon bing --dork=auto -keyhunter recon brave --query="OPENAI_API_KEY filetype:yaml" +# List available recon sources +keyhunter recon list -# ── Package Registries ── -keyhunter recon npm --recent --query="openai" # Scan yeni paketler -keyhunter recon pypi --recent --query="llm" -keyhunter recon crates --query="api_key" +# Code hosting sources +keyhunter recon full --sources=github +keyhunter recon full --sources=gitlab +keyhunter recon full --sources=bitbucket +keyhunter recon full --sources=gist +keyhunter recon full --sources=codeberg +keyhunter recon full --sources=huggingface +keyhunter recon full --sources=replit +keyhunter recon full --sources=codesandbox +keyhunter recon full --sources=sandboxes +keyhunter recon full --sources=kaggle -# ── Cloud Storage ── -keyhunter recon s3 --domain=targetcorp # S3 bucket enumeration -keyhunter recon gcs --domain=targetcorp # GCS buckets -keyhunter recon azure --domain=targetcorp # Azure Blob -keyhunter recon minio --shodan # Exposed MinIO instances -keyhunter recon grayhat --query="openai api_key" # GrayHatWarfare search +# Search engine dorking +keyhunter recon full --sources=google +keyhunter recon full --sources=bing +keyhunter recon full --sources=duckduckgo +keyhunter recon full --sources=yandex +keyhunter recon full --sources=brave -# ── CI/CD Logs ── -keyhunter recon ghactions --org=targetcorp # GitHub Actions logs -keyhunter recon travis --org=targetcorp -keyhunter recon jenkins --shodan # Exposed Jenkins instances -keyhunter recon circleci --org=targetcorp +# Paste sites +keyhunter recon full --sources=pastebin +keyhunter recon full --sources=gistpaste +keyhunter recon full --sources=pastesites +``` -# ── Web Archives ── -keyhunter recon wayback --domain=targetcorp.com # Wayback Machine -keyhunter recon commoncrawl --domain=targetcorp.com +### Dork Management -# ── Frontend & JS ── -keyhunter recon dotenv --domain-list=targets.txt # Exposed .env files -keyhunter recon sourcemaps --domain=app.target.com # JS source maps -keyhunter recon webpack --url=https://app.target.com/main.js -keyhunter recon swagger --shodan # Exposed Swagger UI's -keyhunter recon deploys --domain=targetcorp # Vercel/Netlify previews - -# ── Forums ── -keyhunter recon stackoverflow --query="sk-proj-" -keyhunter recon reddit --subreddit=openai --query="api key" -keyhunter recon hackernews --query="leaked api key" -keyhunter recon telegram-groups --query="free api key" - -# ── Collaboration ── -keyhunter recon notion --query="API_KEY" # Google dorked -keyhunter recon confluence --shodan # Exposed instances -keyhunter recon trello --query="openai api key" - -# ── Log Aggregators ── -keyhunter recon elasticsearch --shodan # Exposed ES instances -keyhunter recon grafana --shodan -keyhunter recon sentry --shodan - -# ── Threat Intelligence ── -keyhunter recon virustotal --query="sk-proj-" -keyhunter recon intelx --query="sk-ant-api03" # Intelligence X -keyhunter recon urlhaus --query="openai" - -# ── Mobile Apps ── -keyhunter recon apk --query="ai chatbot" # APK download + decompile - -# ── DNS/Subdomain ── -keyhunter recon crtsh --domain=targetcorp.com # Cert transparency -keyhunter recon subdomain --domain=targetcorp.com --probe-configs - -# ── Full Sweep ── -keyhunter recon full --providers=openai,anthropic # ALL 80+ sources parallel -keyhunter recon full --categories=code,cloud # Category-filtered sweep - -# ── Dork Management ── -keyhunter dorks list # All dorks across all sources -keyhunter dorks list --source=github -keyhunter dorks list --source=google +```bash +keyhunter dorks list # All dorks across all sources +keyhunter dorks list --source=github # GitHub dorks only +keyhunter dorks list --source=google # Google dorks only keyhunter dorks add github 'filename:.env "GROQ_API_KEY"' -keyhunter dorks run google --category=frontier # Run Google dorks for frontier providers +keyhunter dorks run google --category=frontier keyhunter dorks export ``` -### Viewing Full API Keys +### Key Management -Default olarak key'ler terminalde maskelenir (omuz surfing koruması). Gerçek key'e erişim yolları: +Keys are masked by default in terminal output (shoulder surfing protection). Ways to access full key values: ```bash -# 1. CLI'da --unmask flag'i ile tam key gör -keyhunter scan path . --unmask -# Provider | Key | Confidence | File | Line | Status -# ─────────────┼──────────────────────────────────────────────┼────────────┼───────────────┼──────┼──────── -# OpenAI | sk-proj-abc123def456ghi789jkl012mno345pqr678 | HIGH | src/config.py | 42 | ACTIVE +# Show full keys in scan output +keyhunter scan . --unmask -# 2. JSON export — her zaman tam key içerir -keyhunter scan path . --output=json > results.json +# JSON export always includes full keys +keyhunter scan . --output=json > results.json -# 3. Key management komutu — bulunan tüm key'leri yönet -keyhunter keys list # Maskelenmiş liste -keyhunter keys list --unmask # Tam key'li liste -keyhunter keys show # Tek key tam detay (her zaman unmasked) -keyhunter keys copy # Key'i clipboard'a kopyala -keyhunter keys export --format=json # Tüm key'leri tam değerleriyle export et -keyhunter keys verify # Key'i doğrula + tam detay göster - -# 4. Web Dashboard — /keys/:id sayfasında "Reveal Key" butonu -# 5. Telegram Bot — /key komutu ile tam key +# Key management commands +keyhunter keys list # Masked list +keyhunter keys list --unmask # Full key list +keyhunter keys show # Single key full details (always unmasked) +keyhunter keys copy # Copy key to clipboard +keyhunter keys export --format=json # Export all keys with full values +keyhunter keys verify # Verify key + show full details +keyhunter keys delete # Remove key from database ``` -**Örnek `keyhunter keys show` çıktısı:** +**Example `keyhunter keys show` output:** ``` ID: a3f7b2c1 Provider: OpenAI @@ -331,42 +301,20 @@ keyhunter keys verify # Key'i doğrula + tam detay göster Revoke URL: https://platform.openai.com/api-keys ``` -### Verify a Single Key - -```bash -keyhunter verify sk-proj-abc123... -# Output: -# Provider: OpenAI -# Status: ACTIVE -# Org: my-org -# Rate Limit: 500 req/min -# Revoke: https://platform.openai.com/api-keys -``` - ### Import External Tools ```bash # Run TruffleHog, then enrich with KeyHunter trufflehog git . --json > trufflehog.json -keyhunter import trufflehog trufflehog.json --verify +keyhunter import --format=trufflehog trufflehog.json # Run Gitleaks, then enrich -gitleaks detect -r gitleaks.json -keyhunter import gitleaks gitleaks.json -``` +gitleaks detect -f json -r gitleaks.json +keyhunter import --format=gitleaks gitleaks.json -### Web Dashboard & Telegram Bot - -```bash -# Start web dashboard -keyhunter serve --port=8080 - -# Start with Telegram bot -keyhunter serve --port=8080 --telegram - -# Configure Telegram -keyhunter config set telegram.token "YOUR_BOT_TOKEN" -keyhunter config set telegram.chat_id "YOUR_CHAT_ID" +# Gitleaks CSV +gitleaks detect -f csv -r gitleaks.csv +keyhunter import --format=gitleaks-csv gitleaks.csv ``` ### CI/CD Integration @@ -394,27 +342,6 @@ keyhunter scan . && echo "Clean" || echo "Keys found!" See [docs/CI-CD.md](docs/CI-CD.md) for the full guide, including a copy-paste GitHub Actions workflow and the pre-commit hook install/uninstall lifecycle. -### Scheduled Scanning - -```bash -# Daily GitHub recon at 09:00 -keyhunter schedule add \ - --name="daily-github" \ - --cron="0 9 * * *" \ - --command="recon github --dork=auto" \ - --notify=telegram - -# Hourly paste site monitoring -keyhunter schedule add \ - --name="hourly-paste" \ - --cron="0 * * * *" \ - --command="recon paste --sources=pastebin" \ - --notify=telegram - -keyhunter schedule list -keyhunter schedule remove daily-github -``` - --- ## Configuration @@ -424,37 +351,21 @@ keyhunter schedule remove daily-github keyhunter config init # Creates ~/.keyhunter.yaml -# Set API keys for recon sources -keyhunter config set shodan.apikey "YOUR_SHODAN_KEY" -keyhunter config set censys.api_id "YOUR_CENSYS_ID" -keyhunter config set censys.api_secret "YOUR_CENSYS_SECRET" -keyhunter config set github.token "YOUR_GITHUB_TOKEN" -keyhunter config set gitlab.token "YOUR_GITLAB_TOKEN" -keyhunter config set zoomeye.apikey "YOUR_ZOOMEYE_KEY" -keyhunter config set fofa.email "YOUR_FOFA_EMAIL" -keyhunter config set fofa.apikey "YOUR_FOFA_KEY" -keyhunter config set netlas.apikey "YOUR_NETLAS_KEY" -keyhunter config set binaryedge.apikey "YOUR_BINARYEDGE_KEY" -keyhunter config set google.cx "YOUR_GOOGLE_CX_ID" -keyhunter config set google.apikey "YOUR_GOOGLE_API_KEY" -keyhunter config set bing.apikey "YOUR_BING_API_KEY" -keyhunter config set brave.apikey "YOUR_BRAVE_API_KEY" -keyhunter config set virustotal.apikey "YOUR_VT_KEY" -keyhunter config set intelx.apikey "YOUR_INTELX_KEY" -keyhunter config set grayhat.apikey "YOUR_GRAYHAT_KEY" -keyhunter config set reddit.client_id "YOUR_REDDIT_ID" -keyhunter config set reddit.client_secret "YOUR_REDDIT_SECRET" -keyhunter config set stackoverflow.apikey "YOUR_SO_KEY" -keyhunter config set kaggle.username "YOUR_KAGGLE_USER" -keyhunter config set kaggle.apikey "YOUR_KAGGLE_KEY" +# Set API tokens for recon sources (currently supported) +keyhunter config set recon.github.token "YOUR_GITHUB_TOKEN" +keyhunter config set recon.gitlab.token "YOUR_GITLAB_TOKEN" +keyhunter config set recon.bitbucket.token "YOUR_BITBUCKET_TOKEN" +keyhunter config set recon.huggingface.token "YOUR_HF_TOKEN" +keyhunter config set recon.kaggle.token "YOUR_KAGGLE_TOKEN" +keyhunter config set recon.google.apikey "YOUR_GOOGLE_API_KEY" +keyhunter config set recon.google.cx "YOUR_GOOGLE_CX_ID" +keyhunter config set recon.bing.apikey "YOUR_BING_API_KEY" +keyhunter config set recon.brave.apikey "YOUR_BRAVE_API_KEY" +keyhunter config set recon.yandex.apikey "YOUR_YANDEX_API_KEY" +keyhunter config set recon.yandex.user "YOUR_YANDEX_USER" -# Set notification channels -keyhunter config set telegram.token "YOUR_BOT_TOKEN" -keyhunter config set telegram.chat_id "YOUR_CHAT_ID" -keyhunter config set webhook.url "https://your-webhook.com/alert" - -# Database encryption -keyhunter config set db.password "YOUR_DB_PASSWORD" +# View current config +keyhunter config get recon.github.token ``` ### Config File (`~/.keyhunter.yaml`) @@ -464,53 +375,43 @@ scan: workers: 8 verify_timeout: 10s default_output: table - respect_robots: true recon: stealth: false - rate_limits: - github: 30 # req/min - shodan: 1 # req/sec - censys: 5 # req/sec - zoomeye: 10 # req/sec - fofa: 1 # req/sec - netlas: 1 # req/sec - google: 100 # req/day (Custom Search API) - bing: 3 # req/sec - stackoverflow: 30 # req/sec - hackernews: 100 # req/min - paste: 0.5 # req/sec - npm: 10 # req/sec - pypi: 5 # req/sec - virustotal: 4 # req/min (free tier) - intelx: 10 # req/day (free tier) - grayhat: 5 # req/sec - wayback: 15 # req/min - trello: 10 # req/sec - devto: 1 # req/sec + respect_robots: true + github: + token: "" + gitlab: + token: "" + bitbucket: + token: "" + huggingface: + token: "" + kaggle: + token: "" + google: + apikey: "" + cx: "" + bing: + apikey: "" + brave: + apikey: "" + yandex: + apikey: "" + user: "" +``` -telegram: - token: "encrypted:..." - chat_id: "123456789" - auto_notify: true - -web: - port: 8080 - auth: - enabled: false - username: admin - password: "encrypted:..." - -db: - path: ~/.keyhunter/keyhunter.db - encrypted: true +### Stealth & Ethics Flags +```bash +--stealth # User-agent rotation, increased request spacing +--respect-robots # Respect robots.txt (default: on) ``` --- ## Supported Providers (108) -### Tier 1 — Frontier +### Tier 1 -- Frontier | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| @@ -527,7 +428,7 @@ db: | Inflection AI | Generic UUID | Low | `GET /api/models` | | AI21 Labs | Generic key | Low | `GET /v1/models` | -### Tier 2 — Inference Platforms +### Tier 2 -- Inference Platforms | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| @@ -546,7 +447,7 @@ db: | OctoAI | Generic key | Low | `GET /v1/models` | | Friendli AI | Generic key | Low | `GET /v1/models` | -### Tier 3 — Specialized/Vertical +### Tier 3 -- Specialized/Vertical | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| @@ -563,7 +464,7 @@ db: | Midjourney | Generic key | Low | N/A | | HuggingFace | `hf_*` | High | `GET /api/whoami` | -### Tier 4 — Chinese/Regional +### Tier 4 -- Chinese/Regional | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| @@ -584,7 +485,7 @@ db: | Tencent Hunyuan | SecretId + SecretKey | Medium | `DescribeModels` | | SiliconFlow | `sf_*` | High | `GET /v1/models` | -### Tier 5 — Infrastructure/Gateway +### Tier 5 -- Infrastructure/Gateway | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| @@ -600,7 +501,7 @@ db: | Aether | Generic key | Low | `GET /v1/models` | | Not Diamond | Generic key | Low | `GET /v1/models` | -### Tier 6 — Emerging/Niche +### Tier 6 -- Emerging/Niche | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| @@ -620,7 +521,7 @@ db: | Neon AI | Generic key | Low | N/A | | Lamini | Generic key | Low | `GET /v1/models` | -### Tier 7 — Code & Dev Tools +### Tier 7 -- Code & Dev Tools | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| @@ -635,7 +536,7 @@ db: | IBM watsonx.ai | `ibm_*` | Medium | IAM token endpoint | | Oracle AI | Generic key | Low | N/A | -### Tier 8 — Self-Hosted/Open Infra +### Tier 8 -- Self-Hosted/Open Infra | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| @@ -650,7 +551,7 @@ db: | Triton Inference Server | N/A | N/A | `GET /v2/health/ready` | | Jan AI | N/A (local) | N/A | `GET /v1/models` | -### Tier 9 — Enterprise/Legacy +### Tier 9 -- Enterprise/Legacy | Provider | Key Pattern | Confidence | Verify | |----------|-------------|------------|--------| @@ -677,26 +578,15 @@ db: +--------v--+ +------v-----+ +-----v------+ | Input | | Recon | | Import | | Adapters | | Engine | | Adapters | - | - file | | (80+ src) | | - trufflehog| - | - git | | - IoT (6) | | - gitleaks | - | - stdin | | - Code(16) | | - generic | - | - url | | - Search(5)| +-----+------+ - | - clipboard| | - Paste(8+)| | - +--------+---+ | - Pkg (8) | | - | | - Cloud(7) | | - | | - CI/CD(5) | | - | | - Archive2 | | - | | - Forum(7) | | - | | - Collab(4)| | - | | - JS/FE(5) | | - | | - Logs (3) | | - | | - Intel(3) | | - | | - Mobile(1)| | - | | - DNS (2) | | - | | - API (3) | | - | +------+-----+ | - | | | - +-------+-------+--------------+ + | - file | | (18 live) | | - trufflehog| + | - dir | | - Code(10) | | - gitleaks | + | - git | | - Search(5)| +-----+------+ + | - stdin | | - Paste(3) | | + | - url | +------+-----+ | + | - clipboard| | | + +--------+---+ | | + | | | + +-------+------+--------------+ | +-------v--------+ | Scanner Engine | @@ -707,133 +597,30 @@ db: +------------+-------------+ | | | +-----v----+ +----v-----+ +----v-------+ - | Output | | Notify | | Web | - | - table | | - telegram| | Dashboard | - | - json | | - webhook| | - htmx | - | - sarif | | - slack | | - REST API | - | - csv | +----------+ | - SQLite | - +----------+ +------------+ + | Output | | Dork | | Key | + | - table | | Engine | | Management | + | - json | | - 150 | | - list | + | - sarif | | dorks | | - show | + | - csv | | - 8 src | | - export | + +----------+ +----------+ +------------+ +------------------------------------------+ | Provider Registry (108+ YAML providers) | - | Dork Registry (50+ YAML dorks) | + | Dork Registry (150 YAML dorks) | +------------------------------------------+ ``` ### Key Design Decisions -- **YAML Providers** — Adding a new provider = adding a YAML file. No recompile needed for pattern-only changes (when using external provider dir). Built-in providers are embedded at compile time. -- **Keyword Pre-filtering** — Before running regex, files are scanned for keywords. This provides ~10x speedup on large codebases. -- **Worker Pool** — Parallel scanning with configurable worker count. Default: CPU count. -- **Delta-based Git Scanning** — Only scans changes between commits, not entire trees. -- **SQLite Storage** — All scan results persisted with AES-256 encryption. +- **YAML Providers** -- Adding a new provider = adding a YAML file. No recompile needed for pattern-only changes (when using external provider dir). Built-in providers are embedded at compile time. +- **Keyword Pre-filtering** -- Before running regex, files are scanned for keywords via Aho-Corasick. This provides ~10x speedup on large codebases. +- **Worker Pool** -- Parallel scanning with configurable worker count via ants. Default: CPU count. +- **Delta-based Git Scanning** -- Only scans changes between commits, not entire trees. +- **SQLite Storage** -- All scan results persisted with AES-256 encryption. --- -## Security & Ethics - -### Built-in Protections -- Key values **masked by default** in terminal (first 8 + last 4 chars) — use `--unmask` for full keys -- **Full keys always available** via: `--unmask`, `--output=json`, `keyhunter keys show`, web dashboard, Telegram bot -- Database is **AES-256 encrypted** (full keys stored encrypted) -- API tokens stored **encrypted** in config -- No key values written to logs during `--verify` -- Web dashboard supports **basic auth / token auth** - -### Rate Limiting -| Source | Rate Limit | -|--------|-----------| -| GitHub API (auth) | 30 req/min | -| GitHub API (unauth) | 10 req/min | -| Shodan | Per API plan | -| Censys | 250 queries/day (free) | -| ZoomEye | 10,000 results/month (free) | -| FOFA | 100 results/query (free) | -| Netlas | 50 queries/day (free) | -| Google Custom Search | 100/day free, 10K/day paid | -| Bing Search | 1,000/month (free) | -| Stack Overflow | 300/day (no key), 10K/day (key) | -| HN Algolia | 10,000 req/hour | -| VirusTotal | 4 req/min (free) | -| IntelX | 10 searches/day (free) | -| GrayHatWarfare | Per plan | -| Wayback Machine | ~15 req/min | -| Paste sites | 1 req/2sec | -| npm/PyPI | Generous, be respectful | -| Trello | 100 req/10sec | -| Docker Hub | 100 pulls/6hr (unauth) | - -### Stealth & Ethics Flags -```bash ---stealth # User-agent rotation, increased request spacing ---respect-robots # Respect robots.txt (default: on) -``` - ---- - -## Use Cases - -### Red Team / Pentest -```bash -# Full multi-source recon against a target org -keyhunter recon github --query="targetcorp OPENAI_API_KEY" -keyhunter recon gitlab --query="targetcorp api_key" -keyhunter recon shodan --dork='http.html:"targetcorp" "sk-"' -keyhunter recon censys --query='services.http.response.body:"targetcorp" AND "api_key"' -keyhunter recon zoomeye --query='site:targetcorp.com +"api_key"' -keyhunter recon elasticsearch --shodan # Find exposed ES with leaked keys -keyhunter recon jenkins --shodan # Exposed Jenkins with build logs -keyhunter recon dotenv --domain-list=targetcorp-subdomains.txt # .env exposure -keyhunter recon wayback --domain=targetcorp.com # Historical leaks -keyhunter recon sourcemaps --domain=app.targetcorp.com # JS source maps -keyhunter recon crtsh --domain=targetcorp.com # Discover API subdomains -keyhunter recon full --providers=openai,anthropic # Everything at once -``` - -### DevSecOps / CI Pipeline -```bash -# Pre-commit hook -keyhunter hook install - -# GitHub Actions step -- name: KeyHunter Scan - run: | - keyhunter scan path . --output=sarif > keyhunter.sarif - # Upload to GitHub Security tab -``` - -### Bug Bounty -```bash -# Comprehensive target recon -keyhunter recon github --org=targetcorp --dork=auto --verify -keyhunter recon gist --query="targetcorp" -keyhunter recon paste --sources=all --query="targetcorp" -keyhunter recon postman --query="targetcorp" -keyhunter recon trello --query="targetcorp api key" -keyhunter recon notion --query="targetcorp API_KEY" -keyhunter recon confluence --shodan -keyhunter recon npm --query="targetcorp" # Check their published packages -keyhunter recon pypi --query="targetcorp" -keyhunter recon docker --query="targetcorp" --layers # Docker image layer scan -keyhunter recon apk --query="targetcorp" # Mobile app decompile -keyhunter recon swagger --domain=api.targetcorp.com -``` - -### Monitoring / Alerting -```bash -# Continuous monitoring with Telegram alerts -keyhunter schedule add \ - --name="monitor-github" \ - --cron="*/30 * * * *" \ - --command="recon github --dork=auto --providers=openai" \ - --notify=telegram - -keyhunter serve --telegram -``` - ---- - -## Dork Examples (150+ Built-in) +## Dork Examples (150 Built-in) ### GitHub ``` @@ -846,21 +633,11 @@ filename:docker-compose "API_KEY" "api_key" extension:ipynb filename:.toml "api_key" "sk-" filename:terraform.tfvars "api_key" -"kind: Secret" "data:" filename:*.yaml # K8s secrets -filename:.npmrc "_authToken" # npm tokens -filename:requirements.txt "openai" path:.env # Python projects -``` - -### GitLab -``` -"OPENAI_API_KEY" filename:.env -"sk-ant-" filename:*.py -"api_key" filename:settings.json ``` ### Google Dorking ``` -"sk-proj-" -github.com -stackoverflow.com # Outside known code sites +"sk-proj-" -github.com -stackoverflow.com "sk-ant-api03-" filetype:env "OPENAI_API_KEY" filetype:yml "ANTHROPIC_API_KEY" filetype:json @@ -868,67 +645,78 @@ inurl:.env "API_KEY" intitle:"index of" .env site:pastebin.com "sk-proj-" site:replit.com "OPENAI_API_KEY" -site:codesandbox.io "sk-ant-" -site:notion.so "API_KEY" -site:trello.com "openai" -site:docs.google.com "sk-proj-" -site:medium.com "ANTHROPIC_API_KEY" -site:dev.to "sk-proj-" -site:huggingface.co "OPENAI_API_KEY" -site:kaggle.com "api_key" "sk-" -intitle:"Swagger UI" "api_key" -inurl:graphql "authorization" "Bearer sk-" -filetype:tfstate "api_key" # Terraform state -filetype:ipynb "sk-proj-" # Jupyter notebooks ``` -### Shodan +### Shodan (for future IoT recon sources) ``` http.html:"openai" "api_key" port:8080 http.title:"LiteLLM" port:4000 http.html:"ollama" port:11434 http.title:"Kubernetes Dashboard" -"X-Jenkins" "200 OK" -http.title:"Kibana" port:5601 -http.title:"Grafana" -http.title:"Swagger UI" -http.title:"Gitea" port:3000 -http.html:"PrivateBin" -http.title:"MinIO Browser" -http.title:"Sentry" -http.title:"Confluence" -port:6443 "kube-apiserver" -http.html:"langchain" port:8000 ``` -### Censys -``` -services.http.response.body:"openai" and services.http.response.body:"sk-" -services.http.response.body:"langchain" and services.port:8000 -services.http.response.body:"OPENAI_API_KEY" -services.http.response.body:"sk-ant-api03" +--- + +## Use Cases + +### Red Team / Pentest +```bash +# Multi-source recon against a target org +keyhunter recon full --sources=github,gitlab,gist,pastebin + +# Scan a cloned repository +keyhunter scan ./target-repo/ --verify + +# Scan git history for rotated keys +keyhunter scan --git ./target-repo/ ``` -### ZoomEye -``` -app:"Elasticsearch" +"api_key" -app:"Jenkins" +openai -app:"Grafana" +anthropic -app:"Gitea" +### DevSecOps / CI Pipeline +```bash +# Pre-commit hook +keyhunter hook install + +# GitHub Actions step +- name: KeyHunter Scan + run: keyhunter scan . --output=sarif > keyhunter.sarif ``` -### FOFA -``` -body="sk-proj-" -body="OPENAI_API_KEY" -body="sk-ant-api03" -title="LiteLLM" -title="Swagger UI" && body="api_key" -title="Kibana" && body="authorization" +### Bug Bounty +```bash +# Search code hosting platforms for leaked keys +keyhunter recon full --sources=github,gitlab,bitbucket,gist,codeberg +keyhunter recon full --sources=huggingface,kaggle,replit,codesandbox + +# Search engine dorking +keyhunter recon full --sources=google,bing,duckduckgo,brave + +# Paste site monitoring +keyhunter recon full --sources=pastebin,pastesites,gistpaste ``` --- +## Security & Ethics + +### Built-in Protections +- Key values **masked by default** in terminal (first 8 + last 4 chars) -- use `--unmask` for full keys +- **Full keys always available** via: `--unmask`, `--output=json`, `keyhunter keys show` +- Database is **AES-256 encrypted** (full keys stored encrypted) +- API tokens stored **encrypted** in config +- No key values written to logs during `--verify` + +### Rate Limiting (Recon Sources) +| Source | Rate Limit | +|--------|-----------| +| GitHub API (auth) | 30 req/min | +| GitHub API (unauth) | 10 req/min | +| Google Custom Search | 100/day free, 10K/day paid | +| Bing Search | 1,000/month (free) | +| Brave Search | Per API plan | +| Paste sites | 1 req/2sec | + +--- + ## Contributing ### Adding a New Provider @@ -980,17 +768,28 @@ metadata: ## Roadmap -- [ ] Core scanning engine (file, git, stdin) -- [ ] 108 provider YAML definitions -- [ ] Active verification for all providers -- [ ] CLI with Cobra (scan, verify, import, recon, serve) -- [ ] TruffleHog & Gitleaks import adapters -- [ ] OSINT/Recon engine (Shodan, Censys, GitHub, GitLab, Paste, S3) -- [ ] Built-in dork engine with 50+ dorks -- [ ] Web dashboard (htmx + Tailwind + SQLite) +- [x] Core scanning engine (file, dir, git, stdin, url, clipboard) +- [x] 108 provider YAML definitions (Tier 1-9) +- [x] Active verification (YAML-driven HTTPVerifier) +- [x] Output formats: table, JSON, CSV, SARIF 2.1.0 +- [x] CLI with Cobra (scan, providers, config, keys, import, hook, dorks, recon, legal) +- [x] TruffleHog & Gitleaks import adapters +- [x] Key management (list, show, export, copy, delete, verify) +- [x] Git pre-commit hook (install/uninstall) +- [x] Dork engine with 150 built-in dorks across 8 sources +- [x] OSINT recon framework with 18 live sources +- [ ] IoT scanners (Shodan, Censys, ZoomEye, FOFA, Netlas, BinaryEdge) +- [ ] Cloud storage scanning (S3, GCS, Azure, DigitalOcean) +- [ ] Package registries (npm, PyPI, RubyGems, crates.io, Maven, NuGet) +- [ ] Container & IaC scanning (Docker Hub, Terraform, Helm, Ansible) +- [ ] CI/CD log scanning (GitHub Actions, Travis, CircleCI, Jenkins, GitLab CI) +- [ ] Web archives (Wayback Machine, CommonCrawl) +- [ ] Frontend leak detection (source maps, webpack, .env exposure) +- [ ] Forums & collaboration tools (Stack Overflow, Reddit, Notion, Trello) +- [ ] Threat intel (VirusTotal, Intelligence X, URLhaus) - [ ] Telegram bot with auto-notifications - [ ] Scheduled scanning (cron-based) -- [ ] Pre-commit hook & CI/CD integration (SARIF) +- [ ] Web dashboard (htmx + Tailwind + SQLite) - [ ] Docker image - [ ] Homebrew formula