Go to file

salvacybersec c8f7592b73 feat(17-02): add gocron dependency, subscribers and scheduled_jobs tables with CRUD

- Add gocron/v2 v2.19.1 as direct dependency
- Append subscribers and scheduled_jobs CREATE TABLE to schema.sql
- Implement full subscriber CRUD (Add/Remove/List/IsSubscribed)
- Implement full scheduled job CRUD (Save/List/Get/Delete/UpdateLastRun/SetEnabled)

2026-04-06 17:25:43 +03:00

.claude/worktrees

chore: add .claude/ to gitignore

2026-04-06 16:37:54 +03:00

.planning

docs(phase-16): complete threat intel, mobile, DNS, API marketplaces

2026-04-06 16:48:35 +03:00

cmd

feat(phase-16): wire all 9 Phase 16 sources + VT/IX/ST API keys

2026-04-06 16:48:35 +03:00

docs

merge: phase 14-03 frontend leaks

2026-04-06 13:21:39 +03:00

dorks

feat(08-04): add 10 FOFA + 10 GitLab + 5 Bing dorks

2026-04-06 00:21:41 +03:00

pkg

feat(17-02): add gocron dependency, subscribers and scheduled_jobs tables with CRUD

2026-04-06 17:25:43 +03:00

providers

feat(05-04): extend Tier 1 provider verify specs

2026-04-05 15:46:30 +03:00

testdata

test(07-03): SARIF GitHub code scanning validation

2026-04-05 23:55:38 +03:00

.gitignore

chore: add .claude/ to gitignore

2026-04-06 16:37:54 +03:00

CLAUDE.md

docs: create roadmap (18 phases)

2026-04-04 19:12:41 +03:00

go.mod

feat(17-02): add gocron dependency, subscribers and scheduled_jobs tables with CRUD

2026-04-06 17:25:43 +03:00

go.sum

feat(17-02): add gocron dependency, subscribers and scheduled_jobs tables with CRUD

2026-04-06 17:25:43 +03:00

LEGAL.md

feat(05-02): add LEGAL.md, embed it, and wire keyhunter legal command

2026-04-05 15:46:11 +03:00

main.go

feat(01-01): create main.go, test scaffolding, and testdata fixtures

2026-04-05 00:04:42 +03:00

README.md

docs: update README to reflect current implementation state (phases 1-11)

2026-04-06 12:20:42 +03:00

RESEARCH_REPORT.md

merge: phase 14-03 frontend leaks

2026-04-06 13:21:39 +03:00

tools.go

chore(01-01): initialize Go module with Phase 1 dependencies

2026-04-05 00:04:06 +03:00

README.md

KeyHunter

The most comprehensive API key scanner for LLM/AI providers. Detect, validate, and monitor leaked API keys across 108+ providers.

Why KeyHunter?

Existing tools like TruffleHog (~3 LLM detectors) and Gitleaks (~5 LLM rules) were built for general secret scanning. AI-related credential leaks grew 81% year-over-year in 2025, yet no tool covers more than ~15 LLM providers.

KeyHunter fills that gap with 108+ provider-specific detectors, active key validation, OSINT/recon capabilities, and a growing set of internet sources for leak discovery.

How It Compares

Feature	KeyHunter	TruffleHog	Gitleaks	detect-secrets
LLM Providers	108+	~3	~5	~1
Active Verification	108+ endpoints	~20 types	No	No
OSINT/Recon Sources	18 live (80+ planned)	No	No	No
External Tool Import	TruffleHog + Gitleaks	-	-	-
Dork Engine	150 built-in YAML dorks	No	No	No
Pre-commit Hook	Built-in	Yes	Yes	Yes
SARIF Output	Yes	Yes	Yes	No
Provider YAML Plugin	Community-extensible	Go code only	TOML rules	Python plugins
Web Dashboard	Coming soon	No	No	No
Telegram Bot	Coming soon	No	No	No
Scheduled Scanning	Coming soon	No	No	No

Features

Implemented

Core Scanning Engine

3-stage pipeline -- AC pre-filter, regex match, entropy scoring
ants worker pool for parallel scanning with configurable worker count
108 provider YAML definitions (Tier 1-9), dual-located with go:embed

Input Sources

File scanning -- single file analysis
Directory scanning -- recursive traversal with glob exclusions and mmap
Git history scanning -- full commit history analysis
stdin/pipe support -- echo "sk-proj-..." | keyhunter scan stdin
URL fetching -- scan any remote URL content
Clipboard scanning -- instant clipboard content analysis

Active Verification

YAML-driven HTTPVerifier -- lightweight API calls to verify if detected keys are active
Permission and scope extraction (org, rate limits, model access)
Consent prompt and LEGAL.md for legal safety
Configurable via --verify flag (off by default)

Output Formats

Table -- colored terminal output with key masking (default)
JSON -- full key values for programmatic consumption
CSV -- spreadsheet-compatible export
SARIF 2.1.0 -- CI/CD integration (GitHub Code Scanning, etc.)
Exit codes: 0 (clean), 1 (findings), 2 (error)

Key Management

keyhunter keys list -- list all discovered keys (masked by default)
keyhunter keys show <id> -- full key details
keyhunter keys export -- export in JSON/CSV format
keyhunter keys copy <id> -- copy key to clipboard
keyhunter keys delete <id> -- remove a key from the database
keyhunter keys verify <id> -- verify a specific key

External Tool Import

TruffleHog v3 JSON import with LLM-specific enrichment
Gitleaks JSON and CSV import
Deduplication across imports via (provider, masked_key, source) hashing

Git Pre-commit Hook

keyhunter hook install -- embedded shell script, blocks leaks before commit
keyhunter hook uninstall -- clean removal
Backup of existing hooks with --force

Dork Engine

150 built-in YAML dorks across 8 source types (GitHub, GitLab, Google, Shodan, Censys, ZoomEye, FOFA, Bing)
GitHub live executor with authenticated API
CLI management: keyhunter dorks list, keyhunter dorks list --source=github, keyhunter dorks add, keyhunter dorks run, keyhunter dorks export

OSINT / Recon Engine (18 Sources Live)

The recon framework provides a ReconSource interface with per-source rate limiting, stealth mode, robots.txt compliance, parallel sweep, and result deduplication.

Code Hosting & Snippets (live)

GitHub -- code search with automated dorks
GitLab -- code search
Bitbucket -- code search
GitHub Gist -- public gist search
Codeberg -- alternative Git platform search
HuggingFace -- Spaces, repos, model configs (high-yield for LLM keys)
Replit -- public repl search
CodeSandbox -- sandbox search
StackBlitz Sandboxes -- sandbox search
Kaggle -- notebooks and datasets with API keys

Search Engine Dorking (live)

Google -- Custom Search API / SerpAPI
Bing -- Azure Cognitive Services search
DuckDuckGo -- HTML scraping fallback
Yandex -- XML API search
Brave -- Brave Search API

Paste Sites (live)

Pastebin -- scraping API
GistPaste -- paste search
PasteSites -- multi-paste aggregator

recon full -- parallel sweep across all 18 live sources with deduplication and unified reporting.

CLI Commands

Command	Status
`keyhunter scan`	Implemented
`keyhunter providers list/info/stats`	Implemented
`keyhunter config init/set/get`	Implemented
`keyhunter keys list/show/export/copy/delete/verify`	Implemented
`keyhunter import`	Implemented
`keyhunter hook install/uninstall`	Implemented
`keyhunter dorks list/add/run/export`	Implemented
`keyhunter recon full/list`	Implemented
`keyhunter legal`	Implemented
`keyhunter verify`	Stub
`keyhunter serve`	Stub
`keyhunter schedule`	Stub

Coming Soon

The following features are on the roadmap but not yet implemented:

Phase 12 -- IoT Scanners & Cloud Storage

Shodan -- exposed LLM proxies, dashboards, API endpoints
Censys -- HTTP body search for leaked credentials
ZoomEye -- IoT scanner
FOFA -- Asian infrastructure scanning
Netlas -- HTTP response body search
BinaryEdge -- internet-wide scan data
AWS S3 / GCS / Azure Blob / DigitalOcean Spaces -- bucket enumeration and scanning

Phase 13 -- Package Registries, Containers & IaC

npm / PyPI / RubyGems / crates.io / Maven / NuGet -- package source scanning
Docker Hub -- image layer scanning
Terraform / Helm Charts / Ansible -- IaC scanning

Phase 14 -- CI/CD Logs, Web Archives & Frontend Leaks

GitHub Actions / Travis CI / CircleCI / Jenkins / GitLab CI -- public build log scanning
Wayback Machine / CommonCrawl -- historical web archive scanning
JS Source Maps / Webpack bundles / exposed .env -- frontend leak detection

Phase 15 -- Forums & Collaboration

Stack Overflow / Reddit / Hacker News / dev.to / Medium -- forum scanning
Notion / Confluence / Trello -- collaboration tool scanning
Elasticsearch / Grafana / Sentry -- exposed log aggregators
Telegram groups / Discord -- public channel scanning

Phase 16 -- Threat Intel, Mobile, DNS & API Marketplaces

VirusTotal / Intelligence X / URLhaus -- threat intelligence
APK analysis -- mobile app decompilation
crt.sh / subdomain probing -- DNS/subdomain discovery
Postman / SwaggerHub -- API marketplace scanning

Phase 17 -- Telegram Bot & Scheduler

Telegram Bot -- scan triggers, key alerts, recon results
Scheduled scanning -- cron-based recurring scans with auto-notify

Phase 18 -- Web Dashboard

Web Dashboard -- htmx + Tailwind, SQLite-backed, real-time scan viewer

Quick Start

Install

# From source
go install github.com/salvacybersec/keyhunter@latest

# Binary release (when available)
curl -sSL https://github.com/salvacybersec/keyhunter/releases/latest/download/keyhunter_linux_amd64.tar.gz | tar -xz
sudo mv keyhunter /usr/local/bin/

Basic Usage

# Scan a directory
keyhunter scan ./my-project/

# Scan with active verification
keyhunter scan ./my-project/ --verify

# Scan git history
keyhunter scan --git .

# Scan from pipe
cat secrets.txt | keyhunter scan stdin

# Scan only specific providers
keyhunter scan . --providers=openai,anthropic,deepseek

# JSON output
keyhunter scan . --output=json > results.json

# SARIF output for CI/CD
keyhunter scan . --output=sarif > keyhunter.sarif

# CSV output
keyhunter scan . --output=csv > results.csv

OSINT / Recon

# Full sweep across all 18 live sources
keyhunter recon full

# Sweep specific sources only
keyhunter recon full --sources=github,gitlab,gist

# List available recon sources
keyhunter recon list

# Code hosting sources
keyhunter recon full --sources=github
keyhunter recon full --sources=gitlab
keyhunter recon full --sources=bitbucket
keyhunter recon full --sources=gist
keyhunter recon full --sources=codeberg
keyhunter recon full --sources=huggingface
keyhunter recon full --sources=replit
keyhunter recon full --sources=codesandbox
keyhunter recon full --sources=sandboxes
keyhunter recon full --sources=kaggle

# Search engine dorking
keyhunter recon full --sources=google
keyhunter recon full --sources=bing
keyhunter recon full --sources=duckduckgo
keyhunter recon full --sources=yandex
keyhunter recon full --sources=brave

# Paste sites
keyhunter recon full --sources=pastebin
keyhunter recon full --sources=gistpaste
keyhunter recon full --sources=pastesites

Dork Management

keyhunter dorks list                          # All dorks across all sources
keyhunter dorks list --source=github          # GitHub dorks only
keyhunter dorks list --source=google          # Google dorks only
keyhunter dorks add github 'filename:.env "GROQ_API_KEY"'
keyhunter dorks run google --category=frontier
keyhunter dorks export

Key Management

Keys are masked by default in terminal output (shoulder surfing protection). Ways to access full key values:

# Show full keys in scan output
keyhunter scan . --unmask

# JSON export always includes full keys
keyhunter scan . --output=json > results.json

# Key management commands
keyhunter keys list                   # Masked list
keyhunter keys list --unmask          # Full key list
keyhunter keys show <id>              # Single key full details (always unmasked)
keyhunter keys copy <id>              # Copy key to clipboard
keyhunter keys export --format=json   # Export all keys with full values
keyhunter keys verify <id>            # Verify key + show full details
keyhunter keys delete <id>            # Remove key from database

Example keyhunter keys show output:

 ID:          a3f7b2c1
 Provider:    OpenAI
 Pattern:     OpenAI Project Key
 Key:         sk-proj-abc123def456ghi789jkl012mno345pqr678stu901vwx234
 Confidence:  HIGH
 Source:      src/config.py:42
 Found:       2026-04-04 14:32:01
 Scan ID:     scan_001
 Status:      ACTIVE (verified 2026-04-04 14:32:05)
 Org:         my-org
 Rate Limit:  500 req/min
 Revoke URL:  https://platform.openai.com/api-keys

Import External Tools

# Run TruffleHog, then enrich with KeyHunter
trufflehog git . --json > trufflehog.json
keyhunter import --format=trufflehog trufflehog.json

# Run Gitleaks, then enrich
gitleaks detect -f json -r gitleaks.json
keyhunter import --format=gitleaks gitleaks.json

# Gitleaks CSV
gitleaks detect -f csv -r gitleaks.csv
keyhunter import --format=gitleaks-csv gitleaks.csv

CI/CD Integration

KeyHunter ships with a git pre-commit hook that blocks leaks before they land in history, a GitHub Actions integration that uploads SARIF findings directly into the repository's Code Scanning tab, and an import command that consolidates TruffleHog and Gitleaks output into one normalized database.

# Install pre-commit hook (scans staged files only)
keyhunter hook install

# GitHub Actions (SARIF output for Code Scanning upload)
keyhunter scan . --output sarif > keyhunter.sarif

# Import findings from other scanners
keyhunter import --format=trufflehog trufflehog.json
keyhunter import --format=gitleaks   gitleaks.json

# Exit codes: 0 = clean, 1 = keys found, 2 = error
keyhunter scan . && echo "Clean" || echo "Keys found!"

See docs/CI-CD.md for the full guide, including a copy-paste GitHub Actions workflow and the pre-commit hook install/uninstall lifecycle.

Configuration

# Initialize config
keyhunter config init
# Creates ~/.keyhunter.yaml

# Set API tokens for recon sources (currently supported)
keyhunter config set recon.github.token "YOUR_GITHUB_TOKEN"
keyhunter config set recon.gitlab.token "YOUR_GITLAB_TOKEN"
keyhunter config set recon.bitbucket.token "YOUR_BITBUCKET_TOKEN"
keyhunter config set recon.huggingface.token "YOUR_HF_TOKEN"
keyhunter config set recon.kaggle.token "YOUR_KAGGLE_TOKEN"
keyhunter config set recon.google.apikey "YOUR_GOOGLE_API_KEY"
keyhunter config set recon.google.cx "YOUR_GOOGLE_CX_ID"
keyhunter config set recon.bing.apikey "YOUR_BING_API_KEY"
keyhunter config set recon.brave.apikey "YOUR_BRAVE_API_KEY"
keyhunter config set recon.yandex.apikey "YOUR_YANDEX_API_KEY"
keyhunter config set recon.yandex.user "YOUR_YANDEX_USER"

# View current config
keyhunter config get recon.github.token

Config File (`~/.keyhunter.yaml`)

scan:
  workers: 8
  verify_timeout: 10s
  default_output: table

recon:
  stealth: false
  respect_robots: true
  github:
    token: ""
  gitlab:
    token: ""
  bitbucket:
    token: ""
  huggingface:
    token: ""
  kaggle:
    token: ""
  google:
    apikey: ""
    cx: ""
  bing:
    apikey: ""
  brave:
    apikey: ""
  yandex:
    apikey: ""
    user: ""

Stealth & Ethics Flags

--stealth           # User-agent rotation, increased request spacing
--respect-robots    # Respect robots.txt (default: on)

Supported Providers (108)

Tier 1 -- Frontier

Provider	Key Pattern	Confidence	Verify
OpenAI	`sk-proj-`, `sk-svcacct-`	High	`GET /v1/models`
Anthropic	`sk-ant-api03-*`	High	`GET /v1/models`
Google AI (Gemini)	`AIza*`	High	`GET /v1/models`
Google Vertex AI	OAuth token	Medium	`GET /v1/models`
AWS Bedrock	`AKIA*`	High	`GetFoundationModel`
Azure OpenAI	32-char hex	Medium	`GET /openai/deployments`
Meta AI	`meta-llama-*`	Medium	`GET /v1/models`
xAI (Grok)	`xai-*`	High	`GET /v1/models`
Cohere	`co-*`	High	`GET /v1/models`
Mistral AI	32-char generic	Low	`GET /v1/models`
Inflection AI	Generic UUID	Low	`GET /api/models`
AI21 Labs	Generic key	Low	`GET /v1/models`

Tier 2 -- Inference Platforms

Provider	Key Pattern	Confidence	Verify
Together AI	Generic key	Low	`GET /v1/models`
Fireworks AI	`fw_*`	High	`GET /v1/models`
Groq	`gsk_*`	High	`GET /openai/v1/models`
Replicate	`r8_*`	High	`GET /v1/predictions`
Anyscale	Generic key	Low	`GET /v1/models`
DeepInfra	Generic key	Low	`GET /v1/models`
Lepton AI	`lpt_*`	High	`GET /v1/models`
Modal	Generic token	Low	`GET /api/apps`
Baseten	Generic key	Low	`GET /v1/models`
Cerebrium	Generic key	Low	`GET /v1/models`
NovitaAI	Generic key	Low	`GET /v1/models`
Sambanova	Generic key	Low	`GET /v1/models`
OctoAI	Generic key	Low	`GET /v1/models`
Friendli AI	Generic key	Low	`GET /v1/models`

Tier 3 -- Specialized/Vertical

Provider	Key Pattern	Confidence	Verify
Perplexity	`pplx-*`	High	`GET /chat/completions`
You.com	Generic key	Low	`GET /v1/search`
Voyage AI	`voy-*`	High	`GET /v1/models`
Jina AI	`jina_*`	High	`GET /v1/models`
Unstructured	Generic key	Low	`GET /general/v0/general`
AssemblyAI	Generic key	Low	`GET /v2/transcript`
Deepgram	Generic key	Low	`GET /v1/projects`
ElevenLabs	`el_*`	High	`GET /v1/user`
Stability AI	`sk-*`	Medium	`GET /v1/engines/list`
Runway ML	Generic key	Low	`GET /v1/models`
Midjourney	Generic key	Low	N/A
HuggingFace	`hf_*`	High	`GET /api/whoami`

Tier 4 -- Chinese/Regional

Provider	Key Pattern	Confidence	Verify
DeepSeek	`sk-*`	Medium	`GET /v1/models`
Baichuan	Generic key	Low	`GET /v1/models`
Zhipu AI (GLM)	Generic key	Low	`POST /api/paas/v4/chat`
Moonshot AI (Kimi)	`sk-*`	Medium	`GET /v1/models`
Yi (01.AI)	Generic key	Low	`GET /v1/models`
Qwen (Alibaba)	`sk-*`	Medium	`GET /v1/models`
Baidu (ERNIE)	API Key + Secret	Medium	Token endpoint
ByteDance (Doubao)	Generic key	Low	`GET /v1/models`
SenseTime	Generic key	Low	`GET /v1/models`
iFlytek (Spark)	API Key + Secret	Medium	WebSocket handshake
MiniMax	Generic key	Low	`GET /v1/models`
Stepfun	Generic key	Low	`GET /v1/models`
360 AI	Generic key	Low	`GET /v1/models`
Kuaishou (Kling)	Generic key	Low	`GET /v1/models`
Tencent Hunyuan	SecretId + SecretKey	Medium	`DescribeModels`
SiliconFlow	`sf_*`	High	`GET /v1/models`

Tier 5 -- Infrastructure/Gateway

Provider	Key Pattern	Confidence	Verify
Cloudflare AI	Cloudflare API token	Medium	`GET /ai/models`
Vercel AI	`vercel_*`	High	`GET /v1/models`
LiteLLM	Generic key	Low	`GET /v1/models`
Portkey	Generic key	Low	`GET /v1/models`
Helicone	`sk-helicone-*`	High	`GET /v1/models`
OpenRouter	`sk-or-*`	High	`GET /api/v1/models`
Martian	Generic key	Low	`GET /v1/models`
AI Gateway (Kong)	Generic key	Low	Health endpoint
BricksAI	Generic key	Low	`GET /v1/models`
Aether	Generic key	Low	`GET /v1/models`
Not Diamond	Generic key	Low	`GET /v1/models`

Tier 6 -- Emerging/Niche

Provider	Key Pattern	Confidence	Verify
Reka AI	Generic key	Low	`GET /v1/models`
Aleph Alpha	Generic key	Low	`GET /models`
Writer	Generic key	Low	`GET /v1/models`
Jasper AI	Generic key	Low	N/A
Typeface	Generic key	Low	N/A
Comet ML	Generic key	Low	`GET /api/rest/v2`
Weights & Biases	Generic key	Low	`GET /api/v1/viewer`
LangSmith	`ls__*`	High	`GET /api/v1/info`
Pinecone	Generic key	Low	`GET /databases`
Weaviate	Generic key	Low	`GET /v1/meta`
Qdrant	Generic key	Low	`GET /collections`
Chroma	Generic key	Low	`GET /api/v1/heartbeat`
Milvus	Generic key	Low	`GET /v1/vector/collections`
Neon AI	Generic key	Low	N/A
Lamini	Generic key	Low	`GET /v1/models`

Tier 7 -- Code & Dev Tools

Provider	Key Pattern	Confidence	Verify
GitHub Copilot	`ghu_`, `ghp_`	High	`GET /user`
Cursor	Generic key	Low	N/A
Tabnine	Generic key	Low	N/A
Codeium/Windsurf	Generic key	Low	N/A
Sourcegraph Cody	`sgp_*`	High	`GET /.api/current-user`
Amazon CodeWhisperer	`AKIA*`	High	STS GetCallerIdentity
Replit AI	Generic key	Low	N/A
Codestral (Mistral)	Generic key	Low	`GET /v1/models`
IBM watsonx.ai	`ibm_*`	Medium	IAM token endpoint
Oracle AI	Generic key	Low	N/A

Tier 8 -- Self-Hosted/Open Infra

Provider	Key Pattern	Confidence	Verify
Ollama	N/A (local)	N/A	`GET /api/tags`
vLLM	Generic key	Low	`GET /v1/models`
LocalAI	Generic key	Low	`GET /v1/models`
LM Studio	N/A (local)	N/A	`GET /v1/models`
llama.cpp	N/A (local)	N/A	`GET /health`
GPT4All	N/A (local)	N/A	N/A
text-generation-webui	Generic key	Low	`GET /v1/models`
TensorRT-LLM	N/A	N/A	Health endpoint
Triton Inference Server	N/A	N/A	`GET /v2/health/ready`
Jan AI	N/A (local)	N/A	`GET /v1/models`

Tier 9 -- Enterprise/Legacy

Provider	Key Pattern	Confidence	Verify
Salesforce Einstein	Generic token	Low	REST API
ServiceNow AI	Generic token	Low	REST API
SAP AI Core	OAuth token	Low	Token endpoint
Palantir AIP	Generic token	Low	REST API
Databricks (DBRX)	`dapi*`	High	`GET /api/2.0/clusters`
Snowflake Cortex	JWT token	Medium	SQL endpoint
Oracle Generative AI	Generic key	Low	REST API
HPE GreenLake AI	Generic token	Low	REST API

Architecture

                    +------------------+
                    |   CLI (Cobra)    |
                    +--------+---------+
                             |
              +--------------+--------------+
              |              |              |
     +--------v--+   +------v-----+  +-----v------+
     | Input      |   | Recon      |  | Import     |
     | Adapters   |   | Engine     |  | Adapters   |
     | - file     |   | (18 live)  |  | - trufflehog|
     | - dir      |   | - Code(10) |  | - gitleaks |
     | - git      |   | - Search(5)|  +-----+------+
     | - stdin    |   | - Paste(3) |        |
     | - url      |   +------+-----+        |
     | - clipboard|          |              |
     +--------+---+          |              |
              |              |              |
              +-------+------+--------------+
                      |
              +-------v--------+
              | Scanner Engine |
              | - matcher.go   |
              | - verifier.go  |
              +-------+--------+
                      |
         +------------+-------------+
         |            |             |
   +-----v----+ +----v-----+ +----v-------+
   | Output   | | Dork     | | Key        |
   | - table  | | Engine   | | Management |
   | - json   | | - 150    | | - list     |
   | - sarif  | |   dorks  | | - show     |
   | - csv    | | - 8 src  | | - export   |
   +----------+ +----------+ +------------+

   +------------------------------------------+
   | Provider Registry (108+ YAML providers)  |
   | Dork Registry (150 YAML dorks)           |
   +------------------------------------------+

Key Design Decisions

YAML Providers -- Adding a new provider = adding a YAML file. No recompile needed for pattern-only changes (when using external provider dir). Built-in providers are embedded at compile time.
Keyword Pre-filtering -- Before running regex, files are scanned for keywords via Aho-Corasick. This provides ~10x speedup on large codebases.
Worker Pool -- Parallel scanning with configurable worker count via ants. Default: CPU count.
Delta-based Git Scanning -- Only scans changes between commits, not entire trees.
SQLite Storage -- All scan results persisted with AES-256 encryption.

Dork Examples (150 Built-in)

GitHub

filename:.env "OPENAI_API_KEY"
filename:.env "ANTHROPIC_API_KEY"
filename:config.yaml "api_key" "sk-"
"sk-proj-" language:python
"sk-ant-api03" language:javascript
filename:docker-compose "API_KEY"
"api_key" extension:ipynb
filename:.toml "api_key" "sk-"
filename:terraform.tfvars "api_key"

Google Dorking

"sk-proj-" -github.com -stackoverflow.com
"sk-ant-api03-" filetype:env
"OPENAI_API_KEY" filetype:yml
"ANTHROPIC_API_KEY" filetype:json
inurl:.env "API_KEY"
intitle:"index of" .env
site:pastebin.com "sk-proj-"
site:replit.com "OPENAI_API_KEY"

Shodan (for future IoT recon sources)

http.html:"openai" "api_key" port:8080
http.title:"LiteLLM" port:4000
http.html:"ollama" port:11434
http.title:"Kubernetes Dashboard"

Use Cases

Red Team / Pentest

# Multi-source recon against a target org
keyhunter recon full --sources=github,gitlab,gist,pastebin

# Scan a cloned repository
keyhunter scan ./target-repo/ --verify

# Scan git history for rotated keys
keyhunter scan --git ./target-repo/

DevSecOps / CI Pipeline

# Pre-commit hook
keyhunter hook install

# GitHub Actions step
- name: KeyHunter Scan
  run: keyhunter scan . --output=sarif > keyhunter.sarif

Bug Bounty

# Search code hosting platforms for leaked keys
keyhunter recon full --sources=github,gitlab,bitbucket,gist,codeberg
keyhunter recon full --sources=huggingface,kaggle,replit,codesandbox

# Search engine dorking
keyhunter recon full --sources=google,bing,duckduckgo,brave

# Paste site monitoring
keyhunter recon full --sources=pastebin,pastesites,gistpaste

Security & Ethics

Built-in Protections

Key values masked by default in terminal (first 8 + last 4 chars) -- use --unmask for full keys
Full keys always available via: --unmask, --output=json, keyhunter keys show
Database is AES-256 encrypted (full keys stored encrypted)
API tokens stored encrypted in config
No key values written to logs during --verify

Rate Limiting (Recon Sources)

Source	Rate Limit
GitHub API (auth)	30 req/min
GitHub API (unauth)	10 req/min
Google Custom Search	100/day free, 10K/day paid
Bing Search	1,000/month (free)
Brave Search	Per API plan
Paste sites	1 req/2sec

Contributing

Adding a New Provider

Create providers/your-provider.yaml:

id: your-provider
name: Your Provider
category: emerging
website: https://api.yourprovider.com
confidence: medium

patterns:
  - id: your-provider-key
    name: "Your Provider API Key"
    regex: '\byp_[A-Za-z0-9]{32}\b'
    confidence: high
    description: "Your Provider API key with yp_ prefix"

keywords:
  - "yp_"
  - "YOUR_PROVIDER_API_KEY"

verify:
  enabled: true
  method: GET
  url: "https://api.yourprovider.com/v1/models"
  headers:
    Authorization: "Bearer {{key}}"
  success_codes: [200]
  failure_codes: [401, 403]

metadata:
  docs: "https://docs.yourprovider.com"
  key_url: "https://dashboard.yourprovider.com/keys"
  env_vars: ["YOUR_PROVIDER_API_KEY"]

Run tests: go test ./pkg/provider/...
Submit a PR

Adding a New Dork

Edit dorks/<source>.yaml and add your dork entry
Submit a PR

Roadmap

Core scanning engine (file, dir, git, stdin, url, clipboard)
108 provider YAML definitions (Tier 1-9)
Active verification (YAML-driven HTTPVerifier)
Output formats: table, JSON, CSV, SARIF 2.1.0
CLI with Cobra (scan, providers, config, keys, import, hook, dorks, recon, legal)
TruffleHog & Gitleaks import adapters
Key management (list, show, export, copy, delete, verify)
Git pre-commit hook (install/uninstall)
Dork engine with 150 built-in dorks across 8 sources
OSINT recon framework with 18 live sources
IoT scanners (Shodan, Censys, ZoomEye, FOFA, Netlas, BinaryEdge)
Cloud storage scanning (S3, GCS, Azure, DigitalOcean)
Package registries (npm, PyPI, RubyGems, crates.io, Maven, NuGet)
Container & IaC scanning (Docker Hub, Terraform, Helm, Ansible)
CI/CD log scanning (GitHub Actions, Travis, CircleCI, Jenkins, GitLab CI)
Web archives (Wayback Machine, CommonCrawl)
Frontend leak detection (source maps, webpack, .env exposure)
Forums & collaboration tools (Stack Overflow, Reddit, Notion, Trello)
Threat intel (VirusTotal, Intelligence X, URLhaus)
Telegram bot with auto-notifications
Scheduled scanning (cron-based)
Web dashboard (htmx + Tailwind + SQLite)
Docker image
Homebrew formula

Disclaimer

KeyHunter is designed for authorized security testing, defensive security, bug bounty programs, and educational purposes only. Always ensure you have proper authorization before scanning any target. Unauthorized access to computer systems is illegal.

License

MIT License - see LICENSE for details.

README.md

KeyHunter

Why KeyHunter?

How It Compares

Features

Implemented

Core Scanning Engine

Input Sources

Active Verification

Output Formats

Key Management

External Tool Import

Git Pre-commit Hook

Dork Engine

OSINT / Recon Engine (18 Sources Live)

CLI Commands

Coming Soon

Phase 12 -- IoT Scanners & Cloud Storage

Phase 13 -- Package Registries, Containers & IaC

Phase 14 -- CI/CD Logs, Web Archives & Frontend Leaks

Phase 15 -- Forums & Collaboration

Phase 16 -- Threat Intel, Mobile, DNS & API Marketplaces

Phase 17 -- Telegram Bot & Scheduler

Phase 18 -- Web Dashboard

Quick Start

Install

Basic Usage

OSINT / Recon

Dork Management

Key Management

Import External Tools

CI/CD Integration

Configuration

Config File (~/.keyhunter.yaml)

Stealth & Ethics Flags

Supported Providers (108)

Tier 1 -- Frontier

Tier 2 -- Inference Platforms

Tier 3 -- Specialized/Vertical

Tier 4 -- Chinese/Regional

Tier 5 -- Infrastructure/Gateway

Tier 6 -- Emerging/Niche

Tier 7 -- Code & Dev Tools

Tier 8 -- Self-Hosted/Open Infra

Tier 9 -- Enterprise/Legacy

Architecture

Key Design Decisions

Dork Examples (150 Built-in)

GitHub

Google Dorking

Shodan (for future IoT recon sources)

Use Cases

Red Team / Pentest

DevSecOps / CI Pipeline

Bug Bounty

Security & Ethics

Built-in Protections

Rate Limiting (Recon Sources)

Contributing

Adding a New Provider

Adding a New Dork

Roadmap

Disclaimer

License

Config File (`~/.keyhunter.yaml`)