32 KiB
Roadmap: KeyHunter
Overview
KeyHunter is built in dependency order: the provider registry and storage schema come first because every other subsystem depends on them, then the scanning engine pipeline, then the full provider library, then input sources and verification, then output and key management, then the first competitive differentiators (dork engine, import adapters, CI/CD integration), then the OSINT/recon engine (infrastructure architecture before individual sources), then automation and notification (Telegram bot, scheduler), and finally the web dashboard which aggregates all subsystems. Each phase delivers a complete, verifiable capability before the next phase begins.
Phases
Phase Numbering:
- Integer phases (1, 2, 3): Planned milestone work
- Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED)
Decimal phases appear between their surrounding integers in numeric order.
- Phase 1: Foundation - Provider registry schema, storage layer with AES-256, and CLI skeleton that everything else depends on (completed 2026-04-05)
- Phase 2: Tier 1-2 Providers - Frontier and inference platform provider YAML definitions (26 providers, highest-value targets) (completed 2026-04-05)
- Phase 3: Tier 3-9 Providers - Remaining 82 provider definitions completing 108+ provider coverage (completed 2026-04-05)
- Phase 4: Input Sources - All input modes: file/dir, full git history, stdin, URL, clipboard
- Phase 5: Verification Engine - Opt-in active key verification with consent prompt and legal documentation
- Phase 6: Output, Reporting & Key Management - All output formats and complete key management CLI
- Phase 7: Import Adapters & CI/CD Integration - TruffleHog/Gitleaks import + pre-commit hooks + SARIF to GitHub Security
- Phase 8: Dork Engine - YAML-based dork definitions with 150+ built-in dorks and management commands
- Phase 9: OSINT Infrastructure - Per-source rate limiter architecture and recon engine framework before any sources
- Phase 10: OSINT Code Hosting - GitHub, GitLab, Bitbucket, HuggingFace and 6 more code hosting sources (completed 2026-04-05)
- Phase 11: OSINT Search & Paste - Search engine dorking and paste site aggregation (completed 2026-04-06)
- Phase 12: OSINT IoT & Cloud Storage - Shodan/Censys/ZoomEye/FOFA and S3/GCS/Azure cloud storage scanning (completed 2026-04-06)
- Phase 13: OSINT Package Registries & Container/IaC - npm/PyPI/crates.io and Docker Hub/K8s/Terraform scanning (completed 2026-04-06)
- Phase 14: OSINT CI/CD Logs, Web Archives & Frontend Leaks - Build logs, Wayback Machine, and JS bundle/env scanning (completed 2026-04-06)
- Phase 15: OSINT Forums, Collaboration & Log Aggregators - StackOverflow/Reddit/HN, Notion/Trello, Elasticsearch/Grafana/Sentry (completed 2026-04-06)
- Phase 16: OSINT Threat Intel, Mobile, DNS & API Marketplaces - VirusTotal/IntelX, APK scanning, crt.sh, Postman/SwaggerHub (completed 2026-04-06)
- Phase 17: Telegram Bot & Scheduled Scanning - Remote control bot and cron-based recurring scans with auto-notify
- Phase 18: Web Dashboard - Embedded htmx + Tailwind dashboard aggregating all subsystems with SSE live updates
Phase Details
Phase 1: Foundation
Goal: The provider registry schema, encrypted storage layer, and CLI skeleton exist and function correctly — all downstream subsystems have stable interfaces to build against Depends on: Nothing (first phase) Requirements: CORE-01, CORE-02, CORE-03, CORE-04, CORE-05, CORE-06, CORE-07, STOR-01, STOR-02, STOR-03, CLI-01, CLI-02, CLI-03, CLI-04, CLI-05, PROV-10 Success Criteria (what must be TRUE):
keyhunter scan ./somefileruns the three-stage pipeline (Aho-Corasick pre-filter → regex → entropy) and returns findings with provider names- Findings are persisted to a SQLite database with the key value stored AES-256 encrypted — plaintext key never appears in the database file
keyhunter config initcreates~/.keyhunter.yamlandkeyhunter config set <key> <value>persists valueskeyhunter providers listandkeyhunter providers info <name>return provider metadata from YAML definitions- Provider YAML schema includes
format_versionandlast_verifiedfields validated at load time Plans: 5 plans
Plans:
- 01-01-PLAN.md — Go module init, dependency installation, test scaffolding and testdata fixtures
- 01-02-PLAN.md — Provider registry: YAML schema, embed loader, Aho-Corasick automaton, Registry struct
- 01-03-PLAN.md — Storage layer: AES-256-GCM encryption, Argon2id key derivation, SQLite + Finding CRUD
- 01-04-PLAN.md — Scan engine pipeline: keyword pre-filter, regex+entropy detector, FileSource, ants worker pool
- 01-05-PLAN.md — CLI wiring: scan, providers list/info/stats, config init/set/get, output table
Phase 2: Tier 1-2 Providers
Goal: The 26 highest-value LLM provider YAML definitions exist with accurate regex patterns, keyword lists, confidence levels, and verify endpoints — covering OpenAI, Anthropic, Google AI, AWS Bedrock, Azure OpenAI and all major inference platforms Depends on: Phase 1 Requirements: PROV-01, PROV-02 Success Criteria (what must be TRUE):
keyhunter scancorrectly identifies keys from all 12 Tier 1 providers (OpenAI sk-proj-, Anthropic sk-ant-api03-, Google AIza, etc.) with correct provider nameskeyhunter scancorrectly identifies keys from all 14 Tier 2 inference platform providers (Groq gsk_, Replicate r8_, Fireworks fw_, Perplexity pplx-, etc.)- Each provider YAML includes a
keywordslist that enables Aho-Corasick pre-filtering to skip files with no matching context keyhunter providers statsshows 26 providers loaded with pattern and keyword counts Plans: 5 plans
Plans:
- 02-01-PLAN.md — Tier 1 high-confidence prefixed providers (OpenAI upgrade, Anthropic upgrade, Google AI, Vertex AI, AWS Bedrock, xAI)
- 02-02-PLAN.md — Tier 1 keyword-anchored providers (Azure OpenAI, Meta AI, Cohere, Mistral, Inflection, AI21)
- 02-03-PLAN.md — Tier 2 inference platforms batch 1 (Groq, Replicate, Anyscale, Together, Fireworks, Baseten, DeepInfra)
- 02-04-PLAN.md — Tier 2 inference platforms batch 2 (Lepton, Modal, Cerebrium, Novita, SambaNova, OctoAI, Friendli)
- 02-05-PLAN.md — Registry guardrail test: assert 12 Tier 1 + 14 Tier 2 + regex compilation
Phase 3: Tier 3-9 Providers
Goal: All 108+ LLM provider definitions exist — specialized models, Chinese/regional providers, infrastructure gateways, emerging tools, code assistants, self-hosted runtimes, and enterprise platforms Depends on: Phase 2 Requirements: PROV-03, PROV-04, PROV-05, PROV-06, PROV-07, PROV-08, PROV-09 Success Criteria (what must be TRUE):
keyhunter providers statsshows 108+ total providers across all tiers- Chinese/regional provider keys (DeepSeek, Zhipu, Moonshot, Qwen, Baidu, ByteDance, etc.) are detected using keyword-based matching since they use generic key formats
- Self-hosted provider definitions (Ollama, vLLM, LocalAI, etc.) include patterns for API key authentication when applicable
keyhunter providers list --tier=enterprisereturns Salesforce, ServiceNow, SAP, Palantir, Databricks, Snowflake, Oracle, HPE providers Plans: 8 plans
Plans:
- 03-01-PLAN.md — Tier 4 Chinese/regional providers (DeepSeek, Zhipu, Moonshot, Qwen, Baidu, ByteDance, 01.AI, MiniMax, Baichuan, StepFun, SenseTime, iFlytek, Tencent, SiliconFlow, 360 AI, Kuaishou)
- 03-02-PLAN.md — Tier 3 Specialized (Perplexity, You.com, Voyage, Jina, Unstructured, AssemblyAI, Deepgram, ElevenLabs, Stability, Runway, Midjourney)
- 03-03-PLAN.md — Tier 5 Infrastructure/Gateway (OpenRouter, LiteLLM, Cloudflare AI, Vercel AI, Portkey, Helicone, Martian, Kong, BricksAI, Aether, Not Diamond)
- 03-04-PLAN.md — Tier 7 Code/Dev Tools (GitHub Copilot, Cursor, Tabnine, Codeium, Sourcegraph, CodeWhisperer, Replit AI, Codestral, watsonx, Oracle AI)
- 03-05-PLAN.md — Tier 8 Self-Hosted runtimes (Ollama, vLLM, LocalAI, LM Studio, llama.cpp, GPT4All, text-gen-webui, TensorRT-LLM, Triton, Jan)
- 03-06-PLAN.md — Tier 9 Enterprise (Salesforce Einstein, ServiceNow, SAP AI Core, Palantir, Databricks, Snowflake, Oracle GenAI, HPE GreenLake)
- 03-07-PLAN.md — Tier 6 Emerging/Niche (Reka, Aleph Alpha, Lamini, Writer, Jasper, Typeface, Comet, W&B, LangSmith, Pinecone, Weaviate, Qdrant, Chroma, Milvus, Neon)
- 03-08-PLAN.md — Tier 3-9 guardrail test: lock 108 total providers, per-tier counts, and name sets
Phase 4: Input Sources
Goal: Users can point KeyHunter at any content source — local files, git history across all branches, piped content, remote URLs, and the clipboard — and all are scanned through the same detection pipeline Depends on: Phase 2 Requirements: INPUT-01, INPUT-02, INPUT-03, INPUT-04, INPUT-05, INPUT-06 Success Criteria (what must be TRUE):
keyhunter scan ./myreporecursively scans all files with glob exclusion patterns (e.g.,--exclude="*.min.js") and mmap is used for files above a configurable size thresholdkeyhunter scan --git ./myreposcans full git history including all branches, tags, and stash entries;--since=2024-01-01limits to commits after that datecat secrets.txt | keyhunter scan stdindetects keys from piped inputkeyhunter scan --url https://example.com/config.jsfetches and scans the remote URL contentkeyhunter scan --clipboardscans the current clipboard content Plans: 5 plans
Plans:
- 04-01-PLAN.md — Wave 0: add go-git/v5, atotto/clipboard, golang.org/x/exp/mmap dependencies
- 04-02-PLAN.md — DirSource: recursive walk, glob exclusion, binary skip, mmap for large files (INPUT-01, CORE-07)
- 04-03-PLAN.md — GitSource: full-history scan across branches/tags with blob dedup and --since (INPUT-02)
- 04-04-PLAN.md — StdinSource, URLSource, ClipboardSource (INPUT-03, INPUT-04, INPUT-05)
- 04-05-PLAN.md — cmd/scan.go source-selection dispatch wiring all new sources (INPUT-06)
Phase 5: Verification Engine
Goal: Users can opt into active key verification with a consent prompt, legal documentation, and per-provider API calls that confirm whether a found key is live and return metadata about the key's access level Depends on: Phase 2 Requirements: VRFY-01, VRFY-02, VRFY-03, VRFY-04, VRFY-05, VRFY-06 Success Criteria (what must be TRUE):
keyhunter scan --verifytriggers a one-time consent prompt on first use with clear legal language; user must type "yes" to proceed- Each provider YAML's verify endpoint, method, headers, and success/failure codes are used for verification — no hardcoded verification logic
keyhunter scan --verifyextracts and displays org name, rate limit tier, and available permissions when the provider API returns them--verify-timeout=30schanges the per-key verification timeout from the default 10s- A
LEGAL.mdfile shipping with the binary documents the legal implications of using--verifyPlans: 5 plans
Plans:
- 05-01-PLAN.md — Wave 0: extend VerifySpec schema, Finding struct, storage schema; add gjson dep
- 05-02-PLAN.md — LEGAL.md + pkg/legal embed + consent prompt + keyhunter legal command
- 05-03-PLAN.md — pkg/verify HTTPVerifier: template sub, gjson metadata extraction, ants pool
- 05-04-PLAN.md — Update 12 Tier 1 provider YAMLs with extended verify specs + guardrail test
- 05-05-PLAN.md — cmd/scan.go --verify wiring + --verify-timeout/workers flags + output verify column
Phase 6: Output, Reporting & Key Management
Goal: Users can consume scan results in any format they need and perform full lifecycle management of stored keys — listing, inspecting, exporting, copying, and deleting Depends on: Phase 5 Requirements: OUT-01, OUT-02, OUT-03, OUT-04, OUT-05, OUT-06, KEYS-01, KEYS-02, KEYS-03, KEYS-04, KEYS-05, KEYS-06 Success Criteria (what must be TRUE):
- Default table output shows colored, masked keys (first 8 + last 4 chars);
--unmaskreveals full key values;--output=json|sarif|csvswitches format - Exit code is 0 when no keys found, 1 when keys are found, 2 on scan error — confirming CI/CD compatibility
keyhunter keys listshows all stored keys masked;keyhunter keys show <id>shows full unmasked detailkeyhunter keys export --format=jsonproduces a JSON file with full key values;--format=csvproduces a CSVkeyhunter keys copy <id>copies the full key to clipboard;keyhunter keys delete <id>removes the key from the database Plans: 6 plans
Plans:
- 06-01-PLAN.md — Wave 0: Formatter interface, colors.go (TTY/NO_COLOR), refactor TableFormatter
- 06-02-PLAN.md — JSONFormatter + CSVFormatter (full Finding fields, Unmask option)
- 06-03-PLAN.md — SARIF 2.1.0 formatter with custom structs (rule dedup, level mapping)
- 06-04-PLAN.md — pkg/storage/queries.go: Filters, ListFindingsFiltered, GetFinding, DeleteFinding
- 06-05-PLAN.md — cmd/keys.go command tree: list/show/export/copy/delete/verify (KEYS-01..06)
- 06-06-PLAN.md — scan --output registry dispatch + exit codes 0/1/2 (OUT-05, OUT-06)
Phase 7: Import Adapters & CI/CD Integration
Goal: Users can import findings from TruffleHog and Gitleaks into KeyHunter's database, and use KeyHunter in pre-commit hooks and CI/CD pipelines with SARIF output uploadable to GitHub Security Depends on: Phase 6 Requirements: IMP-01, IMP-02, IMP-03, CICD-01, CICD-02 Success Criteria (what must be TRUE):
keyhunter import --format=trufflehog results.jsonparses TruffleHog v3 JSON output and normalizes findings into the KeyHunter databasekeyhunter import --format=gitleaks results.jsonand--format=csvboth import and deduplicate against existing findingskeyhunter hook installinstalls a git pre-commit hook; runninggit commiton a file with a known API key blocks the commit and prints findingskeyhunter scan --output=sarifproduces a valid SARIF 2.1.0 file that GitHub Code Scanning accepts without errors Plans: 6 plans
Plans:
- 07-01-PLAN.md — pkg/importer Importer interface + TruffleHog v3 JSON parser + fixtures (IMP-01)
- 07-02-PLAN.md — Gitleaks JSON + CSV parsers (IMP-02)
- 07-03-PLAN.md — Dedup helper + SARIF GitHub Code Scanning validation test (IMP-03, CICD-02)
- 07-04-PLAN.md — cmd/import.go wiring format dispatch, dedup, DB persistence (IMP-01/02/03)
- 07-05-PLAN.md — cmd/hook.go install/uninstall with embedded pre-commit script (CICD-01)
- 07-06-PLAN.md — docs/CI-CD.md + README CI/CD section with GitHub Actions workflow (CICD-01, CICD-02)
Phase 8: Dork Engine
Goal: Users can run, manage, and extend a library of 150+ built-in YAML dorks across GitHub, Google, Shodan, Censys, ZoomEye, FOFA, GitLab, and Bing — using the same extensibility pattern as provider definitions Depends on: Phase 7 Requirements: DORK-01, DORK-02, DORK-03, DORK-04 Success Criteria (what must be TRUE):
keyhunter dorks listshows 150+ built-in dorks with source engine and category columnskeyhunter dorks run --source=github --category=frontierexecutes all Tier 1 frontier provider dorks against GitHub code searchkeyhunter dorks add --source=google --query='site:pastebin.com "sk-ant-api03-"'persists a custom dork that appears in subsequentdorks listoutputkeyhunter dorks export --format=jsonexports all dorks including custom additions Plans: 7 plans
Plans:
- 08-01-PLAN.md — Dork schema, go:embed loader, registry, executor interface, custom_dorks storage table
- 08-02-PLAN.md — 50 GitHub dork YAML definitions across 5 categories
- 08-03-PLAN.md — 30 Google + 20 Shodan dork YAML definitions
- 08-04-PLAN.md — 15 Censys + 10 ZoomEye + 10 FOFA + 10 GitLab + 5 Bing dork YAML definitions
- 08-05-PLAN.md — Live GitHub Code Search executor (net/http, Retry-After, limit cap)
- 08-06-PLAN.md — cmd/dorks.go Cobra tree: list/run/add/export/info/delete
- 08-07-PLAN.md — Dork count guardrail test (>=150 total, per-source minimums, ID uniqueness)
Phase 9: OSINT Infrastructure
Goal: The recon engine's ReconSource interface, per-source rate limiter architecture, stealth mode, and parallel sweep orchestrator exist and are validated — all individual source modules build on this foundation
Depends on: Phase 8
Requirements: RECON-INFRA-05, RECON-INFRA-06, RECON-INFRA-07, RECON-INFRA-08
Success Criteria (what must be TRUE):
- Every recon source module holds its own
rate.Limiterinstance — no centralized rate limiter — and theReconSourceinterface enforces aRateLimit() rate.Limitmethod keyhunter recon full --stealthapplies user-agent rotation and jitter delays to all sources; log output shows "source exhausted" events rather than silently returning empty resultskeyhunter recon full --respect-robots(default on) respects robots.txt for web-scraping sources before making any requestskeyhunter recon fullfans out to all enabled sources in parallel and deduplicates findings before persisting to the database Plans: 6 plans
- 09-01-PLAN.md — ReconSource interface + Engine skeleton + ExampleSource stub
- 09-02-PLAN.md — LimiterRegistry per-source rate.Limiter + jitter
- 09-03-PLAN.md — Stealth UA pool + cross-source dedup
- 09-04-PLAN.md — robots.txt parser with 1h per-host cache
- 09-05-PLAN.md — cmd/recon.go CLI tree (full, list)
- 09-06-PLAN.md — Integration test + phase summary
Phase 10: OSINT Code Hosting
Goal: Users can scan 10 code hosting platforms — GitHub, GitLab, Bitbucket, GitHub Gist, Codeberg/Gitea, Replit, CodeSandbox, HuggingFace, Kaggle, and miscellaneous code sandbox sites — for leaked LLM API keys Depends on: Phase 9 Requirements: RECON-CODE-01, RECON-CODE-02, RECON-CODE-03, RECON-CODE-04, RECON-CODE-05, RECON-CODE-06, RECON-CODE-07, RECON-CODE-08, RECON-CODE-09, RECON-CODE-10 Success Criteria (what must be TRUE):
keyhunter recon --sources=github,gitlabexecutes provider keyword dorks against GitHub and GitLab code search APIs and feeds results into the detection pipelinekeyhunter recon --sources=huggingfacescans HuggingFace Spaces and model repos for exposed keyskeyhunter recon --sources=gist,bitbucket,codebergscans public gists, Bitbucket repos, and Codeberg/Gitea instanceskeyhunter recon --sources=replit,codesandbox,kagglescans public repls, sandboxes, and notebooks- All code hosting source findings are stored in the database with source attribution and deduplication Plans: 9 plans Plans:
- 10-01-PLAN.md — Shared HTTP client + provider-query generator + RegisterAll skeleton
- 10-02-PLAN.md — GitHubSource (RECON-CODE-01)
- 10-03-PLAN.md — GitLabSource (RECON-CODE-02)
- 10-04-PLAN.md — BitbucketSource + GistSource (RECON-CODE-03, RECON-CODE-04)
- 10-05-PLAN.md — CodebergSource/Gitea (RECON-CODE-05)
- 10-06-PLAN.md — HuggingFaceSource (RECON-CODE-08)
- 10-07-PLAN.md — Replit + CodeSandbox + Sandboxes scrapers (RECON-CODE-06, RECON-CODE-07, RECON-CODE-10)
- 10-08-PLAN.md — KaggleSource (RECON-CODE-09)
- 10-09-PLAN.md — RegisterAll wiring + CLI integration + end-to-end test
Phase 11: OSINT Search & Paste
Goal: Users can run automated search engine dorking against Google, Bing, DuckDuckGo, Yandex, and Brave, and scan 15+ paste site aggregations for leaked API keys Depends on: Phase 9 Requirements: RECON-DORK-01, RECON-DORK-02, RECON-DORK-03, RECON-PASTE-01 Success Criteria (what must be TRUE):
keyhunter recon --sources=googleruns built-in dorks via Google Custom Search API or SerpAPI and returns results with the dork query that triggered each findingkeyhunter recon --sources=bingexecutes dorks via Azure Cognitive Services and--sources=duckduckgo,yandex,bravevia their respective integrationskeyhunter recon --sources=pastequeries Pastebin API and scrapes 15+ additional paste sites, feeding raw content through the detection pipeline Plans: 3 plans
Plans:
- 11-01-PLAN.md — GoogleDorkSource + BingDorkSource + DuckDuckGoSource + YandexSource + BraveSource (RECON-DORK-01, RECON-DORK-02, RECON-DORK-03)
- 11-02-PLAN.md — PastebinSource + GistPasteSource + PasteSitesSource multi-paste aggregator (RECON-PASTE-01)
- 11-03-PLAN.md — RegisterAll wiring + cmd/recon.go credentials + integration test (all Phase 11 reqs)
Phase 12: OSINT IoT & Cloud Storage
Goal: Users can discover exposed LLM endpoints via IoT scanners (Shodan, Censys, ZoomEye, FOFA, Netlas, BinaryEdge) and scan publicly accessible cloud storage buckets (S3, GCS, Azure Blob, MinIO, GrayHatWarfare) for leaked keys Depends on: Phase 9 Requirements: RECON-IOT-01, RECON-IOT-02, RECON-IOT-03, RECON-IOT-04, RECON-IOT-05, RECON-IOT-06, RECON-CLOUD-01, RECON-CLOUD-02, RECON-CLOUD-03, RECON-CLOUD-04 Success Criteria (what must be TRUE):
keyhunter recon --sources=shodansearches Shodan for exposed vLLM, Ollama, and LiteLLM proxy endpoints using the user's API keykeyhunter recon --sources=censys,zoomeye,fofa,netlas,binaryedgeeach execute IoT searches with appropriate query formats per platformkeyhunter recon --sources=s3enumerates publicly accessible S3 buckets and scans readable objects for API key patternskeyhunter recon --sources=gcs,azureblob,spacesscans GCS, Azure Blob, and DigitalOcean Spaces;--sources=miniodiscovers MinIO instances via Shodan integrationkeyhunter recon --sources=grayhoundwarfarequeries the GrayHatWarfare bucket search engine for matching bucket names Plans: 4 plans
Plans:
- 12-01-PLAN.md — ShodanSource + CensysSource + ZoomEyeSource (RECON-IOT-01, RECON-IOT-02, RECON-IOT-03)
- 12-02-PLAN.md — FOFASource + NetlasSource + BinaryEdgeSource (RECON-IOT-04, RECON-IOT-05, RECON-IOT-06)
- 12-03-PLAN.md — S3Scanner + GCSScanner + AzureBlobScanner + DOSpacesScanner (RECON-CLOUD-01, RECON-CLOUD-02, RECON-CLOUD-03, RECON-CLOUD-04)
- 12-04-PLAN.md — RegisterAll wiring + cmd/recon.go credentials + integration test (all Phase 12 reqs)
Phase 13: OSINT Package Registries & Container/IaC
Goal: Users can scan npm, PyPI, and 6 other package registries for packages containing leaked keys, and scan Docker Hub image layers, Kubernetes configs, Terraform state files, Helm charts, and Ansible Galaxy for secrets in infrastructure code Depends on: Phase 9 Requirements: RECON-PKG-01, RECON-PKG-02, RECON-PKG-03, RECON-INFRA-01, RECON-INFRA-02, RECON-INFRA-03, RECON-INFRA-04 Success Criteria (what must be TRUE):
keyhunter recon --sources=npmdownloads and extracts package tarballs for recently published packages matching LLM-related keywords and scans their contentskeyhunter recon --sources=pypi,rubygems,crates,maven,nuget,packagist,goproxyscans respective registries using the same download-extract-scan patternkeyhunter recon --sources=dockerhubextracts and scans image layers and build args from public Docker Hub imageskeyhunter recon --sources=k8sdiscovers publicly exposed Kubernetes dashboards and scans publicly readable Secret/ConfigMap objectskeyhunter recon --sources=terraform,helm,ansiblescans Terraform registry modules, Helm chart repositories, and Ansible Galaxy roles Plans: 4 plans Plans:
- 13-01-PLAN.md — NpmSource + PyPISource + CratesIOSource + RubyGemsSource (RECON-PKG-01, RECON-PKG-02)
- 13-02-PLAN.md — MavenSource + NuGetSource + GoProxySource + PackagistSource (RECON-PKG-02, RECON-PKG-03)
- 13-03-PLAN.md — DockerHubSource + KubernetesSource + TerraformSource + HelmSource (RECON-INFRA-01..04)
- 13-04-PLAN.md — RegisterAll wiring + integration test (all Phase 13 reqs)
Phase 14: OSINT CI/CD Logs, Web Archives & Frontend Leaks
Goal: Users can scan public CI/CD build logs, historical web snapshots from the Wayback Machine and CommonCrawl, and frontend JavaScript artifacts (source maps, webpack bundles, exposed .env files) for leaked API keys Depends on: Phase 9 Requirements: RECON-CI-01, RECON-CI-02, RECON-CI-03, RECON-CI-04, RECON-ARCH-01, RECON-ARCH-02, RECON-JS-01, RECON-JS-02, RECON-JS-03, RECON-JS-04, RECON-JS-05 Success Criteria (what must be TRUE):
keyhunter recon --sources=github-actionsscans public GitHub Actions workflow run logs for leaked keys in build outputkeyhunter recon --sources=travis,circleci,jenkins,gitlab-ciscans public build logs from each CI platformkeyhunter recon --sources=waybackqueries the CDX API for historical snapshots of target domains and scans retrieved contentkeyhunter recon --sources=commoncrawlsearches CommonCrawl indexes for pages matching LLM provider keywords and scans WARC recordskeyhunter recon --sources=sourcemaps,webpack,dotenv,swagger,deployprevieweach extract and scan the relevant JS artifacts and configuration files Plans: 4 plans
Plans:
- 14-01-PLAN.md — CI/CD log sources: GitHubActions, TravisCI, CircleCI, Jenkins, GitLabCI
- 14-02-PLAN.md — Web archive sources: Wayback Machine, CommonCrawl
- 14-03-PLAN.md — Frontend leak sources: SourceMap, Webpack, EnvLeak, Swagger, DeployPreview
- 14-04-PLAN.md — RegisterAll wiring + integration test (all Phase 14 reqs) (completed 2026-04-06)
Phase 15: OSINT Forums, Collaboration & Log Aggregators
Goal: Users can search developer forums, public collaboration tool pages, and exposed monitoring dashboards for leaked API keys — covering Stack Overflow, Reddit, HackerNews, dev.to, Telegram channels, Discord, Notion, Confluence, Trello, Google Docs, Elasticsearch, Grafana, and Sentry Depends on: Phase 9 Requirements: RECON-FORUM-01, RECON-FORUM-02, RECON-FORUM-03, RECON-FORUM-04, RECON-FORUM-05, RECON-FORUM-06, RECON-COLLAB-01, RECON-COLLAB-02, RECON-COLLAB-03, RECON-COLLAB-04, RECON-LOG-01, RECON-LOG-02, RECON-LOG-03 Success Criteria (what must be TRUE):
keyhunter recon --sources=stackoverflow,reddit,hackernewsqueries each platform's API/Algolia search for LLM provider keywords and scans result contentkeyhunter recon --sources=devto,medium,telegram,discordscans publicly accessible posts, articles, and indexed channel contentkeyhunter recon --sources=notion,confluence,trello,googledocsscans publicly accessible pages via dorking and direct API access where availablekeyhunter recon --sources=elasticsearch,grafana,sentrydiscovers exposed instances and scans accessible log data and dashboards Plans: 4 plans
Plans:
- 15-01-PLAN.md — StackOverflow, Reddit, HackerNews, Discord, Slack, DevTo forum sources (RECON-FORUM-01..06)
- 15-02-PLAN.md — Trello, Notion, Confluence, GoogleDocs collaboration sources (RECON-COLLAB-01..04)
- 15-03-PLAN.md — Elasticsearch, Grafana, Sentry, Kibana, Splunk log aggregator sources (RECON-LOG-01..03)
- 15-04-PLAN.md — RegisterAll wiring + integration test (all Phase 15 reqs)
Phase 16: OSINT Threat Intel, Mobile, DNS & API Marketplaces
Goal: Users can search threat intelligence platforms, scan decompiled Android APKs, perform DNS/subdomain discovery for config endpoint probing, and scan Postman/SwaggerHub API collections for leaked LLM keys Depends on: Phase 9 Requirements: RECON-INTEL-01, RECON-INTEL-02, RECON-INTEL-03, RECON-MOBILE-01, RECON-DNS-01, RECON-DNS-02, RECON-API-01, RECON-API-02 Success Criteria (what must be TRUE):
keyhunter recon --sources=virustotal,intelx,urlhausqueries each threat intelligence platform for files and URLs containing LLM provider keywordskeyhunter recon --sources=apk --target=com.example.appdownloads, decompiles (via apktool/jadx), and scans APK content for API keyskeyhunter recon --sources=crtsh --target=example.comdiscovers subdomains via Certificate Transparency logs and probes each for.env,/api/config, and/actuator/envendpointskeyhunter recon --sources=postman,swaggerhubscans public Postman collections and SwaggerHub API definitions for hardcoded keys in request examples Plans: 4 plans
Plans:
- 16-01-PLAN.md — VirusTotal, IntelligenceX, URLhaus threat intelligence sources (RECON-INTEL-01, RECON-INTEL-02, RECON-INTEL-03)
- 16-02-PLAN.md — APKMirror, crt.sh, SecurityTrails mobile and DNS sources (RECON-MOBILE-01, RECON-DNS-01, RECON-DNS-02)
- 16-03-PLAN.md — Postman, SwaggerHub, RapidAPI marketplace sources (RECON-API-01, RECON-API-02)
- 16-04-PLAN.md — RegisterAll wiring + cmd/recon.go credentials + integration test (all Phase 16 reqs)
Phase 17: Telegram Bot & Scheduled Scanning
Goal: Users can control KeyHunter remotely via a Telegram bot with scan, verify, recon, status, and subscription commands, and set up cron-based recurring scans that auto-notify on new findings Depends on: Phase 16 Requirements: TELE-01, TELE-02, TELE-03, TELE-04, TELE-05, TELE-06, TELE-07, SCHED-01, SCHED-02, SCHED-03 Success Criteria (what must be TRUE):
keyhunter serve --telegramstarts the bot;/scan ./myrepoin a private Telegram chat triggers a scan and returns findings (masked keys only, never unmasked)/verify <key-id>,/recon --sources=github,/status,/stats,/providers, and/helpall respond correctly in private chat/subscribeenables auto-notifications; new key findings from any scan trigger an immediate Telegram message to all subscribed users/key <id>sends full key detail to the requesting user's private chat onlykeyhunter schedule add --cron="0 */6 * * *" --scan=./myrepoadds a recurring scan;keyhunter schedule listshows it; the job persists across restarts and sends Telegram notifications on new findings Plans: TBD
Phase 18: Web Dashboard
Goal: Users can manage and interact with all KeyHunter capabilities through an embedded web dashboard — viewing scans, managing keys, launching recon, browsing providers, managing dorks, and configuring settings — with live scan progress via SSE Depends on: Phase 17 Requirements: WEB-01, WEB-02, WEB-03, WEB-04, WEB-05, WEB-06, WEB-07, WEB-08, WEB-09, WEB-10, WEB-11 Success Criteria (what must be TRUE):
keyhunter servestarts an embedded HTTP server with the full dashboard accessible in a browser; the binary contains all HTML, CSS, and assets via go:embed- The dashboard overview page shows total keys found, scan count, and active providers as summary statistics
- The keys page lists all findings with masked values and a "Reveal Key" toggle that shows the full key on demand
- The recon page allows launching a recon sweep with source selection and shows live progress via Server-Sent Events
- The REST API at
/api/v1/*accepts and returns JSON for all dashboard actions; optional basic auth or token auth is configurable via settings page Plans: TBD UI hint: yes
Progress
Execution Order: Phases execute in numeric order: 1 → 2 → 3 → ... → 18
| Phase | Plans Complete | Status | Completed |
|---|---|---|---|
| 1. Foundation | 0/5 | Planning complete | - |
| 2. Tier 1-2 Providers | 0/? | Not started | - |
| 3. Tier 3-9 Providers | 0/? | Not started | - |
| 4. Input Sources | 0/? | Not started | - |
| 5. Verification Engine | 0/? | Not started | - |
| 6. Output, Reporting & Key Management | 0/? | Not started | - |
| 7. Import Adapters & CI/CD Integration | 0/? | Not started | - |
| 8. Dork Engine | 0/? | Not started | - |
| 9. OSINT Infrastructure | 2/6 | In Progress | |
| 10. OSINT Code Hosting | 9/9 | Complete | 2026-04-06 |
| 11. OSINT Search & Paste | 3/3 | Complete | 2026-04-06 |
| 12. OSINT IoT & Cloud Storage | 4/4 | Complete | 2026-04-06 |
| 13. OSINT Package Registries & Container/IaC | 4/4 | Complete | 2026-04-06 |
| 14. OSINT CI/CD Logs, Web Archives & Frontend Leaks | 1/1 | Complete | 2026-04-06 |
| 15. OSINT Forums, Collaboration & Log Aggregators | 2/4 | Complete | 2026-04-06 |
| 16. OSINT Threat Intel, Mobile, DNS & API Marketplaces | 0/? | Complete | 2026-04-06 |
| 17. Telegram Bot & Scheduled Scanning | 0/? | Not started | - |
| 18. Web Dashboard | 0/? | Not started | - |