Files
keyhunter/.planning/ROADMAP.md
2026-04-06 00:48:42 +03:00

29 KiB

Roadmap: KeyHunter

Overview

KeyHunter is built in dependency order: the provider registry and storage schema come first because every other subsystem depends on them, then the scanning engine pipeline, then the full provider library, then input sources and verification, then output and key management, then the first competitive differentiators (dork engine, import adapters, CI/CD integration), then the OSINT/recon engine (infrastructure architecture before individual sources), then automation and notification (Telegram bot, scheduler), and finally the web dashboard which aggregates all subsystems. Each phase delivers a complete, verifiable capability before the next phase begins.

Phases

Phase Numbering:

  • Integer phases (1, 2, 3): Planned milestone work
  • Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED)

Decimal phases appear between their surrounding integers in numeric order.

  • Phase 1: Foundation - Provider registry schema, storage layer with AES-256, and CLI skeleton that everything else depends on (completed 2026-04-05)
  • Phase 2: Tier 1-2 Providers - Frontier and inference platform provider YAML definitions (26 providers, highest-value targets) (completed 2026-04-05)
  • Phase 3: Tier 3-9 Providers - Remaining 82 provider definitions completing 108+ provider coverage (completed 2026-04-05)
  • Phase 4: Input Sources - All input modes: file/dir, full git history, stdin, URL, clipboard
  • Phase 5: Verification Engine - Opt-in active key verification with consent prompt and legal documentation
  • Phase 6: Output, Reporting & Key Management - All output formats and complete key management CLI
  • Phase 7: Import Adapters & CI/CD Integration - TruffleHog/Gitleaks import + pre-commit hooks + SARIF to GitHub Security
  • Phase 8: Dork Engine - YAML-based dork definitions with 150+ built-in dorks and management commands
  • Phase 9: OSINT Infrastructure - Per-source rate limiter architecture and recon engine framework before any sources
  • Phase 10: OSINT Code Hosting - GitHub, GitLab, Bitbucket, HuggingFace and 6 more code hosting sources
  • Phase 11: OSINT Search & Paste - Search engine dorking and paste site aggregation
  • Phase 12: OSINT IoT & Cloud Storage - Shodan/Censys/ZoomEye/FOFA and S3/GCS/Azure cloud storage scanning
  • Phase 13: OSINT Package Registries & Container/IaC - npm/PyPI/crates.io and Docker Hub/K8s/Terraform scanning
  • Phase 14: OSINT CI/CD Logs, Web Archives & Frontend Leaks - Build logs, Wayback Machine, and JS bundle/env scanning
  • Phase 15: OSINT Forums, Collaboration & Log Aggregators - StackOverflow/Reddit/HN, Notion/Trello, Elasticsearch/Grafana/Sentry
  • Phase 16: OSINT Threat Intel, Mobile, DNS & API Marketplaces - VirusTotal/IntelX, APK scanning, crt.sh, Postman/SwaggerHub
  • Phase 17: Telegram Bot & Scheduled Scanning - Remote control bot and cron-based recurring scans with auto-notify
  • Phase 18: Web Dashboard - Embedded htmx + Tailwind dashboard aggregating all subsystems with SSE live updates

Phase Details

Phase 1: Foundation

Goal: The provider registry schema, encrypted storage layer, and CLI skeleton exist and function correctly — all downstream subsystems have stable interfaces to build against Depends on: Nothing (first phase) Requirements: CORE-01, CORE-02, CORE-03, CORE-04, CORE-05, CORE-06, CORE-07, STOR-01, STOR-02, STOR-03, CLI-01, CLI-02, CLI-03, CLI-04, CLI-05, PROV-10 Success Criteria (what must be TRUE):

  1. keyhunter scan ./somefile runs the three-stage pipeline (Aho-Corasick pre-filter → regex → entropy) and returns findings with provider names
  2. Findings are persisted to a SQLite database with the key value stored AES-256 encrypted — plaintext key never appears in the database file
  3. keyhunter config init creates ~/.keyhunter.yaml and keyhunter config set <key> <value> persists values
  4. keyhunter providers list and keyhunter providers info <name> return provider metadata from YAML definitions
  5. Provider YAML schema includes format_version and last_verified fields validated at load time Plans: 5 plans

Plans:

  • 01-01-PLAN.md — Go module init, dependency installation, test scaffolding and testdata fixtures
  • 01-02-PLAN.md — Provider registry: YAML schema, embed loader, Aho-Corasick automaton, Registry struct
  • 01-03-PLAN.md — Storage layer: AES-256-GCM encryption, Argon2id key derivation, SQLite + Finding CRUD
  • 01-04-PLAN.md — Scan engine pipeline: keyword pre-filter, regex+entropy detector, FileSource, ants worker pool
  • 01-05-PLAN.md — CLI wiring: scan, providers list/info/stats, config init/set/get, output table

Phase 2: Tier 1-2 Providers

Goal: The 26 highest-value LLM provider YAML definitions exist with accurate regex patterns, keyword lists, confidence levels, and verify endpoints — covering OpenAI, Anthropic, Google AI, AWS Bedrock, Azure OpenAI and all major inference platforms Depends on: Phase 1 Requirements: PROV-01, PROV-02 Success Criteria (what must be TRUE):

  1. keyhunter scan correctly identifies keys from all 12 Tier 1 providers (OpenAI sk-proj-, Anthropic sk-ant-api03-, Google AIza, etc.) with correct provider names
  2. keyhunter scan correctly identifies keys from all 14 Tier 2 inference platform providers (Groq gsk_, Replicate r8_, Fireworks fw_, Perplexity pplx-, etc.)
  3. Each provider YAML includes a keywords list that enables Aho-Corasick pre-filtering to skip files with no matching context
  4. keyhunter providers stats shows 26 providers loaded with pattern and keyword counts Plans: 5 plans

Plans:

  • 02-01-PLAN.md — Tier 1 high-confidence prefixed providers (OpenAI upgrade, Anthropic upgrade, Google AI, Vertex AI, AWS Bedrock, xAI)
  • 02-02-PLAN.md — Tier 1 keyword-anchored providers (Azure OpenAI, Meta AI, Cohere, Mistral, Inflection, AI21)
  • 02-03-PLAN.md — Tier 2 inference platforms batch 1 (Groq, Replicate, Anyscale, Together, Fireworks, Baseten, DeepInfra)
  • 02-04-PLAN.md — Tier 2 inference platforms batch 2 (Lepton, Modal, Cerebrium, Novita, SambaNova, OctoAI, Friendli)
  • 02-05-PLAN.md — Registry guardrail test: assert 12 Tier 1 + 14 Tier 2 + regex compilation

Phase 3: Tier 3-9 Providers

Goal: All 108+ LLM provider definitions exist — specialized models, Chinese/regional providers, infrastructure gateways, emerging tools, code assistants, self-hosted runtimes, and enterprise platforms Depends on: Phase 2 Requirements: PROV-03, PROV-04, PROV-05, PROV-06, PROV-07, PROV-08, PROV-09 Success Criteria (what must be TRUE):

  1. keyhunter providers stats shows 108+ total providers across all tiers
  2. Chinese/regional provider keys (DeepSeek, Zhipu, Moonshot, Qwen, Baidu, ByteDance, etc.) are detected using keyword-based matching since they use generic key formats
  3. Self-hosted provider definitions (Ollama, vLLM, LocalAI, etc.) include patterns for API key authentication when applicable
  4. keyhunter providers list --tier=enterprise returns Salesforce, ServiceNow, SAP, Palantir, Databricks, Snowflake, Oracle, HPE providers Plans: 8 plans

Plans:

  • 03-01-PLAN.md — Tier 4 Chinese/regional providers (DeepSeek, Zhipu, Moonshot, Qwen, Baidu, ByteDance, 01.AI, MiniMax, Baichuan, StepFun, SenseTime, iFlytek, Tencent, SiliconFlow, 360 AI, Kuaishou)
  • 03-02-PLAN.md — Tier 3 Specialized (Perplexity, You.com, Voyage, Jina, Unstructured, AssemblyAI, Deepgram, ElevenLabs, Stability, Runway, Midjourney)
  • 03-03-PLAN.md — Tier 5 Infrastructure/Gateway (OpenRouter, LiteLLM, Cloudflare AI, Vercel AI, Portkey, Helicone, Martian, Kong, BricksAI, Aether, Not Diamond)
  • 03-04-PLAN.md — Tier 7 Code/Dev Tools (GitHub Copilot, Cursor, Tabnine, Codeium, Sourcegraph, CodeWhisperer, Replit AI, Codestral, watsonx, Oracle AI)
  • 03-05-PLAN.md — Tier 8 Self-Hosted runtimes (Ollama, vLLM, LocalAI, LM Studio, llama.cpp, GPT4All, text-gen-webui, TensorRT-LLM, Triton, Jan)
  • 03-06-PLAN.md — Tier 9 Enterprise (Salesforce Einstein, ServiceNow, SAP AI Core, Palantir, Databricks, Snowflake, Oracle GenAI, HPE GreenLake)
  • 03-07-PLAN.md — Tier 6 Emerging/Niche (Reka, Aleph Alpha, Lamini, Writer, Jasper, Typeface, Comet, W&B, LangSmith, Pinecone, Weaviate, Qdrant, Chroma, Milvus, Neon)
  • 03-08-PLAN.md — Tier 3-9 guardrail test: lock 108 total providers, per-tier counts, and name sets

Phase 4: Input Sources

Goal: Users can point KeyHunter at any content source — local files, git history across all branches, piped content, remote URLs, and the clipboard — and all are scanned through the same detection pipeline Depends on: Phase 2 Requirements: INPUT-01, INPUT-02, INPUT-03, INPUT-04, INPUT-05, INPUT-06 Success Criteria (what must be TRUE):

  1. keyhunter scan ./myrepo recursively scans all files with glob exclusion patterns (e.g., --exclude="*.min.js") and mmap is used for files above a configurable size threshold
  2. keyhunter scan --git ./myrepo scans full git history including all branches, tags, and stash entries; --since=2024-01-01 limits to commits after that date
  3. cat secrets.txt | keyhunter scan stdin detects keys from piped input
  4. keyhunter scan --url https://example.com/config.js fetches and scans the remote URL content
  5. keyhunter scan --clipboard scans the current clipboard content Plans: 5 plans

Plans:

  • 04-01-PLAN.md — Wave 0: add go-git/v5, atotto/clipboard, golang.org/x/exp/mmap dependencies
  • 04-02-PLAN.md — DirSource: recursive walk, glob exclusion, binary skip, mmap for large files (INPUT-01, CORE-07)
  • 04-03-PLAN.md — GitSource: full-history scan across branches/tags with blob dedup and --since (INPUT-02)
  • 04-04-PLAN.md — StdinSource, URLSource, ClipboardSource (INPUT-03, INPUT-04, INPUT-05)
  • 04-05-PLAN.md — cmd/scan.go source-selection dispatch wiring all new sources (INPUT-06)

Phase 5: Verification Engine

Goal: Users can opt into active key verification with a consent prompt, legal documentation, and per-provider API calls that confirm whether a found key is live and return metadata about the key's access level Depends on: Phase 2 Requirements: VRFY-01, VRFY-02, VRFY-03, VRFY-04, VRFY-05, VRFY-06 Success Criteria (what must be TRUE):

  1. keyhunter scan --verify triggers a one-time consent prompt on first use with clear legal language; user must type "yes" to proceed
  2. Each provider YAML's verify endpoint, method, headers, and success/failure codes are used for verification — no hardcoded verification logic
  3. keyhunter scan --verify extracts and displays org name, rate limit tier, and available permissions when the provider API returns them
  4. --verify-timeout=30s changes the per-key verification timeout from the default 10s
  5. A LEGAL.md file shipping with the binary documents the legal implications of using --verify Plans: 5 plans

Plans:

  • 05-01-PLAN.md — Wave 0: extend VerifySpec schema, Finding struct, storage schema; add gjson dep
  • 05-02-PLAN.md — LEGAL.md + pkg/legal embed + consent prompt + keyhunter legal command
  • 05-03-PLAN.md — pkg/verify HTTPVerifier: template sub, gjson metadata extraction, ants pool
  • 05-04-PLAN.md — Update 12 Tier 1 provider YAMLs with extended verify specs + guardrail test
  • 05-05-PLAN.md — cmd/scan.go --verify wiring + --verify-timeout/workers flags + output verify column

Phase 6: Output, Reporting & Key Management

Goal: Users can consume scan results in any format they need and perform full lifecycle management of stored keys — listing, inspecting, exporting, copying, and deleting Depends on: Phase 5 Requirements: OUT-01, OUT-02, OUT-03, OUT-04, OUT-05, OUT-06, KEYS-01, KEYS-02, KEYS-03, KEYS-04, KEYS-05, KEYS-06 Success Criteria (what must be TRUE):

  1. Default table output shows colored, masked keys (first 8 + last 4 chars); --unmask reveals full key values; --output=json|sarif|csv switches format
  2. Exit code is 0 when no keys found, 1 when keys are found, 2 on scan error — confirming CI/CD compatibility
  3. keyhunter keys list shows all stored keys masked; keyhunter keys show <id> shows full unmasked detail
  4. keyhunter keys export --format=json produces a JSON file with full key values; --format=csv produces a CSV
  5. keyhunter keys copy <id> copies the full key to clipboard; keyhunter keys delete <id> removes the key from the database Plans: 6 plans

Plans:

  • 06-01-PLAN.md — Wave 0: Formatter interface, colors.go (TTY/NO_COLOR), refactor TableFormatter
  • 06-02-PLAN.md — JSONFormatter + CSVFormatter (full Finding fields, Unmask option)
  • 06-03-PLAN.md — SARIF 2.1.0 formatter with custom structs (rule dedup, level mapping)
  • 06-04-PLAN.md — pkg/storage/queries.go: Filters, ListFindingsFiltered, GetFinding, DeleteFinding
  • 06-05-PLAN.md — cmd/keys.go command tree: list/show/export/copy/delete/verify (KEYS-01..06)
  • 06-06-PLAN.md — scan --output registry dispatch + exit codes 0/1/2 (OUT-05, OUT-06)

Phase 7: Import Adapters & CI/CD Integration

Goal: Users can import findings from TruffleHog and Gitleaks into KeyHunter's database, and use KeyHunter in pre-commit hooks and CI/CD pipelines with SARIF output uploadable to GitHub Security Depends on: Phase 6 Requirements: IMP-01, IMP-02, IMP-03, CICD-01, CICD-02 Success Criteria (what must be TRUE):

  1. keyhunter import --format=trufflehog results.json parses TruffleHog v3 JSON output and normalizes findings into the KeyHunter database
  2. keyhunter import --format=gitleaks results.json and --format=csv both import and deduplicate against existing findings
  3. keyhunter hook install installs a git pre-commit hook; running git commit on a file with a known API key blocks the commit and prints findings
  4. keyhunter scan --output=sarif produces a valid SARIF 2.1.0 file that GitHub Code Scanning accepts without errors Plans: 6 plans

Plans:

  • 07-01-PLAN.md — pkg/importer Importer interface + TruffleHog v3 JSON parser + fixtures (IMP-01)
  • 07-02-PLAN.md — Gitleaks JSON + CSV parsers (IMP-02)
  • 07-03-PLAN.md — Dedup helper + SARIF GitHub Code Scanning validation test (IMP-03, CICD-02)
  • 07-04-PLAN.md — cmd/import.go wiring format dispatch, dedup, DB persistence (IMP-01/02/03)
  • 07-05-PLAN.md — cmd/hook.go install/uninstall with embedded pre-commit script (CICD-01)
  • 07-06-PLAN.md — docs/CI-CD.md + README CI/CD section with GitHub Actions workflow (CICD-01, CICD-02)

Phase 8: Dork Engine

Goal: Users can run, manage, and extend a library of 150+ built-in YAML dorks across GitHub, Google, Shodan, Censys, ZoomEye, FOFA, GitLab, and Bing — using the same extensibility pattern as provider definitions Depends on: Phase 7 Requirements: DORK-01, DORK-02, DORK-03, DORK-04 Success Criteria (what must be TRUE):

  1. keyhunter dorks list shows 150+ built-in dorks with source engine and category columns
  2. keyhunter dorks run --source=github --category=frontier executes all Tier 1 frontier provider dorks against GitHub code search
  3. keyhunter dorks add --source=google --query='site:pastebin.com "sk-ant-api03-"' persists a custom dork that appears in subsequent dorks list output
  4. keyhunter dorks export --format=json exports all dorks including custom additions Plans: 7 plans

Plans:

  • 08-01-PLAN.md — Dork schema, go:embed loader, registry, executor interface, custom_dorks storage table
  • 08-02-PLAN.md — 50 GitHub dork YAML definitions across 5 categories
  • 08-03-PLAN.md — 30 Google + 20 Shodan dork YAML definitions
  • 08-04-PLAN.md — 15 Censys + 10 ZoomEye + 10 FOFA + 10 GitLab + 5 Bing dork YAML definitions
  • 08-05-PLAN.md — Live GitHub Code Search executor (net/http, Retry-After, limit cap)
  • 08-06-PLAN.md — cmd/dorks.go Cobra tree: list/run/add/export/info/delete
  • 08-07-PLAN.md — Dork count guardrail test (>=150 total, per-source minimums, ID uniqueness)

Phase 9: OSINT Infrastructure

Goal: The recon engine's ReconSource interface, per-source rate limiter architecture, stealth mode, and parallel sweep orchestrator exist and are validated — all individual source modules build on this foundation Depends on: Phase 8 Requirements: RECON-INFRA-05, RECON-INFRA-06, RECON-INFRA-07, RECON-INFRA-08 Success Criteria (what must be TRUE):

  1. Every recon source module holds its own rate.Limiter instance — no centralized rate limiter — and the ReconSource interface enforces a RateLimit() rate.Limit method
  2. keyhunter recon full --stealth applies user-agent rotation and jitter delays to all sources; log output shows "source exhausted" events rather than silently returning empty results
  3. keyhunter recon full --respect-robots (default on) respects robots.txt for web-scraping sources before making any requests
  4. keyhunter recon full fans out to all enabled sources in parallel and deduplicates findings before persisting to the database Plans: 6 plans
  • 09-01-PLAN.md — ReconSource interface + Engine skeleton + ExampleSource stub
  • 09-02-PLAN.md — LimiterRegistry per-source rate.Limiter + jitter
  • 09-03-PLAN.md — Stealth UA pool + cross-source dedup
  • 09-04-PLAN.md — robots.txt parser with 1h per-host cache
  • 09-05-PLAN.md — cmd/recon.go CLI tree (full, list)
  • 09-06-PLAN.md — Integration test + phase summary

Phase 10: OSINT Code Hosting

Goal: Users can scan 10 code hosting platforms — GitHub, GitLab, Bitbucket, GitHub Gist, Codeberg/Gitea, Replit, CodeSandbox, HuggingFace, Kaggle, and miscellaneous code sandbox sites — for leaked LLM API keys Depends on: Phase 9 Requirements: RECON-CODE-01, RECON-CODE-02, RECON-CODE-03, RECON-CODE-04, RECON-CODE-05, RECON-CODE-06, RECON-CODE-07, RECON-CODE-08, RECON-CODE-09, RECON-CODE-10 Success Criteria (what must be TRUE):

  1. keyhunter recon --sources=github,gitlab executes provider keyword dorks against GitHub and GitLab code search APIs and feeds results into the detection pipeline
  2. keyhunter recon --sources=huggingface scans HuggingFace Spaces and model repos for exposed keys
  3. keyhunter recon --sources=gist,bitbucket,codeberg scans public gists, Bitbucket repos, and Codeberg/Gitea instances
  4. keyhunter recon --sources=replit,codesandbox,kaggle scans public repls, sandboxes, and notebooks
  5. All code hosting source findings are stored in the database with source attribution and deduplication Plans: TBD

Phase 11: OSINT Search & Paste

Goal: Users can run automated search engine dorking against Google, Bing, DuckDuckGo, Yandex, and Brave, and scan 15+ paste site aggregations for leaked API keys Depends on: Phase 9 Requirements: RECON-DORK-01, RECON-DORK-02, RECON-DORK-03, RECON-PASTE-01 Success Criteria (what must be TRUE):

  1. keyhunter recon --sources=google runs built-in dorks via Google Custom Search API or SerpAPI and returns results with the dork query that triggered each finding
  2. keyhunter recon --sources=bing executes dorks via Azure Cognitive Services and --sources=duckduckgo,yandex,brave via their respective integrations
  3. keyhunter recon --sources=paste queries Pastebin API and scrapes 15+ additional paste sites, feeding raw content through the detection pipeline Plans: TBD

Phase 12: OSINT IoT & Cloud Storage

Goal: Users can discover exposed LLM endpoints via IoT scanners (Shodan, Censys, ZoomEye, FOFA, Netlas, BinaryEdge) and scan publicly accessible cloud storage buckets (S3, GCS, Azure Blob, MinIO, GrayHatWarfare) for leaked keys Depends on: Phase 9 Requirements: RECON-IOT-01, RECON-IOT-02, RECON-IOT-03, RECON-IOT-04, RECON-IOT-05, RECON-IOT-06, RECON-CLOUD-01, RECON-CLOUD-02, RECON-CLOUD-03, RECON-CLOUD-04 Success Criteria (what must be TRUE):

  1. keyhunter recon --sources=shodan searches Shodan for exposed vLLM, Ollama, and LiteLLM proxy endpoints using the user's API key
  2. keyhunter recon --sources=censys,zoomeye,fofa,netlas,binaryedge each execute IoT searches with appropriate query formats per platform
  3. keyhunter recon --sources=s3 enumerates publicly accessible S3 buckets and scans readable objects for API key patterns
  4. keyhunter recon --sources=gcs,azureblob,spaces scans GCS, Azure Blob, and DigitalOcean Spaces; --sources=minio discovers MinIO instances via Shodan integration
  5. keyhunter recon --sources=grayhoundwarfare queries the GrayHatWarfare bucket search engine for matching bucket names Plans: TBD

Phase 13: OSINT Package Registries & Container/IaC

Goal: Users can scan npm, PyPI, and 6 other package registries for packages containing leaked keys, and scan Docker Hub image layers, Kubernetes configs, Terraform state files, Helm charts, and Ansible Galaxy for secrets in infrastructure code Depends on: Phase 9 Requirements: RECON-PKG-01, RECON-PKG-02, RECON-PKG-03, RECON-INFRA-01, RECON-INFRA-02, RECON-INFRA-03, RECON-INFRA-04 Success Criteria (what must be TRUE):

  1. keyhunter recon --sources=npm downloads and extracts package tarballs for recently published packages matching LLM-related keywords and scans their contents
  2. keyhunter recon --sources=pypi,rubygems,crates,maven,nuget,packagist,goproxy scans respective registries using the same download-extract-scan pattern
  3. keyhunter recon --sources=dockerhub extracts and scans image layers and build args from public Docker Hub images
  4. keyhunter recon --sources=k8s discovers publicly exposed Kubernetes dashboards and scans publicly readable Secret/ConfigMap objects
  5. keyhunter recon --sources=terraform,helm,ansible scans Terraform registry modules, Helm chart repositories, and Ansible Galaxy roles Plans: TBD

Phase 14: OSINT CI/CD Logs, Web Archives & Frontend Leaks

Goal: Users can scan public CI/CD build logs, historical web snapshots from the Wayback Machine and CommonCrawl, and frontend JavaScript artifacts (source maps, webpack bundles, exposed .env files) for leaked API keys Depends on: Phase 9 Requirements: RECON-CI-01, RECON-CI-02, RECON-CI-03, RECON-CI-04, RECON-ARCH-01, RECON-ARCH-02, RECON-JS-01, RECON-JS-02, RECON-JS-03, RECON-JS-04, RECON-JS-05 Success Criteria (what must be TRUE):

  1. keyhunter recon --sources=github-actions scans public GitHub Actions workflow run logs for leaked keys in build output
  2. keyhunter recon --sources=travis,circleci,jenkins,gitlab-ci scans public build logs from each CI platform
  3. keyhunter recon --sources=wayback queries the CDX API for historical snapshots of target domains and scans retrieved content
  4. keyhunter recon --sources=commoncrawl searches CommonCrawl indexes for pages matching LLM provider keywords and scans WARC records
  5. keyhunter recon --sources=sourcemaps,webpack,dotenv,swagger,deploypreview each extract and scan the relevant JS artifacts and configuration files Plans: TBD

Phase 15: OSINT Forums, Collaboration & Log Aggregators

Goal: Users can search developer forums, public collaboration tool pages, and exposed monitoring dashboards for leaked API keys — covering Stack Overflow, Reddit, HackerNews, dev.to, Telegram channels, Discord, Notion, Confluence, Trello, Google Docs, Elasticsearch, Grafana, and Sentry Depends on: Phase 9 Requirements: RECON-FORUM-01, RECON-FORUM-02, RECON-FORUM-03, RECON-FORUM-04, RECON-FORUM-05, RECON-FORUM-06, RECON-COLLAB-01, RECON-COLLAB-02, RECON-COLLAB-03, RECON-COLLAB-04, RECON-LOG-01, RECON-LOG-02, RECON-LOG-03 Success Criteria (what must be TRUE):

  1. keyhunter recon --sources=stackoverflow,reddit,hackernews queries each platform's API/Algolia search for LLM provider keywords and scans result content
  2. keyhunter recon --sources=devto,medium,telegram,discord scans publicly accessible posts, articles, and indexed channel content
  3. keyhunter recon --sources=notion,confluence,trello,googledocs scans publicly accessible pages via dorking and direct API access where available
  4. keyhunter recon --sources=elasticsearch,grafana,sentry discovers exposed instances and scans accessible log data and dashboards Plans: TBD

Phase 16: OSINT Threat Intel, Mobile, DNS & API Marketplaces

Goal: Users can search threat intelligence platforms, scan decompiled Android APKs, perform DNS/subdomain discovery for config endpoint probing, and scan Postman/SwaggerHub API collections for leaked LLM keys Depends on: Phase 9 Requirements: RECON-INTEL-01, RECON-INTEL-02, RECON-INTEL-03, RECON-MOBILE-01, RECON-DNS-01, RECON-DNS-02, RECON-API-01, RECON-API-02 Success Criteria (what must be TRUE):

  1. keyhunter recon --sources=virustotal,intelx,urlhaus queries each threat intelligence platform for files and URLs containing LLM provider keywords
  2. keyhunter recon --sources=apk --target=com.example.app downloads, decompiles (via apktool/jadx), and scans APK content for API keys
  3. keyhunter recon --sources=crtsh --target=example.com discovers subdomains via Certificate Transparency logs and probes each for .env, /api/config, and /actuator/env endpoints
  4. keyhunter recon --sources=postman,swaggerhub scans public Postman collections and SwaggerHub API definitions for hardcoded keys in request examples Plans: TBD

Phase 17: Telegram Bot & Scheduled Scanning

Goal: Users can control KeyHunter remotely via a Telegram bot with scan, verify, recon, status, and subscription commands, and set up cron-based recurring scans that auto-notify on new findings Depends on: Phase 16 Requirements: TELE-01, TELE-02, TELE-03, TELE-04, TELE-05, TELE-06, TELE-07, SCHED-01, SCHED-02, SCHED-03 Success Criteria (what must be TRUE):

  1. keyhunter serve --telegram starts the bot; /scan ./myrepo in a private Telegram chat triggers a scan and returns findings (masked keys only, never unmasked)
  2. /verify <key-id>, /recon --sources=github, /status, /stats, /providers, and /help all respond correctly in private chat
  3. /subscribe enables auto-notifications; new key findings from any scan trigger an immediate Telegram message to all subscribed users
  4. /key <id> sends full key detail to the requesting user's private chat only
  5. keyhunter schedule add --cron="0 */6 * * *" --scan=./myrepo adds a recurring scan; keyhunter schedule list shows it; the job persists across restarts and sends Telegram notifications on new findings Plans: TBD

Phase 18: Web Dashboard

Goal: Users can manage and interact with all KeyHunter capabilities through an embedded web dashboard — viewing scans, managing keys, launching recon, browsing providers, managing dorks, and configuring settings — with live scan progress via SSE Depends on: Phase 17 Requirements: WEB-01, WEB-02, WEB-03, WEB-04, WEB-05, WEB-06, WEB-07, WEB-08, WEB-09, WEB-10, WEB-11 Success Criteria (what must be TRUE):

  1. keyhunter serve starts an embedded HTTP server with the full dashboard accessible in a browser; the binary contains all HTML, CSS, and assets via go:embed
  2. The dashboard overview page shows total keys found, scan count, and active providers as summary statistics
  3. The keys page lists all findings with masked values and a "Reveal Key" toggle that shows the full key on demand
  4. The recon page allows launching a recon sweep with source selection and shows live progress via Server-Sent Events
  5. The REST API at /api/v1/* accepts and returns JSON for all dashboard actions; optional basic auth or token auth is configurable via settings page Plans: TBD UI hint: yes

Progress

Execution Order: Phases execute in numeric order: 1 → 2 → 3 → ... → 18

Phase Plans Complete Status Completed
1. Foundation 0/5 Planning complete -
2. Tier 1-2 Providers 0/? Not started -
3. Tier 3-9 Providers 0/? Not started -
4. Input Sources 0/? Not started -
5. Verification Engine 0/? Not started -
6. Output, Reporting & Key Management 0/? Not started -
7. Import Adapters & CI/CD Integration 0/? Not started -
8. Dork Engine 0/? Not started -
9. OSINT Infrastructure 2/6 In Progress
10. OSINT Code Hosting 0/? Not started -
11. OSINT Search & Paste 0/? Not started -
12. OSINT IoT & Cloud Storage 0/? Not started -
13. OSINT Package Registries & Container/IaC 0/? Not started -
14. OSINT CI/CD Logs, Web Archives & Frontend Leaks 0/? Not started -
15. OSINT Forums, Collaboration & Log Aggregators 0/? Not started -
16. OSINT Threat Intel, Mobile, DNS & API Marketplaces 0/? Not started -
17. Telegram Bot & Scheduled Scanning 0/? Not started -
18. Web Dashboard 0/? Not started -