Files

salvacybersec 0e87618e32 docs(phase-16): complete threat intel, mobile, DNS, API marketplaces

2026-04-06 16:48:35 +03:00

32 KiB

Raw Blame History

Roadmap: KeyHunter

Overview

KeyHunter is built in dependency order: the provider registry and storage schema come first because every other subsystem depends on them, then the scanning engine pipeline, then the full provider library, then input sources and verification, then output and key management, then the first competitive differentiators (dork engine, import adapters, CI/CD integration), then the OSINT/recon engine (infrastructure architecture before individual sources), then automation and notification (Telegram bot, scheduler), and finally the web dashboard which aggregates all subsystems. Each phase delivers a complete, verifiable capability before the next phase begins.

Phases

Phase Numbering:

Integer phases (1, 2, 3): Planned milestone work
Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED)

Decimal phases appear between their surrounding integers in numeric order.

Phase 1: Foundation - Provider registry schema, storage layer with AES-256, and CLI skeleton that everything else depends on (completed 2026-04-05)
Phase 2: Tier 1-2 Providers - Frontier and inference platform provider YAML definitions (26 providers, highest-value targets) (completed 2026-04-05)
Phase 3: Tier 3-9 Providers - Remaining 82 provider definitions completing 108+ provider coverage (completed 2026-04-05)
Phase 4: Input Sources - All input modes: file/dir, full git history, stdin, URL, clipboard
Phase 5: Verification Engine - Opt-in active key verification with consent prompt and legal documentation
Phase 6: Output, Reporting & Key Management - All output formats and complete key management CLI
Phase 7: Import Adapters & CI/CD Integration - TruffleHog/Gitleaks import + pre-commit hooks + SARIF to GitHub Security
Phase 8: Dork Engine - YAML-based dork definitions with 150+ built-in dorks and management commands
Phase 9: OSINT Infrastructure - Per-source rate limiter architecture and recon engine framework before any sources
Phase 10: OSINT Code Hosting - GitHub, GitLab, Bitbucket, HuggingFace and 6 more code hosting sources (completed 2026-04-05)
Phase 11: OSINT Search & Paste - Search engine dorking and paste site aggregation (completed 2026-04-06)
Phase 12: OSINT IoT & Cloud Storage - Shodan/Censys/ZoomEye/FOFA and S3/GCS/Azure cloud storage scanning (completed 2026-04-06)
Phase 13: OSINT Package Registries & Container/IaC - npm/PyPI/crates.io and Docker Hub/K8s/Terraform scanning (completed 2026-04-06)
Phase 14: OSINT CI/CD Logs, Web Archives & Frontend Leaks - Build logs, Wayback Machine, and JS bundle/env scanning (completed 2026-04-06)
Phase 15: OSINT Forums, Collaboration & Log Aggregators - StackOverflow/Reddit/HN, Notion/Trello, Elasticsearch/Grafana/Sentry (completed 2026-04-06)
Phase 16: OSINT Threat Intel, Mobile, DNS & API Marketplaces - VirusTotal/IntelX, APK scanning, crt.sh, Postman/SwaggerHub (completed 2026-04-06)
Phase 17: Telegram Bot & Scheduled Scanning - Remote control bot and cron-based recurring scans with auto-notify
Phase 18: Web Dashboard - Embedded htmx + Tailwind dashboard aggregating all subsystems with SSE live updates

Phase Details

Phase 1: Foundation

Goal: The provider registry schema, encrypted storage layer, and CLI skeleton exist and function correctly — all downstream subsystems have stable interfaces to build against Depends on: Nothing (first phase) Requirements: CORE-01, CORE-02, CORE-03, CORE-04, CORE-05, CORE-06, CORE-07, STOR-01, STOR-02, STOR-03, CLI-01, CLI-02, CLI-03, CLI-04, CLI-05, PROV-10 Success Criteria (what must be TRUE):

keyhunter scan ./somefile runs the three-stage pipeline (Aho-Corasick pre-filter → regex → entropy) and returns findings with provider names
Findings are persisted to a SQLite database with the key value stored AES-256 encrypted — plaintext key never appears in the database file
keyhunter config init creates ~/.keyhunter.yaml and keyhunter config set <key> <value> persists values
keyhunter providers list and keyhunter providers info <name> return provider metadata from YAML definitions
Provider YAML schema includes format_version and last_verified fields validated at load time Plans: 5 plans

Plans:

01-01-PLAN.md — Go module init, dependency installation, test scaffolding and testdata fixtures
01-02-PLAN.md — Provider registry: YAML schema, embed loader, Aho-Corasick automaton, Registry struct
01-03-PLAN.md — Storage layer: AES-256-GCM encryption, Argon2id key derivation, SQLite + Finding CRUD
01-04-PLAN.md — Scan engine pipeline: keyword pre-filter, regex+entropy detector, FileSource, ants worker pool
01-05-PLAN.md — CLI wiring: scan, providers list/info/stats, config init/set/get, output table

Phase 2: Tier 1-2 Providers

Goal: The 26 highest-value LLM provider YAML definitions exist with accurate regex patterns, keyword lists, confidence levels, and verify endpoints — covering OpenAI, Anthropic, Google AI, AWS Bedrock, Azure OpenAI and all major inference platforms Depends on: Phase 1 Requirements: PROV-01, PROV-02 Success Criteria (what must be TRUE):

keyhunter scan correctly identifies keys from all 12 Tier 1 providers (OpenAI sk-proj-, Anthropic sk-ant-api03-, Google AIza, etc.) with correct provider names
keyhunter scan correctly identifies keys from all 14 Tier 2 inference platform providers (Groq gsk_, Replicate r8_, Fireworks fw_, Perplexity pplx-, etc.)
Each provider YAML includes a keywords list that enables Aho-Corasick pre-filtering to skip files with no matching context
keyhunter providers stats shows 26 providers loaded with pattern and keyword counts Plans: 5 plans

Plans:

02-01-PLAN.md — Tier 1 high-confidence prefixed providers (OpenAI upgrade, Anthropic upgrade, Google AI, Vertex AI, AWS Bedrock, xAI)
02-02-PLAN.md — Tier 1 keyword-anchored providers (Azure OpenAI, Meta AI, Cohere, Mistral, Inflection, AI21)
02-03-PLAN.md — Tier 2 inference platforms batch 1 (Groq, Replicate, Anyscale, Together, Fireworks, Baseten, DeepInfra)
02-04-PLAN.md — Tier 2 inference platforms batch 2 (Lepton, Modal, Cerebrium, Novita, SambaNova, OctoAI, Friendli)
02-05-PLAN.md — Registry guardrail test: assert 12 Tier 1 + 14 Tier 2 + regex compilation

Phase 3: Tier 3-9 Providers

Goal: All 108+ LLM provider definitions exist — specialized models, Chinese/regional providers, infrastructure gateways, emerging tools, code assistants, self-hosted runtimes, and enterprise platforms Depends on: Phase 2 Requirements: PROV-03, PROV-04, PROV-05, PROV-06, PROV-07, PROV-08, PROV-09 Success Criteria (what must be TRUE):

keyhunter providers stats shows 108+ total providers across all tiers
Chinese/regional provider keys (DeepSeek, Zhipu, Moonshot, Qwen, Baidu, ByteDance, etc.) are detected using keyword-based matching since they use generic key formats
Self-hosted provider definitions (Ollama, vLLM, LocalAI, etc.) include patterns for API key authentication when applicable
keyhunter providers list --tier=enterprise returns Salesforce, ServiceNow, SAP, Palantir, Databricks, Snowflake, Oracle, HPE providers Plans: 8 plans

Plans:

03-01-PLAN.md — Tier 4 Chinese/regional providers (DeepSeek, Zhipu, Moonshot, Qwen, Baidu, ByteDance, 01.AI, MiniMax, Baichuan, StepFun, SenseTime, iFlytek, Tencent, SiliconFlow, 360 AI, Kuaishou)
03-02-PLAN.md — Tier 3 Specialized (Perplexity, You.com, Voyage, Jina, Unstructured, AssemblyAI, Deepgram, ElevenLabs, Stability, Runway, Midjourney)
03-03-PLAN.md — Tier 5 Infrastructure/Gateway (OpenRouter, LiteLLM, Cloudflare AI, Vercel AI, Portkey, Helicone, Martian, Kong, BricksAI, Aether, Not Diamond)
03-04-PLAN.md — Tier 7 Code/Dev Tools (GitHub Copilot, Cursor, Tabnine, Codeium, Sourcegraph, CodeWhisperer, Replit AI, Codestral, watsonx, Oracle AI)
03-05-PLAN.md — Tier 8 Self-Hosted runtimes (Ollama, vLLM, LocalAI, LM Studio, llama.cpp, GPT4All, text-gen-webui, TensorRT-LLM, Triton, Jan)
03-06-PLAN.md — Tier 9 Enterprise (Salesforce Einstein, ServiceNow, SAP AI Core, Palantir, Databricks, Snowflake, Oracle GenAI, HPE GreenLake)
03-07-PLAN.md — Tier 6 Emerging/Niche (Reka, Aleph Alpha, Lamini, Writer, Jasper, Typeface, Comet, W&B, LangSmith, Pinecone, Weaviate, Qdrant, Chroma, Milvus, Neon)
03-08-PLAN.md — Tier 3-9 guardrail test: lock 108 total providers, per-tier counts, and name sets

Phase 4: Input Sources

Goal: Users can point KeyHunter at any content source — local files, git history across all branches, piped content, remote URLs, and the clipboard — and all are scanned through the same detection pipeline Depends on: Phase 2 Requirements: INPUT-01, INPUT-02, INPUT-03, INPUT-04, INPUT-05, INPUT-06 Success Criteria (what must be TRUE):

keyhunter scan ./myrepo recursively scans all files with glob exclusion patterns (e.g., --exclude="*.min.js") and mmap is used for files above a configurable size threshold
keyhunter scan --git ./myrepo scans full git history including all branches, tags, and stash entries; --since=2024-01-01 limits to commits after that date
cat secrets.txt | keyhunter scan stdin detects keys from piped input
keyhunter scan --url https://example.com/config.js fetches and scans the remote URL content
keyhunter scan --clipboard scans the current clipboard content Plans: 5 plans

Plans:

04-01-PLAN.md — Wave 0: add go-git/v5, atotto/clipboard, golang.org/x/exp/mmap dependencies
04-02-PLAN.md — DirSource: recursive walk, glob exclusion, binary skip, mmap for large files (INPUT-01, CORE-07)
04-03-PLAN.md — GitSource: full-history scan across branches/tags with blob dedup and --since (INPUT-02)
04-04-PLAN.md — StdinSource, URLSource, ClipboardSource (INPUT-03, INPUT-04, INPUT-05)
04-05-PLAN.md — cmd/scan.go source-selection dispatch wiring all new sources (INPUT-06)

Phase 5: Verification Engine

Goal: Users can opt into active key verification with a consent prompt, legal documentation, and per-provider API calls that confirm whether a found key is live and return metadata about the key's access level Depends on: Phase 2 Requirements: VRFY-01, VRFY-02, VRFY-03, VRFY-04, VRFY-05, VRFY-06 Success Criteria (what must be TRUE):

keyhunter scan --verify triggers a one-time consent prompt on first use with clear legal language; user must type "yes" to proceed
Each provider YAML's verify endpoint, method, headers, and success/failure codes are used for verification — no hardcoded verification logic
keyhunter scan --verify extracts and displays org name, rate limit tier, and available permissions when the provider API returns them
--verify-timeout=30s changes the per-key verification timeout from the default 10s
A LEGAL.md file shipping with the binary documents the legal implications of using --verify Plans: 5 plans

Plans:

05-01-PLAN.md — Wave 0: extend VerifySpec schema, Finding struct, storage schema; add gjson dep
05-02-PLAN.md — LEGAL.md + pkg/legal embed + consent prompt + keyhunter legal command
05-03-PLAN.md — pkg/verify HTTPVerifier: template sub, gjson metadata extraction, ants pool
05-04-PLAN.md — Update 12 Tier 1 provider YAMLs with extended verify specs + guardrail test
05-05-PLAN.md — cmd/scan.go --verify wiring + --verify-timeout/workers flags + output verify column

Phase 6: Output, Reporting & Key Management

Goal: Users can consume scan results in any format they need and perform full lifecycle management of stored keys — listing, inspecting, exporting, copying, and deleting Depends on: Phase 5 Requirements: OUT-01, OUT-02, OUT-03, OUT-04, OUT-05, OUT-06, KEYS-01, KEYS-02, KEYS-03, KEYS-04, KEYS-05, KEYS-06 Success Criteria (what must be TRUE):

Default table output shows colored, masked keys (first 8 + last 4 chars); --unmask reveals full key values; --output=json|sarif|csv switches format
Exit code is 0 when no keys found, 1 when keys are found, 2 on scan error — confirming CI/CD compatibility
keyhunter keys list shows all stored keys masked; keyhunter keys show <id> shows full unmasked detail
keyhunter keys export --format=json produces a JSON file with full key values; --format=csv produces a CSV
keyhunter keys copy <id> copies the full key to clipboard; keyhunter keys delete <id> removes the key from the database Plans: 6 plans

Plans:

06-01-PLAN.md — Wave 0: Formatter interface, colors.go (TTY/NO_COLOR), refactor TableFormatter
06-02-PLAN.md — JSONFormatter + CSVFormatter (full Finding fields, Unmask option)
06-03-PLAN.md — SARIF 2.1.0 formatter with custom structs (rule dedup, level mapping)
06-04-PLAN.md — pkg/storage/queries.go: Filters, ListFindingsFiltered, GetFinding, DeleteFinding
06-05-PLAN.md — cmd/keys.go command tree: list/show/export/copy/delete/verify (KEYS-01..06)
06-06-PLAN.md — scan --output registry dispatch + exit codes 0/1/2 (OUT-05, OUT-06)

Phase 7: Import Adapters & CI/CD Integration

Goal: Users can import findings from TruffleHog and Gitleaks into KeyHunter's database, and use KeyHunter in pre-commit hooks and CI/CD pipelines with SARIF output uploadable to GitHub Security Depends on: Phase 6 Requirements: IMP-01, IMP-02, IMP-03, CICD-01, CICD-02 Success Criteria (what must be TRUE):

keyhunter import --format=trufflehog results.json parses TruffleHog v3 JSON output and normalizes findings into the KeyHunter database
keyhunter import --format=gitleaks results.json and --format=csv both import and deduplicate against existing findings
keyhunter hook install installs a git pre-commit hook; running git commit on a file with a known API key blocks the commit and prints findings
keyhunter scan --output=sarif produces a valid SARIF 2.1.0 file that GitHub Code Scanning accepts without errors Plans: 6 plans

Plans:

07-01-PLAN.md — pkg/importer Importer interface + TruffleHog v3 JSON parser + fixtures (IMP-01)
07-02-PLAN.md — Gitleaks JSON + CSV parsers (IMP-02)
07-03-PLAN.md — Dedup helper + SARIF GitHub Code Scanning validation test (IMP-03, CICD-02)
07-04-PLAN.md — cmd/import.go wiring format dispatch, dedup, DB persistence (IMP-01/02/03)
07-05-PLAN.md — cmd/hook.go install/uninstall with embedded pre-commit script (CICD-01)
07-06-PLAN.md — docs/CI-CD.md + README CI/CD section with GitHub Actions workflow (CICD-01, CICD-02)

Phase 8: Dork Engine

Goal: Users can run, manage, and extend a library of 150+ built-in YAML dorks across GitHub, Google, Shodan, Censys, ZoomEye, FOFA, GitLab, and Bing — using the same extensibility pattern as provider definitions Depends on: Phase 7 Requirements: DORK-01, DORK-02, DORK-03, DORK-04 Success Criteria (what must be TRUE):

keyhunter dorks list shows 150+ built-in dorks with source engine and category columns
keyhunter dorks run --source=github --category=frontier executes all Tier 1 frontier provider dorks against GitHub code search
keyhunter dorks add --source=google --query='site:pastebin.com "sk-ant-api03-"' persists a custom dork that appears in subsequent dorks list output
keyhunter dorks export --format=json exports all dorks including custom additions Plans: 7 plans

Plans:

08-01-PLAN.md — Dork schema, go:embed loader, registry, executor interface, custom_dorks storage table
08-02-PLAN.md — 50 GitHub dork YAML definitions across 5 categories
08-03-PLAN.md — 30 Google + 20 Shodan dork YAML definitions
08-04-PLAN.md — 15 Censys + 10 ZoomEye + 10 FOFA + 10 GitLab + 5 Bing dork YAML definitions
08-05-PLAN.md — Live GitHub Code Search executor (net/http, Retry-After, limit cap)
08-06-PLAN.md — cmd/dorks.go Cobra tree: list/run/add/export/info/delete
08-07-PLAN.md — Dork count guardrail test (>=150 total, per-source minimums, ID uniqueness)

Phase 9: OSINT Infrastructure

Goal: The recon engine's ReconSource interface, per-source rate limiter architecture, stealth mode, and parallel sweep orchestrator exist and are validated — all individual source modules build on this foundation Depends on: Phase 8 Requirements: RECON-INFRA-05, RECON-INFRA-06, RECON-INFRA-07, RECON-INFRA-08 Success Criteria (what must be TRUE):

Every recon source module holds its own rate.Limiter instance — no centralized rate limiter — and the ReconSource interface enforces a RateLimit() rate.Limit method
keyhunter recon full --stealth applies user-agent rotation and jitter delays to all sources; log output shows "source exhausted" events rather than silently returning empty results
keyhunter recon full --respect-robots (default on) respects robots.txt for web-scraping sources before making any requests
keyhunter recon full fans out to all enabled sources in parallel and deduplicates findings before persisting to the database Plans: 6 plans

09-01-PLAN.md — ReconSource interface + Engine skeleton + ExampleSource stub
09-02-PLAN.md — LimiterRegistry per-source rate.Limiter + jitter
09-03-PLAN.md — Stealth UA pool + cross-source dedup
09-04-PLAN.md — robots.txt parser with 1h per-host cache
09-05-PLAN.md — cmd/recon.go CLI tree (full, list)
09-06-PLAN.md — Integration test + phase summary

Phase 10: OSINT Code Hosting

Goal: Users can scan 10 code hosting platforms — GitHub, GitLab, Bitbucket, GitHub Gist, Codeberg/Gitea, Replit, CodeSandbox, HuggingFace, Kaggle, and miscellaneous code sandbox sites — for leaked LLM API keys Depends on: Phase 9 Requirements: RECON-CODE-01, RECON-CODE-02, RECON-CODE-03, RECON-CODE-04, RECON-CODE-05, RECON-CODE-06, RECON-CODE-07, RECON-CODE-08, RECON-CODE-09, RECON-CODE-10 Success Criteria (what must be TRUE):

keyhunter recon --sources=github,gitlab executes provider keyword dorks against GitHub and GitLab code search APIs and feeds results into the detection pipeline
keyhunter recon --sources=huggingface scans HuggingFace Spaces and model repos for exposed keys
keyhunter recon --sources=gist,bitbucket,codeberg scans public gists, Bitbucket repos, and Codeberg/Gitea instances
keyhunter recon --sources=replit,codesandbox,kaggle scans public repls, sandboxes, and notebooks
All code hosting source findings are stored in the database with source attribution and deduplication Plans: 9 plans Plans:

10-01-PLAN.md — Shared HTTP client + provider-query generator + RegisterAll skeleton
10-02-PLAN.md — GitHubSource (RECON-CODE-01)
10-03-PLAN.md — GitLabSource (RECON-CODE-02)
10-04-PLAN.md — BitbucketSource + GistSource (RECON-CODE-03, RECON-CODE-04)
10-05-PLAN.md — CodebergSource/Gitea (RECON-CODE-05)
10-06-PLAN.md — HuggingFaceSource (RECON-CODE-08)
10-07-PLAN.md — Replit + CodeSandbox + Sandboxes scrapers (RECON-CODE-06, RECON-CODE-07, RECON-CODE-10)
10-08-PLAN.md — KaggleSource (RECON-CODE-09)
10-09-PLAN.md — RegisterAll wiring + CLI integration + end-to-end test

Phase 11: OSINT Search & Paste

Goal: Users can run automated search engine dorking against Google, Bing, DuckDuckGo, Yandex, and Brave, and scan 15+ paste site aggregations for leaked API keys Depends on: Phase 9 Requirements: RECON-DORK-01, RECON-DORK-02, RECON-DORK-03, RECON-PASTE-01 Success Criteria (what must be TRUE):

keyhunter recon --sources=google runs built-in dorks via Google Custom Search API or SerpAPI and returns results with the dork query that triggered each finding
keyhunter recon --sources=bing executes dorks via Azure Cognitive Services and --sources=duckduckgo,yandex,brave via their respective integrations
keyhunter recon --sources=paste queries Pastebin API and scrapes 15+ additional paste sites, feeding raw content through the detection pipeline Plans: 3 plans

Plans:

11-01-PLAN.md — GoogleDorkSource + BingDorkSource + DuckDuckGoSource + YandexSource + BraveSource (RECON-DORK-01, RECON-DORK-02, RECON-DORK-03)
11-02-PLAN.md — PastebinSource + GistPasteSource + PasteSitesSource multi-paste aggregator (RECON-PASTE-01)
11-03-PLAN.md — RegisterAll wiring + cmd/recon.go credentials + integration test (all Phase 11 reqs)

Phase 12: OSINT IoT & Cloud Storage

Goal: Users can discover exposed LLM endpoints via IoT scanners (Shodan, Censys, ZoomEye, FOFA, Netlas, BinaryEdge) and scan publicly accessible cloud storage buckets (S3, GCS, Azure Blob, MinIO, GrayHatWarfare) for leaked keys Depends on: Phase 9 Requirements: RECON-IOT-01, RECON-IOT-02, RECON-IOT-03, RECON-IOT-04, RECON-IOT-05, RECON-IOT-06, RECON-CLOUD-01, RECON-CLOUD-02, RECON-CLOUD-03, RECON-CLOUD-04 Success Criteria (what must be TRUE):

keyhunter recon --sources=shodan searches Shodan for exposed vLLM, Ollama, and LiteLLM proxy endpoints using the user's API key
keyhunter recon --sources=censys,zoomeye,fofa,netlas,binaryedge each execute IoT searches with appropriate query formats per platform
keyhunter recon --sources=s3 enumerates publicly accessible S3 buckets and scans readable objects for API key patterns
keyhunter recon --sources=gcs,azureblob,spaces scans GCS, Azure Blob, and DigitalOcean Spaces; --sources=minio discovers MinIO instances via Shodan integration
keyhunter recon --sources=grayhoundwarfare queries the GrayHatWarfare bucket search engine for matching bucket names Plans: 4 plans

Plans:

12-01-PLAN.md — ShodanSource + CensysSource + ZoomEyeSource (RECON-IOT-01, RECON-IOT-02, RECON-IOT-03)
12-02-PLAN.md — FOFASource + NetlasSource + BinaryEdgeSource (RECON-IOT-04, RECON-IOT-05, RECON-IOT-06)
12-03-PLAN.md — S3Scanner + GCSScanner + AzureBlobScanner + DOSpacesScanner (RECON-CLOUD-01, RECON-CLOUD-02, RECON-CLOUD-03, RECON-CLOUD-04)
12-04-PLAN.md — RegisterAll wiring + cmd/recon.go credentials + integration test (all Phase 12 reqs)

Phase 13: OSINT Package Registries & Container/IaC

Goal: Users can scan npm, PyPI, and 6 other package registries for packages containing leaked keys, and scan Docker Hub image layers, Kubernetes configs, Terraform state files, Helm charts, and Ansible Galaxy for secrets in infrastructure code Depends on: Phase 9 Requirements: RECON-PKG-01, RECON-PKG-02, RECON-PKG-03, RECON-INFRA-01, RECON-INFRA-02, RECON-INFRA-03, RECON-INFRA-04 Success Criteria (what must be TRUE):

keyhunter recon --sources=npm downloads and extracts package tarballs for recently published packages matching LLM-related keywords and scans their contents
keyhunter recon --sources=pypi,rubygems,crates,maven,nuget,packagist,goproxy scans respective registries using the same download-extract-scan pattern
keyhunter recon --sources=dockerhub extracts and scans image layers and build args from public Docker Hub images
keyhunter recon --sources=k8s discovers publicly exposed Kubernetes dashboards and scans publicly readable Secret/ConfigMap objects
keyhunter recon --sources=terraform,helm,ansible scans Terraform registry modules, Helm chart repositories, and Ansible Galaxy roles Plans: 4 plans Plans:

13-01-PLAN.md — NpmSource + PyPISource + CratesIOSource + RubyGemsSource (RECON-PKG-01, RECON-PKG-02)
13-02-PLAN.md — MavenSource + NuGetSource + GoProxySource + PackagistSource (RECON-PKG-02, RECON-PKG-03)
13-03-PLAN.md — DockerHubSource + KubernetesSource + TerraformSource + HelmSource (RECON-INFRA-01..04)
13-04-PLAN.md — RegisterAll wiring + integration test (all Phase 13 reqs)

Phase 14: OSINT CI/CD Logs, Web Archives & Frontend Leaks

Goal: Users can scan public CI/CD build logs, historical web snapshots from the Wayback Machine and CommonCrawl, and frontend JavaScript artifacts (source maps, webpack bundles, exposed .env files) for leaked API keys Depends on: Phase 9 Requirements: RECON-CI-01, RECON-CI-02, RECON-CI-03, RECON-CI-04, RECON-ARCH-01, RECON-ARCH-02, RECON-JS-01, RECON-JS-02, RECON-JS-03, RECON-JS-04, RECON-JS-05 Success Criteria (what must be TRUE):

keyhunter recon --sources=github-actions scans public GitHub Actions workflow run logs for leaked keys in build output
keyhunter recon --sources=travis,circleci,jenkins,gitlab-ci scans public build logs from each CI platform
keyhunter recon --sources=wayback queries the CDX API for historical snapshots of target domains and scans retrieved content
keyhunter recon --sources=commoncrawl searches CommonCrawl indexes for pages matching LLM provider keywords and scans WARC records
keyhunter recon --sources=sourcemaps,webpack,dotenv,swagger,deploypreview each extract and scan the relevant JS artifacts and configuration files Plans: 4 plans

Plans:

14-01-PLAN.md — CI/CD log sources: GitHubActions, TravisCI, CircleCI, Jenkins, GitLabCI
14-02-PLAN.md — Web archive sources: Wayback Machine, CommonCrawl
14-03-PLAN.md — Frontend leak sources: SourceMap, Webpack, EnvLeak, Swagger, DeployPreview
14-04-PLAN.md — RegisterAll wiring + integration test (all Phase 14 reqs) (completed 2026-04-06)

Phase 15: OSINT Forums, Collaboration & Log Aggregators

Goal: Users can search developer forums, public collaboration tool pages, and exposed monitoring dashboards for leaked API keys — covering Stack Overflow, Reddit, HackerNews, dev.to, Telegram channels, Discord, Notion, Confluence, Trello, Google Docs, Elasticsearch, Grafana, and Sentry Depends on: Phase 9 Requirements: RECON-FORUM-01, RECON-FORUM-02, RECON-FORUM-03, RECON-FORUM-04, RECON-FORUM-05, RECON-FORUM-06, RECON-COLLAB-01, RECON-COLLAB-02, RECON-COLLAB-03, RECON-COLLAB-04, RECON-LOG-01, RECON-LOG-02, RECON-LOG-03 Success Criteria (what must be TRUE):

keyhunter recon --sources=stackoverflow,reddit,hackernews queries each platform's API/Algolia search for LLM provider keywords and scans result content
keyhunter recon --sources=devto,medium,telegram,discord scans publicly accessible posts, articles, and indexed channel content
keyhunter recon --sources=notion,confluence,trello,googledocs scans publicly accessible pages via dorking and direct API access where available
keyhunter recon --sources=elasticsearch,grafana,sentry discovers exposed instances and scans accessible log data and dashboards Plans: 4 plans

Plans:

15-01-PLAN.md — StackOverflow, Reddit, HackerNews, Discord, Slack, DevTo forum sources (RECON-FORUM-01..06)
15-02-PLAN.md — Trello, Notion, Confluence, GoogleDocs collaboration sources (RECON-COLLAB-01..04)
15-03-PLAN.md — Elasticsearch, Grafana, Sentry, Kibana, Splunk log aggregator sources (RECON-LOG-01..03)
15-04-PLAN.md — RegisterAll wiring + integration test (all Phase 15 reqs)

Phase 16: OSINT Threat Intel, Mobile, DNS & API Marketplaces

Goal: Users can search threat intelligence platforms, scan decompiled Android APKs, perform DNS/subdomain discovery for config endpoint probing, and scan Postman/SwaggerHub API collections for leaked LLM keys Depends on: Phase 9 Requirements: RECON-INTEL-01, RECON-INTEL-02, RECON-INTEL-03, RECON-MOBILE-01, RECON-DNS-01, RECON-DNS-02, RECON-API-01, RECON-API-02 Success Criteria (what must be TRUE):

keyhunter recon --sources=virustotal,intelx,urlhaus queries each threat intelligence platform for files and URLs containing LLM provider keywords
keyhunter recon --sources=apk --target=com.example.app downloads, decompiles (via apktool/jadx), and scans APK content for API keys
keyhunter recon --sources=crtsh --target=example.com discovers subdomains via Certificate Transparency logs and probes each for .env, /api/config, and /actuator/env endpoints
keyhunter recon --sources=postman,swaggerhub scans public Postman collections and SwaggerHub API definitions for hardcoded keys in request examples Plans: 4 plans

Plans:

16-01-PLAN.md — VirusTotal, IntelligenceX, URLhaus threat intelligence sources (RECON-INTEL-01, RECON-INTEL-02, RECON-INTEL-03)
16-02-PLAN.md — APKMirror, crt.sh, SecurityTrails mobile and DNS sources (RECON-MOBILE-01, RECON-DNS-01, RECON-DNS-02)
16-03-PLAN.md — Postman, SwaggerHub, RapidAPI marketplace sources (RECON-API-01, RECON-API-02)
16-04-PLAN.md — RegisterAll wiring + cmd/recon.go credentials + integration test (all Phase 16 reqs)

Phase 17: Telegram Bot & Scheduled Scanning

Goal: Users can control KeyHunter remotely via a Telegram bot with scan, verify, recon, status, and subscription commands, and set up cron-based recurring scans that auto-notify on new findings Depends on: Phase 16 Requirements: TELE-01, TELE-02, TELE-03, TELE-04, TELE-05, TELE-06, TELE-07, SCHED-01, SCHED-02, SCHED-03 Success Criteria (what must be TRUE):

keyhunter serve --telegram starts the bot; /scan ./myrepo in a private Telegram chat triggers a scan and returns findings (masked keys only, never unmasked)
/verify <key-id>, /recon --sources=github, /status, /stats, /providers, and /help all respond correctly in private chat
/subscribe enables auto-notifications; new key findings from any scan trigger an immediate Telegram message to all subscribed users
/key <id> sends full key detail to the requesting user's private chat only
keyhunter schedule add --cron="0 */6 * * *" --scan=./myrepo adds a recurring scan; keyhunter schedule list shows it; the job persists across restarts and sends Telegram notifications on new findings Plans: TBD

Phase 18: Web Dashboard

Goal: Users can manage and interact with all KeyHunter capabilities through an embedded web dashboard — viewing scans, managing keys, launching recon, browsing providers, managing dorks, and configuring settings — with live scan progress via SSE Depends on: Phase 17 Requirements: WEB-01, WEB-02, WEB-03, WEB-04, WEB-05, WEB-06, WEB-07, WEB-08, WEB-09, WEB-10, WEB-11 Success Criteria (what must be TRUE):

keyhunter serve starts an embedded HTTP server with the full dashboard accessible in a browser; the binary contains all HTML, CSS, and assets via go:embed
The dashboard overview page shows total keys found, scan count, and active providers as summary statistics
The keys page lists all findings with masked values and a "Reveal Key" toggle that shows the full key on demand
The recon page allows launching a recon sweep with source selection and shows live progress via Server-Sent Events
The REST API at /api/v1/* accepts and returns JSON for all dashboard actions; optional basic auth or token auth is configurable via settings page Plans: TBD UI hint: yes

Progress

Execution Order: Phases execute in numeric order: 1 → 2 → 3 → ... → 18

Phase	Plans Complete	Status	Completed
1. Foundation	0/5	Planning complete	-
2. Tier 1-2 Providers	0/?	Not started	-
3. Tier 3-9 Providers	0/?	Not started	-
4. Input Sources	0/?	Not started	-
5. Verification Engine	0/?	Not started	-
6. Output, Reporting & Key Management	0/?	Not started	-
7. Import Adapters & CI/CD Integration	0/?	Not started	-
8. Dork Engine	0/?	Not started	-
9. OSINT Infrastructure	2/6	In Progress
10. OSINT Code Hosting	9/9	Complete	2026-04-06
11. OSINT Search & Paste	3/3	Complete	2026-04-06
12. OSINT IoT & Cloud Storage	4/4	Complete	2026-04-06
13. OSINT Package Registries & Container/IaC	4/4	Complete	2026-04-06
14. OSINT CI/CD Logs, Web Archives & Frontend Leaks	1/1	Complete	2026-04-06
15. OSINT Forums, Collaboration & Log Aggregators	2/4	Complete	2026-04-06
16. OSINT Threat Intel, Mobile, DNS & API Marketplaces	0/?	Complete	2026-04-06
17. Telegram Bot & Scheduled Scanning	0/?	Not started	-
18. Web Dashboard	0/?	Not started	-

32 KiB Raw Blame History

Roadmap: KeyHunter

Overview

Phases

Phase Details

Phase 1: Foundation

Phase 2: Tier 1-2 Providers

Phase 3: Tier 3-9 Providers

Phase 4: Input Sources

Phase 5: Verification Engine

Phase 6: Output, Reporting & Key Management

Phase 7: Import Adapters & CI/CD Integration

Phase 8: Dork Engine

Phase 9: OSINT Infrastructure

Phase 10: OSINT Code Hosting

Phase 11: OSINT Search & Paste

Phase 12: OSINT IoT & Cloud Storage

Phase 13: OSINT Package Registries & Container/IaC

Phase 14: OSINT CI/CD Logs, Web Archives & Frontend Leaks

Phase 15: OSINT Forums, Collaboration & Log Aggregators

Phase 16: OSINT Threat Intel, Mobile, DNS & API Marketplaces

Phase 17: Telegram Bot & Scheduled Scanning

Phase 18: Web Dashboard

Progress

32 KiB

Raw Blame History