--- gsd_state_version: 1.0 milestone: v1.0 milestone_name: milestone status: executing stopped_at: Completed 18-02-PLAN.md last_updated: "2026-04-06T15:07:44.687Z" last_activity: 2026-04-06 progress: total_phases: 18 completed_phases: 15 total_plans: 90 completed_plans: 88 percent: 20 --- # Project State ## Project Reference See: .planning/PROJECT.md (updated 2026-04-04) **Core value:** Detect leaked LLM API keys across more providers and more internet sources than any other tool, with active verification to confirm keys are real and alive. **Current focus:** Phase 13 — osint-package-registries ## Current Position Phase: 18 Plan: Not started Status: Ready to execute Last activity: 2026-04-06 Progress: [██░░░░░░░░] 20% ## Performance Metrics **Velocity:** - Total plans completed: 0 - Average duration: — - Total execution time: 0 hours **By Phase:** | Phase | Plans | Total | Avg/Plan | |-------|-------|-------|----------| | - | - | - | - | **Recent Trend:** - Last 5 plans: — - Trend: — *Updated after each plan completion* | Phase 01-foundation P02 | 9 | 2 tasks | 11 files | | Phase 01-foundation P04 | 5min | 2 tasks | 12 files | | Phase 01-foundation P05 | 4min | 2 tasks | 8 files | | Phase 02-tier-1-2-providers P02 | 1m | 2 tasks | 12 files | | Phase 02-tier-1-2-providers P03 | 3min | 2 tasks | 14 files | | Phase 02-tier-1-2-providers P01 | 3min | 2 tasks | 12 files | | Phase 02-tier-1-2-providers P04 | 1min | 2 tasks tasks | 14 files files | | Phase 02-tier-1-2-providers P05 | 2min | 1 tasks | 1 files | | Phase 03-tier-3-9-providers P04 | 3m | 2 tasks | 20 files | | Phase 03-tier-3-9-providers P02 | 70 | 2 tasks | 22 files | | Phase 03-tier-3-9-providers P06 | 3m | 2 tasks | 16 files | | Phase 03-tier-3-9-providers P01 | 3m | 2 tasks | 32 files | | Phase 03 P08 | 2min | 1 tasks | 1 files | | Phase 04 P01 | 1m | 1 tasks | 2 files | | Phase 04-input-sources P03 | 6m | 1 tasks | 2 files | | Phase 04 P02 | 4min | 1 tasks | 3 files | | Phase 04 P05 | 3min | 1 tasks | 2 files | | Phase 05 P01 | 3m43s | 2 tasks | 10 files | | Phase 05 P04 | 10m | 2 tasks | 25 files | | Phase 05-verification-engine P02 | 7m | 2 tasks | 9 files | | Phase 05-verification-engine P03 | 245s | 2 tasks | 4 files | | Phase 05 P05 | 12min | 2 tasks | 5 files | | Phase 06 P01 | 8m | 2 tasks | 7 files | | Phase 06 P03 | ~6m | 1 tasks | 2 files | | Phase 06-output-reporting P05 | 4min | 2 tasks | 3 files | | Phase 06 P06 | 3min | 2 tasks | 3 files | | Phase 08-dork-engine P01 | 15min | 2 tasks | 10 files | | Phase 08-dork-engine P02 | 12min | 2 tasks | 11 files | | Phase 08-dork-engine P03 | 10m | 2 tasks | 10 files | | Phase 08-dork-engine P07 | 3m | 1 tasks | 1 files | | Phase 09-osint-infrastructure P04 | 6min | 2 tasks | 4 files | | Phase 09 P05 | 5m | 2 tasks | 2 files | | Phase 09-osint-infrastructure P06 | 8min | 2 tasks | 2 files | | Phase 10-osint-code-hosting P01 | 4m | 2 tasks | 7 files | | Phase 10-osint-code-hosting P02 | 5min | 1 tasks | 2 files | | Phase 10-osint-code-hosting P07 | 6 | 2 tasks | 6 files | | Phase 10 P09 | 12min | 2 tasks | 5 files | | Phase 11 P03 | 6min | 2 tasks | 4 files | | Phase 11 P01 | 3min | 2 tasks | 11 files | | Phase 12 P01 | 3min | 2 tasks | 6 files | | Phase 12 P04 | 14min | 2 tasks | 4 files | | Phase 13 P02 | 3min | 2 tasks | 8 files | | Phase 13 P03 | 5min | 2 tasks | 11 files | | Phase 13 P04 | 5min | 2 tasks | 3 files | | Phase 14 P01 | 4min | 1 tasks | 14 files | | Phase 15 P01 | 3min | 2 tasks | 13 files | | Phase 15 P03 | 4min | 2 tasks | 11 files | | Phase 16 P01 | 4min | 2 tasks | 6 files | | Phase 17 P01 | 3min | 2 tasks | 4 files | | Phase 17 P04 | 3min | 2 tasks | 4 files | | Phase 18 P02 | 7min | 2 tasks | 7 files | ## Accumulated Context ### Decisions Decisions are logged in PROJECT.md Key Decisions table. Recent decisions affecting current work: - Roadmap: CGO_ENABLED=0 throughout — modernc.org/sqlite over mattn/go-sqlite3 (see PROJECT.md) - Roadmap: Per-source rate limiter architecture (Phase 9) must precede all OSINT source modules (Phases 10-16) - Roadmap: AES-256 encryption added in Phase 1, not post-hoc — avoids migration complexity - Roadmap: Verification (Phase 5) requires consent prompt + LEGAL.md — not optional polish - [Phase 01-foundation]: Provider YAML in dual locations: providers/ (user-visible) and pkg/providers/definitions/ (embed) — Go embed cannot use '..' paths - [Phase 01-foundation]: Aho-Corasick built with DFA=true at NewRegistry() for O(n) keyword pre-filtering across all providers - [Phase 01-foundation]: pkg/types/chunk.go breaks engine<->sources circular import; ants pool with WaitGroup+Mutex for detector coordination - [Phase 01-foundation]: Per-installation salt via settings table -- no hardcoded salt in production code - [Phase 01-foundation]: Exit code semantics: 0=clean, 1=keys-found, 2=error for CI/CD integration - [Phase 02-tier-1-2-providers]: AWS Bedrock verify URL left empty — SigV4 signing deferred to Phase 5 verification engine - [Phase 03-tier-3-9-providers]: Keyword-only detection for providers without documented key prefixes (You.com, Unstructured, Runway, Midjourney) to avoid false positives. - [Phase 04]: Use 'go mod download' instead of 'go mod tidy' when bootstrapping dependencies ahead of their consumers - [Phase 04-input-sources]: GitSource walks heads+tags+remotes+stash with per-OID blob dedup - [Phase 04]: Introduced selectSource dispatcher with sourceFlags struct for testable CLI source routing - [Phase 05]: Keep legacy VerifySpec ValidStatus/InvalidStatus alongside canonical SuccessCodes/FailureCodes; Effective*() helpers pick canonical-first with fallback - [Phase 05]: Store Finding.VerifyMetadata as JSON TEXT column; legacy DBs migrated in-place via PRAGMA table_info + conditional ALTER TABLE in storage.Open() - [Phase 05-verification-engine]: LEGAL.md dual-location mirror (root + pkg/legal/) required because go:embed cannot traverse parents — mirrors Phase 1 providers pattern - [Phase 05-verification-engine]: verify.consent setting: granted is sticky across runs; declined is not — users who initially refuse can change mind without manual reset - [Phase 05-verification-engine]: Plan 05-03: HTTPVerifier classifies via YAML VerifySpec only; no per-provider branches. VerifyAll uses ants pool with per-finding Result guarantee. - [Phase 05]: Verification runs in batch mode after scan completes (collect -> verify -> persist) with Result->Finding back-assignment via provider+masked-key tuple - [Phase 06]: Registry pattern for output formatters; TableFormatter strips ANSI when writer is not a TTY via zero-value lipgloss.Style - [Phase 06]: SARIF 2.1.0 via hand-rolled structs (no library) per CLAUDE.md - [Phase 06-output-reporting]: keys export rejects SARIF (scan-only); keys show always unmasked; keys verify updates findings inline via db.SQL().Exec - [Phase 08-dork-engine]: pkg/dorks mirrors pkg/providers go:embed pattern; //go:embed definitions/* tolerates empty .gitkeep-only tree - [Phase 08-dork-engine]: Runner + Executor interface separate from Registry so 08-05 GitHub executor registers without touching YAML loader - [Phase 10-osint-code-hosting]: Client handles retry only; rate limiting is caller's responsibility via LimiterRegistry - [Phase 10-osint-code-hosting]: github/gist use 'kw' in:file; all other sources use bare keyword - [Phase 10-osint-code-hosting]: GitHubSource reuses shared sources.Client + LimiterRegistry; builds queries from providers.Registry via BuildQueries; missing token disables (not errors) - [Phase 10]: RegisterAll registers all ten Phase 10 sources unconditionally; missing credentials flip Enabled()==false rather than hiding sources from the CLI catalog - [Phase 11]: RegisterAll extended to 18 sources (10 Phase 10 + 8 Phase 11); paste sources use BaseURL prefix in integration test to avoid /search path collision - [Phase 11]: Integration test uses injected test platforms for PasteSites (same pattern as SandboxesSource) - [Phase 11]: All five search sources use dork query format to focus on paste/code hosting leak sites - [Phase 12]: Shodan/Censys/ZoomEye use bare keyword queries; Censys POST+BasicAuth, Shodan key param, ZoomEye API-KEY header - [Phase 12]: RegisterAll extended to 28 sources (18 Phase 10-11 + 10 Phase 12); cloud scanners credentialless, IoT scanners credential-gated - [Phase 13]: GoProxy regex requires domain dot to filter non-module paths; NuGet projectUrl fallback to nuget.org canonical - [Phase 13]: KubernetesSource uses Artifact Hub rather than Censys/Shodan dorking to avoid duplicating Phase 12 sources - [Phase 13]: RegisterAll extended to 32 sources (28 Phase 10-12 + 4 Phase 13 container/IaC) - [Phase 13]: RegisterAll extended to 40 sources (28 Phase 10-12 + 12 Phase 13); package registry sources credentialless, no new SourcesConfig fields - [Phase 14]: RegisterAll extended to 45 sources (40 Phase 10-13 + 5 Phase 14 CI/CD); CircleCI gets dedicated CIRCLECI_TOKEN - [Phase 15]: Discord/Slack use dorking approach (configurable search endpoint) since neither has public message search API - [Phase 15]: Log aggregator sources are credentialless, targeting exposed instances - [Phase 16]: VT uses x-apikey header per official API v3 spec - [Phase 16]: IX uses three-step flow: POST search, GET results, GET file content - [Phase 16]: URLhaus tag lookup with payload endpoint fallback - [Phase 17]: telego v1.8.0 promoted from indirect to direct; context cancellation for graceful shutdown; rate limit 60s scan/verify/recon, 5s others - [Phase 17]: Separated format from send for testable notifications without telego mock - [Phase 18]: JSON wrapper structs (apiKey, apiProvider, apiDork) with explicit JSON tags since domain structs only have yaml tags - [Phase 18]: API never exposes raw key values -- KeyValue always empty string in JSON responses - [Phase 18]: Single SSEHub shared between scan and recon progress endpoints, events distinguished by Type prefix ### Pending Todos None yet. ### Blockers/Concerns - Phase 1: Argon2 vs PBKDF2 for database encryption key derivation — needs decision before Storage Layer implementation - Phase 1: Aho-Corasick library choice (cloudflare/ahocorasick vs bobrik/ahocorasick) — verify which TruffleHog uses - Phase 2+: Provider YAML patterns for 108 providers — lesser-known providers need targeted research (Chinese LLMs, niche APIs) - Phase 11: Google Custom Search API quota (100 queries/day free tier) vs direct scraping ToS trade-off — product decision needed ## Session Continuity Last session: 2026-04-06T15:07:44.683Z Stopped at: Completed 18-02-PLAN.md Resume file: None