--- phase: 10-osint-code-hosting verified: 2026-04-06T08:37:18Z status: passed score: 5/5 must-haves verified re_verification: previous_status: gaps_found previous_score: 3/5 gaps_closed: - "`recon --sources=github,gitlab` executes dorks via APIs — `--sources` StringSlice flag now declared on reconFullCmd (line 174) and filterEngineSources rebuilds a filtered engine via Engine.Get (lines 67-86)" - "All code hosting source findings are stored in the database with source attribution and deduplication — persistReconFindings (lines 90-115) calls storage.SaveFinding per deduped finding, gated by `--no-persist` opt-out flag" gaps_remaining: [] regressions: [] --- # Phase 10: OSINT Code Hosting Verification Report **Phase Goal:** Users can scan 10 code hosting platforms for leaked LLM API keys **Verified:** 2026-04-06T08:37:18Z **Status:** passed **Re-verification:** Yes -- after gap closure (previous: gaps_found 3/5) ## Goal Achievement ### Observable Truths (from ROADMAP Success Criteria) | # | Truth | Status | Evidence | |---|-------|--------|----------| | 1 | `recon --sources=github,gitlab` executes dorks via APIs and feeds detection pipeline | VERIFIED | `--sources` StringSlice flag declared at cmd/recon.go:174. reconFullCmd (line 37-39) checks `reconSourcesFilter` and calls `filterEngineSources` which uses `Engine.Get(name)` (engine.go:37-42) to rebuild a filtered engine containing only named sources. GitHubSource and GitLabSource are substantive implementations (199 and 175 lines respectively) with real API calls. | | 2 | `recon --sources=huggingface` scans HF Spaces and model repos | VERIFIED | HuggingFaceSource (huggingface.go, 181 lines) sweeps both `/api/spaces` and `/api/models`. Registered in register.go:56. `--sources=huggingface` would filter to this single source via filterEngineSources. Integration test asserts findings arrive from both endpoints. | | 3 | `recon --sources=gist,bitbucket,codeberg` works | VERIFIED | GistSource (184 lines), BitbucketSource (174 lines), CodebergSource (167 lines) all implemented, registered (register.go:68-84), and exercised by integration test. `--sources` flag enables selecting any combination. | | 4 | `recon --sources=replit,codesandbox,kaggle` works | VERIFIED | ReplitSource (141 lines), CodeSandboxSource (95 lines), KaggleSource (149 lines) all implemented, registered (register.go:86-97), and exercised by integration test. SandboxesSource (248 lines) also present for CodePen/JSFiddle/StackBlitz/Glitch/Observable. | | 5 | Code hosting findings stored in DB with source attribution and dedup | VERIFIED | `persistReconFindings` (cmd/recon.go:90-115) iterates deduped findings and calls `storage.SaveFinding` (pkg/storage/findings.go:43) with correct field mapping including SourceType, ProviderName, KeyMasked. Called at line 56 gated by `!reconNoPersist`. Dedup via `recon.Dedup` at line 50. `openDBWithKey` (cmd/keys.go:410) provides DB handle with encryption key. | **Score:** 5/5 truths VERIFIED ### Required Artifacts All ten source files exist, are substantive, and are wired via RegisterAll (regression check -- unchanged from initial verification): | Artifact | Expected | Status | Details | |----------|----------|--------|---------| | `pkg/recon/sources/github.go` | GitHubSource | VERIFIED | 199 lines, /search/code API | | `pkg/recon/sources/gitlab.go` | GitLabSource | VERIFIED | 175 lines, /api/v4/search | | `pkg/recon/sources/bitbucket.go` | BitbucketSource | VERIFIED | 174 lines, /2.0/workspaces search | | `pkg/recon/sources/gist.go` | GistSource | VERIFIED | 184 lines, /gists/public enumeration | | `pkg/recon/sources/codeberg.go` | CodebergSource | VERIFIED | 167 lines, /api/v1/repos/search | | `pkg/recon/sources/huggingface.go` | HuggingFaceSource | VERIFIED | 181 lines, /api/spaces + /api/models | | `pkg/recon/sources/replit.go` | ReplitSource | VERIFIED | 141 lines, HTML scraper | | `pkg/recon/sources/codesandbox.go` | CodeSandboxSource | VERIFIED | 95 lines, HTML scraper | | `pkg/recon/sources/sandboxes.go` | SandboxesSource | VERIFIED | 248 lines, multi-platform aggregator | | `pkg/recon/sources/kaggle.go` | KaggleSource | VERIFIED | 149 lines, /api/v1/kernels/list | | `pkg/recon/sources/register.go` | RegisterAll | VERIFIED | 10 engine.Register calls (lines 54-97) | | `pkg/recon/sources/integration_test.go` | E2E SweepAll test | VERIFIED | 240 lines, httptest multiplexed server | | `pkg/recon/engine.go` | Engine with Get() method | VERIFIED | Get(name) at lines 37-42, returns (ReconSource, bool) | | `cmd/recon.go` | CLI with --sources flag + DB persistence | VERIFIED | --sources at line 174, filterEngineSources at lines 67-86, persistReconFindings at lines 90-115 | ### Key Link Verification | From | To | Via | Status | Details | |------|----|----|--------|---------| | cmd/recon.go | pkg/recon/sources | sources.RegisterAll(e, cfg) | WIRED | Line 157 in buildReconEngine | | register.go | all 10 sources | engine.Register(...) | WIRED | 10 Register calls (lines 54-97) | | each source | httpclient.go | Client.Do(ctx, req) | WIRED | Shared retrying client in every source | | each source | recon.LimiterRegistry | Limiters.Wait(...) | WIRED | Rate limiting in every Sweep loop | | Sweep outputs | cmd/recon.go | out chan <- recon.Finding -> SweepAll -> Dedup | WIRED | reconFullCmd collects + dedups | | cmd/recon.go | --sources filter | reconSourcesFilter -> filterEngineSources -> Engine.Get | WIRED | Flag at line 174, filter at lines 37-39, rebuild at lines 67-86 | | cmd/recon.go findings | pkg/storage | persistReconFindings -> openDBWithKey -> db.SaveFinding | WIRED | Lines 55-59 call persistReconFindings, which calls storage.SaveFinding per finding (lines 97-112) | ### Data-Flow Trace (Level 4) | Artifact | Data Variable | Source | Produces Real Data | Status | |----------|---------------|--------|--------------------|--------| | All 10 sources | Finding structs | API JSON / HTML scraping | Yes (integration test asserts non-empty findings per SourceType) | FLOWING | | cmd/recon.go dedup | deduped slice | recon.Dedup(all) from SweepAll | Yes | FLOWING | | cmd/recon.go persist | storage.Finding | persistReconFindings maps engine.Finding -> storage.Finding | Yes -- SaveFinding inserts with ProviderName, SourceType, KeyMasked, etc. | FLOWING | ### Behavioral Spot-Checks | Behavior | Command | Result | Status | |----------|---------|--------|--------| | `go build ./...` succeeds | `go build ./...` | exit 0, clean | PASS | | --sources flag declared | grep StringSliceVar cmd/recon.go | Found at line 174 | PASS | | persistReconFindings calls SaveFinding | grep SaveFinding cmd/recon.go | Found at line 110 | PASS | | Engine.Get method exists | grep "func.*Get" pkg/recon/engine.go | Found at line 37 | PASS | | storage.Finding has all mapped fields | grep SourceType pkg/storage/findings.go | SourceType field present at line 20 | PASS | ### Requirements Coverage | Requirement | Source Plan | Description | Status | Evidence | |-------------|-------------|-------------|--------|----------| | RECON-CODE-01 | 10-02 | GitHub code search | SATISFIED | github.go + test | | RECON-CODE-02 | 10-03 | GitLab code search | SATISFIED | gitlab.go + test | | RECON-CODE-03 | 10-04 | GitHub Gist search | SATISFIED | gist.go + test | | RECON-CODE-04 | 10-04 | Bitbucket code search | SATISFIED | bitbucket.go + test | | RECON-CODE-05 | 10-05 | Codeberg/Gitea search | SATISFIED | codeberg.go + test | | RECON-CODE-06 | 10-07 | Replit scanning | SATISFIED | replit.go + test | | RECON-CODE-07 | 10-07 | CodeSandbox scanning | SATISFIED | codesandbox.go + test | | RECON-CODE-08 | 10-06 | HuggingFace scanning | SATISFIED | huggingface.go + test | | RECON-CODE-09 | 10-08 | Kaggle scanning | SATISFIED | kaggle.go + test | | RECON-CODE-10 | 10-07 | CodePen/JSFiddle/StackBlitz/Glitch/Observable | SATISFIED | sandboxes.go + test | ### Anti-Patterns Found | File | Line | Pattern | Severity | Impact | |------|------|---------|----------|--------| | cmd/recon.go | 84 | `_ = eng` unused parameter assignment | Info | Cosmetic; kept for API symmetry per comment | No TODOs, FIXMEs, placeholders, or empty implementations found in any Phase 10 file. ### Human Verification Required None. All gaps have been closed with programmatically verifiable changes. ### Gaps Summary Both gaps from the initial verification have been closed: 1. **--sources flag:** `reconFullCmd` now declares a `--sources` StringSlice flag (line 174). When provided, `filterEngineSources` (lines 67-86) uses the new `Engine.Get(name)` method (engine.go:37-42) to rebuild a filtered engine containing only the requested sources. This satisfies SCs 1-4 which require `recon --sources=github,gitlab` syntax. 2. **Database persistence:** `persistReconFindings` (lines 90-115) maps deduped `engine.Finding` structs to `storage.Finding` structs and calls `db.SaveFinding` for each one. The function is invoked at line 56, gated by `!reconNoPersist` (opt-out via `--no-persist` flag). This satisfies SC5 which requires findings stored in DB with source attribution and dedup. No regressions detected. All 10 source implementations, RegisterAll wiring, integration test, and previously-passing artifacts remain intact. --- _Verified: 2026-04-06T08:37:18Z_ _Verifier: Claude (gsd-verifier)_