9.1 KiB
phase, verified, status, score, re_verification
| phase | verified | status | score | re_verification | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10-osint-code-hosting | 2026-04-06T08:37:18Z | passed | 5/5 must-haves verified |
|
Phase 10: OSINT Code Hosting Verification Report
Phase Goal: Users can scan 10 code hosting platforms for leaked LLM API keys Verified: 2026-04-06T08:37:18Z Status: passed Re-verification: Yes -- after gap closure (previous: gaps_found 3/5)
Goal Achievement
Observable Truths (from ROADMAP Success Criteria)
| # | Truth | Status | Evidence |
|---|---|---|---|
| 1 | recon --sources=github,gitlab executes dorks via APIs and feeds detection pipeline |
VERIFIED | --sources StringSlice flag declared at cmd/recon.go:174. reconFullCmd (line 37-39) checks reconSourcesFilter and calls filterEngineSources which uses Engine.Get(name) (engine.go:37-42) to rebuild a filtered engine containing only named sources. GitHubSource and GitLabSource are substantive implementations (199 and 175 lines respectively) with real API calls. |
| 2 | recon --sources=huggingface scans HF Spaces and model repos |
VERIFIED | HuggingFaceSource (huggingface.go, 181 lines) sweeps both /api/spaces and /api/models. Registered in register.go:56. --sources=huggingface would filter to this single source via filterEngineSources. Integration test asserts findings arrive from both endpoints. |
| 3 | recon --sources=gist,bitbucket,codeberg works |
VERIFIED | GistSource (184 lines), BitbucketSource (174 lines), CodebergSource (167 lines) all implemented, registered (register.go:68-84), and exercised by integration test. --sources flag enables selecting any combination. |
| 4 | recon --sources=replit,codesandbox,kaggle works |
VERIFIED | ReplitSource (141 lines), CodeSandboxSource (95 lines), KaggleSource (149 lines) all implemented, registered (register.go:86-97), and exercised by integration test. SandboxesSource (248 lines) also present for CodePen/JSFiddle/StackBlitz/Glitch/Observable. |
| 5 | Code hosting findings stored in DB with source attribution and dedup | VERIFIED | persistReconFindings (cmd/recon.go:90-115) iterates deduped findings and calls storage.SaveFinding (pkg/storage/findings.go:43) with correct field mapping including SourceType, ProviderName, KeyMasked. Called at line 56 gated by !reconNoPersist. Dedup via recon.Dedup at line 50. openDBWithKey (cmd/keys.go:410) provides DB handle with encryption key. |
Score: 5/5 truths VERIFIED
Required Artifacts
All ten source files exist, are substantive, and are wired via RegisterAll (regression check -- unchanged from initial verification):
| Artifact | Expected | Status | Details |
|---|---|---|---|
pkg/recon/sources/github.go |
GitHubSource | VERIFIED | 199 lines, /search/code API |
pkg/recon/sources/gitlab.go |
GitLabSource | VERIFIED | 175 lines, /api/v4/search |
pkg/recon/sources/bitbucket.go |
BitbucketSource | VERIFIED | 174 lines, /2.0/workspaces search |
pkg/recon/sources/gist.go |
GistSource | VERIFIED | 184 lines, /gists/public enumeration |
pkg/recon/sources/codeberg.go |
CodebergSource | VERIFIED | 167 lines, /api/v1/repos/search |
pkg/recon/sources/huggingface.go |
HuggingFaceSource | VERIFIED | 181 lines, /api/spaces + /api/models |
pkg/recon/sources/replit.go |
ReplitSource | VERIFIED | 141 lines, HTML scraper |
pkg/recon/sources/codesandbox.go |
CodeSandboxSource | VERIFIED | 95 lines, HTML scraper |
pkg/recon/sources/sandboxes.go |
SandboxesSource | VERIFIED | 248 lines, multi-platform aggregator |
pkg/recon/sources/kaggle.go |
KaggleSource | VERIFIED | 149 lines, /api/v1/kernels/list |
pkg/recon/sources/register.go |
RegisterAll | VERIFIED | 10 engine.Register calls (lines 54-97) |
pkg/recon/sources/integration_test.go |
E2E SweepAll test | VERIFIED | 240 lines, httptest multiplexed server |
pkg/recon/engine.go |
Engine with Get() method | VERIFIED | Get(name) at lines 37-42, returns (ReconSource, bool) |
cmd/recon.go |
CLI with --sources flag + DB persistence | VERIFIED | --sources at line 174, filterEngineSources at lines 67-86, persistReconFindings at lines 90-115 |
Key Link Verification
| From | To | Via | Status | Details |
|---|---|---|---|---|
| cmd/recon.go | pkg/recon/sources | sources.RegisterAll(e, cfg) | WIRED | Line 157 in buildReconEngine |
| register.go | all 10 sources | engine.Register(...) | WIRED | 10 Register calls (lines 54-97) |
| each source | httpclient.go | Client.Do(ctx, req) | WIRED | Shared retrying client in every source |
| each source | recon.LimiterRegistry | Limiters.Wait(...) | WIRED | Rate limiting in every Sweep loop |
| Sweep outputs | cmd/recon.go | out chan <- recon.Finding -> SweepAll -> Dedup | WIRED | reconFullCmd collects + dedups |
| cmd/recon.go | --sources filter | reconSourcesFilter -> filterEngineSources -> Engine.Get | WIRED | Flag at line 174, filter at lines 37-39, rebuild at lines 67-86 |
| cmd/recon.go findings | pkg/storage | persistReconFindings -> openDBWithKey -> db.SaveFinding | WIRED | Lines 55-59 call persistReconFindings, which calls storage.SaveFinding per finding (lines 97-112) |
Data-Flow Trace (Level 4)
| Artifact | Data Variable | Source | Produces Real Data | Status |
|---|---|---|---|---|
| All 10 sources | Finding structs | API JSON / HTML scraping | Yes (integration test asserts non-empty findings per SourceType) | FLOWING |
| cmd/recon.go dedup | deduped slice | recon.Dedup(all) from SweepAll | Yes | FLOWING |
| cmd/recon.go persist | storage.Finding | persistReconFindings maps engine.Finding -> storage.Finding | Yes -- SaveFinding inserts with ProviderName, SourceType, KeyMasked, etc. | FLOWING |
Behavioral Spot-Checks
| Behavior | Command | Result | Status |
|---|---|---|---|
go build ./... succeeds |
go build ./... |
exit 0, clean | PASS |
| --sources flag declared | grep StringSliceVar cmd/recon.go | Found at line 174 | PASS |
| persistReconFindings calls SaveFinding | grep SaveFinding cmd/recon.go | Found at line 110 | PASS |
| Engine.Get method exists | grep "func.*Get" pkg/recon/engine.go | Found at line 37 | PASS |
| storage.Finding has all mapped fields | grep SourceType pkg/storage/findings.go | SourceType field present at line 20 | PASS |
Requirements Coverage
| Requirement | Source Plan | Description | Status | Evidence |
|---|---|---|---|---|
| RECON-CODE-01 | 10-02 | GitHub code search | SATISFIED | github.go + test |
| RECON-CODE-02 | 10-03 | GitLab code search | SATISFIED | gitlab.go + test |
| RECON-CODE-03 | 10-04 | GitHub Gist search | SATISFIED | gist.go + test |
| RECON-CODE-04 | 10-04 | Bitbucket code search | SATISFIED | bitbucket.go + test |
| RECON-CODE-05 | 10-05 | Codeberg/Gitea search | SATISFIED | codeberg.go + test |
| RECON-CODE-06 | 10-07 | Replit scanning | SATISFIED | replit.go + test |
| RECON-CODE-07 | 10-07 | CodeSandbox scanning | SATISFIED | codesandbox.go + test |
| RECON-CODE-08 | 10-06 | HuggingFace scanning | SATISFIED | huggingface.go + test |
| RECON-CODE-09 | 10-08 | Kaggle scanning | SATISFIED | kaggle.go + test |
| RECON-CODE-10 | 10-07 | CodePen/JSFiddle/StackBlitz/Glitch/Observable | SATISFIED | sandboxes.go + test |
Anti-Patterns Found
| File | Line | Pattern | Severity | Impact |
|---|---|---|---|---|
| cmd/recon.go | 84 | _ = eng unused parameter assignment |
Info | Cosmetic; kept for API symmetry per comment |
No TODOs, FIXMEs, placeholders, or empty implementations found in any Phase 10 file.
Human Verification Required
None. All gaps have been closed with programmatically verifiable changes.
Gaps Summary
Both gaps from the initial verification have been closed:
-
--sources flag:
reconFullCmdnow declares a--sourcesStringSlice flag (line 174). When provided,filterEngineSources(lines 67-86) uses the newEngine.Get(name)method (engine.go:37-42) to rebuild a filtered engine containing only the requested sources. This satisfies SCs 1-4 which requirerecon --sources=github,gitlabsyntax. -
Database persistence:
persistReconFindings(lines 90-115) maps dedupedengine.Findingstructs tostorage.Findingstructs and callsdb.SaveFindingfor each one. The function is invoked at line 56, gated by!reconNoPersist(opt-out via--no-persistflag). This satisfies SC5 which requires findings stored in DB with source attribution and dedup.
No regressions detected. All 10 source implementations, RegisterAll wiring, integration test, and previously-passing artifacts remain intact.
Verified: 2026-04-06T08:37:18Z Verifier: Claude (gsd-verifier)