Compare commits
2 Commits
1acbedc03a
...
3aadeb2d1c
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
3aadeb2d1c | ||
|
|
118decbb3e |
@@ -336,7 +336,7 @@ Phases execute in numeric order: 1 → 2 → 3 → ... → 18
|
|||||||
| 7. Import Adapters & CI/CD Integration | 0/? | Not started | - |
|
| 7. Import Adapters & CI/CD Integration | 0/? | Not started | - |
|
||||||
| 8. Dork Engine | 0/? | Not started | - |
|
| 8. Dork Engine | 0/? | Not started | - |
|
||||||
| 9. OSINT Infrastructure | 2/6 | In Progress| |
|
| 9. OSINT Infrastructure | 2/6 | In Progress| |
|
||||||
| 10. OSINT Code Hosting | 9/9 | Complete | 2026-04-05 |
|
| 10. OSINT Code Hosting | 9/9 | Complete | 2026-04-06 |
|
||||||
| 11. OSINT Search & Paste | 0/? | Not started | - |
|
| 11. OSINT Search & Paste | 0/? | Not started | - |
|
||||||
| 12. OSINT IoT & Cloud Storage | 0/? | Not started | - |
|
| 12. OSINT IoT & Cloud Storage | 0/? | Not started | - |
|
||||||
| 13. OSINT Package Registries & Container/IaC | 0/? | Not started | - |
|
| 13. OSINT Package Registries & Container/IaC | 0/? | Not started | - |
|
||||||
|
|||||||
@@ -4,8 +4,8 @@ milestone: v1.0
|
|||||||
milestone_name: milestone
|
milestone_name: milestone
|
||||||
status: executing
|
status: executing
|
||||||
stopped_at: Completed 10-09-PLAN.md
|
stopped_at: Completed 10-09-PLAN.md
|
||||||
last_updated: "2026-04-05T22:28:27.416Z"
|
last_updated: "2026-04-06T08:38:31.363Z"
|
||||||
last_activity: 2026-04-05
|
last_activity: 2026-04-06
|
||||||
progress:
|
progress:
|
||||||
total_phases: 18
|
total_phases: 18
|
||||||
completed_phases: 10
|
completed_phases: 10
|
||||||
@@ -25,10 +25,10 @@ See: .planning/PROJECT.md (updated 2026-04-04)
|
|||||||
|
|
||||||
## Current Position
|
## Current Position
|
||||||
|
|
||||||
Phase: 10 (osint-code-hosting) — EXECUTING
|
Phase: 11
|
||||||
Plan: 4 of 9
|
Plan: Not started
|
||||||
Status: Ready to execute
|
Status: Ready to execute
|
||||||
Last activity: 2026-04-05
|
Last activity: 2026-04-06
|
||||||
|
|
||||||
Progress: [██░░░░░░░░] 20%
|
Progress: [██░░░░░░░░] 20%
|
||||||
|
|
||||||
|
|||||||
128
.planning/phases/10-osint-code-hosting/10-VERIFICATION.md
Normal file
128
.planning/phases/10-osint-code-hosting/10-VERIFICATION.md
Normal file
@@ -0,0 +1,128 @@
|
|||||||
|
---
|
||||||
|
phase: 10-osint-code-hosting
|
||||||
|
verified: 2026-04-06T08:37:18Z
|
||||||
|
status: passed
|
||||||
|
score: 5/5 must-haves verified
|
||||||
|
re_verification:
|
||||||
|
previous_status: gaps_found
|
||||||
|
previous_score: 3/5
|
||||||
|
gaps_closed:
|
||||||
|
- "`recon --sources=github,gitlab` executes dorks via APIs — `--sources` StringSlice flag now declared on reconFullCmd (line 174) and filterEngineSources rebuilds a filtered engine via Engine.Get (lines 67-86)"
|
||||||
|
- "All code hosting source findings are stored in the database with source attribution and deduplication — persistReconFindings (lines 90-115) calls storage.SaveFinding per deduped finding, gated by `--no-persist` opt-out flag"
|
||||||
|
gaps_remaining: []
|
||||||
|
regressions: []
|
||||||
|
---
|
||||||
|
|
||||||
|
# Phase 10: OSINT Code Hosting Verification Report
|
||||||
|
|
||||||
|
**Phase Goal:** Users can scan 10 code hosting platforms for leaked LLM API keys
|
||||||
|
**Verified:** 2026-04-06T08:37:18Z
|
||||||
|
**Status:** passed
|
||||||
|
**Re-verification:** Yes -- after gap closure (previous: gaps_found 3/5)
|
||||||
|
|
||||||
|
## Goal Achievement
|
||||||
|
|
||||||
|
### Observable Truths (from ROADMAP Success Criteria)
|
||||||
|
|
||||||
|
| # | Truth | Status | Evidence |
|
||||||
|
|---|-------|--------|----------|
|
||||||
|
| 1 | `recon --sources=github,gitlab` executes dorks via APIs and feeds detection pipeline | VERIFIED | `--sources` StringSlice flag declared at cmd/recon.go:174. reconFullCmd (line 37-39) checks `reconSourcesFilter` and calls `filterEngineSources` which uses `Engine.Get(name)` (engine.go:37-42) to rebuild a filtered engine containing only named sources. GitHubSource and GitLabSource are substantive implementations (199 and 175 lines respectively) with real API calls. |
|
||||||
|
| 2 | `recon --sources=huggingface` scans HF Spaces and model repos | VERIFIED | HuggingFaceSource (huggingface.go, 181 lines) sweeps both `/api/spaces` and `/api/models`. Registered in register.go:56. `--sources=huggingface` would filter to this single source via filterEngineSources. Integration test asserts findings arrive from both endpoints. |
|
||||||
|
| 3 | `recon --sources=gist,bitbucket,codeberg` works | VERIFIED | GistSource (184 lines), BitbucketSource (174 lines), CodebergSource (167 lines) all implemented, registered (register.go:68-84), and exercised by integration test. `--sources` flag enables selecting any combination. |
|
||||||
|
| 4 | `recon --sources=replit,codesandbox,kaggle` works | VERIFIED | ReplitSource (141 lines), CodeSandboxSource (95 lines), KaggleSource (149 lines) all implemented, registered (register.go:86-97), and exercised by integration test. SandboxesSource (248 lines) also present for CodePen/JSFiddle/StackBlitz/Glitch/Observable. |
|
||||||
|
| 5 | Code hosting findings stored in DB with source attribution and dedup | VERIFIED | `persistReconFindings` (cmd/recon.go:90-115) iterates deduped findings and calls `storage.SaveFinding` (pkg/storage/findings.go:43) with correct field mapping including SourceType, ProviderName, KeyMasked. Called at line 56 gated by `!reconNoPersist`. Dedup via `recon.Dedup` at line 50. `openDBWithKey` (cmd/keys.go:410) provides DB handle with encryption key. |
|
||||||
|
|
||||||
|
**Score:** 5/5 truths VERIFIED
|
||||||
|
|
||||||
|
### Required Artifacts
|
||||||
|
|
||||||
|
All ten source files exist, are substantive, and are wired via RegisterAll (regression check -- unchanged from initial verification):
|
||||||
|
|
||||||
|
| Artifact | Expected | Status | Details |
|
||||||
|
|----------|----------|--------|---------|
|
||||||
|
| `pkg/recon/sources/github.go` | GitHubSource | VERIFIED | 199 lines, /search/code API |
|
||||||
|
| `pkg/recon/sources/gitlab.go` | GitLabSource | VERIFIED | 175 lines, /api/v4/search |
|
||||||
|
| `pkg/recon/sources/bitbucket.go` | BitbucketSource | VERIFIED | 174 lines, /2.0/workspaces search |
|
||||||
|
| `pkg/recon/sources/gist.go` | GistSource | VERIFIED | 184 lines, /gists/public enumeration |
|
||||||
|
| `pkg/recon/sources/codeberg.go` | CodebergSource | VERIFIED | 167 lines, /api/v1/repos/search |
|
||||||
|
| `pkg/recon/sources/huggingface.go` | HuggingFaceSource | VERIFIED | 181 lines, /api/spaces + /api/models |
|
||||||
|
| `pkg/recon/sources/replit.go` | ReplitSource | VERIFIED | 141 lines, HTML scraper |
|
||||||
|
| `pkg/recon/sources/codesandbox.go` | CodeSandboxSource | VERIFIED | 95 lines, HTML scraper |
|
||||||
|
| `pkg/recon/sources/sandboxes.go` | SandboxesSource | VERIFIED | 248 lines, multi-platform aggregator |
|
||||||
|
| `pkg/recon/sources/kaggle.go` | KaggleSource | VERIFIED | 149 lines, /api/v1/kernels/list |
|
||||||
|
| `pkg/recon/sources/register.go` | RegisterAll | VERIFIED | 10 engine.Register calls (lines 54-97) |
|
||||||
|
| `pkg/recon/sources/integration_test.go` | E2E SweepAll test | VERIFIED | 240 lines, httptest multiplexed server |
|
||||||
|
| `pkg/recon/engine.go` | Engine with Get() method | VERIFIED | Get(name) at lines 37-42, returns (ReconSource, bool) |
|
||||||
|
| `cmd/recon.go` | CLI with --sources flag + DB persistence | VERIFIED | --sources at line 174, filterEngineSources at lines 67-86, persistReconFindings at lines 90-115 |
|
||||||
|
|
||||||
|
### Key Link Verification
|
||||||
|
|
||||||
|
| From | To | Via | Status | Details |
|
||||||
|
|------|----|----|--------|---------|
|
||||||
|
| cmd/recon.go | pkg/recon/sources | sources.RegisterAll(e, cfg) | WIRED | Line 157 in buildReconEngine |
|
||||||
|
| register.go | all 10 sources | engine.Register(...) | WIRED | 10 Register calls (lines 54-97) |
|
||||||
|
| each source | httpclient.go | Client.Do(ctx, req) | WIRED | Shared retrying client in every source |
|
||||||
|
| each source | recon.LimiterRegistry | Limiters.Wait(...) | WIRED | Rate limiting in every Sweep loop |
|
||||||
|
| Sweep outputs | cmd/recon.go | out chan <- recon.Finding -> SweepAll -> Dedup | WIRED | reconFullCmd collects + dedups |
|
||||||
|
| cmd/recon.go | --sources filter | reconSourcesFilter -> filterEngineSources -> Engine.Get | WIRED | Flag at line 174, filter at lines 37-39, rebuild at lines 67-86 |
|
||||||
|
| cmd/recon.go findings | pkg/storage | persistReconFindings -> openDBWithKey -> db.SaveFinding | WIRED | Lines 55-59 call persistReconFindings, which calls storage.SaveFinding per finding (lines 97-112) |
|
||||||
|
|
||||||
|
### Data-Flow Trace (Level 4)
|
||||||
|
|
||||||
|
| Artifact | Data Variable | Source | Produces Real Data | Status |
|
||||||
|
|----------|---------------|--------|--------------------|--------|
|
||||||
|
| All 10 sources | Finding structs | API JSON / HTML scraping | Yes (integration test asserts non-empty findings per SourceType) | FLOWING |
|
||||||
|
| cmd/recon.go dedup | deduped slice | recon.Dedup(all) from SweepAll | Yes | FLOWING |
|
||||||
|
| cmd/recon.go persist | storage.Finding | persistReconFindings maps engine.Finding -> storage.Finding | Yes -- SaveFinding inserts with ProviderName, SourceType, KeyMasked, etc. | FLOWING |
|
||||||
|
|
||||||
|
### Behavioral Spot-Checks
|
||||||
|
|
||||||
|
| Behavior | Command | Result | Status |
|
||||||
|
|----------|---------|--------|--------|
|
||||||
|
| `go build ./...` succeeds | `go build ./...` | exit 0, clean | PASS |
|
||||||
|
| --sources flag declared | grep StringSliceVar cmd/recon.go | Found at line 174 | PASS |
|
||||||
|
| persistReconFindings calls SaveFinding | grep SaveFinding cmd/recon.go | Found at line 110 | PASS |
|
||||||
|
| Engine.Get method exists | grep "func.*Get" pkg/recon/engine.go | Found at line 37 | PASS |
|
||||||
|
| storage.Finding has all mapped fields | grep SourceType pkg/storage/findings.go | SourceType field present at line 20 | PASS |
|
||||||
|
|
||||||
|
### Requirements Coverage
|
||||||
|
|
||||||
|
| Requirement | Source Plan | Description | Status | Evidence |
|
||||||
|
|-------------|-------------|-------------|--------|----------|
|
||||||
|
| RECON-CODE-01 | 10-02 | GitHub code search | SATISFIED | github.go + test |
|
||||||
|
| RECON-CODE-02 | 10-03 | GitLab code search | SATISFIED | gitlab.go + test |
|
||||||
|
| RECON-CODE-03 | 10-04 | GitHub Gist search | SATISFIED | gist.go + test |
|
||||||
|
| RECON-CODE-04 | 10-04 | Bitbucket code search | SATISFIED | bitbucket.go + test |
|
||||||
|
| RECON-CODE-05 | 10-05 | Codeberg/Gitea search | SATISFIED | codeberg.go + test |
|
||||||
|
| RECON-CODE-06 | 10-07 | Replit scanning | SATISFIED | replit.go + test |
|
||||||
|
| RECON-CODE-07 | 10-07 | CodeSandbox scanning | SATISFIED | codesandbox.go + test |
|
||||||
|
| RECON-CODE-08 | 10-06 | HuggingFace scanning | SATISFIED | huggingface.go + test |
|
||||||
|
| RECON-CODE-09 | 10-08 | Kaggle scanning | SATISFIED | kaggle.go + test |
|
||||||
|
| RECON-CODE-10 | 10-07 | CodePen/JSFiddle/StackBlitz/Glitch/Observable | SATISFIED | sandboxes.go + test |
|
||||||
|
|
||||||
|
### Anti-Patterns Found
|
||||||
|
|
||||||
|
| File | Line | Pattern | Severity | Impact |
|
||||||
|
|------|------|---------|----------|--------|
|
||||||
|
| cmd/recon.go | 84 | `_ = eng` unused parameter assignment | Info | Cosmetic; kept for API symmetry per comment |
|
||||||
|
|
||||||
|
No TODOs, FIXMEs, placeholders, or empty implementations found in any Phase 10 file.
|
||||||
|
|
||||||
|
### Human Verification Required
|
||||||
|
|
||||||
|
None. All gaps have been closed with programmatically verifiable changes.
|
||||||
|
|
||||||
|
### Gaps Summary
|
||||||
|
|
||||||
|
Both gaps from the initial verification have been closed:
|
||||||
|
|
||||||
|
1. **--sources flag:** `reconFullCmd` now declares a `--sources` StringSlice flag (line 174). When provided, `filterEngineSources` (lines 67-86) uses the new `Engine.Get(name)` method (engine.go:37-42) to rebuild a filtered engine containing only the requested sources. This satisfies SCs 1-4 which require `recon --sources=github,gitlab` syntax.
|
||||||
|
|
||||||
|
2. **Database persistence:** `persistReconFindings` (lines 90-115) maps deduped `engine.Finding` structs to `storage.Finding` structs and calls `db.SaveFinding` for each one. The function is invoked at line 56, gated by `!reconNoPersist` (opt-out via `--no-persist` flag). This satisfies SC5 which requires findings stored in DB with source attribution and dedup.
|
||||||
|
|
||||||
|
No regressions detected. All 10 source implementations, RegisterAll wiring, integration test, and previously-passing artifacts remain intact.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
_Verified: 2026-04-06T08:37:18Z_
|
||||||
|
_Verifier: Claude (gsd-verifier)_
|
||||||
70
cmd/recon.go
70
cmd/recon.go
@@ -4,10 +4,13 @@ import (
|
|||||||
"context"
|
"context"
|
||||||
"fmt"
|
"fmt"
|
||||||
"os"
|
"os"
|
||||||
|
"strings"
|
||||||
|
|
||||||
|
"github.com/salvacybersec/keyhunter/pkg/engine"
|
||||||
"github.com/salvacybersec/keyhunter/pkg/providers"
|
"github.com/salvacybersec/keyhunter/pkg/providers"
|
||||||
"github.com/salvacybersec/keyhunter/pkg/recon"
|
"github.com/salvacybersec/keyhunter/pkg/recon"
|
||||||
"github.com/salvacybersec/keyhunter/pkg/recon/sources"
|
"github.com/salvacybersec/keyhunter/pkg/recon/sources"
|
||||||
|
"github.com/salvacybersec/keyhunter/pkg/storage"
|
||||||
"github.com/spf13/cobra"
|
"github.com/spf13/cobra"
|
||||||
"github.com/spf13/viper"
|
"github.com/spf13/viper"
|
||||||
)
|
)
|
||||||
@@ -16,6 +19,8 @@ var (
|
|||||||
reconStealth bool
|
reconStealth bool
|
||||||
reconRespectRobots bool
|
reconRespectRobots bool
|
||||||
reconQuery string
|
reconQuery string
|
||||||
|
reconSourcesFilter []string
|
||||||
|
reconNoPersist bool
|
||||||
)
|
)
|
||||||
|
|
||||||
var reconCmd = &cobra.Command{
|
var reconCmd = &cobra.Command{
|
||||||
@@ -26,9 +31,12 @@ var reconCmd = &cobra.Command{
|
|||||||
|
|
||||||
var reconFullCmd = &cobra.Command{
|
var reconFullCmd = &cobra.Command{
|
||||||
Use: "full",
|
Use: "full",
|
||||||
Short: "Sweep all enabled sources in parallel and deduplicate findings",
|
Short: "Sweep enabled sources in parallel, deduplicate findings, and persist to DB",
|
||||||
RunE: func(cmd *cobra.Command, args []string) error {
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
eng := buildReconEngine()
|
eng := buildReconEngine()
|
||||||
|
if len(reconSourcesFilter) > 0 {
|
||||||
|
eng = filterEngineSources(eng, reconSourcesFilter)
|
||||||
|
}
|
||||||
cfg := recon.Config{
|
cfg := recon.Config{
|
||||||
Stealth: reconStealth,
|
Stealth: reconStealth,
|
||||||
RespectRobots: reconRespectRobots,
|
RespectRobots: reconRespectRobots,
|
||||||
@@ -44,10 +52,68 @@ var reconFullCmd = &cobra.Command{
|
|||||||
for _, f := range deduped {
|
for _, f := range deduped {
|
||||||
fmt.Printf(" [%s] %s %s %s\n", f.SourceType, f.ProviderName, f.KeyMasked, f.Source)
|
fmt.Printf(" [%s] %s %s %s\n", f.SourceType, f.ProviderName, f.KeyMasked, f.Source)
|
||||||
}
|
}
|
||||||
|
if !reconNoPersist && len(deduped) > 0 {
|
||||||
|
if err := persistReconFindings(deduped); err != nil {
|
||||||
|
fmt.Fprintf(os.Stderr, "recon: warning: failed to persist findings: %v\n", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
return nil
|
return nil
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// filterEngineSources rebuilds an Engine containing only the sources named in filter.
|
||||||
|
// Unknown names are silently skipped to avoid breaking on typos — the user sees the
|
||||||
|
// remaining count in the sweep summary.
|
||||||
|
func filterEngineSources(eng *recon.Engine, filter []string) *recon.Engine {
|
||||||
|
want := make(map[string]bool, len(filter))
|
||||||
|
for _, name := range filter {
|
||||||
|
want[strings.TrimSpace(name)] = true
|
||||||
|
}
|
||||||
|
filtered := recon.NewEngine()
|
||||||
|
// We can't introspect source structs out of the original engine, so rebuild
|
||||||
|
// fresh and re-register only what matches. This relies on buildReconEngine
|
||||||
|
// being idempotent and cheap.
|
||||||
|
fresh := buildReconEngine()
|
||||||
|
for _, name := range fresh.List() {
|
||||||
|
if want[name] {
|
||||||
|
if src, ok := fresh.Get(name); ok {
|
||||||
|
filtered.Register(src)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
_ = eng // parameter kept for API symmetry; filtered engine replaces it
|
||||||
|
return filtered
|
||||||
|
}
|
||||||
|
|
||||||
|
// persistReconFindings writes deduplicated findings to the SQLite findings table.
|
||||||
|
// Uses the same encryption key derivation as the scan command.
|
||||||
|
func persistReconFindings(findings []engine.Finding) error {
|
||||||
|
db, encKey, err := openDBWithKey()
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
defer db.Close()
|
||||||
|
for _, f := range findings {
|
||||||
|
sf := storage.Finding{
|
||||||
|
ProviderName: f.ProviderName,
|
||||||
|
KeyValue: f.KeyValue,
|
||||||
|
KeyMasked: f.KeyMasked,
|
||||||
|
Confidence: f.Confidence,
|
||||||
|
SourcePath: f.Source,
|
||||||
|
SourceType: f.SourceType,
|
||||||
|
LineNumber: f.LineNumber,
|
||||||
|
Verified: f.Verified,
|
||||||
|
VerifyStatus: f.VerifyStatus,
|
||||||
|
VerifyHTTPCode: f.VerifyHTTPCode,
|
||||||
|
VerifyMetadata: f.VerifyMetadata,
|
||||||
|
}
|
||||||
|
if _, err := db.SaveFinding(sf, encKey); err != nil {
|
||||||
|
return fmt.Errorf("save finding: %w", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
var reconListCmd = &cobra.Command{
|
var reconListCmd = &cobra.Command{
|
||||||
Use: "list",
|
Use: "list",
|
||||||
Short: "List registered recon sources",
|
Short: "List registered recon sources",
|
||||||
@@ -105,6 +171,8 @@ func init() {
|
|||||||
reconFullCmd.Flags().BoolVar(&reconStealth, "stealth", false, "enable UA rotation and jitter delays")
|
reconFullCmd.Flags().BoolVar(&reconStealth, "stealth", false, "enable UA rotation and jitter delays")
|
||||||
reconFullCmd.Flags().BoolVar(&reconRespectRobots, "respect-robots", true, "respect robots.txt for web-scraping sources")
|
reconFullCmd.Flags().BoolVar(&reconRespectRobots, "respect-robots", true, "respect robots.txt for web-scraping sources")
|
||||||
reconFullCmd.Flags().StringVar(&reconQuery, "query", "", "override query sent to each source")
|
reconFullCmd.Flags().StringVar(&reconQuery, "query", "", "override query sent to each source")
|
||||||
|
reconFullCmd.Flags().StringSliceVar(&reconSourcesFilter, "sources", nil, "comma-separated list of sources to run (e.g., github,gitlab)")
|
||||||
|
reconFullCmd.Flags().BoolVar(&reconNoPersist, "no-persist", false, "do not write findings to the database (print only)")
|
||||||
reconCmd.AddCommand(reconFullCmd)
|
reconCmd.AddCommand(reconFullCmd)
|
||||||
reconCmd.AddCommand(reconListCmd)
|
reconCmd.AddCommand(reconListCmd)
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -33,6 +33,14 @@ func (e *Engine) Register(s ReconSource) {
|
|||||||
e.sources[s.Name()] = s
|
e.sources[s.Name()] = s
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Get returns a registered source by name and true, or nil and false.
|
||||||
|
func (e *Engine) Get(name string) (ReconSource, bool) {
|
||||||
|
e.mu.RLock()
|
||||||
|
defer e.mu.RUnlock()
|
||||||
|
s, ok := e.sources[name]
|
||||||
|
return s, ok
|
||||||
|
}
|
||||||
|
|
||||||
// List returns registered source names in sorted order.
|
// List returns registered source names in sorted order.
|
||||||
func (e *Engine) List() []string {
|
func (e *Engine) List() []string {
|
||||||
e.mu.RLock()
|
e.mu.RLock()
|
||||||
|
|||||||
Reference in New Issue
Block a user