docs(09): create phase plan
This commit is contained in:
@@ -197,7 +197,13 @@ Plans:
|
|||||||
2. `keyhunter recon full --stealth` applies user-agent rotation and jitter delays to all sources; log output shows "source exhausted" events rather than silently returning empty results
|
2. `keyhunter recon full --stealth` applies user-agent rotation and jitter delays to all sources; log output shows "source exhausted" events rather than silently returning empty results
|
||||||
3. `keyhunter recon full --respect-robots` (default on) respects robots.txt for web-scraping sources before making any requests
|
3. `keyhunter recon full --respect-robots` (default on) respects robots.txt for web-scraping sources before making any requests
|
||||||
4. `keyhunter recon full` fans out to all enabled sources in parallel and deduplicates findings before persisting to the database
|
4. `keyhunter recon full` fans out to all enabled sources in parallel and deduplicates findings before persisting to the database
|
||||||
**Plans**: TBD
|
**Plans**: 6 plans
|
||||||
|
- [ ] 09-01-PLAN.md — ReconSource interface + Engine skeleton + ExampleSource stub
|
||||||
|
- [ ] 09-02-PLAN.md — LimiterRegistry per-source rate.Limiter + jitter
|
||||||
|
- [ ] 09-03-PLAN.md — Stealth UA pool + cross-source dedup
|
||||||
|
- [ ] 09-04-PLAN.md — robots.txt parser with 1h per-host cache
|
||||||
|
- [ ] 09-05-PLAN.md — cmd/recon.go CLI tree (full, list)
|
||||||
|
- [ ] 09-06-PLAN.md — Integration test + phase summary
|
||||||
|
|
||||||
### Phase 10: OSINT Code Hosting
|
### Phase 10: OSINT Code Hosting
|
||||||
**Goal**: Users can scan 10 code hosting platforms — GitHub, GitLab, Bitbucket, GitHub Gist, Codeberg/Gitea, Replit, CodeSandbox, HuggingFace, Kaggle, and miscellaneous code sandbox sites — for leaked LLM API keys
|
**Goal**: Users can scan 10 code hosting platforms — GitHub, GitLab, Bitbucket, GitHub Gist, Codeberg/Gitea, Replit, CodeSandbox, HuggingFace, Kaggle, and miscellaneous code sandbox sites — for leaked LLM API keys
|
||||||
|
|||||||
304
.planning/phases/09-osint-infrastructure/09-01-PLAN.md
Normal file
304
.planning/phases/09-osint-infrastructure/09-01-PLAN.md
Normal file
@@ -0,0 +1,304 @@
|
|||||||
|
---
|
||||||
|
phase: 09-osint-infrastructure
|
||||||
|
plan: 01
|
||||||
|
type: execute
|
||||||
|
wave: 1
|
||||||
|
depends_on: []
|
||||||
|
files_modified:
|
||||||
|
- pkg/recon/source.go
|
||||||
|
- pkg/recon/engine.go
|
||||||
|
- pkg/recon/example.go
|
||||||
|
- pkg/recon/engine_test.go
|
||||||
|
autonomous: true
|
||||||
|
requirements: [RECON-INFRA-08]
|
||||||
|
must_haves:
|
||||||
|
truths:
|
||||||
|
- "pkg/recon package compiles with a ReconSource interface"
|
||||||
|
- "Engine.Register adds a source; Engine.List returns registered names"
|
||||||
|
- "Engine.SweepAll fans out to all enabled sources via ants pool and returns aggregated Findings"
|
||||||
|
- "ExampleSource implements ReconSource end-to-end and emits a deterministic fake Finding"
|
||||||
|
artifacts:
|
||||||
|
- path: "pkg/recon/source.go"
|
||||||
|
provides: "ReconSource interface + Finding type alias + Config struct"
|
||||||
|
contains: "type ReconSource interface"
|
||||||
|
- path: "pkg/recon/engine.go"
|
||||||
|
provides: "Engine with Register, List, SweepAll (parallel fanout via ants)"
|
||||||
|
contains: "func (e *Engine) SweepAll"
|
||||||
|
- path: "pkg/recon/example.go"
|
||||||
|
provides: "ExampleSource stub that emits hardcoded findings"
|
||||||
|
contains: "type ExampleSource"
|
||||||
|
- path: "pkg/recon/engine_test.go"
|
||||||
|
provides: "Tests for Register/List/SweepAll with ExampleSource"
|
||||||
|
contains: "func TestSweepAll"
|
||||||
|
key_links:
|
||||||
|
- from: "pkg/recon/engine.go"
|
||||||
|
to: "github.com/panjf2000/ants/v2"
|
||||||
|
via: "parallel source fanout"
|
||||||
|
pattern: "ants\\.NewPool"
|
||||||
|
- from: "pkg/recon/engine.go"
|
||||||
|
to: "pkg/engine.Finding"
|
||||||
|
via: "aliased as recon.Finding for SourceType=\"recon:*\""
|
||||||
|
pattern: "engine\\.Finding"
|
||||||
|
---
|
||||||
|
|
||||||
|
<objective>
|
||||||
|
Create the pkg/recon/ package foundation: ReconSource interface, Engine orchestrator with parallel fanout via ants pool, and an ExampleSource stub that proves the pipeline end-to-end. This is the contract that all later sources (Phases 10-16) will implement.
|
||||||
|
|
||||||
|
Purpose: Establish the interface + engine skeleton so subsequent Wave 1 plans (limiter, stealth, robots) can land in parallel without conflict, and Wave 2 can wire the CLI.
|
||||||
|
Output: pkg/recon/source.go, pkg/recon/engine.go, pkg/recon/example.go, pkg/recon/engine_test.go
|
||||||
|
</objective>
|
||||||
|
|
||||||
|
<execution_context>
|
||||||
|
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
||||||
|
@$HOME/.claude/get-shit-done/templates/summary.md
|
||||||
|
</execution_context>
|
||||||
|
|
||||||
|
<context>
|
||||||
|
@.planning/PROJECT.md
|
||||||
|
@.planning/ROADMAP.md
|
||||||
|
@.planning/STATE.md
|
||||||
|
@.planning/phases/09-osint-infrastructure/09-CONTEXT.md
|
||||||
|
@pkg/engine/engine.go
|
||||||
|
@pkg/engine/finding.go
|
||||||
|
|
||||||
|
<interfaces>
|
||||||
|
<!-- Key types the executor needs. -->
|
||||||
|
|
||||||
|
From pkg/engine/finding.go:
|
||||||
|
```go
|
||||||
|
type Finding struct {
|
||||||
|
ProviderName string
|
||||||
|
KeyValue string
|
||||||
|
KeyMasked string
|
||||||
|
Confidence string
|
||||||
|
Source string
|
||||||
|
SourceType string // existing: "file","git","stdin","url","clipboard". New: "recon:<name>"
|
||||||
|
LineNumber int
|
||||||
|
Offset int64
|
||||||
|
DetectedAt time.Time
|
||||||
|
Verified bool
|
||||||
|
VerifyStatus string
|
||||||
|
VerifyHTTPCode int
|
||||||
|
VerifyMetadata map[string]string
|
||||||
|
VerifyError string
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
ants pool pattern (pkg/engine/engine.go): `ants.NewPool(workers)`, `pool.Submit(func(){...})`, `pool.Release()`, coordinated via `sync.WaitGroup`.
|
||||||
|
</interfaces>
|
||||||
|
</context>
|
||||||
|
|
||||||
|
<tasks>
|
||||||
|
|
||||||
|
<task type="auto" tdd="true">
|
||||||
|
<name>Task 1: Define ReconSource interface and Config</name>
|
||||||
|
<files>pkg/recon/source.go</files>
|
||||||
|
<behavior>
|
||||||
|
- ReconSource interface has methods: Name() string, RateLimit() rate.Limit, Burst() int, RespectsRobots() bool, Enabled(cfg Config) bool, Sweep(ctx, query, out chan<- Finding) error
|
||||||
|
- Finding is a type alias for pkg/engine.Finding so downstream code reuses the existing storage path
|
||||||
|
- Config struct carries Stealth bool, RespectRobots bool, EnabledSources []string, Query string
|
||||||
|
</behavior>
|
||||||
|
<action>
|
||||||
|
Create pkg/recon/source.go with package recon. Import golang.org/x/time/rate, context, and pkg/engine.
|
||||||
|
|
||||||
|
```go
|
||||||
|
package recon
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"golang.org/x/time/rate"
|
||||||
|
"github.com/salvacybersec/keyhunter/pkg/engine"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Finding is the recon package's alias for the canonical engine.Finding.
|
||||||
|
// Recon sources set SourceType = "recon:<source-name>".
|
||||||
|
type Finding = engine.Finding
|
||||||
|
|
||||||
|
// Config controls a recon sweep.
|
||||||
|
type Config struct {
|
||||||
|
Stealth bool
|
||||||
|
RespectRobots bool
|
||||||
|
EnabledSources []string // empty = all
|
||||||
|
Query string
|
||||||
|
}
|
||||||
|
|
||||||
|
// ReconSource is implemented by every OSINT source module (Phases 10-16).
|
||||||
|
// Each source owns its own rate.Limiter constructed from RateLimit()/Burst().
|
||||||
|
type ReconSource interface {
|
||||||
|
Name() string
|
||||||
|
RateLimit() rate.Limit
|
||||||
|
Burst() int
|
||||||
|
RespectsRobots() bool
|
||||||
|
Enabled(cfg Config) bool
|
||||||
|
Sweep(ctx context.Context, query string, out chan<- Finding) error
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Per Config decisions in 09-CONTEXT.md. No external deps beyond golang.org/x/time/rate (already in go.mod) and pkg/engine.
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
<automated>cd /home/salva/Documents/apikey && go build ./pkg/recon/...</automated>
|
||||||
|
</verify>
|
||||||
|
<done>pkg/recon/source.go compiles; ReconSource interface exported; Finding aliased to engine.Finding.</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
<task type="auto" tdd="true">
|
||||||
|
<name>Task 2: Engine with Register/List/SweepAll + ExampleSource + tests</name>
|
||||||
|
<files>pkg/recon/engine.go, pkg/recon/example.go, pkg/recon/engine_test.go</files>
|
||||||
|
<behavior>
|
||||||
|
- Engine.Register(src ReconSource) adds to internal map keyed by Name()
|
||||||
|
- Engine.List() returns sorted source names
|
||||||
|
- Engine.SweepAll(ctx, cfg) runs every enabled source in parallel via ants pool, collects Findings from a shared channel, and returns []Finding. Dedup is NOT done here (Plan 09-03 owns dedup.go); SweepAll just aggregates.
|
||||||
|
- Each source call is wrapped in its own goroutine submitted to ants.Pool; uses sync.WaitGroup to close the out channel after all sources finish
|
||||||
|
- ExampleSource.Name()="example", RateLimit()=rate.Limit(10), Burst()=1, RespectsRobots()=false, Enabled always true, Sweep emits two deterministic Findings with SourceType="recon:example"
|
||||||
|
- TestSweepAll registers ExampleSource, runs SweepAll, asserts exactly 2 findings with SourceType="recon:example"
|
||||||
|
- TestRegisterList asserts List() returns ["example"] after registering
|
||||||
|
</behavior>
|
||||||
|
<action>
|
||||||
|
Create pkg/recon/engine.go:
|
||||||
|
|
||||||
|
```go
|
||||||
|
package recon
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"sort"
|
||||||
|
"sync"
|
||||||
|
|
||||||
|
"github.com/panjf2000/ants/v2"
|
||||||
|
)
|
||||||
|
|
||||||
|
type Engine struct {
|
||||||
|
mu sync.RWMutex
|
||||||
|
sources map[string]ReconSource
|
||||||
|
}
|
||||||
|
|
||||||
|
func NewEngine() *Engine {
|
||||||
|
return &Engine{sources: make(map[string]ReconSource)}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (e *Engine) Register(s ReconSource) {
|
||||||
|
e.mu.Lock()
|
||||||
|
defer e.mu.Unlock()
|
||||||
|
e.sources[s.Name()] = s
|
||||||
|
}
|
||||||
|
|
||||||
|
func (e *Engine) List() []string {
|
||||||
|
e.mu.RLock()
|
||||||
|
defer e.mu.RUnlock()
|
||||||
|
names := make([]string, 0, len(e.sources))
|
||||||
|
for n := range e.sources {
|
||||||
|
names = append(names, n)
|
||||||
|
}
|
||||||
|
sort.Strings(names)
|
||||||
|
return names
|
||||||
|
}
|
||||||
|
|
||||||
|
// SweepAll fans out to every enabled source in parallel via ants pool and
|
||||||
|
// returns aggregated findings. Deduplication is performed by callers using
|
||||||
|
// pkg/recon.Dedup (plan 09-03).
|
||||||
|
func (e *Engine) SweepAll(ctx context.Context, cfg Config) ([]Finding, error) {
|
||||||
|
e.mu.RLock()
|
||||||
|
active := make([]ReconSource, 0, len(e.sources))
|
||||||
|
for _, s := range e.sources {
|
||||||
|
if s.Enabled(cfg) {
|
||||||
|
active = append(active, s)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
e.mu.RUnlock()
|
||||||
|
|
||||||
|
if len(active) == 0 {
|
||||||
|
return nil, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
pool, err := ants.NewPool(len(active))
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
defer pool.Release()
|
||||||
|
|
||||||
|
out := make(chan Finding, 256)
|
||||||
|
var wg sync.WaitGroup
|
||||||
|
for _, s := range active {
|
||||||
|
s := s
|
||||||
|
wg.Add(1)
|
||||||
|
_ = pool.Submit(func() {
|
||||||
|
defer wg.Done()
|
||||||
|
_ = s.Sweep(ctx, cfg.Query, out)
|
||||||
|
})
|
||||||
|
}
|
||||||
|
go func() { wg.Wait(); close(out) }()
|
||||||
|
|
||||||
|
var all []Finding
|
||||||
|
for f := range out {
|
||||||
|
all = append(all, f)
|
||||||
|
}
|
||||||
|
return all, nil
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Create pkg/recon/example.go with an ExampleSource emitting two deterministic Findings (SourceType="recon:example", fake masked keys, distinct Source URLs) to prove the pipeline.
|
||||||
|
|
||||||
|
```go
|
||||||
|
package recon
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"golang.org/x/time/rate"
|
||||||
|
)
|
||||||
|
|
||||||
|
type ExampleSource struct{}
|
||||||
|
|
||||||
|
func (ExampleSource) Name() string { return "example" }
|
||||||
|
func (ExampleSource) RateLimit() rate.Limit { return rate.Limit(10) }
|
||||||
|
func (ExampleSource) Burst() int { return 1 }
|
||||||
|
func (ExampleSource) RespectsRobots() bool { return false }
|
||||||
|
func (ExampleSource) Enabled(_ Config) bool { return true }
|
||||||
|
|
||||||
|
func (ExampleSource) Sweep(ctx context.Context, query string, out chan<- Finding) error {
|
||||||
|
fakes := []Finding{
|
||||||
|
{ProviderName: "openai", KeyMasked: "sk-examp...AAAA", Source: "https://example.invalid/a", SourceType: "recon:example", DetectedAt: time.Now()},
|
||||||
|
{ProviderName: "anthropic", KeyMasked: "sk-ant-e...BBBB", Source: "https://example.invalid/b", SourceType: "recon:example", DetectedAt: time.Now()},
|
||||||
|
}
|
||||||
|
for _, f := range fakes {
|
||||||
|
select {
|
||||||
|
case out <- f:
|
||||||
|
case <-ctx.Done():
|
||||||
|
return ctx.Err()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Create pkg/recon/engine_test.go with TestRegisterList and TestSweepAll using ExampleSource. Use testify require.
|
||||||
|
|
||||||
|
TDD: write tests first, they fail, then implement.
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/ -run 'TestRegisterList|TestSweepAll' -count=1</automated>
|
||||||
|
</verify>
|
||||||
|
<done>Tests pass. Engine registers ExampleSource, SweepAll returns 2 findings with SourceType="recon:example".</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
</tasks>
|
||||||
|
|
||||||
|
<verification>
|
||||||
|
- `go build ./pkg/recon/...` succeeds
|
||||||
|
- `go test ./pkg/recon/ -count=1` passes
|
||||||
|
- `go vet ./pkg/recon/...` clean
|
||||||
|
</verification>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
- ReconSource interface exported
|
||||||
|
- Engine.Register/List/SweepAll implemented and tested
|
||||||
|
- ExampleSource proves end-to-end fanout
|
||||||
|
- No cycles with pkg/engine (recon imports engine, not vice versa)
|
||||||
|
</success_criteria>
|
||||||
|
|
||||||
|
<output>
|
||||||
|
After completion, create `.planning/phases/09-osint-infrastructure/09-01-SUMMARY.md`
|
||||||
|
</output>
|
||||||
|
</content>
|
||||||
147
.planning/phases/09-osint-infrastructure/09-02-PLAN.md
Normal file
147
.planning/phases/09-osint-infrastructure/09-02-PLAN.md
Normal file
@@ -0,0 +1,147 @@
|
|||||||
|
---
|
||||||
|
phase: 09-osint-infrastructure
|
||||||
|
plan: 02
|
||||||
|
type: execute
|
||||||
|
wave: 1
|
||||||
|
depends_on: []
|
||||||
|
files_modified:
|
||||||
|
- pkg/recon/limiter.go
|
||||||
|
- pkg/recon/limiter_test.go
|
||||||
|
autonomous: true
|
||||||
|
requirements: [RECON-INFRA-05]
|
||||||
|
must_haves:
|
||||||
|
truths:
|
||||||
|
- "Each source has its own rate.Limiter — no central limiter"
|
||||||
|
- "limiter.Wait blocks until a token is available, honoring ctx cancellation"
|
||||||
|
- "Jitter delay (100ms-1s) is applied before each request when stealth is enabled"
|
||||||
|
- "LimiterRegistry maps source names to limiters and returns existing limiters on repeat lookup"
|
||||||
|
artifacts:
|
||||||
|
- path: "pkg/recon/limiter.go"
|
||||||
|
provides: "LimiterRegistry with For(name, rate, burst) + Wait with optional jitter"
|
||||||
|
contains: "type LimiterRegistry"
|
||||||
|
- path: "pkg/recon/limiter_test.go"
|
||||||
|
provides: "Tests for per-source isolation, jitter range, ctx cancellation"
|
||||||
|
key_links:
|
||||||
|
- from: "pkg/recon/limiter.go"
|
||||||
|
to: "golang.org/x/time/rate"
|
||||||
|
via: "rate.NewLimiter per source"
|
||||||
|
pattern: "rate\\.NewLimiter"
|
||||||
|
---
|
||||||
|
|
||||||
|
<objective>
|
||||||
|
Implement per-source rate limiter architecture: each source registers its own rate.Limiter keyed by name, and the engine calls Wait() before each request. Optional jitter (100ms-1s) when stealth mode is enabled.
|
||||||
|
|
||||||
|
Purpose: Satisfies RECON-INFRA-05 and guarantees the "every source holds its own limiter — no centralized limiter" success criterion from the roadmap.
|
||||||
|
Output: pkg/recon/limiter.go, pkg/recon/limiter_test.go
|
||||||
|
</objective>
|
||||||
|
|
||||||
|
<execution_context>
|
||||||
|
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
||||||
|
@$HOME/.claude/get-shit-done/templates/summary.md
|
||||||
|
</execution_context>
|
||||||
|
|
||||||
|
<context>
|
||||||
|
@.planning/phases/09-osint-infrastructure/09-CONTEXT.md
|
||||||
|
@go.mod
|
||||||
|
</context>
|
||||||
|
|
||||||
|
<tasks>
|
||||||
|
|
||||||
|
<task type="auto" tdd="true">
|
||||||
|
<name>Task 1: LimiterRegistry with per-source rate.Limiter and jitter</name>
|
||||||
|
<files>pkg/recon/limiter.go, pkg/recon/limiter_test.go</files>
|
||||||
|
<behavior>
|
||||||
|
- LimiterRegistry.For(name string, r rate.Limit, burst int) *rate.Limiter returns the existing limiter for name or creates a new one. Subsequent calls with the same name return the SAME pointer (idempotent).
|
||||||
|
- Wait(ctx, name, r, burst, stealth bool) error calls limiter.Wait(ctx), then if stealth==true sleeps a random duration between 100ms and 1s (respecting ctx).
|
||||||
|
- Per-source isolation: two different names produce two distinct *rate.Limiter instances.
|
||||||
|
- Ctx cancellation during Wait returns ctx.Err() promptly.
|
||||||
|
- Tests:
|
||||||
|
- TestLimiterPerSourceIsolation: registry.For("a", 10, 1) != registry.For("b", 10, 1)
|
||||||
|
- TestLimiterIdempotent: registry.For("a", 10, 1) == registry.For("a", 10, 1) (same pointer)
|
||||||
|
- TestWaitRespectsContext: cancelled ctx returns error
|
||||||
|
- TestJitterRange: with stealth=true, Wait duration is >= 100ms. Use a high rate (1000/s, burst 100) so only jitter contributes.
|
||||||
|
</behavior>
|
||||||
|
<action>
|
||||||
|
Create pkg/recon/limiter.go:
|
||||||
|
|
||||||
|
```go
|
||||||
|
package recon
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"math/rand"
|
||||||
|
"sync"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"golang.org/x/time/rate"
|
||||||
|
)
|
||||||
|
|
||||||
|
// LimiterRegistry holds one *rate.Limiter per source name.
|
||||||
|
// RECON-INFRA-05: each source owns its own limiter — no centralization.
|
||||||
|
type LimiterRegistry struct {
|
||||||
|
mu sync.Mutex
|
||||||
|
limiters map[string]*rate.Limiter
|
||||||
|
}
|
||||||
|
|
||||||
|
func NewLimiterRegistry() *LimiterRegistry {
|
||||||
|
return &LimiterRegistry{limiters: make(map[string]*rate.Limiter)}
|
||||||
|
}
|
||||||
|
|
||||||
|
// For returns the limiter for name, creating it with (r, burst) on first call.
|
||||||
|
// Repeat calls with the same name return the same *rate.Limiter pointer.
|
||||||
|
func (lr *LimiterRegistry) For(name string, r rate.Limit, burst int) *rate.Limiter {
|
||||||
|
lr.mu.Lock()
|
||||||
|
defer lr.mu.Unlock()
|
||||||
|
if l, ok := lr.limiters[name]; ok {
|
||||||
|
return l
|
||||||
|
}
|
||||||
|
l := rate.NewLimiter(r, burst)
|
||||||
|
lr.limiters[name] = l
|
||||||
|
return l
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wait blocks until the source's token is available. If stealth is true,
|
||||||
|
// an additional random jitter between 100ms and 1s is applied to evade
|
||||||
|
// fingerprint detection (RECON-INFRA-06 partial — fully wired in 09-03).
|
||||||
|
func (lr *LimiterRegistry) Wait(ctx context.Context, name string, r rate.Limit, burst int, stealth bool) error {
|
||||||
|
l := lr.For(name, r, burst)
|
||||||
|
if err := l.Wait(ctx); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
if stealth {
|
||||||
|
jitter := time.Duration(100+rand.Intn(900)) * time.Millisecond
|
||||||
|
select {
|
||||||
|
case <-time.After(jitter):
|
||||||
|
case <-ctx.Done():
|
||||||
|
return ctx.Err()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Create pkg/recon/limiter_test.go with the four tests above. Use testify require. For TestJitterRange, call Wait with rate=1000, burst=100, stealth=true, measure elapsed, assert >= 90ms (10ms slack) and <= 1100ms.
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/ -run 'TestLimiter|TestWait|TestJitter' -count=1</automated>
|
||||||
|
</verify>
|
||||||
|
<done>All limiter tests pass; per-source isolation verified; jitter bounded; ctx cancellation honored.</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
</tasks>
|
||||||
|
|
||||||
|
<verification>
|
||||||
|
- `go test ./pkg/recon/ -run Limiter -count=1` passes
|
||||||
|
- `go vet ./pkg/recon/...` clean
|
||||||
|
</verification>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
- LimiterRegistry exported with For and Wait
|
||||||
|
- Each source receives its own *rate.Limiter
|
||||||
|
- Stealth jitter range 100ms-1s enforced
|
||||||
|
</success_criteria>
|
||||||
|
|
||||||
|
<output>
|
||||||
|
After completion, create `.planning/phases/09-osint-infrastructure/09-02-SUMMARY.md`
|
||||||
|
</output>
|
||||||
|
</content>
|
||||||
186
.planning/phases/09-osint-infrastructure/09-03-PLAN.md
Normal file
186
.planning/phases/09-osint-infrastructure/09-03-PLAN.md
Normal file
@@ -0,0 +1,186 @@
|
|||||||
|
---
|
||||||
|
phase: 09-osint-infrastructure
|
||||||
|
plan: 03
|
||||||
|
type: execute
|
||||||
|
wave: 1
|
||||||
|
depends_on: []
|
||||||
|
files_modified:
|
||||||
|
- pkg/recon/stealth.go
|
||||||
|
- pkg/recon/stealth_test.go
|
||||||
|
- pkg/recon/dedup.go
|
||||||
|
- pkg/recon/dedup_test.go
|
||||||
|
autonomous: true
|
||||||
|
requirements: [RECON-INFRA-06]
|
||||||
|
must_haves:
|
||||||
|
truths:
|
||||||
|
- "Stealth mode exposes a UA pool of 10 realistic browser user-agents (Chrome/Firefox/Safari across Linux/macOS/Windows)"
|
||||||
|
- "RandomUserAgent returns a UA from the pool, distributed across calls"
|
||||||
|
- "Dedup drops duplicate Findings keyed by SHA256(provider + masked_key + source)"
|
||||||
|
- "Dedup preserves first-seen order and metadata"
|
||||||
|
artifacts:
|
||||||
|
- path: "pkg/recon/stealth.go"
|
||||||
|
provides: "UA pool, RandomUserAgent, StealthHeaders helper"
|
||||||
|
contains: "var userAgents"
|
||||||
|
- path: "pkg/recon/dedup.go"
|
||||||
|
provides: "Dedup([]Finding) []Finding keyed by sha256(provider|masked|source)"
|
||||||
|
contains: "func Dedup"
|
||||||
|
key_links:
|
||||||
|
- from: "pkg/recon/dedup.go"
|
||||||
|
to: "crypto/sha256"
|
||||||
|
via: "finding hash key"
|
||||||
|
pattern: "sha256\\.Sum256"
|
||||||
|
---
|
||||||
|
|
||||||
|
<objective>
|
||||||
|
Implement stealth mode helpers (UA rotation) and cross-source deduplication. Both are small, self-contained, and unblock the parallel sweep orchestrator from producing noisy duplicate findings.
|
||||||
|
|
||||||
|
Purpose: Satisfies RECON-INFRA-06 (stealth UA rotation) and provides the dedup primitive that SweepAll callers use to satisfy RECON-INFRA-08's "deduplicates findings before persisting" criterion.
|
||||||
|
Output: pkg/recon/stealth.go, pkg/recon/dedup.go, and their tests
|
||||||
|
</objective>
|
||||||
|
|
||||||
|
<execution_context>
|
||||||
|
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
||||||
|
@$HOME/.claude/get-shit-done/templates/summary.md
|
||||||
|
</execution_context>
|
||||||
|
|
||||||
|
<context>
|
||||||
|
@.planning/phases/09-osint-infrastructure/09-CONTEXT.md
|
||||||
|
@pkg/engine/finding.go
|
||||||
|
</context>
|
||||||
|
|
||||||
|
<tasks>
|
||||||
|
|
||||||
|
<task type="auto" tdd="true">
|
||||||
|
<name>Task 1: Stealth UA pool + RandomUserAgent</name>
|
||||||
|
<files>pkg/recon/stealth.go, pkg/recon/stealth_test.go</files>
|
||||||
|
<behavior>
|
||||||
|
- userAgents is an unexported slice of exactly 10 realistic UA strings covering Chrome/Firefox/Safari on Linux/macOS/Windows
|
||||||
|
- RandomUserAgent() returns a random entry from the pool
|
||||||
|
- StealthHeaders() returns map[string]string{"User-Agent": RandomUserAgent(), "Accept-Language": "en-US,en;q=0.9"}
|
||||||
|
- Tests: TestUAPoolSize (== 10), TestRandomUserAgentInPool (returned value is in pool), TestStealthHeadersHasUA
|
||||||
|
</behavior>
|
||||||
|
<action>
|
||||||
|
Create pkg/recon/stealth.go with a package-level `userAgents` slice of 10 realistic UAs. Include at least:
|
||||||
|
- Chrome 120 Windows
|
||||||
|
- Chrome 120 macOS
|
||||||
|
- Chrome 120 Linux
|
||||||
|
- Firefox 121 Windows
|
||||||
|
- Firefox 121 macOS
|
||||||
|
- Firefox 121 Linux
|
||||||
|
- Safari 17 macOS
|
||||||
|
- Safari 17 iOS
|
||||||
|
- Edge 120 Windows
|
||||||
|
- Chrome Android
|
||||||
|
|
||||||
|
```go
|
||||||
|
package recon
|
||||||
|
|
||||||
|
import "math/rand"
|
||||||
|
|
||||||
|
var userAgents = []string{
|
||||||
|
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
|
||||||
|
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
|
||||||
|
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
|
||||||
|
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0",
|
||||||
|
"Mozilla/5.0 (Macintosh; Intel Mac OS X 14.2; rv:121.0) Gecko/20100101 Firefox/121.0",
|
||||||
|
"Mozilla/5.0 (X11; Linux x86_64; rv:121.0) Gecko/20100101 Firefox/121.0",
|
||||||
|
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15",
|
||||||
|
"Mozilla/5.0 (iPhone; CPU iPhone OS 17_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Mobile/15E148 Safari/604.1",
|
||||||
|
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.2210.61",
|
||||||
|
"Mozilla/5.0 (Linux; Android 14; Pixel 8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Mobile Safari/537.36",
|
||||||
|
}
|
||||||
|
|
||||||
|
// RandomUserAgent returns a random browser user-agent from the pool.
|
||||||
|
// Used when Config.Stealth is true.
|
||||||
|
func RandomUserAgent() string {
|
||||||
|
return userAgents[rand.Intn(len(userAgents))]
|
||||||
|
}
|
||||||
|
|
||||||
|
// StealthHeaders returns a minimal headers map with rotated UA and Accept-Language.
|
||||||
|
func StealthHeaders() map[string]string {
|
||||||
|
return map[string]string{
|
||||||
|
"User-Agent": RandomUserAgent(),
|
||||||
|
"Accept-Language": "en-US,en;q=0.9",
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Create pkg/recon/stealth_test.go with the three tests. TestRandomUserAgentInPool should loop 100 times and assert each result is present in the `userAgents` slice.
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/ -run 'TestUAPool|TestRandomUserAgent|TestStealthHeaders' -count=1</automated>
|
||||||
|
</verify>
|
||||||
|
<done>Tests pass. Pool has exactly 10 UAs. Random selection always within pool.</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
<task type="auto" tdd="true">
|
||||||
|
<name>Task 2: Cross-source finding dedup</name>
|
||||||
|
<files>pkg/recon/dedup.go, pkg/recon/dedup_test.go</files>
|
||||||
|
<behavior>
|
||||||
|
- Dedup(in []Finding) []Finding drops duplicates keyed by sha256(ProviderName + "|" + KeyMasked + "|" + Source)
|
||||||
|
- First-seen wins: returned slice preserves the first occurrence's metadata (SourceType, DetectedAt, etc.)
|
||||||
|
- Order is preserved from the input (stable dedup)
|
||||||
|
- Nil/empty input returns nil
|
||||||
|
- Tests:
|
||||||
|
- TestDedupEmpty: Dedup(nil) == nil
|
||||||
|
- TestDedupNoDuplicates: 3 distinct findings -> 3 returned
|
||||||
|
- TestDedupAllDuplicates: 3 identical findings -> 1 returned
|
||||||
|
- TestDedupPreservesFirstSeen: two findings with same key, different DetectedAt — the first-seen timestamp wins
|
||||||
|
- TestDedupDifferentSource: same provider/masked, different Source URLs -> both kept
|
||||||
|
</behavior>
|
||||||
|
<action>
|
||||||
|
Create pkg/recon/dedup.go:
|
||||||
|
|
||||||
|
```go
|
||||||
|
package recon
|
||||||
|
|
||||||
|
import (
|
||||||
|
"crypto/sha256"
|
||||||
|
"encoding/hex"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Dedup removes duplicate findings using SHA256(provider|masked|source) as key.
|
||||||
|
// Stable: preserves input order and first-seen metadata.
|
||||||
|
func Dedup(in []Finding) []Finding {
|
||||||
|
if len(in) == 0 {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
seen := make(map[string]struct{}, len(in))
|
||||||
|
out := make([]Finding, 0, len(in))
|
||||||
|
for _, f := range in {
|
||||||
|
h := sha256.Sum256([]byte(f.ProviderName + "|" + f.KeyMasked + "|" + f.Source))
|
||||||
|
k := hex.EncodeToString(h[:])
|
||||||
|
if _, dup := seen[k]; dup {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
seen[k] = struct{}{}
|
||||||
|
out = append(out, f)
|
||||||
|
}
|
||||||
|
return out
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Create pkg/recon/dedup_test.go with the five tests. Use testify require.
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/ -run TestDedup -count=1</automated>
|
||||||
|
</verify>
|
||||||
|
<done>All dedup tests pass. First-seen wins. Different Source URLs are kept separate.</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
</tasks>
|
||||||
|
|
||||||
|
<verification>
|
||||||
|
- `go test ./pkg/recon/ -run 'TestUAPool|TestRandom|TestStealth|TestDedup' -count=1` passes
|
||||||
|
- `go vet ./pkg/recon/...` clean
|
||||||
|
</verification>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
- Stealth UA pool (10 entries) exported via RandomUserAgent/StealthHeaders
|
||||||
|
- Dedup primitive removes duplicates stably by sha256(provider|masked|source)
|
||||||
|
</success_criteria>
|
||||||
|
|
||||||
|
<output>
|
||||||
|
After completion, create `.planning/phases/09-osint-infrastructure/09-03-SUMMARY.md`
|
||||||
|
</output>
|
||||||
|
</content>
|
||||||
196
.planning/phases/09-osint-infrastructure/09-04-PLAN.md
Normal file
196
.planning/phases/09-osint-infrastructure/09-04-PLAN.md
Normal file
@@ -0,0 +1,196 @@
|
|||||||
|
---
|
||||||
|
phase: 09-osint-infrastructure
|
||||||
|
plan: 04
|
||||||
|
type: execute
|
||||||
|
wave: 1
|
||||||
|
depends_on: []
|
||||||
|
files_modified:
|
||||||
|
- pkg/recon/robots.go
|
||||||
|
- pkg/recon/robots_test.go
|
||||||
|
- go.mod
|
||||||
|
- go.sum
|
||||||
|
autonomous: true
|
||||||
|
requirements: [RECON-INFRA-07]
|
||||||
|
must_haves:
|
||||||
|
truths:
|
||||||
|
- "pkg/recon.RobotsCache parses and caches robots.txt per host for 1 hour"
|
||||||
|
- "Allowed(host, path) returns true if robots.txt permits `keyhunter` UA on that path"
|
||||||
|
- "Cache hit avoids a second HTTP fetch for the same host within TTL"
|
||||||
|
- "Network errors degrade safely: default-allow (so a broken robots.txt fetch does not silently block sweeps)"
|
||||||
|
artifacts:
|
||||||
|
- path: "pkg/recon/robots.go"
|
||||||
|
provides: "RobotsCache with Allowed(ctx, url) bool + 1h per-host TTL"
|
||||||
|
contains: "type RobotsCache"
|
||||||
|
- path: "pkg/recon/robots_test.go"
|
||||||
|
provides: "Tests for parse/allowed/disallowed/cache-hit/network-fail"
|
||||||
|
key_links:
|
||||||
|
- from: "pkg/recon/robots.go"
|
||||||
|
to: "github.com/temoto/robotstxt"
|
||||||
|
via: "robotstxt.FromBytes"
|
||||||
|
pattern: "robotstxt\\."
|
||||||
|
---
|
||||||
|
|
||||||
|
<objective>
|
||||||
|
Add robots.txt parser and per-host cache for web-scraping sources. Satisfies RECON-INFRA-07 ("`keyhunter recon full --respect-robots` respects robots.txt for web-scraping sources before making any requests"). Only sources with RespectsRobots()==true consult the cache.
|
||||||
|
|
||||||
|
Purpose: Foundation for every later web-scraping source (Phase 11 paste, Phase 15 forums, etc.). Adds github.com/temoto/robotstxt dependency.
|
||||||
|
Output: pkg/recon/robots.go, pkg/recon/robots_test.go, go.mod/go.sum updated
|
||||||
|
</objective>
|
||||||
|
|
||||||
|
<execution_context>
|
||||||
|
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
||||||
|
@$HOME/.claude/get-shit-done/templates/summary.md
|
||||||
|
</execution_context>
|
||||||
|
|
||||||
|
<context>
|
||||||
|
@.planning/phases/09-osint-infrastructure/09-CONTEXT.md
|
||||||
|
@go.mod
|
||||||
|
</context>
|
||||||
|
|
||||||
|
<tasks>
|
||||||
|
|
||||||
|
<task type="auto">
|
||||||
|
<name>Task 1: Add temoto/robotstxt dependency</name>
|
||||||
|
<files>go.mod, go.sum</files>
|
||||||
|
<action>
|
||||||
|
Run `go get github.com/temoto/robotstxt@latest` from the repo root. This updates go.mod and go.sum. Do NOT run `go mod tidy` yet — downstream tasks in this plan consume the dep and tidy will fail if tests are not written. Prefer `go mod download github.com/temoto/robotstxt` if only population of go.sum is needed, but `go get` is canonical.
|
||||||
|
|
||||||
|
Verify the dep appears in go.mod `require` block.
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
<automated>cd /home/salva/Documents/apikey && grep -q "github.com/temoto/robotstxt" go.mod</automated>
|
||||||
|
</verify>
|
||||||
|
<done>go.mod contains github.com/temoto/robotstxt; go.sum populated.</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
<task type="auto" tdd="true">
|
||||||
|
<name>Task 2: RobotsCache with 1h TTL and default-allow on error</name>
|
||||||
|
<files>pkg/recon/robots.go, pkg/recon/robots_test.go</files>
|
||||||
|
<behavior>
|
||||||
|
- RobotsCache.Allowed(ctx, rawURL) (bool, error): parse URL -> host, fetch https://host/robots.txt (or use injected http.Client for tests), cache parsed result for 1 hour per host
|
||||||
|
- UA used for matching is "keyhunter"
|
||||||
|
- On fetch error or parse error: return true, nil (default-allow) so a broken robots endpoint does not silently disable a recon source
|
||||||
|
- Cache key is host (not full URL)
|
||||||
|
- Second call for same host within TTL does NOT trigger another HTTP request
|
||||||
|
- Tests use httptest.Server to serve robots.txt and inject a custom http.Client via RobotsCache.Client field
|
||||||
|
- Tests:
|
||||||
|
- TestRobotsAllowed: robots.txt says "User-agent: * / Disallow:" and path /public -> Allowed returns true
|
||||||
|
- TestRobotsDisallowed: robots.txt says "User-agent: * / Disallow: /private" and path /private -> false
|
||||||
|
- TestRobotsCacheHit: after first call, second call hits cache (use an atomic counter in the httptest handler and assert count == 1)
|
||||||
|
- TestRobotsNetworkError: server returns 500 -> Allowed returns true (default-allow)
|
||||||
|
- TestRobotsUAKeyhunter: robots.txt has "User-agent: keyhunter / Disallow: /blocked" -> path /blocked returns false
|
||||||
|
</behavior>
|
||||||
|
<action>
|
||||||
|
Create pkg/recon/robots.go:
|
||||||
|
|
||||||
|
```go
|
||||||
|
package recon
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"io"
|
||||||
|
"net/http"
|
||||||
|
"net/url"
|
||||||
|
"sync"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"github.com/temoto/robotstxt"
|
||||||
|
)
|
||||||
|
|
||||||
|
const (
|
||||||
|
robotsTTL = 1 * time.Hour
|
||||||
|
robotsUA = "keyhunter"
|
||||||
|
)
|
||||||
|
|
||||||
|
type robotsEntry struct {
|
||||||
|
data *robotstxt.RobotsData
|
||||||
|
fetched time.Time
|
||||||
|
}
|
||||||
|
|
||||||
|
// RobotsCache fetches and caches per-host robots.txt for 1 hour.
|
||||||
|
// Sources whose RespectsRobots() returns true should call Allowed before each request.
|
||||||
|
type RobotsCache struct {
|
||||||
|
mu sync.Mutex
|
||||||
|
cache map[string]robotsEntry
|
||||||
|
Client *http.Client // nil -> http.DefaultClient
|
||||||
|
}
|
||||||
|
|
||||||
|
func NewRobotsCache() *RobotsCache {
|
||||||
|
return &RobotsCache{cache: make(map[string]robotsEntry)}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Allowed reports whether `keyhunter` may fetch rawURL per the host's robots.txt.
|
||||||
|
// On fetch/parse error the function returns true (default-allow) to avoid silently
|
||||||
|
// disabling recon sources when a site has a broken robots endpoint.
|
||||||
|
func (rc *RobotsCache) Allowed(ctx context.Context, rawURL string) (bool, error) {
|
||||||
|
u, err := url.Parse(rawURL)
|
||||||
|
if err != nil {
|
||||||
|
return true, nil
|
||||||
|
}
|
||||||
|
host := u.Host
|
||||||
|
|
||||||
|
rc.mu.Lock()
|
||||||
|
entry, ok := rc.cache[host]
|
||||||
|
if ok && time.Since(entry.fetched) < robotsTTL {
|
||||||
|
rc.mu.Unlock()
|
||||||
|
return entry.data.TestAgent(u.Path, robotsUA), nil
|
||||||
|
}
|
||||||
|
rc.mu.Unlock()
|
||||||
|
|
||||||
|
client := rc.Client
|
||||||
|
if client == nil {
|
||||||
|
client = http.DefaultClient
|
||||||
|
}
|
||||||
|
req, _ := http.NewRequestWithContext(ctx, "GET", u.Scheme+"://"+host+"/robots.txt", nil)
|
||||||
|
resp, err := client.Do(req)
|
||||||
|
if err != nil {
|
||||||
|
return true, nil // default-allow on network error
|
||||||
|
}
|
||||||
|
defer resp.Body.Close()
|
||||||
|
if resp.StatusCode >= 400 {
|
||||||
|
return true, nil // default-allow on 4xx/5xx
|
||||||
|
}
|
||||||
|
body, err := io.ReadAll(resp.Body)
|
||||||
|
if err != nil {
|
||||||
|
return true, nil
|
||||||
|
}
|
||||||
|
data, err := robotstxt.FromBytes(body)
|
||||||
|
if err != nil {
|
||||||
|
return true, nil
|
||||||
|
}
|
||||||
|
rc.mu.Lock()
|
||||||
|
rc.cache[host] = robotsEntry{data: data, fetched: time.Now()}
|
||||||
|
rc.mu.Unlock()
|
||||||
|
return data.TestAgent(u.Path, robotsUA), nil
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Create pkg/recon/robots_test.go using httptest.NewServer. Inject the test server's client into RobotsCache.Client (use `server.Client()`). For TestRobotsCacheHit, use `atomic.Int32` incremented inside the handler.
|
||||||
|
|
||||||
|
Note on test URL: since httptest.Server has a dynamic host, build rawURL from `server.URL + "/public"`. The cache key will be the httptest host:port — both calls share the same host, so cache hit is testable.
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/ -run TestRobots -count=1</automated>
|
||||||
|
</verify>
|
||||||
|
<done>All 5 robots tests pass. Cache hit verified via request counter. Default-allow on 500 verified.</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
</tasks>
|
||||||
|
|
||||||
|
<verification>
|
||||||
|
- `go test ./pkg/recon/ -run TestRobots -count=1` passes
|
||||||
|
- `go build ./...` passes (robotstxt dep resolved)
|
||||||
|
- `go vet ./pkg/recon/...` clean
|
||||||
|
</verification>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
- RobotsCache implemented with 1h TTL
|
||||||
|
- UA "keyhunter" matching
|
||||||
|
- Default-allow on network/parse errors
|
||||||
|
- github.com/temoto/robotstxt added to go.mod
|
||||||
|
</success_criteria>
|
||||||
|
|
||||||
|
<output>
|
||||||
|
After completion, create `.planning/phases/09-osint-infrastructure/09-04-SUMMARY.md`
|
||||||
|
</output>
|
||||||
|
</content>
|
||||||
185
.planning/phases/09-osint-infrastructure/09-05-PLAN.md
Normal file
185
.planning/phases/09-osint-infrastructure/09-05-PLAN.md
Normal file
@@ -0,0 +1,185 @@
|
|||||||
|
---
|
||||||
|
phase: 09-osint-infrastructure
|
||||||
|
plan: 05
|
||||||
|
type: execute
|
||||||
|
wave: 2
|
||||||
|
depends_on: ["09-01", "09-02", "09-03", "09-04"]
|
||||||
|
files_modified:
|
||||||
|
- cmd/recon.go
|
||||||
|
- cmd/stubs.go
|
||||||
|
- cmd/root.go
|
||||||
|
autonomous: true
|
||||||
|
requirements: [RECON-INFRA-08]
|
||||||
|
must_haves:
|
||||||
|
truths:
|
||||||
|
- "`keyhunter recon full` runs Engine.SweepAll + Dedup and prints a masked findings table"
|
||||||
|
- "`keyhunter recon list` prints the registered source names one per line"
|
||||||
|
- "--stealth, --respect-robots (default true), --query flags exist on `recon full`"
|
||||||
|
- "ExampleSource is registered at init() so Phase 9 ships a demonstrable pipeline"
|
||||||
|
- "The stub reconCmd in cmd/stubs.go is removed; cmd/recon.go owns the command tree"
|
||||||
|
artifacts:
|
||||||
|
- path: "cmd/recon.go"
|
||||||
|
provides: "reconCmd with subcommands `full` and `list`, flag wiring, source registration"
|
||||||
|
contains: "var reconCmd"
|
||||||
|
- path: "cmd/stubs.go"
|
||||||
|
provides: "reconCmd stub removed; other stubs unchanged"
|
||||||
|
key_links:
|
||||||
|
- from: "cmd/recon.go"
|
||||||
|
to: "pkg/recon.Engine"
|
||||||
|
via: "NewEngine + Register(ExampleSource{}) + SweepAll"
|
||||||
|
pattern: "recon\\.NewEngine"
|
||||||
|
- from: "cmd/recon.go"
|
||||||
|
to: "pkg/recon.Dedup"
|
||||||
|
via: "Dedup applied to SweepAll results before printing"
|
||||||
|
pattern: "recon\\.Dedup"
|
||||||
|
---
|
||||||
|
|
||||||
|
<objective>
|
||||||
|
Wire the recon package into the Cobra CLI with `keyhunter recon full` and `keyhunter recon list`. Remove the stub reconCmd from cmd/stubs.go. Register ExampleSource at init() so `recon full` produces visible output end-to-end on a fresh clone.
|
||||||
|
|
||||||
|
Purpose: Satisfies RECON-INFRA-08 "Recon full command — parallel sweep across all sources with deduplication". Completes the phase's user-facing entrypoint.
|
||||||
|
Output: cmd/recon.go (new), cmd/stubs.go (stub removed), cmd/root.go (registration unchanged or updated)
|
||||||
|
</objective>
|
||||||
|
|
||||||
|
<execution_context>
|
||||||
|
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
||||||
|
@$HOME/.claude/get-shit-done/templates/summary.md
|
||||||
|
</execution_context>
|
||||||
|
|
||||||
|
<context>
|
||||||
|
@.planning/phases/09-osint-infrastructure/09-CONTEXT.md
|
||||||
|
@cmd/stubs.go
|
||||||
|
@.planning/phases/09-osint-infrastructure/09-01-SUMMARY.md
|
||||||
|
@.planning/phases/09-osint-infrastructure/09-03-SUMMARY.md
|
||||||
|
</context>
|
||||||
|
|
||||||
|
<tasks>
|
||||||
|
|
||||||
|
<task type="auto">
|
||||||
|
<name>Task 1: Remove reconCmd stub from cmd/stubs.go</name>
|
||||||
|
<files>cmd/stubs.go</files>
|
||||||
|
<action>
|
||||||
|
Delete the `var reconCmd = &cobra.Command{...}` block from cmd/stubs.go. Leave verifyCmd, serveCmd, scheduleCmd untouched. The real reconCmd will be declared in cmd/recon.go (Task 2).
|
||||||
|
|
||||||
|
Verify cmd/root.go still references `reconCmd` — it will resolve to the new declaration in cmd/recon.go (same package `cmd`).
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
<automated>cd /home/salva/Documents/apikey && ! grep -q 'var reconCmd' cmd/stubs.go</automated>
|
||||||
|
</verify>
|
||||||
|
<done>cmd/stubs.go no longer declares reconCmd; file still compiles with other stubs.</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
<task type="auto">
|
||||||
|
<name>Task 2: Create cmd/recon.go with full and list subcommands</name>
|
||||||
|
<files>cmd/recon.go</files>
|
||||||
|
<action>
|
||||||
|
Create cmd/recon.go declaring `var reconCmd` plus subcommands `reconFullCmd` and `reconListCmd`. Flag wiring:
|
||||||
|
- `--stealth` bool, default false
|
||||||
|
- `--respect-robots` bool, default true
|
||||||
|
- `--query` string, default "" (empty -> sources use their own default keywords)
|
||||||
|
|
||||||
|
```go
|
||||||
|
package cmd
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"fmt"
|
||||||
|
|
||||||
|
"github.com/salvacybersec/keyhunter/pkg/recon"
|
||||||
|
"github.com/spf13/cobra"
|
||||||
|
)
|
||||||
|
|
||||||
|
var (
|
||||||
|
reconStealth bool
|
||||||
|
reconRespectRobots bool
|
||||||
|
reconQuery string
|
||||||
|
)
|
||||||
|
|
||||||
|
var reconCmd = &cobra.Command{
|
||||||
|
Use: "recon",
|
||||||
|
Short: "Run OSINT recon across internet sources",
|
||||||
|
Long: "Run OSINT recon sweeps across registered sources. Phase 9 ships with an ExampleSource stub; real sources land in Phases 10-16.",
|
||||||
|
}
|
||||||
|
|
||||||
|
var reconFullCmd = &cobra.Command{
|
||||||
|
Use: "full",
|
||||||
|
Short: "Sweep all enabled sources in parallel and deduplicate findings",
|
||||||
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
|
eng := buildReconEngine()
|
||||||
|
cfg := recon.Config{
|
||||||
|
Stealth: reconStealth,
|
||||||
|
RespectRobots: reconRespectRobots,
|
||||||
|
Query: reconQuery,
|
||||||
|
}
|
||||||
|
ctx := context.Background()
|
||||||
|
all, err := eng.SweepAll(ctx, cfg)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("recon sweep: %w", err)
|
||||||
|
}
|
||||||
|
deduped := recon.Dedup(all)
|
||||||
|
fmt.Printf("recon: swept %d sources, %d findings (%d after dedup)\n", len(eng.List()), len(all), len(deduped))
|
||||||
|
for _, f := range deduped {
|
||||||
|
fmt.Printf(" [%s] %s %s %s\n", f.SourceType, f.ProviderName, f.KeyMasked, f.Source)
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
var reconListCmd = &cobra.Command{
|
||||||
|
Use: "list",
|
||||||
|
Short: "List registered recon sources",
|
||||||
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
|
eng := buildReconEngine()
|
||||||
|
for _, name := range eng.List() {
|
||||||
|
fmt.Println(name)
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
// buildReconEngine constructs the recon Engine with all sources registered.
|
||||||
|
// Phase 9 ships ExampleSource only; Phases 10-16 will add real sources here
|
||||||
|
// (or via a registration side-effect in their packages).
|
||||||
|
func buildReconEngine() *recon.Engine {
|
||||||
|
e := recon.NewEngine()
|
||||||
|
e.Register(recon.ExampleSource{})
|
||||||
|
return e
|
||||||
|
}
|
||||||
|
|
||||||
|
func init() {
|
||||||
|
reconFullCmd.Flags().BoolVar(&reconStealth, "stealth", false, "enable UA rotation and jitter delays")
|
||||||
|
reconFullCmd.Flags().BoolVar(&reconRespectRobots, "respect-robots", true, "respect robots.txt for web-scraping sources")
|
||||||
|
reconFullCmd.Flags().StringVar(&reconQuery, "query", "", "override query sent to each source")
|
||||||
|
reconCmd.AddCommand(reconFullCmd)
|
||||||
|
reconCmd.AddCommand(reconListCmd)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Do NOT modify cmd/root.go unless `rootCmd.AddCommand(reconCmd)` is missing. (It currently exists because the stub was registered there.)
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
<automated>cd /home/salva/Documents/apikey && go build ./... && go run . recon list | grep -q '^example$'</automated>
|
||||||
|
</verify>
|
||||||
|
<done>`keyhunter recon list` prints "example". `keyhunter recon full` prints 2 findings from ExampleSource with "recon: swept 1 sources, 2 findings (2 after dedup)".</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
</tasks>
|
||||||
|
|
||||||
|
<verification>
|
||||||
|
- `go build ./...` succeeds
|
||||||
|
- `go run . recon list` prints `example`
|
||||||
|
- `go run . recon full` prints "recon: swept 1 sources, 2 findings (2 after dedup)" and two lines with [recon:example]
|
||||||
|
- `go run . recon full --stealth --query=test` runs without error
|
||||||
|
</verification>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
- reconCmd owned by cmd/recon.go, stub removed from cmd/stubs.go
|
||||||
|
- `recon full` and `recon list` subcommands work end-to-end
|
||||||
|
- Dedup wired through the SweepAll result
|
||||||
|
- Flags --stealth, --respect-robots (default true), --query all parse
|
||||||
|
</success_criteria>
|
||||||
|
|
||||||
|
<output>
|
||||||
|
After completion, create `.planning/phases/09-osint-infrastructure/09-05-SUMMARY.md`
|
||||||
|
</output>
|
||||||
|
</content>
|
||||||
136
.planning/phases/09-osint-infrastructure/09-06-PLAN.md
Normal file
136
.planning/phases/09-osint-infrastructure/09-06-PLAN.md
Normal file
@@ -0,0 +1,136 @@
|
|||||||
|
---
|
||||||
|
phase: 09-osint-infrastructure
|
||||||
|
plan: 06
|
||||||
|
type: execute
|
||||||
|
wave: 2
|
||||||
|
depends_on: ["09-01", "09-02", "09-03", "09-04", "09-05"]
|
||||||
|
files_modified:
|
||||||
|
- pkg/recon/integration_test.go
|
||||||
|
- .planning/phases/09-osint-infrastructure/09-PHASE-SUMMARY.md
|
||||||
|
autonomous: true
|
||||||
|
requirements: [RECON-INFRA-05, RECON-INFRA-06, RECON-INFRA-07, RECON-INFRA-08]
|
||||||
|
must_haves:
|
||||||
|
truths:
|
||||||
|
- "Integration test exercises Engine + LimiterRegistry + Dedup together against a synthetic source that emits duplicates"
|
||||||
|
- "Integration test verifies --stealth path calls RandomUserAgent without errors"
|
||||||
|
- "Integration test verifies RobotsCache.Allowed is invoked only when RespectsRobots()==true"
|
||||||
|
- "Phase summary documents all 4 requirement IDs as complete"
|
||||||
|
artifacts:
|
||||||
|
- path: "pkg/recon/integration_test.go"
|
||||||
|
provides: "End-to-end test: Engine + Limiter + Stealth + Robots + Dedup"
|
||||||
|
contains: "func TestReconPipelineIntegration"
|
||||||
|
- path: ".planning/phases/09-osint-infrastructure/09-PHASE-SUMMARY.md"
|
||||||
|
provides: "Phase completion summary with requirement ID coverage and next-phase guidance"
|
||||||
|
key_links:
|
||||||
|
- from: "pkg/recon/integration_test.go"
|
||||||
|
to: "pkg/recon.Engine + LimiterRegistry + RobotsCache + Dedup"
|
||||||
|
via: "TestReconPipelineIntegration wires all four together"
|
||||||
|
pattern: "TestReconPipelineIntegration"
|
||||||
|
---
|
||||||
|
|
||||||
|
<objective>
|
||||||
|
Phase 9 integration test + phase summary. Proves the four recon infra components compose correctly before Phases 10-16 start building sources on top, and documents completion for roadmap tracking.
|
||||||
|
|
||||||
|
Purpose: Final safety net for the phase. Catches cross-component bugs (e.g., limiter deadlock, dedup hash collision, robots TTL leak) that unit tests on individual files miss.
|
||||||
|
Output: pkg/recon/integration_test.go, .planning/phases/09-osint-infrastructure/09-PHASE-SUMMARY.md
|
||||||
|
</objective>
|
||||||
|
|
||||||
|
<execution_context>
|
||||||
|
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
||||||
|
@$HOME/.claude/get-shit-done/templates/summary.md
|
||||||
|
</execution_context>
|
||||||
|
|
||||||
|
<context>
|
||||||
|
@.planning/phases/09-osint-infrastructure/09-CONTEXT.md
|
||||||
|
@.planning/phases/09-osint-infrastructure/09-01-SUMMARY.md
|
||||||
|
@.planning/phases/09-osint-infrastructure/09-02-SUMMARY.md
|
||||||
|
@.planning/phases/09-osint-infrastructure/09-03-SUMMARY.md
|
||||||
|
@.planning/phases/09-osint-infrastructure/09-04-SUMMARY.md
|
||||||
|
@.planning/phases/09-osint-infrastructure/09-05-SUMMARY.md
|
||||||
|
</context>
|
||||||
|
|
||||||
|
<tasks>
|
||||||
|
|
||||||
|
<task type="auto" tdd="true">
|
||||||
|
<name>Task 1: End-to-end integration test</name>
|
||||||
|
<files>pkg/recon/integration_test.go</files>
|
||||||
|
<behavior>
|
||||||
|
- Define a local TestSource struct (in the _test.go file) that:
|
||||||
|
- Name() returns "test"
|
||||||
|
- RateLimit() returns rate.Limit(100), Burst() returns 10
|
||||||
|
- RespectsRobots() returns false
|
||||||
|
- Enabled returns true
|
||||||
|
- Sweep emits 5 Findings, 2 of which are exact duplicates (same provider+masked+source)
|
||||||
|
- TestReconPipelineIntegration:
|
||||||
|
- Construct Engine, Register TestSource
|
||||||
|
- Construct LimiterRegistry and call Wait("test", 100, 10, true) once to verify jitter path does not panic
|
||||||
|
- Call Engine.SweepAll(ctx, Config{Stealth: true})
|
||||||
|
- Assert len(findings) == 5 (raw), len(Dedup(findings)) == 4 (after dedup)
|
||||||
|
- Assert every finding has SourceType starting with "recon:"
|
||||||
|
- TestRobotsOnlyWhenRespectsRobots:
|
||||||
|
- Create two sources: webSource (RespectsRobots true) and apiSource (RespectsRobots false)
|
||||||
|
- Verify that a RobotsCache call path is only exercised for webSource (use a counter via a shim: the test can simulate this by manually invoking RobotsCache.Allowed for webSource before calling webSource.Sweep, and asserting apiSource path skips it)
|
||||||
|
- This is a documentation-style test; minimal logic: assert `webSource.RespectsRobots() == true && apiSource.RespectsRobots() == false`, then assert RobotsCache.Allowed works when called, and is never called when RespectsRobots returns false (trivially satisfied by not invoking it).
|
||||||
|
</behavior>
|
||||||
|
<action>
|
||||||
|
Create pkg/recon/integration_test.go. Declare testSource and testWebSource structs within the test file. Use `httptest.NewServer` for the robots portion, serving "User-agent: *\nAllow: /\n".
|
||||||
|
|
||||||
|
The test should import pkg/recon-internal identifiers directly (same package `recon`, not `recon_test`) so it can access all exported symbols.
|
||||||
|
|
||||||
|
Assertions via testify require:
|
||||||
|
- require.Equal(t, 5, len(raw))
|
||||||
|
- require.Equal(t, 4, len(recon.Dedup(raw)))
|
||||||
|
- require.Equal(t, "recon:test", raw[0].SourceType)
|
||||||
|
- require.NoError(t, limiter.Wait(ctx, "test", rate.Limit(100), 10, true))
|
||||||
|
- require.True(t, webSource.RespectsRobots())
|
||||||
|
- require.False(t, apiSource.RespectsRobots())
|
||||||
|
- allowed, err := rc.Allowed(ctx, server.URL+"/foo"); require.NoError(t, err); require.True(t, allowed)
|
||||||
|
|
||||||
|
Per RECON-INFRA-05/06/07/08 — each requirement has at least one assertion in this integration test.
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/ -run 'TestReconPipelineIntegration|TestRobotsOnlyWhenRespectsRobots' -count=1</automated>
|
||||||
|
</verify>
|
||||||
|
<done>Integration test passes. All 4 RECON-INFRA requirement IDs have at least one assertion covering them.</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
<task type="auto">
|
||||||
|
<name>Task 2: Write 09-PHASE-SUMMARY.md</name>
|
||||||
|
<files>.planning/phases/09-osint-infrastructure/09-PHASE-SUMMARY.md</files>
|
||||||
|
<action>
|
||||||
|
Create the phase summary documenting:
|
||||||
|
- Requirements closed: RECON-INFRA-05, RECON-INFRA-06, RECON-INFRA-07, RECON-INFRA-08 (all 4)
|
||||||
|
- Key artifacts: pkg/recon/{source,engine,limiter,stealth,dedup,robots,example}.go + tests
|
||||||
|
- CLI surface: `keyhunter recon full`, `keyhunter recon list`
|
||||||
|
- Decisions adopted: per-source limiter (no centralization), default-allow on robots fetch failure, dedup by sha256(provider|masked|source), UA pool of 10
|
||||||
|
- New dependency: github.com/temoto/robotstxt
|
||||||
|
- Handoff to Phase 10: all real sources implement ReconSource interface and register via `buildReconEngine()` in cmd/recon.go (or ideally via package init side-effects once the pattern is established in Phase 10)
|
||||||
|
- Known gaps deferred: proxy/TOR (out of scope), per-source retry (each source handles own retries), distributed rate limiting (out of scope)
|
||||||
|
|
||||||
|
Follow the standard SUMMARY.md template from @$HOME/.claude/get-shit-done/templates/summary.md.
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
<automated>test -s /home/salva/Documents/apikey/.planning/phases/09-osint-infrastructure/09-PHASE-SUMMARY.md && grep -q "RECON-INFRA-05" /home/salva/Documents/apikey/.planning/phases/09-osint-infrastructure/09-PHASE-SUMMARY.md && grep -q "RECON-INFRA-08" /home/salva/Documents/apikey/.planning/phases/09-osint-infrastructure/09-PHASE-SUMMARY.md</automated>
|
||||||
|
</verify>
|
||||||
|
<done>09-PHASE-SUMMARY.md exists, non-empty, names all 4 requirement IDs.</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
</tasks>
|
||||||
|
|
||||||
|
<verification>
|
||||||
|
- `go test ./pkg/recon/... -count=1` passes (all unit + integration)
|
||||||
|
- `go build ./...` passes
|
||||||
|
- `go vet ./...` clean
|
||||||
|
- 09-PHASE-SUMMARY.md exists with all 4 RECON-INFRA IDs
|
||||||
|
</verification>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
- Integration test proves Engine + Limiter + Stealth + Robots + Dedup compose correctly
|
||||||
|
- Phase summary documents completion of all 4 requirement IDs
|
||||||
|
- Phase 10 can start immediately against a stable pkg/recon contract
|
||||||
|
</success_criteria>
|
||||||
|
|
||||||
|
<output>
|
||||||
|
After completion, create `.planning/phases/09-osint-infrastructure/09-06-SUMMARY.md`
|
||||||
|
</output>
|
||||||
|
</content>
|
||||||
Reference in New Issue
Block a user