Commit Graph

246 Commits

Author SHA1 Message Date
salvacybersec
10ae94115f Merge branch 'worktree-agent-a6700ee2' 2026-04-06 11:57:39 +03:00
salvacybersec
da0bf800f9 docs(11-02): complete paste site sources plan
- SUMMARY.md for PastebinSource, GistPasteSource, PasteSitesSource
2026-04-06 11:57:21 +03:00
salvacybersec
ed148d47e1 feat(11-02): add PasteSitesSource multi-paste aggregator
- Aggregates dpaste, paste.ee, rentry, hastebin into single source
- Follows SandboxesSource multi-platform pattern with per-platform error isolation
- Two-phase search+raw-fetch with keyword matching against provider registry
2026-04-06 11:55:44 +03:00
salvacybersec
3c500b5473 feat(11-02): add PastebinSource and GistPasteSource for paste site scanning
- PastebinSource: two-phase search+raw-fetch with keyword matching
- GistPasteSource: scrapes gist.github.com public search (no auth)
- Both implement recon.ReconSource with httptest-based tests
2026-04-06 11:53:00 +03:00
salvacybersec
f8b06055ef docs(11): create phase plan — 3 plans for search engine dorking + paste sites 2026-04-06 11:50:38 +03:00
salvacybersec
9ad9767109 docs(11-16): auto-generated OSINT phase contexts 2026-04-06 11:40:44 +03:00
salvacybersec
3aadeb2d1c docs(phase-10): complete phase execution 2026-04-06 11:38:31 +03:00
salvacybersec
118decbb3e fix(phase-10): add --sources filter flag and DB persistence to recon full
Closes 2 verification gaps:
1. --sources=github,gitlab flag filters registered sources before sweep
2. Findings persisted to SQLite via storage.SaveFinding after dedup

Also adds Engine.Get() method for source lookup by name.
2026-04-06 11:36:19 +03:00
salvacybersec
1acbedc03a docs(10-09): complete RegisterAll + integration test plan 2026-04-06 01:28:32 +03:00
salvacybersec
e00fb172ab feat(10-09): wire sources.RegisterAll into cmd/recon with viper+env credential lookup 2026-04-06 01:27:25 +03:00
salvacybersec
8528108613 test(10-09): add end-to-end SweepAll integration test across all ten sources 2026-04-06 01:26:13 +03:00
salvacybersec
fb3e57382e feat(10-09): wire all ten Phase 10 sources in RegisterAll 2026-04-06 01:24:22 +03:00
salvacybersec
4628ccfe90 test(10-09): add failing RegisterAll wiring tests 2026-04-06 01:23:26 +03:00
salvacybersec
a034eeb14c Merge branch 'worktree-agent-ad7ef8d3' 2026-04-06 01:20:33 +03:00
salvacybersec
a0b8f99a7f Merge branch 'worktree-agent-ac81d6ab' 2026-04-06 01:20:25 +03:00
salvacybersec
430ace9a9a Merge branch 'worktree-agent-a2637f83' 2026-04-06 01:20:25 +03:00
salvacybersec
91becd961f Merge branch 'worktree-agent-a7f84823' 2026-04-06 01:20:25 +03:00
salvacybersec
6928ca4e70 Merge branch 'worktree-agent-a2fe7ff3' 2026-04-06 01:20:25 +03:00
salvacybersec
12c402ab67 docs(10-07): complete sandbox/IDE scraping sources plan 2026-04-06 01:19:57 +03:00
salvacybersec
21d5551aa4 docs(10-04): complete Bitbucket + Gist sources plan 2026-04-06 01:18:53 +03:00
salvacybersec
3d3c57fff2 docs(10-05): complete CodebergSource plan 2026-04-06 01:18:46 +03:00
salvacybersec
ecebffd27d feat(10-07): add SandboxesSource aggregator (codepen/jsfiddle/stackblitz/glitch/observable)
- Single ReconSource umbrella iterating per-platform HTML or JSON search endpoints
- Per-platform failures logged and skipped (log-and-continue); ctx cancel aborts fast
- Sub-platform identifier encoded in Finding.KeyMasked as 'platform=<name>' (pragmatic slot)
- Gitpod intentionally omitted (no public search)
- 5 httptest-backed tests covering HTML+JSON extraction, platform-failure tolerance, ctx cancel
2026-04-06 01:18:15 +03:00
salvacybersec
4fafc01052 feat(10-05): implement CodebergSource for Gitea REST API
- Add CodebergSource targeting /api/v1/repos/search (Codeberg + any Gitea)
- Public API by default; Authorization: token <t> when Token set
- Unauth rate limit 60/hour, authenticated ~1000/hour
- Emit Findings keyed to repo html_url with SourceType=recon:codeberg
- Keyword index maps BuildQueries output back to ProviderName
- httptest coverage: name/interface, rate limits (both modes),
  sweep decoding, header presence/absence, ctx cancellation
2026-04-06 01:17:25 +03:00
salvacybersec
3715a75be7 docs(10-02): complete GitHubSource plan 2026-04-06 01:17:21 +03:00
salvacybersec
0e16e8ea4c feat(10-04): add GistSource for public gist keyword recon
- GistSource implements recon.ReconSource (RECON-CODE-04)
- Lists /gists/public?per_page=100, fetches each file's raw content,
  scans against provider keyword set, emits one Finding per matching gist
- Disabled when GitHub token empty
- Rate: rate.Every(2s), burst 1 (30 req/min GitHub limit)
- 256KB read cap per file; skips gists without keyword matches
- httptest coverage: enable gating, sweep match, no-match, 401, ctx cancel
2026-04-06 01:17:07 +03:00
salvacybersec
62a347f476 feat(10-07): add Replit and CodeSandbox scraping sources
- ReplitSource scrapes /search HTML extracting /@user/repl anchors
- CodeSandboxSource scrapes /search HTML extracting /s/slug anchors
- Both use golang.org/x/net/html parser, 10 req/min rate, RespectsRobots=true
- 10 httptest-backed tests covering extraction, ctx cancel, rate/name assertions
2026-04-06 01:16:39 +03:00
salvacybersec
223c23e672 docs(10-03): complete GitLabSource plan summary 2026-04-06 01:16:34 +03:00
salvacybersec
cae714b488 docs(10-06): complete HuggingFaceSource plan 2026-04-06 01:16:27 +03:00
salvacybersec
792ac8d54b docs(10-08): complete KaggleSource plan 2026-04-06 01:16:24 +03:00
salvacybersec
ab636dc5e1 fix(10-02): stabilize GitHubSource provider-name test 2026-04-06 01:15:51 +03:00
salvacybersec
0137dc57b1 feat(10-03): add GitLabSource for /api/v4/search blobs
- Implements recon.ReconSource against GitLab Search API
- PRIVATE-TOKEN header auth; rate.Every(30ms) burst 5 (~2000/min)
- Disabled when token empty; Sweep returns nil without calls
- Emits Finding per blob with Source=/projects/<id>/-/blob/<ref>/<path>
- 401 wrapped as ErrUnauthorized; ctx cancellation honored
- httptest coverage: enabled gating, happy path, 401, ctx cancel, iface assert
2026-04-06 01:15:49 +03:00
salvacybersec
39001f208c feat(10-06): implement HuggingFaceSource scanning Spaces and Models
- queries /api/spaces and /api/models via Hub API
- token optional: slower rate when absent (10s vs 3.6s)
- emits Findings with SourceType=recon:huggingface and prefixed Source URLs
- compile-time assert implements recon.ReconSource
2026-04-06 01:15:49 +03:00
salvacybersec
45f8782464 test(10-06): add failing tests for HuggingFaceSource
- httptest server routes /api/spaces and /api/models
- assertions: enabled, both endpoints hit, URL prefixes, auth header, ctx cancel, rate-limit token mode
2026-04-06 01:15:43 +03:00
salvacybersec
d279abf449 feat(10-04): add BitbucketSource for code search recon
- BitbucketSource implements recon.ReconSource (RECON-CODE-03)
- Queries /2.0/workspaces/{ws}/search/code with Bearer auth
- Disabled when token OR workspace empty
- Rate: rate.Every(3.6s), burst 1 (Bitbucket 1000/hr limit)
- httptest coverage: enable gating, sweep, 401, ctx cancel
2026-04-06 01:15:42 +03:00
salvacybersec
243b7405cd feat(10-08): add KaggleSource with HTTP Basic auth
- KaggleSource queries /api/v1/kernels/list with SetBasicAuth(user, key)
- Disabled when either KaggleUser or KaggleKey is empty (no HTTP calls)
- Emits Findings tagged recon:kaggle with Source = <web>/code/<ref>
- 60/min rate limit via rate.Every(1s), burst 1
- httptest-driven tests cover enabled, auth header, missing creds,
  401 unauthorized, and ctx cancellation
- RECON-CODE-09
2026-04-06 01:15:23 +03:00
salvacybersec
fb6cb53975 feat(10-02): implement GitHubSource recon.ReconSource 2026-04-06 01:14:52 +03:00
salvacybersec
03deb603b3 test(10-02): add failing tests for GitHubSource 2026-04-06 01:12:56 +03:00
salvacybersec
9b1aaae28d docs(10-01): complete recon sources foundation plan 2026-04-06 01:10:57 +03:00
salvacybersec
9273f356e6 feat(10-01): add provider-driven query generator and RegisterAll skeleton
- BuildQueries(reg, source) dedups keywords and formats per-source syntax
- github/gist use 'keyword' in:file; others use bare keyword
- SourcesConfig placeholder struct for Wave 2 plans to depend on
- RegisterAll no-op stub (Plan 10-09 will fill)
2026-04-06 01:09:57 +03:00
salvacybersec
75024e4701 feat(10-01): add shared retry HTTP client for recon sources
- Client.Do retries 429/403/5xx honoring Retry-After
- 401 returns ErrUnauthorized immediately (no retry)
- Context cancellation honored during retry sleeps
- Default UA keyhunter-recon/1.0, 30s timeout, 2 retries
2026-04-06 01:09:02 +03:00
salvacybersec
191bdee3bc docs(10-osint-code-hosting): create phase 10 plans (9 plans across 3 waves) 2026-04-06 01:07:15 +03:00
salvacybersec
cfe090a5c9 docs(10): OSINT code hosting context 2026-04-06 00:59:18 +03:00
salvacybersec
226274ca9e docs(phase-09): complete phase execution 2026-04-06 00:56:36 +03:00
salvacybersec
4b8599d959 docs(09-06): complete phase 9 OSINT infrastructure
- Add 09-06-SUMMARY.md (integration test + phase summary plan)
- Update STATE.md progress and metrics
- Update ROADMAP.md phase 09 status
- Mark RECON-INFRA-05/06/07/08 complete in REQUIREMENTS.md
2026-04-06 00:53:35 +03:00
salvacybersec
d29a7d30b2 docs(09-06): add phase 09 completion summary
Documents all 4 RECON-INFRA requirement IDs as complete, summarizes
decisions (per-source limiters, default-allow robots, SHA256 dedup,
UA pool of 10), lists handoff contract for Phases 10-16.
2026-04-06 00:52:20 +03:00
salvacybersec
a754ff7546 test(09-06): add recon pipeline integration test
- Exercises Engine + LimiterRegistry + Stealth + Dedup end-to-end
- testSource emits 5 findings with one duplicate pair (Dedup -> 4)
- TestRobotsOnlyWhenRespectsRobots asserts robots gating via httptest
- Covers RECON-INFRA-05/06/07/08
2026-04-06 00:51:08 +03:00
salvacybersec
0ff9edc6c1 docs(09-05): complete recon CLI command tree plan 2026-04-06 00:48:42 +03:00
salvacybersec
86a6bb864b feat(09-05): add recon full/list commands and remove stub
- cmd/recon.go owns reconCmd with full and list subcommands
- Wires pkg/recon.Engine.SweepAll + Dedup with ExampleSource registered
- Adds --stealth, --respect-robots (default true), --query flags
- Removes reconCmd stub from cmd/stubs.go
2026-04-06 00:47:32 +03:00
salvacybersec
c2137edc41 merge: plan 09-03 stealth+dedup 2026-04-06 00:45:13 +03:00
salvacybersec
1eb86ca308 docs(09-03): complete stealth UA pool and dedup plan
- Stealth UA pool (10 browsers) + RandomUserAgent/StealthHeaders
- Stable cross-source Dedup keyed by sha256(provider|masked|source)
- Mark RECON-INFRA-06 complete
2026-04-06 00:44:37 +03:00