From 61a9d527ee67fb07db46fdfb5db2acb9023416e2 Mon Sep 17 00:00:00 2001 From: salvacybersec Date: Mon, 6 Apr 2026 11:55:46 +0300 Subject: [PATCH] docs(11-01): complete search engine dorking sources plan - SUMMARY.md for 5 search engine sources (Google, Bing, DuckDuckGo, Yandex, Brave) - STATE.md updated with position and decisions - Requirements RECON-DORK-01/02/03 marked complete --- .planning/REQUIREMENTS.md | 6 +- .planning/STATE.md | 16 +-- .../11-osint_search_paste/11-01-SUMMARY.md | 117 ++++++++++++++++++ 3 files changed, 129 insertions(+), 10 deletions(-) create mode 100644 .planning/phases/11-osint_search_paste/11-01-SUMMARY.md diff --git a/.planning/REQUIREMENTS.md b/.planning/REQUIREMENTS.md index dedc4b7..a1c8a93 100644 --- a/.planning/REQUIREMENTS.md +++ b/.planning/REQUIREMENTS.md @@ -115,9 +115,9 @@ Requirements for initial release. Each maps to roadmap phases. ### OSINT/Recon — Search Engine Dorking -- [ ] **RECON-DORK-01**: Google dorking via Custom Search API / SerpAPI with 100+ built-in dorks -- [ ] **RECON-DORK-02**: Bing dorking via Azure Cognitive Services -- [ ] **RECON-DORK-03**: DuckDuckGo, Yandex, Brave search integration +- [x] **RECON-DORK-01**: Google dorking via Custom Search API / SerpAPI with 100+ built-in dorks +- [x] **RECON-DORK-02**: Bing dorking via Azure Cognitive Services +- [x] **RECON-DORK-03**: DuckDuckGo, Yandex, Brave search integration ### OSINT/Recon — Paste Sites diff --git a/.planning/STATE.md b/.planning/STATE.md index 532f202..bf2e45e 100644 --- a/.planning/STATE.md +++ b/.planning/STATE.md @@ -3,14 +3,14 @@ gsd_state_version: 1.0 milestone: v1.0 milestone_name: milestone status: executing -stopped_at: Completed 10-09-PLAN.md -last_updated: "2026-04-06T08:38:31.363Z" +stopped_at: Completed 11-01-PLAN.md +last_updated: "2026-04-06T08:55:35.271Z" last_activity: 2026-04-06 progress: total_phases: 18 - completed_phases: 10 - total_plans: 62 - completed_plans: 63 + completed_phases: 9 + total_plans: 57 + completed_plans: 64 percent: 20 --- @@ -89,6 +89,7 @@ Progress: [██░░░░░░░░] 20% | Phase 10-osint-code-hosting P02 | 5min | 1 tasks | 2 files | | Phase 10-osint-code-hosting P07 | 6 | 2 tasks | 6 files | | Phase 10 P09 | 12min | 2 tasks | 5 files | +| Phase 11 P01 | 3min | 2 tasks | 11 files | ## Accumulated Context @@ -126,6 +127,7 @@ Recent decisions affecting current work: - [Phase 10-osint-code-hosting]: github/gist use 'kw' in:file; all other sources use bare keyword - [Phase 10-osint-code-hosting]: GitHubSource reuses shared sources.Client + LimiterRegistry; builds queries from providers.Registry via BuildQueries; missing token disables (not errors) - [Phase 10]: RegisterAll registers all ten Phase 10 sources unconditionally; missing credentials flip Enabled()==false rather than hiding sources from the CLI catalog +- [Phase 11]: All five search sources use dork query format to focus on paste/code hosting leak sites ### Pending Todos @@ -140,6 +142,6 @@ None yet. ## Session Continuity -Last session: 2026-04-05T22:28:27.412Z -Stopped at: Completed 10-09-PLAN.md +Last session: 2026-04-06T08:55:35.267Z +Stopped at: Completed 11-01-PLAN.md Resume file: None diff --git a/.planning/phases/11-osint_search_paste/11-01-SUMMARY.md b/.planning/phases/11-osint_search_paste/11-01-SUMMARY.md new file mode 100644 index 0000000..ddb75c6 --- /dev/null +++ b/.planning/phases/11-osint_search_paste/11-01-SUMMARY.md @@ -0,0 +1,117 @@ +--- +phase: 11-osint-search-paste +plan: 01 +subsystem: recon +tags: [google-custom-search, bing-web-search, duckduckgo, yandex-xml, brave-search, dorking, osint] + +requires: + - phase: 10-osint-code-hosting + provides: "ReconSource interface, sources.Client, LimiterRegistry, BuildQueries/formatQuery" +provides: + - "GoogleDorkSource - Google Custom Search JSON API dorking" + - "BingDorkSource - Bing Web Search API v7 dorking" + - "DuckDuckGoSource - HTML scraping (credential-free)" + - "YandexSource - Yandex XML Search API dorking" + - "BraveSource - Brave Search API dorking" + - "formatQuery cases for all five search engines" +affects: [11-osint-search-paste, 11-03 RegisterAll wiring] + +tech-stack: + added: [encoding/xml for Yandex XML parsing] + patterns: [search-engine dork query format via formatQuery, XML API response parsing] + +key-files: + created: + - pkg/recon/sources/google.go + - pkg/recon/sources/google_test.go + - pkg/recon/sources/bing.go + - pkg/recon/sources/bing_test.go + - pkg/recon/sources/duckduckgo.go + - pkg/recon/sources/duckduckgo_test.go + - pkg/recon/sources/yandex.go + - pkg/recon/sources/yandex_test.go + - pkg/recon/sources/brave.go + - pkg/recon/sources/brave_test.go + modified: + - pkg/recon/sources/queries.go + +key-decisions: + - "All five search sources use dork query format: site:pastebin.com OR site:github.com \"keyword\" to focus on paste/code hosting leak sites" + - "DuckDuckGo is credential-free (HTML scraping) with RespectsRobots=true; other four require API keys" + - "Yandex uses encoding/xml for XML response parsing; all others use encoding/json" + - "extractGoogleKeyword reverse-parser shared by Bing/Yandex/Brave for keyword-to-provider mapping" + +patterns-established: + - "Search engine dork sources: same Sweep loop pattern as Phase 10 code hosting sources" + - "XML API sources: encoding/xml with nested struct unmarshaling (Yandex)" + +requirements-completed: [RECON-DORK-01, RECON-DORK-02, RECON-DORK-03] + +duration: 3min +completed: 2026-04-06 +--- + +# Phase 11 Plan 01: Search Engine Dorking Sources Summary + +**Five search engine dorking ReconSource implementations (Google, Bing, DuckDuckGo, Yandex, Brave) with dork-style queries targeting paste/code hosting sites** + +## Performance + +- **Duration:** 3 min +- **Started:** 2026-04-06T08:51:30Z +- **Completed:** 2026-04-06T08:54:52Z +- **Tasks:** 2 +- **Files modified:** 11 + +## Accomplishments +- GoogleDorkSource and BingDorkSource with JSON API integration and httptest-based tests +- DuckDuckGoSource with HTML scraping (credential-free, RespectsRobots=true) +- YandexSource with XML Search API and encoding/xml response parsing +- BraveSource with Brave Search API and X-Subscription-Token auth +- formatQuery updated with dork syntax for all five search engines + +## Task Commits + +Each task was committed atomically: + +1. **Task 1: GoogleDorkSource + BingDorkSource + formatQuery updates** - `7272e65` (feat) +2. **Task 2: DuckDuckGoSource + YandexSource + BraveSource** - `7707053` (feat) + +## Files Created/Modified +- `pkg/recon/sources/google.go` - Google Custom Search JSON API source (APIKey + CX required) +- `pkg/recon/sources/google_test.go` - Google source tests (enabled, sweep, cancel, unauth) +- `pkg/recon/sources/bing.go` - Bing Web Search API v7 source (Ocp-Apim-Subscription-Key) +- `pkg/recon/sources/bing_test.go` - Bing source tests +- `pkg/recon/sources/duckduckgo.go` - DuckDuckGo HTML scraper (no API key, always enabled) +- `pkg/recon/sources/duckduckgo_test.go` - DuckDuckGo tests including empty registry +- `pkg/recon/sources/yandex.go` - Yandex XML Search API (user + key required, XML parsing) +- `pkg/recon/sources/yandex_test.go` - Yandex tests +- `pkg/recon/sources/brave.go` - Brave Search API (X-Subscription-Token) +- `pkg/recon/sources/brave_test.go` - Brave tests +- `pkg/recon/sources/queries.go` - Added google/bing/duckduckgo/yandex/brave formatQuery cases + +## Decisions Made +- All five search sources use dork query format `site:pastebin.com OR site:github.com "keyword"` to focus results on leak-likely sites +- DuckDuckGo is the only credential-free source; uses HTML scraping with extractAnchorHrefs (shared with Replit) +- Yandex requires encoding/xml for its XML Search API response format +- extractGoogleKeyword reverse-parser reused across Bing/Yandex/Brave for keyword-to-provider name mapping + +## Deviations from Plan + +None - plan executed exactly as written. + +## Issues Encountered + +None. + +## User Setup Required + +None - no external service configuration required. + +## Next Phase Readiness +- All five search engine sources ready for RegisterAll wiring in Plan 11-03 +- Each source follows established ReconSource pattern for seamless engine integration + +--- +*Phase: 11-osint-search-paste* +*Completed: 2026-04-06*