--- phase: 11-osint-search-paste plan: 01 subsystem: recon tags: [google-custom-search, bing-web-search, duckduckgo, yandex-xml, brave-search, dorking, osint] requires: - phase: 10-osint-code-hosting provides: "ReconSource interface, sources.Client, LimiterRegistry, BuildQueries/formatQuery" provides: - "GoogleDorkSource - Google Custom Search JSON API dorking" - "BingDorkSource - Bing Web Search API v7 dorking" - "DuckDuckGoSource - HTML scraping (credential-free)" - "YandexSource - Yandex XML Search API dorking" - "BraveSource - Brave Search API dorking" - "formatQuery cases for all five search engines" affects: [11-osint-search-paste, 11-03 RegisterAll wiring] tech-stack: added: [encoding/xml for Yandex XML parsing] patterns: [search-engine dork query format via formatQuery, XML API response parsing] key-files: created: - pkg/recon/sources/google.go - pkg/recon/sources/google_test.go - pkg/recon/sources/bing.go - pkg/recon/sources/bing_test.go - pkg/recon/sources/duckduckgo.go - pkg/recon/sources/duckduckgo_test.go - pkg/recon/sources/yandex.go - pkg/recon/sources/yandex_test.go - pkg/recon/sources/brave.go - pkg/recon/sources/brave_test.go modified: - pkg/recon/sources/queries.go key-decisions: - "All five search sources use dork query format: site:pastebin.com OR site:github.com \"keyword\" to focus on paste/code hosting leak sites" - "DuckDuckGo is credential-free (HTML scraping) with RespectsRobots=true; other four require API keys" - "Yandex uses encoding/xml for XML response parsing; all others use encoding/json" - "extractGoogleKeyword reverse-parser shared by Bing/Yandex/Brave for keyword-to-provider mapping" patterns-established: - "Search engine dork sources: same Sweep loop pattern as Phase 10 code hosting sources" - "XML API sources: encoding/xml with nested struct unmarshaling (Yandex)" requirements-completed: [RECON-DORK-01, RECON-DORK-02, RECON-DORK-03] duration: 3min completed: 2026-04-06 --- # Phase 11 Plan 01: Search Engine Dorking Sources Summary **Five search engine dorking ReconSource implementations (Google, Bing, DuckDuckGo, Yandex, Brave) with dork-style queries targeting paste/code hosting sites** ## Performance - **Duration:** 3 min - **Started:** 2026-04-06T08:51:30Z - **Completed:** 2026-04-06T08:54:52Z - **Tasks:** 2 - **Files modified:** 11 ## Accomplishments - GoogleDorkSource and BingDorkSource with JSON API integration and httptest-based tests - DuckDuckGoSource with HTML scraping (credential-free, RespectsRobots=true) - YandexSource with XML Search API and encoding/xml response parsing - BraveSource with Brave Search API and X-Subscription-Token auth - formatQuery updated with dork syntax for all five search engines ## Task Commits Each task was committed atomically: 1. **Task 1: GoogleDorkSource + BingDorkSource + formatQuery updates** - `7272e65` (feat) 2. **Task 2: DuckDuckGoSource + YandexSource + BraveSource** - `7707053` (feat) ## Files Created/Modified - `pkg/recon/sources/google.go` - Google Custom Search JSON API source (APIKey + CX required) - `pkg/recon/sources/google_test.go` - Google source tests (enabled, sweep, cancel, unauth) - `pkg/recon/sources/bing.go` - Bing Web Search API v7 source (Ocp-Apim-Subscription-Key) - `pkg/recon/sources/bing_test.go` - Bing source tests - `pkg/recon/sources/duckduckgo.go` - DuckDuckGo HTML scraper (no API key, always enabled) - `pkg/recon/sources/duckduckgo_test.go` - DuckDuckGo tests including empty registry - `pkg/recon/sources/yandex.go` - Yandex XML Search API (user + key required, XML parsing) - `pkg/recon/sources/yandex_test.go` - Yandex tests - `pkg/recon/sources/brave.go` - Brave Search API (X-Subscription-Token) - `pkg/recon/sources/brave_test.go` - Brave tests - `pkg/recon/sources/queries.go` - Added google/bing/duckduckgo/yandex/brave formatQuery cases ## Decisions Made - All five search sources use dork query format `site:pastebin.com OR site:github.com "keyword"` to focus results on leak-likely sites - DuckDuckGo is the only credential-free source; uses HTML scraping with extractAnchorHrefs (shared with Replit) - Yandex requires encoding/xml for its XML Search API response format - extractGoogleKeyword reverse-parser reused across Bing/Yandex/Brave for keyword-to-provider name mapping ## Deviations from Plan None - plan executed exactly as written. ## Issues Encountered None. ## User Setup Required None - no external service configuration required. ## Next Phase Readiness - All five search engine sources ready for RegisterAll wiring in Plan 11-03 - Each source follows established ReconSource pattern for seamless engine integration --- *Phase: 11-osint-search-paste* *Completed: 2026-04-06*