Files
salvacybersec 61a9d527ee docs(11-01): complete search engine dorking sources plan
- SUMMARY.md for 5 search engine sources (Google, Bing, DuckDuckGo, Yandex, Brave)
- STATE.md updated with position and decisions
- Requirements RECON-DORK-01/02/03 marked complete
2026-04-06 11:55:46 +03:00

118 lines
4.7 KiB
Markdown

---
phase: 11-osint-search-paste
plan: 01
subsystem: recon
tags: [google-custom-search, bing-web-search, duckduckgo, yandex-xml, brave-search, dorking, osint]
requires:
- phase: 10-osint-code-hosting
provides: "ReconSource interface, sources.Client, LimiterRegistry, BuildQueries/formatQuery"
provides:
- "GoogleDorkSource - Google Custom Search JSON API dorking"
- "BingDorkSource - Bing Web Search API v7 dorking"
- "DuckDuckGoSource - HTML scraping (credential-free)"
- "YandexSource - Yandex XML Search API dorking"
- "BraveSource - Brave Search API dorking"
- "formatQuery cases for all five search engines"
affects: [11-osint-search-paste, 11-03 RegisterAll wiring]
tech-stack:
added: [encoding/xml for Yandex XML parsing]
patterns: [search-engine dork query format via formatQuery, XML API response parsing]
key-files:
created:
- pkg/recon/sources/google.go
- pkg/recon/sources/google_test.go
- pkg/recon/sources/bing.go
- pkg/recon/sources/bing_test.go
- pkg/recon/sources/duckduckgo.go
- pkg/recon/sources/duckduckgo_test.go
- pkg/recon/sources/yandex.go
- pkg/recon/sources/yandex_test.go
- pkg/recon/sources/brave.go
- pkg/recon/sources/brave_test.go
modified:
- pkg/recon/sources/queries.go
key-decisions:
- "All five search sources use dork query format: site:pastebin.com OR site:github.com \"keyword\" to focus on paste/code hosting leak sites"
- "DuckDuckGo is credential-free (HTML scraping) with RespectsRobots=true; other four require API keys"
- "Yandex uses encoding/xml for XML response parsing; all others use encoding/json"
- "extractGoogleKeyword reverse-parser shared by Bing/Yandex/Brave for keyword-to-provider mapping"
patterns-established:
- "Search engine dork sources: same Sweep loop pattern as Phase 10 code hosting sources"
- "XML API sources: encoding/xml with nested struct unmarshaling (Yandex)"
requirements-completed: [RECON-DORK-01, RECON-DORK-02, RECON-DORK-03]
duration: 3min
completed: 2026-04-06
---
# Phase 11 Plan 01: Search Engine Dorking Sources Summary
**Five search engine dorking ReconSource implementations (Google, Bing, DuckDuckGo, Yandex, Brave) with dork-style queries targeting paste/code hosting sites**
## Performance
- **Duration:** 3 min
- **Started:** 2026-04-06T08:51:30Z
- **Completed:** 2026-04-06T08:54:52Z
- **Tasks:** 2
- **Files modified:** 11
## Accomplishments
- GoogleDorkSource and BingDorkSource with JSON API integration and httptest-based tests
- DuckDuckGoSource with HTML scraping (credential-free, RespectsRobots=true)
- YandexSource with XML Search API and encoding/xml response parsing
- BraveSource with Brave Search API and X-Subscription-Token auth
- formatQuery updated with dork syntax for all five search engines
## Task Commits
Each task was committed atomically:
1. **Task 1: GoogleDorkSource + BingDorkSource + formatQuery updates** - `7272e65` (feat)
2. **Task 2: DuckDuckGoSource + YandexSource + BraveSource** - `7707053` (feat)
## Files Created/Modified
- `pkg/recon/sources/google.go` - Google Custom Search JSON API source (APIKey + CX required)
- `pkg/recon/sources/google_test.go` - Google source tests (enabled, sweep, cancel, unauth)
- `pkg/recon/sources/bing.go` - Bing Web Search API v7 source (Ocp-Apim-Subscription-Key)
- `pkg/recon/sources/bing_test.go` - Bing source tests
- `pkg/recon/sources/duckduckgo.go` - DuckDuckGo HTML scraper (no API key, always enabled)
- `pkg/recon/sources/duckduckgo_test.go` - DuckDuckGo tests including empty registry
- `pkg/recon/sources/yandex.go` - Yandex XML Search API (user + key required, XML parsing)
- `pkg/recon/sources/yandex_test.go` - Yandex tests
- `pkg/recon/sources/brave.go` - Brave Search API (X-Subscription-Token)
- `pkg/recon/sources/brave_test.go` - Brave tests
- `pkg/recon/sources/queries.go` - Added google/bing/duckduckgo/yandex/brave formatQuery cases
## Decisions Made
- All five search sources use dork query format `site:pastebin.com OR site:github.com "keyword"` to focus results on leak-likely sites
- DuckDuckGo is the only credential-free source; uses HTML scraping with extractAnchorHrefs (shared with Replit)
- Yandex requires encoding/xml for its XML Search API response format
- extractGoogleKeyword reverse-parser reused across Bing/Yandex/Brave for keyword-to-provider name mapping
## Deviations from Plan
None - plan executed exactly as written.
## Issues Encountered
None.
## User Setup Required
None - no external service configuration required.
## Next Phase Readiness
- All five search engine sources ready for RegisterAll wiring in Plan 11-03
- Each source follows established ReconSource pattern for seamless engine integration
---
*Phase: 11-osint-search-paste*
*Completed: 2026-04-06*