- URLhausSource searches abuse.ch URLhaus API for malicious URLs with API keys
- Credentialless source (Enabled always true, no API key needed)
- Tag lookup with payload endpoint fallback
- ciLogKeyPattern used for content matching
- Tests with httptest mocks for happy path and empty results
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- VirusTotalSource searches VT Intelligence API for files containing API keys
- IntelligenceXSource searches IX archive with 3-step flow (search/results/read)
- Both credential-gated (Enabled returns false without API key)
- ciLogKeyPattern used for content matching
- Tests with httptest mocks for happy path and empty results
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- GrafanaSource: search dashboards via /api/search, fetch detail via /api/dashboards/uid
- SentrySource: search issues via /api/0/issues, fetch events for key detection
- Register all 5 log aggregator sources in RegisterAll (67 sources total)
- Tests use httptest mocks for each API endpoint
- ElasticsearchSource: POST _search API with query_string, parse hits._source
- KibanaSource: GET saved_objects/_find API with kbn-xsrf header
- SplunkSource: GET search/jobs/export API with newline-delimited JSON parsing
- All sources use ciLogKeyPattern for key detection
- Tests use httptest mocks for each API endpoint
- DiscordSource uses dorking approach against configurable search endpoint
- SlackSource uses dorking against slack-archive indexers
- DevToSource searches dev.to API articles list + detail for body_markdown
- RegisterAll extended to include all 6 Phase 15 forum sources
- All credentialless, use ciLogKeyPattern for key detection
- StackOverflowSource searches SE API v2.3 search/excerpts endpoint
- RedditSource searches Reddit JSON API with custom User-Agent
- HackerNewsSource searches Algolia HN API for comments
- All credentialless, use ciLogKeyPattern for key detection
- Tests use httptest mock servers with API key patterns
- ConfluenceSource searches exposed instances via /rest/api/content/search CQL
- GoogleDocsSource uses dorking + /export?format=txt for plain-text scanning
- HTML tag stripping for Confluence storage format
- Both credentialless, tests with httptest mocks confirm findings
- TrelloSource searches public Trello boards via /1/search API
- NotionSource uses dorking to discover and scrape public Notion pages
- Both credentialless, follow established Phase 10 pattern
- Tests with httptest mocks confirm Sweep emits findings
- Add CircleCIToken to SourcesConfig with env/viper lookup in cmd/recon.go
- Register 7 new sources: travisci, ghactions, circleci, jenkins, wayback, commoncrawl, jsbundle
- Update register_test.go expectations from 45 to 52 sources
- Add integration test handlers + registrations for all 12 Phase 14 sources
- Integration test now validates 52 sources end-to-end
- SwaggerSource probes OpenAPI doc endpoints for API keys in example/default fields
- DeployPreviewSource scans Vercel/Netlify preview URLs for __NEXT_DATA__ env leaks
- Both implement ReconSource, credentialless, with httptest-based tests
- GitHubActionsSource: searches GitHub code search for workflow files with provider keywords (token-gated)
- TravisCISource: queries Travis CI v3 API for public build logs (credentialless)
- CircleCISource: queries CircleCI v2 pipeline API for build pipelines (token-gated)
- JenkinsSource: queries open Jenkins /api/json for job build consoles (credentialless)
- GitLabCISource: queries GitLab projects API for CI-enabled projects (token-gated)
- RegisterAll extended to 45 sources (40 Phase 10-13 + 5 Phase 14)
- Integration test updated with fixtures for all 5 new sources
- cmd/recon.go wires CIRCLECI_TOKEN env var
- SourceMapSource probes .map files for original source containing API keys
- WebpackSource scans JS bundles for inlined NEXT_PUBLIC_/REACT_APP_/VITE_ env vars
- EnvLeakSource probes common .env paths for exposed environment files
- All three implement ReconSource, credentialless, with httptest-based tests
- WaybackMachineSource queries CDX API for historical snapshots
- CommonCrawlSource queries CC Index API for matching pages
- Both credentialless, rate-limited at 1 req/5s, RespectsRobots=true
- RegisterAll extended to 42 sources (40 Phase 10-13 + 2 Phase 14)
- Full httptest-based test coverage for both sources
- Terraform searches registry.terraform.io v1 modules API with namespace/name/provider URLs
- Helm searches artifacthub.io for charts (kind=0) with repo/chart URL construction
- Both sources: context cancellation, nil registry, httptest-based tests
- MavenSource queries Maven Central Solr API for provider keyword matches
- NuGetSource queries NuGet gallery search API with projectUrl fallback
- Both sources: httptest fixtures, ctx cancellation, metadata tests
- AzureBlobScanner enumerates public Azure Blob containers with XML listing
- DOSpacesScanner enumerates public DO Spaces across 5 regions (S3-compatible XML)
- httptest-based tests for all four scanners: sweep, empty registry, ctx cancel, metadata
- All sources credentialless, compile-time interface assertions
- S3Scanner enumerates public AWS S3 buckets by provider keyword + suffix pattern
- GCSScanner enumerates public GCS buckets with JSON listing format
- Shared bucketNames() helper and isConfigFile() filter for config-pattern files
- Both credentialless (anonymous HTTP), always Enabled, BaseURL override for tests
- FOFASource searches FOFA API with base64-encoded queries (email+key auth)
- NetlasSource searches Netlas API with X-API-Key header auth
- BinaryEdgeSource searches BinaryEdge API with X-Key header auth
- All three implement recon.ReconSource with shared Client retry/backoff
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ShodanSource searches /shodan/host/search with API key auth
- CensysSource POSTs to /v2/hosts/search with Basic Auth
- ZoomEyeSource searches /host/search with API-KEY header
- All use shared Client for retry/backoff, LimiterRegistry for rate limiting
- Extend httptest mux with fixtures for Google, Bing, DuckDuckGo, Yandex, Brave
- Add Pastebin (routed /pb/), GistPaste (/gp/), PasteSites (injected platform)
- Assert all 18 SourceTypes emit at least one finding via SweepAll
- DuckDuckGoSource scrapes HTML search (no API key, always enabled, RespectsRobots=true)
- YandexSource uses Yandex XML Search API (user+key required, XML response parsing)
- BraveSource uses Brave Search API (X-Subscription-Token header, JSON response)
- All three follow established error handling: 401 aborts, transient continues, ctx cancellation returns
- GoogleDorkSource uses Google Custom Search JSON API (APIKey+CX required)
- BingDorkSource uses Bing Web Search API v7 (Ocp-Apim-Subscription-Key header)
- formatQuery now handles google/bing/duckduckgo/yandex/brave dork syntax
- Both sources follow established pattern: retry via Client, rate limit via LimiterRegistry
- PastebinSource: two-phase search+raw-fetch with keyword matching
- GistPasteSource: scrapes gist.github.com public search (no auth)
- Both implement recon.ReconSource with httptest-based tests
Closes 2 verification gaps:
1. --sources=github,gitlab flag filters registered sources before sweep
2. Findings persisted to SQLite via storage.SaveFinding after dedup
Also adds Engine.Get() method for source lookup by name.