---
phase: 12-osint_iot_cloud_storage
plan: 03
type: execute
wave: 1
depends_on: []
files_modified:
- pkg/recon/sources/s3scanner.go
- pkg/recon/sources/s3scanner_test.go
- pkg/recon/sources/gcsscanner.go
- pkg/recon/sources/gcsscanner_test.go
- pkg/recon/sources/azureblob.go
- pkg/recon/sources/azureblob_test.go
- pkg/recon/sources/dospaces.go
- pkg/recon/sources/dospaces_test.go
autonomous: true
requirements: [RECON-CLOUD-01, RECON-CLOUD-02, RECON-CLOUD-03, RECON-CLOUD-04]
must_haves:
truths:
- "S3Scanner enumerates publicly accessible S3 buckets by name pattern and scans readable objects for API key exposure"
- "GCSScanner scans publicly accessible Google Cloud Storage buckets"
- "AzureBlobScanner scans publicly accessible Azure Blob containers"
- "DOSpacesScanner scans publicly accessible DigitalOcean Spaces"
- "Each cloud scanner is credentialless (uses anonymous HTTP to probe public buckets) and always Enabled"
artifacts:
- path: "pkg/recon/sources/s3scanner.go"
provides: "S3Scanner implementing recon.ReconSource"
exports: ["S3Scanner"]
- path: "pkg/recon/sources/gcsscanner.go"
provides: "GCSScanner implementing recon.ReconSource"
exports: ["GCSScanner"]
- path: "pkg/recon/sources/azureblob.go"
provides: "AzureBlobScanner implementing recon.ReconSource"
exports: ["AzureBlobScanner"]
- path: "pkg/recon/sources/dospaces.go"
provides: "DOSpacesScanner implementing recon.ReconSource"
exports: ["DOSpacesScanner"]
key_links:
- from: "pkg/recon/sources/s3scanner.go"
to: "pkg/recon/sources/httpclient.go"
via: "sources.Client for retry/backoff HTTP"
pattern: "s\\.client\\.Do"
---
Implement four cloud storage scanner recon sources: S3Scanner, GCSScanner, AzureBlobScanner, and DOSpacesScanner.
Purpose: Enable discovery of API keys leaked in publicly accessible cloud storage buckets across AWS, GCP, Azure, and DigitalOcean.
Output: Four source files + tests following the established Phase 10 pattern.
Note on RECON-CLOUD-03 (MinIO via Shodan) and RECON-CLOUD-04 (GrayHatWarfare): These are addressed here. MinIO discovery is implemented as a Shodan query variant within S3Scanner (MinIO uses S3-compatible API). GrayHatWarfare is implemented as a dedicated scanner that queries the GrayHatWarfare buckets.grayhatwarfare.com API.
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@pkg/recon/source.go
@pkg/recon/sources/httpclient.go
@pkg/recon/sources/bing.go
@pkg/recon/sources/queries.go
@pkg/recon/sources/register.go
From pkg/recon/source.go:
```go
type ReconSource interface {
Name() string
RateLimit() rate.Limit
Burst() int
RespectsRobots() bool
Enabled(cfg Config) bool
Sweep(ctx context.Context, query string, out chan<- Finding) error
}
```
From pkg/recon/sources/httpclient.go:
```go
type Client struct { HTTP *http.Client; MaxRetries int; UserAgent string }
func NewClient() *Client
func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error)
```
Task 1: Implement S3Scanner and GCSScanner
pkg/recon/sources/s3scanner.go, pkg/recon/sources/gcsscanner.go
**S3Scanner** (s3scanner.go) — RECON-CLOUD-01 + RECON-CLOUD-03:
- Struct: `S3Scanner` with fields `Registry *providers.Registry`, `Limiters *recon.LimiterRegistry`, `BaseURL string`, `client *Client`
- Compile-time assertion: `var _ recon.ReconSource = (*S3Scanner)(nil)`
- Name(): "s3"
- RateLimit(): rate.Every(500 * time.Millisecond) — S3 public reads are generous
- Burst(): 3
- RespectsRobots(): false (direct API calls)
- Enabled(): always true (credentialless — probes public buckets)
- Sweep(): Generates candidate bucket names from provider keywords (e.g., "openai-keys", "anthropic-config", "llm-keys", etc.) using a helper `bucketNames(registry)` that combines provider keywords with common suffixes like "-keys", "-config", "-backup", "-data", "-secrets", "-env". For each candidate bucket:
1. HEAD `https://{bucket}.s3.amazonaws.com/` — if 200/403, bucket exists
2. If 200 (public listing), GET the ListBucket XML, parse `` elements
3. For keys matching common config file patterns (.env, config.*, *.json, *.yaml, *.yml, *.toml, *.conf), emit a Finding with Source=`s3://{bucket}/{key}`, SourceType="recon:s3", Confidence="medium"
4. Do NOT download object contents (too heavy) — just flag the presence of suspicious files
- Use BaseURL override for tests (default: "https://%s.s3.amazonaws.com")
- Note: MinIO instances (RECON-CLOUD-03) are discovered via Shodan queries in Plan 12-01's ShodanSource using the query "minio" — this source focuses on AWS S3 bucket enumeration.
**GCSScanner** (gcsscanner.go) — RECON-CLOUD-02:
- Struct: `GCSScanner` with fields `Registry *providers.Registry`, `Limiters *recon.LimiterRegistry`, `BaseURL string`, `client *Client`
- Name(): "gcs"
- RateLimit(): rate.Every(500 * time.Millisecond)
- Burst(): 3
- RespectsRobots(): false
- Enabled(): always true (credentialless)
- Sweep(): Same bucket enumeration pattern as S3Scanner but using `https://storage.googleapis.com/{bucket}` for HEAD and listing. GCS public bucket listing returns JSON when Accept: application/json is set. Parse `{"items":[{"name":"..."}]}`. Emit findings for config-pattern files with Source=`gs://{bucket}/{name}`, SourceType="recon:gcs".
Both sources share a common `bucketNames` helper function — define it in s3scanner.go and export it for use by both.
cd /home/salva/Documents/apikey/.claude/worktrees/agent-a6700ee2 && go build ./pkg/recon/sources/
S3Scanner and GCSScanner compile and implement recon.ReconSource
Task 2: Implement AzureBlobScanner, DOSpacesScanner, and all cloud scanner tests
pkg/recon/sources/azureblob.go, pkg/recon/sources/dospaces.go, pkg/recon/sources/s3scanner_test.go, pkg/recon/sources/gcsscanner_test.go, pkg/recon/sources/azureblob_test.go, pkg/recon/sources/dospaces_test.go
**AzureBlobScanner** (azureblob.go) — RECON-CLOUD-02:
- Struct: `AzureBlobScanner` with fields `Registry *providers.Registry`, `Limiters *recon.LimiterRegistry`, `BaseURL string`, `client *Client`
- Name(): "azureblob"
- RateLimit(): rate.Every(500 * time.Millisecond)
- Burst(): 3
- RespectsRobots(): false
- Enabled(): always true (credentialless)
- Sweep(): Uses bucket enumeration pattern with Azure Blob URL format `https://{account}.blob.core.windows.net/{container}?restype=container&comp=list`. Generate account names from provider keywords with common suffixes. Parse XML `...`. Emit findings for config-pattern files with Source=`azure://{account}/{container}/{name}`, SourceType="recon:azureblob".
**DOSpacesScanner** (dospaces.go) — RECON-CLOUD-02:
- Struct: `DOSpacesScanner` with fields `Registry *providers.Registry`, `Limiters *recon.LimiterRegistry`, `BaseURL string`, `client *Client`
- Name(): "spaces"
- RateLimit(): rate.Every(500 * time.Millisecond)
- Burst(): 3
- RespectsRobots(): false
- Enabled(): always true (credentialless)
- Sweep(): Uses bucket enumeration with DO Spaces URL format `https://{bucket}.{region}.digitaloceanspaces.com/`. Iterate regions: nyc3, sfo3, ams3, sgp1, fra1. Same XML ListBucket format as S3 (DO Spaces is S3-compatible). Emit findings with Source=`do://{bucket}/{key}`, SourceType="recon:spaces".
**Tests** (all four test files):
Each test file follows the httptest pattern:
- Mock server returns appropriate XML/JSON for bucket listing
- Verify Sweep emits correct number of findings with correct SourceType and Source URL format
- Verify Enabled() returns true (credentialless sources)
- Test with empty registry (no keywords => no bucket names => no findings)
- Test context cancellation
Use a minimal providers.Registry with 1 test provider having keyword "testprov" so bucket names like "testprov-keys" are generated.
cd /home/salva/Documents/apikey/.claude/worktrees/agent-a6700ee2 && go test ./pkg/recon/sources/ -run "TestS3Scanner|TestGCSScanner|TestAzureBlob|TestDOSpaces" -v -count=1
All four cloud scanner sources compile and pass tests; each emits findings with correct source type and URL format
- `go build ./pkg/recon/sources/` compiles without errors
- `go test ./pkg/recon/sources/ -run "TestS3Scanner|TestGCSScanner|TestAzureBlob|TestDOSpaces" -v` all pass
- Each source file has compile-time assertion
Four cloud storage scanners (S3, GCS, Azure Blob, DO Spaces) implement recon.ReconSource with credentialless public bucket enumeration, use shared Client for HTTP, and pass unit tests.