Files
keyhunter/.planning/phases/10-osint-code-hosting/10-06-PLAN.md

109 lines
3.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
phase: 10-osint-code-hosting
plan: 06
type: execute
wave: 2
depends_on: [10-01]
files_modified:
- pkg/recon/sources/huggingface.go
- pkg/recon/sources/huggingface_test.go
autonomous: true
requirements: [RECON-CODE-08]
must_haves:
truths:
- "HuggingFaceSource queries /api/spaces and /api/models search endpoints"
- "Token is optional — anonymous requests allowed at lower rate limit"
- "Findings have SourceType=\"recon:huggingface\" and Source = full HF URL"
artifacts:
- path: "pkg/recon/sources/huggingface.go"
provides: "HuggingFaceSource implementing recon.ReconSource"
key_links:
- from: "pkg/recon/sources/huggingface.go"
to: "pkg/recon/sources/httpclient.go"
via: "Client.Do"
pattern: "client\\.Do"
---
<objective>
Implement HuggingFaceSource scanning both Spaces and model repos via the HF Hub API.
Token optional; unauthenticated requests work but are rate-limited harder.
Purpose: RECON-CODE-08.
Output: pkg/recon/sources/huggingface.go + tests.
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/phases/10-osint-code-hosting/10-CONTEXT.md
@.planning/phases/10-osint-code-hosting/10-01-SUMMARY.md
@pkg/recon/source.go
@pkg/recon/sources/httpclient.go
<interfaces>
HuggingFace Hub API:
GET https://huggingface.co/api/spaces?search=<q>&limit=50
GET https://huggingface.co/api/models?search=<q>&limit=50
Response (either): array of { "id": "owner/name", "modelId"|"spaceId": "owner/name" }
Optional auth: Authorization: Bearer <hf-token>
URL derivation: Source = "https://huggingface.co/spaces/<id>" or ".../<id>" for models.
Rate: 1000/hour authenticated → rate.Every(3600*time.Millisecond); unauth: rate.Every(10*time.Second), burst 1.
</interfaces>
</context>
<tasks>
<task type="auto" tdd="true">
<name>Task 1: HuggingFaceSource + tests</name>
<files>pkg/recon/sources/huggingface.go, pkg/recon/sources/huggingface_test.go</files>
<behavior>
- Test A: Enabled always true (token optional)
- Test B: Sweep hits both /api/spaces and /api/models endpoints for each query
- Test C: Decodes array of {id} and emits Findings with Source prefixed by "https://huggingface.co/spaces/" or "https://huggingface.co/" for models, SourceType="recon:huggingface"
- Test D: Authorization header present when token set, absent when empty
- Test E: Ctx cancellation respected
- Test F: RateLimit returns slower rate when token empty
</behavior>
<action>
Create `pkg/recon/sources/huggingface.go`:
- Struct `HuggingFaceSource { Token, BaseURL string; Registry *providers.Registry; Limiters *recon.LimiterRegistry; client *Client }`
- Default BaseURL: `https://huggingface.co`
- Name "huggingface", RespectsRobots false, Burst 1
- RateLimit: token-dependent (see interfaces)
- Enabled always true
- Sweep: build keyword list, for each keyword iterate two endpoints
(`/api/spaces?search=<q>&limit=50`, `/api/models?search=<q>&limit=50`), emit
Findings. URL prefix differs per endpoint.
- Compile-time assert
Create `pkg/recon/sources/huggingface_test.go` with httptest server that routes
both paths. Assert exact number of Findings (2 per keyword × number of keywords)
and URL prefixes.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run TestHuggingFace -v -timeout 30s</automated>
</verify>
<done>
HuggingFaceSource passes tests covering both endpoints, token modes, cancellation.
</done>
</task>
</tasks>
<verification>
- `go test ./pkg/recon/sources/ -run TestHuggingFace -v`
</verification>
<success_criteria>
RECON-CODE-08 satisfied.
</success_criteria>
<output>
After completion, create `.planning/phases/10-osint-code-hosting/10-06-SUMMARY.md`.
</output>