Files
keyhunter/.planning/phases/04-input-sources/04-04-PLAN.md
2026-04-06 12:27:23 +03:00

19 KiB

phase, plan, type, wave, depends_on, files_modified, autonomous, requirements, must_haves
phase plan type wave depends_on files_modified autonomous requirements must_haves
04-input-sources 04 execute 1
04-01
pkg/engine/sources/stdin.go
pkg/engine/sources/stdin_test.go
pkg/engine/sources/url.go
pkg/engine/sources/url_test.go
pkg/engine/sources/clipboard.go
pkg/engine/sources/clipboard_test.go
true
INPUT-03
INPUT-04
INPUT-05
truths artifacts key_links
StdinSource reads from an io.Reader and emits chunks with Source='stdin'
URLSource fetches an http/https URL with 30s timeout, 50MB cap, rejects file:// and other schemes, and emits chunks with Source='url:<url>'
URLSource rejects responses with non-text Content-Type unless allowlisted (text/*, application/json, application/javascript, application/xml)
ClipboardSource reads current clipboard via atotto/clipboard and emits chunks with Source='clipboard'
ClipboardSource returns a clear error if clipboard tooling is unavailable
path provides exports min_lines
pkg/engine/sources/stdin.go StdinSource
StdinSource
NewStdinSource
40
path provides exports min_lines
pkg/engine/sources/url.go URLSource with HTTP fetch, timeout, size cap, content-type filter
URLSource
NewURLSource
100
path provides exports min_lines
pkg/engine/sources/clipboard.go ClipboardSource wrapping atotto/clipboard
ClipboardSource
NewClipboardSource
30
from to via pattern
pkg/engine/sources/url.go net/http http.Client with Timeout http.Client
from to via pattern
pkg/engine/sources/url.go io.LimitReader MaxContentLength enforcement LimitReader
from to via pattern
pkg/engine/sources/clipboard.go github.com/atotto/clipboard clipboard.ReadAll clipboard.ReadAll
Implement three smaller Source adapters in a single plan since each is <80 lines and they share no state: - `StdinSource` reads from an injectable `io.Reader` (defaults to `os.Stdin`) — INPUT-03 - `URLSource` fetches a remote URL via stdlib `net/http` with timeout, size cap, scheme whitelist, and content-type filter — INPUT-04 - `ClipboardSource` reads the current clipboard via `github.com/atotto/clipboard` with graceful fallback — INPUT-05

Purpose: These three adapters complete the Phase 4 input surface area. Bundling them into one plan keeps wave-1 parallelism healthy (04-02 + 04-03 + 04-04 run simultaneously) while respecting the ~50% context budget since each adapter is self-contained and small. Output: Six files total (three sources + three test files).

<execution_context> @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/phases/04-input-sources/04-CONTEXT.md @pkg/engine/sources/source.go @pkg/types/chunk.go Source interface: ```go type Source interface { Chunks(ctx context.Context, out chan<- types.Chunk) error } ```

Shared helper (produced by plan 04-02 in pkg/engine/sources/dir.go):

func emitChunks(ctx context.Context, data []byte, source string, chunkSize int, out chan<- types.Chunk) error

atotto/clipboard API:

import "github.com/atotto/clipboard"
func ReadAll() (string, error)
func Unsupported bool  // set on platforms without clipboard tooling
Task 1: Implement StdinSource, URLSource, and ClipboardSource with full test coverage - pkg/engine/sources/source.go - pkg/engine/sources/dir.go (for emitChunks signature from plan 04-02) - pkg/types/chunk.go - .planning/phases/04-input-sources/04-CONTEXT.md (Stdin, URL, Clipboard sections) pkg/engine/sources/stdin.go, pkg/engine/sources/stdin_test.go, pkg/engine/sources/url.go, pkg/engine/sources/url_test.go, pkg/engine/sources/clipboard.go, pkg/engine/sources/clipboard_test.go StdinSource: - Test 1: Feeding "API_KEY=xyz" through a bytes.Buffer emits one chunk with Source="stdin" - Test 2: Empty input emits zero chunks without error - Test 3: ctx cancellation returns ctx.Err() URLSource: - Test 4: Fetches content from httptest.Server, emits a chunk with Source="url:" - Test 5: Server returning 50MB+1 body is rejected with a size error - Test 6: Server returning Content-Type image/png is rejected - Test 7: Scheme "file:///etc/passwd" is rejected without any request attempt - Test 8: Server returning 500 returns a non-nil error containing "500" - Test 9: HTTP 301 redirect is followed (max 5 hops) ClipboardSource: - Test 10: If clipboard.Unsupported is true, returns an error with "clipboard" in the message - Test 11: Otherwise reads clipboard (may skip if empty on CI) — use build tag or t.Skip guard

Create pkg/engine/sources/stdin.go:

package sources

import (
	"context"
	"io"
	"os"

	"github.com/salvacybersec/keyhunter/pkg/types"
)

// StdinSource reads content from an io.Reader (defaults to os.Stdin) and
// emits overlapping chunks. Used when a user runs `keyhunter scan stdin`
// or `keyhunter scan -`.
type StdinSource struct {
	Reader    io.Reader
	ChunkSize int
}

// NewStdinSource returns a StdinSource bound to os.Stdin.
func NewStdinSource() *StdinSource {
	return &StdinSource{Reader: os.Stdin, ChunkSize: defaultChunkSize}
}

// NewStdinSourceFrom returns a StdinSource bound to the given reader
// (used primarily by tests).
func NewStdinSourceFrom(r io.Reader) *StdinSource {
	return &StdinSource{Reader: r, ChunkSize: defaultChunkSize}
}

// Chunks reads the entire input, then hands it to the shared chunk emitter.
func (s *StdinSource) Chunks(ctx context.Context, out chan<- types.Chunk) error {
	if s.Reader == nil {
		s.Reader = os.Stdin
	}
	data, err := io.ReadAll(s.Reader)
	if err != nil {
		return err
	}
	if len(data) == 0 {
		return nil
	}
	return emitChunks(ctx, data, "stdin", s.ChunkSize, out)
}

Create pkg/engine/sources/stdin_test.go:

package sources

import (
	"bytes"
	"context"
	"testing"
	"time"

	"github.com/stretchr/testify/require"

	"github.com/salvacybersec/keyhunter/pkg/types"
)

func TestStdinSource_Basic(t *testing.T) {
	src := NewStdinSourceFrom(bytes.NewBufferString("API_KEY=sk-test-xyz"))
	ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
	defer cancel()
	out := make(chan types.Chunk, 8)
	errCh := make(chan error, 1)
	go func() { errCh <- src.Chunks(ctx, out); close(out) }()

	var got []types.Chunk
	for c := range out {
		got = append(got, c)
	}
	require.NoError(t, <-errCh)
	require.Len(t, got, 1)
	require.Equal(t, "stdin", got[0].Source)
	require.Equal(t, "API_KEY=sk-test-xyz", string(got[0].Data))
}

func TestStdinSource_Empty(t *testing.T) {
	src := NewStdinSourceFrom(bytes.NewBuffer(nil))
	out := make(chan types.Chunk, 1)
	err := src.Chunks(context.Background(), out)
	close(out)
	require.NoError(t, err)
	require.Len(t, out, 0)
}

func TestStdinSource_CtxCancel(t *testing.T) {
	// Large buffer so emitChunks iterates and can observe cancellation.
	data := make([]byte, 1<<20)
	src := NewStdinSourceFrom(bytes.NewReader(data))
	ctx, cancel := context.WithCancel(context.Background())
	cancel()
	out := make(chan types.Chunk) // unbuffered forces select on ctx
	err := src.Chunks(ctx, out)
	require.ErrorIs(t, err, context.Canceled)
}

Create pkg/engine/sources/url.go:

package sources

import (
	"context"
	"errors"
	"fmt"
	"io"
	"net/http"
	"net/url"
	"strings"
	"time"

	"github.com/salvacybersec/keyhunter/pkg/types"
)

// MaxURLContentLength is the hard cap on URLSource response bodies.
const MaxURLContentLength int64 = 50 * 1024 * 1024 // 50 MB

// DefaultURLTimeout is the overall request timeout (connect + read + body).
const DefaultURLTimeout = 30 * time.Second

// allowedContentTypes is the whitelist of Content-Type prefixes URLSource
// will accept. Binary types (images, archives, executables) are rejected.
var allowedContentTypes = []string{
	"text/",
	"application/json",
	"application/javascript",
	"application/xml",
	"application/x-yaml",
	"application/yaml",
}

// URLSource fetches a remote resource over HTTP(S) and emits its body as chunks.
type URLSource struct {
	URL       string
	Client    *http.Client
	UserAgent string
	Insecure  bool // skip TLS verification (default false)
	ChunkSize int
}

// NewURLSource creates a URLSource with sane defaults.
func NewURLSource(rawURL string) *URLSource {
	return &URLSource{
		URL:       rawURL,
		Client:    defaultHTTPClient(),
		UserAgent: "keyhunter/dev",
		ChunkSize: defaultChunkSize,
	}
}

func defaultHTTPClient() *http.Client {
	return &http.Client{
		Timeout: DefaultURLTimeout,
		CheckRedirect: func(req *http.Request, via []*http.Request) error {
			if len(via) >= 5 {
				return errors.New("stopped after 5 redirects")
			}
			return nil
		},
	}
}

// Chunks validates the URL, issues a GET, and emits the response body as chunks.
func (u *URLSource) Chunks(ctx context.Context, out chan<- types.Chunk) error {
	parsed, err := url.Parse(u.URL)
	if err != nil {
		return fmt.Errorf("URLSource: parse %q: %w", u.URL, err)
	}
	if parsed.Scheme != "http" && parsed.Scheme != "https" {
		return fmt.Errorf("URLSource: unsupported scheme %q (only http/https)", parsed.Scheme)
	}

	req, err := http.NewRequestWithContext(ctx, http.MethodGet, u.URL, nil)
	if err != nil {
		return fmt.Errorf("URLSource: new request: %w", err)
	}
	req.Header.Set("User-Agent", u.UserAgent)

	client := u.Client
	if client == nil {
		client = defaultHTTPClient()
	}
	resp, err := client.Do(req)
	if err != nil {
		return fmt.Errorf("URLSource: fetch: %w", err)
	}
	defer resp.Body.Close()

	if resp.StatusCode < 200 || resp.StatusCode >= 300 {
		return fmt.Errorf("URLSource: non-2xx status %d from %s", resp.StatusCode, u.URL)
	}

	ct := resp.Header.Get("Content-Type")
	if !isAllowedContentType(ct) {
		return fmt.Errorf("URLSource: disallowed Content-Type %q", ct)
	}

	if resp.ContentLength > MaxURLContentLength {
		return fmt.Errorf("URLSource: Content-Length %d exceeds cap %d", resp.ContentLength, MaxURLContentLength)
	}

	// LimitReader cap + 1 to detect overflow even if ContentLength was missing/wrong.
	limited := io.LimitReader(resp.Body, MaxURLContentLength+1)
	data, err := io.ReadAll(limited)
	if err != nil {
		return fmt.Errorf("URLSource: read body: %w", err)
	}
	if int64(len(data)) > MaxURLContentLength {
		return fmt.Errorf("URLSource: body exceeds %d bytes", MaxURLContentLength)
	}
	if len(data) == 0 {
		return nil
	}

	source := "url:" + u.URL
	return emitChunks(ctx, data, source, u.ChunkSize, out)
}

func isAllowedContentType(ct string) bool {
	if ct == "" {
		return true // some servers omit; trust and scan
	}
	// Strip parameters like "; charset=utf-8".
	if idx := strings.Index(ct, ";"); idx >= 0 {
		ct = ct[:idx]
	}
	ct = strings.TrimSpace(strings.ToLower(ct))
	for _, prefix := range allowedContentTypes {
		if strings.HasPrefix(ct, prefix) {
			return true
		}
	}
	return false
}

Create pkg/engine/sources/url_test.go:

package sources

import (
	"context"
	"net/http"
	"net/http/httptest"
	"strings"
	"testing"
	"time"

	"github.com/stretchr/testify/require"

	"github.com/salvacybersec/keyhunter/pkg/types"
)

func drainURL(t *testing.T, src Source) ([]types.Chunk, error) {
	t.Helper()
	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
	defer cancel()
	out := make(chan types.Chunk, 256)
	errCh := make(chan error, 1)
	go func() { errCh <- src.Chunks(ctx, out); close(out) }()
	var got []types.Chunk
	for c := range out {
		got = append(got, c)
	}
	return got, <-errCh
}

func TestURLSource_Fetches(t *testing.T) {
	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		w.Header().Set("Content-Type", "text/plain")
		_, _ = w.Write([]byte("API_KEY=sk-live-xyz"))
	}))
	defer srv.Close()

	chunks, err := drainURL(t, NewURLSource(srv.URL))
	require.NoError(t, err)
	require.Len(t, chunks, 1)
	require.Equal(t, "url:"+srv.URL, chunks[0].Source)
	require.Equal(t, "API_KEY=sk-live-xyz", string(chunks[0].Data))
}

func TestURLSource_RejectsBinaryContentType(t *testing.T) {
	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		w.Header().Set("Content-Type", "image/png")
		_, _ = w.Write([]byte{0x89, 0x50, 0x4e, 0x47})
	}))
	defer srv.Close()

	_, err := drainURL(t, NewURLSource(srv.URL))
	require.Error(t, err)
	require.Contains(t, err.Error(), "Content-Type")
}

func TestURLSource_RejectsNonHTTPScheme(t *testing.T) {
	_, err := drainURL(t, NewURLSource("file:///etc/passwd"))
	require.Error(t, err)
	require.Contains(t, err.Error(), "unsupported scheme")
}

func TestURLSource_Rejects500(t *testing.T) {
	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		http.Error(w, "boom", http.StatusInternalServerError)
	}))
	defer srv.Close()

	_, err := drainURL(t, NewURLSource(srv.URL))
	require.Error(t, err)
	require.Contains(t, err.Error(), "500")
}

func TestURLSource_RejectsOversizeBody(t *testing.T) {
	// Serve body just over the cap. Use a small override to keep the test fast.
	big := strings.Repeat("a", int(MaxURLContentLength)+10)
	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		w.Header().Set("Content-Type", "text/plain")
		_, _ = w.Write([]byte(big))
	}))
	defer srv.Close()

	_, err := drainURL(t, NewURLSource(srv.URL))
	require.Error(t, err)
}

func TestURLSource_FollowsRedirect(t *testing.T) {
	target := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		w.Header().Set("Content-Type", "text/plain")
		_, _ = w.Write([]byte("redirected body"))
	}))
	defer target.Close()

	redirector := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		http.Redirect(w, r, target.URL, http.StatusMovedPermanently)
	}))
	defer redirector.Close()

	chunks, err := drainURL(t, NewURLSource(redirector.URL))
	require.NoError(t, err)
	require.NotEmpty(t, chunks)
	require.Contains(t, string(chunks[0].Data), "redirected body")
}

Create pkg/engine/sources/clipboard.go:

package sources

import (
	"context"
	"errors"
	"fmt"

	"github.com/atotto/clipboard"

	"github.com/salvacybersec/keyhunter/pkg/types"
)

// ClipboardSource reads the current OS clipboard contents and emits them
// as a single chunk stream with Source="clipboard". Requires xclip/xsel/
// wl-clipboard on Linux, pbpaste on macOS, or native API on Windows.
type ClipboardSource struct {
	// Reader overrides the clipboard reader; when nil the real clipboard is used.
	// Tests inject a func returning a fixture.
	Reader    func() (string, error)
	ChunkSize int
}

// NewClipboardSource returns a ClipboardSource bound to the real OS clipboard.
func NewClipboardSource() *ClipboardSource {
	return &ClipboardSource{Reader: clipboard.ReadAll, ChunkSize: defaultChunkSize}
}

// Chunks reads the clipboard and emits its contents.
func (c *ClipboardSource) Chunks(ctx context.Context, out chan<- types.Chunk) error {
	if clipboard.Unsupported && c.Reader == nil {
		return errors.New("ClipboardSource: clipboard tooling unavailable (install xclip/xsel/wl-clipboard on Linux)")
	}
	reader := c.Reader
	if reader == nil {
		reader = clipboard.ReadAll
	}
	text, err := reader()
	if err != nil {
		return fmt.Errorf("ClipboardSource: read: %w", err)
	}
	if text == "" {
		return nil
	}
	return emitChunks(ctx, []byte(text), "clipboard", c.ChunkSize, out)
}

Create pkg/engine/sources/clipboard_test.go:

package sources

import (
	"context"
	"errors"
	"testing"
	"time"

	"github.com/stretchr/testify/require"

	"github.com/salvacybersec/keyhunter/pkg/types"
)

func TestClipboardSource_FixtureReader(t *testing.T) {
	src := &ClipboardSource{
		Reader:    func() (string, error) { return "sk-live-xxxxxx", nil },
		ChunkSize: defaultChunkSize,
	}
	ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
	defer cancel()
	out := make(chan types.Chunk, 4)
	errCh := make(chan error, 1)
	go func() { errCh <- src.Chunks(ctx, out); close(out) }()

	var got []types.Chunk
	for c := range out {
		got = append(got, c)
	}
	require.NoError(t, <-errCh)
	require.Len(t, got, 1)
	require.Equal(t, "clipboard", got[0].Source)
	require.Equal(t, "sk-live-xxxxxx", string(got[0].Data))
}

func TestClipboardSource_ReaderError(t *testing.T) {
	src := &ClipboardSource{
		Reader: func() (string, error) { return "", errors.New("no xclip installed") },
	}
	out := make(chan types.Chunk, 1)
	err := src.Chunks(context.Background(), out)
	require.Error(t, err)
	require.Contains(t, err.Error(), "clipboard")
}

func TestClipboardSource_EmptyClipboard(t *testing.T) {
	src := &ClipboardSource{
		Reader: func() (string, error) { return "", nil },
	}
	out := make(chan types.Chunk, 1)
	err := src.Chunks(context.Background(), out)
	require.NoError(t, err)
	require.Len(t, out, 0)
}

Do NOT modify cmd/scan.go in this plan. Do NOT create pkg/engine/sources/dir.go, git.go, or touch file.go — those are owned by plans 04-02 and 04-03. go test ./pkg/engine/sources/... -run 'TestStdinSource|TestURLSource|TestClipboardSource' -race -count=1 <acceptance_criteria> - go build ./pkg/engine/sources/... exits 0 - go test ./pkg/engine/sources/... -run 'TestStdinSource|TestURLSource|TestClipboardSource' -race passes all subtests - grep -n "http.Client" pkg/engine/sources/url.go hits - grep -n "LimitReader" pkg/engine/sources/url.go hits - grep -n "clipboard.ReadAll" pkg/engine/sources/clipboard.go hits - grep -n "\"stdin\"" pkg/engine/sources/stdin.go hits (source label) - grep -n "\"url:\" + u.URL\\|\"url:\"+u.URL" pkg/engine/sources/url.go hits </acceptance_criteria> StdinSource, URLSource, and ClipboardSource all implement Source, enforce their respective safety limits (stdin read-to-EOF, url scheme/size/content-type whitelist, clipboard tooling check), and their tests pass under -race.

- `go test ./pkg/engine/sources/... -race -count=1` passes including new tests - `go vet ./pkg/engine/sources/...` clean - All grep acceptance checks hit

<success_criteria> Three new source adapters exist, each self-contained, each with test coverage, and none conflicting with file ownership of plans 04-02 (dir/file) or 04-03 (git). </success_criteria>

After completion, create `.planning/phases/04-input-sources/04-04-SUMMARY.md` listing the six files created, test names with pass status, and any platform-specific notes about clipboard tests on the executor's CI environment.