Used local helper `emitGitChunks` instead of shared `emitChunks` because plan 04-02 (dir.go) has not landed yet; slated for consolidation once 04-02 ships.
Also walk `refs/remotes/*` in seed collection so cloned repos without local branches still get full coverage.
Skip symbolic references (HEAD) via `ref.Type() == HashReference` filter to avoid double-walking what branch refs already cover.
duration
completed
tasks
tests
~6m
2026-04-05
1
8
INPUT-02
Phase 4 Plan 3: GitSource Summary
GitSource walks every commit on every branch, tag, remote-tracking ref, and the stash using go-git/v5, deduplicates blob scans by OID, and emits chunks tagged git:<short-sha>:<path> — letting KeyHunter surface leaked keys that exist only in history.
What Shipped
pkg/engine/sources/git.go (~195 lines)
GitSource struct with RepoPath, Since time.Time, ChunkSize.
NewGitSource(path) factory that sets the default chunk size from the sources-package constant.
Chunks(ctx, out) implements the Source interface:
git.PlainOpen the repo (empty path / missing dir → error).
collectSeedCommits enumerates all refs/heads, refs/tags, refs/remotes, and refs/stash hash references; annotated tags are resolved to their underlying commit.
For each seed, repo.Log(&git.LogOptions{From: seed}) walks ancestry; a seenCommits map prevents re-walking shared history across refs.
Per commit: Since cutoff short-circuit (via c.Author.When.Before), then emitCommitBlobs.
emitCommitBlobs streams tree.Files(), skipping already-seen OIDs (seenBlobs map), go-git IsBinary hits, and first-512-byte null-byte positives; text blobs are piped through emitGitChunks with the source string git:<short-sha>:<path>.
emitGitChunks mirrors file.go overlap-chunking semantics (default 4096 with 256 overlap) so historic blobs are scanned with the same boundary guarantees as on-disk files.
3-commit repo yields chunks whose sources all match ^git:[0-9a-f]{7}:.+$
TestGitSource_BlobDeduplication
Two files with identical content → one blob scanned, not two
TestGitSource_ModifiedFileKeepsBothVersions
Editing a.txt preserves both old and new blobs in output
TestGitSource_MultiBranch
Checks out a feature branch, adds a file, returns to base, adds another file → both branches' unique blobs appear
TestGitSource_TagReachesOldCommit
Lightweight tag on an early commit keeps that commit reachable after HEAD moves on
TestGitSource_SinceFilterExcludesAll
Since = now + 1h emits zero chunks
TestGitSource_SourceFormat
Nested path path/to/file.txt round-trips in the source field
TestGitSource_MissingRepo
Non-existent path returns an error rather than panicking
Verification
go vet ./pkg/engine/sources/... # clean
go test ./pkg/engine/sources/... -run TestGitSource -race -count=1 -v # 8/8 PASS
go build ./pkg/engine/sources/... # clean
Grep acceptance checks from the plan — all hit:
git.PlainOpen → git.go:47
seenBlobs → git.go:62, 143, 146
fmt.Sprintf("git:%s:%s" → git.go:172
g.Since → git.go:81
Deviations from Plan
Auto-fixed Issues
1. [Rule 3 - Blocking] emitChunks helper not yet available
Found during: Task 1
Issue: Plan referenced emitChunks from pkg/engine/sources/dir.go, which is produced by plan 04-02 (not yet executed in this wave). Compilation would have failed.
Fix: Added a local emitGitChunks mirroring FileSource's overlap-chunk logic, plus a local gitBinarySniffSize constant. Documented as a temporary shim slated for consolidation when 04-02 lands.
2. [Rule 2 - Critical functionality] Walk remote-tracking refs as seeds
Found during: Task 1 (review of collectSeedCommits)
Issue: A freshly cloned repo often has zero refs/heads/* entries locally (only refs/remotes/origin/*). Restricting seeds to branches+tags+stash would produce an empty scan in that common case.
Fix: Also include name.IsRemote() refs in the seed set. Filter out symbolic refs (HEAD) via ref.Type() == plumbing.HashReference to avoid duplicate walks.
Shared emitChunks consolidation: to be handled in 04-02 + follow-up cleanup.
Parallel blob scanning via ants pool (noted in 04-CONTEXT.md as a performance idea) — deferred; current single-goroutine walk is already correct and respects context cancellation.
go-git Version
Resolved by plan 04-01: github.com/go-git/go-git/v5 v5.17.2 (promoted from indirect to direct in go.mod by this plan).
Commits
e48a7a4 — feat(04-03): implement GitSource with full-history traversal