Commit Graph

5 Commits

Author SHA1 Message Date
salvacybersec
850c3ff8e9 feat(04-04): add StdinSource, URLSource, and ClipboardSource
- StdinSource reads from an injectable io.Reader (INPUT-03)
- URLSource fetches http/https with 30s timeout, 50MB cap, scheme whitelist, and Content-Type filter (INPUT-04)
- ClipboardSource wraps atotto/clipboard with graceful fallback for missing tooling (INPUT-05)
- emitByteChunks local helper mirrors file.go windowing to stay independent of sibling wave-1 plans
- Tests cover happy path, cancellation, redirects, oversize bodies, binary content types, scheme rejection, and clipboard error paths
2026-04-05 15:18:23 +03:00
salvacybersec
6f834c9c06 feat(04-02): implement DirSource with recursive walk, glob exclusion, and mmap
- Add DirSource with filepath.WalkDir recursive traversal
- Default exclusions for .git, node_modules, vendor, *.min.js, *.map
- Binary file detection via NUL byte sniff (first 512 bytes)
- mmap reads for files >= 10MB via golang.org/x/exp/mmap
- Deterministic sorted emission order for reproducible tests
- Refactor FileSource to share emitChunks/isBinary helpers and mmap large files
2026-04-05 15:18:10 +03:00
salvacybersec
e48a7a489e feat(04-03): implement GitSource with full-history traversal
- Walks every commit across branches, tags, remote-tracking refs, and stash
- Deduplicates blob scans by OID (seenBlobs map) so identical content
  across commits/files is scanned exactly once
- Emits chunks with source format git:<short-sha>:<path>
- Honors --since filter via GitSource.Since (commit author date)
- Resolves annotated tag objects down to their commit hash
- Skips binary blobs via go-git IsBinary plus null-byte sniff
- 8 subtests cover history walk, dedup, modified-file, multi-branch,
  tag reachability, since filter, source format, missing repo
2026-04-05 15:18:05 +03:00
salvacybersec
ce6298f304 test(04-02): add failing tests for DirSource recursive walk and mmap 2026-04-05 15:16:48 +03:00
salvacybersec
cea2e371cc feat(01-04): implement three-stage scanning pipeline with ants worker pool
- pkg/engine/sources/source.go: Source interface using pkg/types.Chunk
- pkg/engine/sources/file.go: FileSource with overlapping chunk reads
- pkg/engine/filter.go: KeywordFilter using Aho-Corasick pre-filter
- pkg/engine/detector.go: Detect with regex matching + Shannon entropy check
- pkg/engine/engine.go: Engine.Scan orchestrating 3-stage pipeline with ants pool
- pkg/engine/scanner_test.go: filled test stubs with pipeline integration tests
- testdata/samples: fixed anthropic key lengths to match {93,} regex pattern
2026-04-05 12:21:17 +03:00