docs(07-02): complete Gitleaks importer plan

This commit is contained in:
salvacybersec
2026-04-05 23:56:12 +03:00
parent 75becce3dd
commit 5ce2d4945e

View File

@@ -0,0 +1,86 @@
---
phase: 07-import-cicd
plan: 02
subsystem: importer
tags: [importer, gitleaks, json, csv, normalization]
requires:
- pkg/engine/finding.go
- pkg/importer/importer.go
provides:
- GitleaksImporter (gitleaks JSON)
- GitleaksCSVImporter (gitleaks CSV)
- normalizeGitleaksRuleID
- buildGitleaksFinding
affects:
- pkg/importer/
tech-stack:
added: []
patterns:
- "Header-indexed CSV parsing for column-order resilience"
- "Shared finding-builder helper across JSON/CSV paths"
- "Suffix-stripping rule-ID normalization"
key-files:
created:
- pkg/importer/gitleaks.go
- pkg/importer/gitleaks_test.go
- pkg/importer/testdata/gitleaks-sample.json
- pkg/importer/testdata/gitleaks-sample.csv
modified: []
decisions:
- "RuleID normalization strips suffixes (-api-key, -access-token, -secret-key, -secret, -token, -key) in that order; unknown patterns are lowercased only"
- "CSV reader resolves columns by header name (not position) for forward-compat with Gitleaks column-order drift"
- "StartLine parse errors in CSV are swallowed (LineNumber=0) — findings remain ingestible"
- "Confidence fixed to 'medium' because Gitleaks does not verify keys"
metrics:
duration: "~4m"
completed: 2026-04-05
tasks: 1
files: 4
tests: 8
requirements: [IMP-02]
---
# Phase 7 Plan 02: Gitleaks Importer Summary
One-liner: Gitleaks JSON and CSV output ingests into normalized engine.Finding records with provider-name normalization (e.g., `openai-api-key` -> `openai`) and header-indexed CSV parsing for column-order resilience.
## What was built
- `GitleaksImporter` (`Name() = "gitleaks"`) decodes a JSON array of Gitleaks finding records into `[]engine.Finding`.
- `GitleaksCSVImporter` (`Name() = "gitleaks-csv"`) reads CSV with a mandatory header row, resolves columns by name so column order drift does not break ingestion.
- `normalizeGitleaksRuleID` trims common Gitleaks suffixes (`-api-key`, `-access-token`, `-secret-key`, `-secret`, `-token`, `-key`) after lowercasing; unknown patterns pass through lowercased (e.g., `github-pat` -> `github-pat`).
- `buildGitleaksFinding` is a private helper used by both JSON and CSV paths so the two stay in lockstep: sets `SourceType="import:gitleaks"`, `Confidence="medium"`, `VerifyStatus="unverified"`, `Verified=false`, and falls back from `File` to `SymlinkFile` when blank.
- Fixtures with 3 records each (OpenAI / AWS / generic) in matching JSON and CSV shapes.
## Tests
All 8 tests under `go test ./pkg/importer/... -run Gitleaks -v` pass:
- `TestGitleaksImporter_Name` — Name() assertions for both importers
- `TestGitleaksImporter_JSON` — 3-record fixture, provider names, line numbers, SourceType, Confidence, masked key
- `TestGitleaksImporter_CSV` — same assertions against CSV fixture
- `TestGitleaksImporter_NormalizeRuleID` — table: openai/aws/anthropic/generic/github-pat + case variants
- `TestGitleaksImporter_EmptyArray``[]` returns empty slice, nil error
- `TestGitleaksImporter_EmptyCSV` — header-only returns empty slice, nil error
- `TestGitleaksImporter_InvalidJSON` — returns wrapped error
- `TestGitleaksImporter_SymlinkFallback` — uses SymlinkFile when File is blank
Full package `go test ./pkg/importer/...` also passes (trufflehog/dedup tests from the parallel 07-01 plan continue to pass alongside).
## Commits
- `bd8eb9b` feat(07-02): add Gitleaks JSON + CSV importers
## Deviations from Plan
None - plan executed exactly as written. The Importer interface file (`pkg/importer/importer.go`) was already present from the parallel wave-1 Plan 07-01, so this executor did not need to create it (plan explicitly allowed for either case).
## Self-Check: PASSED
- FOUND: pkg/importer/gitleaks.go
- FOUND: pkg/importer/gitleaks_test.go
- FOUND: pkg/importer/testdata/gitleaks-sample.json
- FOUND: pkg/importer/testdata/gitleaks-sample.csv
- FOUND commit: bd8eb9b
- Tests: 8/8 passing
- Build: `go build ./pkg/importer/...` clean