keyhunter/docs/CI-CD.md

# KeyHunter CI/CD Integration

KeyHunter is designed to plug into developer workflows at three points: as a local
**pre-commit hook** that blocks leaks before they land in git history, as a **GitHub
Actions** step that uploads SARIF findings into the repository's Code Scanning tab,
and as an **import sink** that consolidates results from other scanners (TruffleHog,
Gitleaks) into one normalized database. This guide shows how to wire up each one.

## Pre-commit Hook

The pre-commit hook scans only files that are staged for commit
(`git diff --cached --name-only --diff-filter=ACMR`), so it adds minimal latency to
local development while still catching fresh leaks before they leave the workstation.

### Install

```bash
keyhunter hook install
```

This writes an executable shell script to `.git/hooks/pre-commit` in the current
repository. The script shells out to the `keyhunter` binary on your `PATH` and exits
non-zero if any findings are produced — which aborts the commit.

The command refuses to overwrite an existing `pre-commit` hook unless you pass
`--force`:

```bash
keyhunter hook install --force
```

With `--force`, the existing hook is backed up to
`.git/hooks/pre-commit.bak.<unix-timestamp>` before the new one is written. You can
restore it manually at any time.

### Bypass a Single Commit

If you need to commit in an emergency and want to skip the hook for one commit only,
use git's built-in bypass flag:

```bash
git commit --no-verify -m "hotfix: bypass scan"
```

No KeyHunter state is changed — the next commit will be scanned normally.

### Uninstall

```bash
keyhunter hook uninstall
```

This removes `.git/hooks/pre-commit`. Any `.bak.<timestamp>` files created by
`--force` are left in place so you can restore previous hooks manually.

## GitHub Actions (SARIF upload to Code Scanning)

KeyHunter emits SARIF 2.1.0 output (`--output sarif`), which GitHub's Code Scanning
service ingests directly. Once uploaded, findings appear in the repository's
**Security** tab, can be triaged, and can block pull requests via branch protection
rules.

Save the following as `.github/workflows/keyhunter.yml`:

```yaml
name: KeyHunter
on:
  push:
    branches: [main]
  pull_request:
jobs:
  scan:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      security-events: write
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - name: Install KeyHunter
        run: |
          curl -sSL https://github.com/salvacybersec/keyhunter/releases/latest/download/keyhunter_linux_amd64.tar.gz | tar -xz
          sudo mv keyhunter /usr/local/bin/
      - name: Scan repository
        run: keyhunter scan . --output sarif > keyhunter.sarif
        continue-on-error: true
      - name: Upload SARIF to GitHub Code Scanning
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: keyhunter.sarif
          category: keyhunter
```

### Why `continue-on-error: true`?

KeyHunter exits with code `1` when it finds keys (see **Exit Codes** below). Without
`continue-on-error: true`, GitHub Actions would mark the scan step as failed and skip
the upload step — meaning the findings would never reach the Security tab. Letting
the scan "fail" gracefully and then uploading the SARIF file gives you both the
annotated findings in Code Scanning **and** the option to enforce blocking via branch
protection rules on the Code Scanning check itself.

### Why `security-events: write`?

GitHub requires the `security-events: write` permission on the job for
`github/codeql-action/upload-sarif@v3` to post results into Code Scanning. The
default `GITHUB_TOKEN` has this permission available but it must be explicitly
requested at the job level when using fine-grained workflow permissions. `fetch-depth: 0`
ensures the full history is available so findings can be attributed to specific
commits.

## Importing External Scanner Output

KeyHunter's `import` subcommand consolidates findings from other scanners into its
SQLite database. This is useful when you run multiple tools in CI and want a single
place to triage, verify, and track leaks over time.

### TruffleHog (JSON)

```bash
trufflehog filesystem . --json > trufflehog.json
keyhunter import --format=trufflehog trufflehog.json
```

### Gitleaks (JSON)

```bash
gitleaks detect -f json -r gitleaks.json
keyhunter import --format=gitleaks gitleaks.json
```

### Gitleaks (CSV)

```bash
gitleaks detect -f csv -r gitleaks.csv
keyhunter import --format=gitleaks-csv gitleaks.csv
```

### Idempotency

Every imported finding is hashed on `(provider, masked_key, source)` before insert.
Re-running the same import — for example, because a later CI step needs to consume
the results — will not create duplicate rows. The command prints a summary of the
form `Imported N findings (M new, K duplicates)` so you can confirm dedup worked.

## Exit Codes

Both `keyhunter scan` and `keyhunter import` follow the same exit-code convention so
they compose predictably with CI runners and shell scripts:

| Code | Meaning                                    | Typical CI usage                              |
| ---- | ------------------------------------------ | --------------------------------------------- |
| `0`  | Clean run — no findings                    | Step passes, pipeline continues               |
| `1`  | Findings present                           | Mark the job as failed / upload SARIF anyway  |
| `2`  | Runtime error (bad flags, I/O, parse fail) | Pipeline should stop; investigate logs        |

For GitHub Actions SARIF uploads, use `continue-on-error: true` on the scan step so
that exit code `1` still lets the upload step run. For simple gating (e.g. a Makefile
target), `keyhunter scan . && echo clean || echo findings` distinguishes the three
states.