Files
strix/strix/skills/custom/source_aware_sast.md
2026-03-31 16:44:48 -04:00

5.3 KiB

name, description
name description
source-aware-sast Practical source-aware SAST and AST playbook for semgrep, ast-grep, gitleaks, and trivy fs

Source-Aware SAST Playbook

Use this skill for source-heavy analysis where static and structural signals should guide dynamic testing.

Fast Start

Run tools from repo root and store outputs in a dedicated artifact directory:

mkdir -p /workspace/.strix-source-aware

Before scanning, check shared wiki memory:

1) list_notes(category="wiki")
2) get_note(note_id=...) for `wiki:overview` first, then `wiki:security`
3) Reuse matching repo wiki notes if present
4) create_note(category="wiki") only if missing (with tags `wiki:overview` / `wiki:security`)

After every major source-analysis batch, update wiki:security with update_note so other agents can reuse your latest map.

Run this baseline once per repository before deep narrowing:

ART=/workspace/.strix-source-aware
mkdir -p "$ART"

semgrep scan --config p/default --config p/golang --config p/secrets \
  --metrics=off --json --output "$ART/semgrep.json" .
# Build deterministic AST targets from semgrep scope (no hardcoded path guessing)
python3 - <<'PY'
import json
from pathlib import Path

art = Path("/workspace/.strix-source-aware")
semgrep_json = art / "semgrep.json"
targets_file = art / "sg-targets.txt"

try:
    data = json.loads(semgrep_json.read_text(encoding="utf-8"))
except Exception:
    targets_file.write_text("", encoding="utf-8")
    raise

scanned = data.get("paths", {}).get("scanned") or []
if not scanned:
    scanned = sorted(
        {
            r.get("path")
            for r in data.get("results", [])
            if isinstance(r, dict) and isinstance(r.get("path"), str) and r.get("path")
        }
    )

bounded = scanned[:4000]
targets_file.write_text("".join(f"{p}\n" for p in bounded), encoding="utf-8")
print(f"sg-targets: {len(bounded)}")
PY
xargs -r -n 200 sg run --pattern '$F($$$ARGS)' --json=stream < "$ART/sg-targets.txt" \
  > "$ART/ast-grep.json" 2> "$ART/ast-grep.log" || true
gitleaks detect --source . --report-format json --report-path "$ART/gitleaks.json" || true
trufflehog filesystem --no-update --json --no-verification . > "$ART/trufflehog.json" || true
# Keep trivy focused on vuln/misconfig (secrets already covered above) and increase timeout for large repos
trivy fs --scanners vuln,misconfig --timeout 30m --offline-scan \
  --format json --output "$ART/trivy-fs.json" . || true

If one tool is skipped or fails, record that in wiki:security along with the reason.

Semgrep First Pass

Use Semgrep as the default static triage pass:

# Preferred deterministic profile set (works with --metrics=off)
semgrep scan --config p/default --config p/golang --config p/secrets \
  --metrics=off --json --output /workspace/.strix-source-aware/semgrep.json .

# If you choose auto config, do not combine it with --metrics=off
semgrep scan --config auto --json --output /workspace/.strix-source-aware/semgrep-auto.json .

If diff scope is active, restrict to changed files first, then expand only when needed.

AST-Grep Structural Mapping

Use sg for structure-aware code hunting:

# Ruleless structural pass over deterministic target list (no sgconfig.yml required)
xargs -r -n 200 sg run --pattern '$F($$$ARGS)' --json=stream \
  < /workspace/.strix-source-aware/sg-targets.txt \
  > /workspace/.strix-source-aware/ast-grep.json 2> /workspace/.strix-source-aware/ast-grep.log || true

Target high-value patterns such as:

  • missing auth checks near route handlers
  • dynamic command/query construction
  • unsafe deserialization or template execution paths
  • file and path operations influenced by user input

Tree-Sitter Assisted Repo Mapping

Use tree-sitter CLI for syntax-aware parsing when grep-level mapping is noisy:

tree-sitter parse -q <file>

Use outputs to improve route/symbol/sink maps for subsequent targeted scans.

Secret and Supply Chain Coverage

Detect hardcoded credentials:

gitleaks detect --source . --report-format json --report-path /workspace/.strix-source-aware/gitleaks.json
trufflehog filesystem --json . > /workspace/.strix-source-aware/trufflehog.json

Run repository-wide dependency and config checks:

trivy fs --scanners vuln,misconfig --timeout 30m --offline-scan \
  --format json --output /workspace/.strix-source-aware/trivy-fs.json . || true

Converting Static Signals Into Exploits

  1. Rank candidates by impact and exploitability.
  2. Trace source-to-sink flow for top candidates.
  3. Build dynamic PoCs that reproduce the suspected issue.
  4. Report only after dynamic validation succeeds.

Wiki Update Template

Keep wiki:overview and wiki:security per repository. Update these sections in wiki:security:

## Architecture
## Entrypoints
## AuthN/AuthZ
## High-Risk Sinks
## Static Findings Summary
## Dynamic Validation Follow-Ups

Before agent_finish, make one final update_note call to capture:

  • scanner artifacts and paths
  • top validated/invalidated hypotheses
  • concrete dynamic follow-up tasks

Anti-Patterns

  • Do not treat scanner output as final truth.
  • Do not spend full cycles on low-signal pattern matches.
  • Do not report source-only findings without validation evidence.
  • Do not create duplicate wiki:overview or wiki:security notes for the same repository.