5.3 KiB
name, description
| name | description |
|---|---|
| source-aware-sast | Practical source-aware SAST and AST playbook for semgrep, ast-grep, gitleaks, and trivy fs |
Source-Aware SAST Playbook
Use this skill for source-heavy analysis where static and structural signals should guide dynamic testing.
Fast Start
Run tools from repo root and store outputs in a dedicated artifact directory:
mkdir -p /workspace/.strix-source-aware
Before scanning, check shared wiki memory:
1) list_notes(category="wiki")
2) get_note(note_id=...) for the selected repo wiki before analysis
3) Reuse matching repo wiki note if present
4) create_note(category="wiki") only if missing
After every major source-analysis batch, update the same repo wiki note with update_note so other agents can reuse your latest map.
Baseline Coverage Bundle (Recommended)
Run this baseline once per repository before deep narrowing:
ART=/workspace/.strix-source-aware
mkdir -p "$ART"
semgrep scan --config p/default --config p/golang --config p/secrets \
--metrics=off --json --output "$ART/semgrep.json" .
# Build deterministic AST targets from semgrep scope (no hardcoded path guessing)
python3 - <<'PY'
import json
from pathlib import Path
art = Path("/workspace/.strix-source-aware")
semgrep_json = art / "semgrep.json"
targets_file = art / "sg-targets.txt"
try:
data = json.loads(semgrep_json.read_text(encoding="utf-8"))
except Exception:
targets_file.write_text("", encoding="utf-8")
raise
scanned = data.get("paths", {}).get("scanned") or []
if not scanned:
scanned = sorted(
{
r.get("path")
for r in data.get("results", [])
if isinstance(r, dict) and isinstance(r.get("path"), str) and r.get("path")
}
)
bounded = scanned[:4000]
targets_file.write_text("".join(f"{p}\n" for p in bounded), encoding="utf-8")
print(f"sg-targets: {len(bounded)}")
PY
xargs -r -n 200 sg run --pattern '$F($$$ARGS)' --json=stream < "$ART/sg-targets.txt" \
> "$ART/ast-grep.json" 2> "$ART/ast-grep.log" || true
gitleaks detect --source . --report-format json --report-path "$ART/gitleaks.json" || true
trufflehog filesystem --no-update --json --no-verification . > "$ART/trufflehog.json" || true
# Keep trivy focused on vuln/misconfig (secrets already covered above) and increase timeout for large repos
trivy fs --scanners vuln,misconfig --timeout 30m --offline-scan \
--format json --output "$ART/trivy-fs.json" . || true
If one tool is skipped or fails, record that in the shared wiki note along with the reason.
Semgrep First Pass
Use Semgrep as the default static triage pass:
# Preferred deterministic profile set (works with --metrics=off)
semgrep scan --config p/default --config p/golang --config p/secrets \
--metrics=off --json --output /workspace/.strix-source-aware/semgrep.json .
# If you choose auto config, do not combine it with --metrics=off
semgrep scan --config auto --json --output /workspace/.strix-source-aware/semgrep-auto.json .
If diff scope is active, restrict to changed files first, then expand only when needed.
AST-Grep Structural Mapping
Use sg for structure-aware code hunting:
# Ruleless structural pass over deterministic target list (no sgconfig.yml required)
xargs -r -n 200 sg run --pattern '$F($$$ARGS)' --json=stream \
< /workspace/.strix-source-aware/sg-targets.txt \
> /workspace/.strix-source-aware/ast-grep.json 2> /workspace/.strix-source-aware/ast-grep.log || true
Target high-value patterns such as:
- missing auth checks near route handlers
- dynamic command/query construction
- unsafe deserialization or template execution paths
- file and path operations influenced by user input
Tree-Sitter Assisted Repo Mapping
Use tree-sitter CLI for syntax-aware parsing when grep-level mapping is noisy:
tree-sitter parse -q <file>
Use outputs to improve route/symbol/sink maps for subsequent targeted scans.
Secret and Supply Chain Coverage
Detect hardcoded credentials:
gitleaks detect --source . --report-format json --report-path /workspace/.strix-source-aware/gitleaks.json
trufflehog filesystem --json . > /workspace/.strix-source-aware/trufflehog.json
Run repository-wide dependency and config checks:
trivy fs --scanners vuln,misconfig --timeout 30m --offline-scan \
--format json --output /workspace/.strix-source-aware/trivy-fs.json . || true
Converting Static Signals Into Exploits
- Rank candidates by impact and exploitability.
- Trace source-to-sink flow for top candidates.
- Build dynamic PoCs that reproduce the suspected issue.
- Report only after dynamic validation succeeds.
Wiki Update Template
Keep one wiki note per repository and update these sections:
## Architecture
## Entrypoints
## AuthN/AuthZ
## High-Risk Sinks
## Static Findings Summary
## Dynamic Validation Follow-Ups
Before agent_finish, make one final update_note call to capture:
- scanner artifacts and paths
- top validated/invalidated hypotheses
- concrete dynamic follow-up tasks
Anti-Patterns
- Do not treat scanner output as final truth.
- Do not spend full cycles on low-signal pattern matches.
- Do not report source-only findings without validation evidence.
- Do not create multiple wiki notes for the same repository when one already exists.