Files
personas/personas/_shared/feynman-skills/paper-code-audit/SKILL.md
salvacybersec 3126dadd19 chore: CLAUDE.md + build.py refresh + feynman-skills import
- CLAUDE.md: updated project guidance
- build.py: install flow tweaks (post install_opencode fix)
- personas/_shared/feynman-skills/: 20 Feynman skills imported from ~/Documents/opencode-skills-parked/, sibling _platform-mapping.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 01:35:13 +03:00

3.5 KiB

name, description
name description
paper-code-audit Compare a paper's claims against its public codebase. Use when the user asks to audit a paper, check code-claim consistency, verify reproducibility of a specific paper, or find mismatches between a paper and its implementation.

Paper-Code Audit

Compare a paper's claimed methods, defaults, metrics, and data handling against the actual code. Surface mismatches, omissions, and reproduction risks.

Derive a slug from the audit target (lowercase, hyphens, ≤5 words).

Subagent mapping

See ../_platform-mapping.md. Dispatch researcher for evidence gathering, verifier for citation/URL verification.

Workflow

1. Plan

Write outputs/.plans/<slug>.md with:

  • Paper — title, arXiv id or DOI, and the specific version being audited
  • Repo — canonical URL (be explicit about fork vs upstream)
  • Claims to check — numbered list of specific claims (e.g. "claim 3: final layer uses GELU activation")
  • Verification approach — per-claim, how will you check it (grep source, run a specific script, diff configs)

Summarize the plan briefly, continue immediately unless the user asked for plan review.

2. Gather evidence

Non-trivial audits: dispatch a researcher subagent to pull implementation details from paper sections and linked code. Use the alpha-research skill's alpha code command (or equivalent repo browsing) to read source files.

Small audits (single claim, ≤3 files): the lead agent gathers directly.

For each claim, record:

  • The paper section or figure where the claim is made
  • The code location (file:line or function name) where it should be implemented
  • What the code actually does

3. Compare

Organize findings under these buckets:

Bucket Meaning
MATCH Code matches the claim faithfully
MISMATCH Code contradicts the paper
OMITTED Claim is in paper but code doesn't implement it
UNDOCUMENTED Code does something material that isn't in the paper
AMBIGUOUS Paper's description is too vague to verify against code
MISSING CODE The referenced module/experiment is not in the public repo

4. Cite

For non-trivial audits, dispatch verifier against the draft to verify every URL (paper links, repo links, commit hashes) and add inline citations where missing.

5. Deliver

Save exactly one audit artifact to outputs/<slug>-audit.md:

# Audit: <paper title>

**Paper:** <link> (<version/date>)
**Repo:** <link> (<commit hash used for audit>)
**Date:** YYYY-MM-DD

## Summary
- Claims checked: <N>
- MATCH: <n> | MISMATCH: <n> | OMITTED: <n> | UNDOCUMENTED: <n> | AMBIGUOUS: <n> | MISSING CODE: <n>

## Findings

### <claim 1>
- **Paper says:** <quote or summary> (<section>)
- **Code does:** <what you found> (<file:line>)
- **Verdict:** MATCH / MISMATCH / OMITTED / …
- **Impact on reproducibility:** <brief>

### <claim 2>
...

## Reproduction risks
- <risks ordered by severity>

## Sources
- <paper URL>
- <repo URL at audit commit>

End with a Sources section containing paper and repository URLs pinned to the version audited (commit hash, not main).

What NOT to do

  • Don't run the code unless the user explicitly asked for an execution audit. Reading is often enough to find the mismatch.
  • Don't generalize from src/models/transformer.py to "the method" without checking the experiment scripts actually call it.
  • Don't grade papers. The audit reports what is and isn't in the code; it doesn't pass judgement on the research.