Initial Feynman research agent scaffold

2026-03-20 11:05:58 -07:00
commit 1fe1ce04a5
25 changed files with 5079 additions and 0 deletions
--- a/skills/research/experiment-design/SKILL.md
+++ b/skills/research/experiment-design/SKILL.md
@@ -0,0 +1,48 @@
+---
+name: experiment-design
+description: Use this when the task is to turn a vague research idea into a testable experiment, define metrics, choose baselines, or plan ablations.
+---
+
+# Experiment Design
+
+## When To Use
+
+Use this skill when the user has:
+- a hypothesis to test
+- a method to evaluate
+- an unclear benchmark plan
+- a need for baselines, ablations, or metrics
+
+## Procedure
+
+1. Restate the research question as a falsifiable claim.
+2. Define:
+   - independent variables
+   - dependent variables
+   - success metrics
+   - baselines
+   - constraints
+3. Search for prior work first with `alpha_search` so you do not reinvent an obviously flawed setup.
+4. Use `alpha_get_paper` and `alpha_ask_paper` on the strongest references.
+5. Prefer the smallest experiment that can meaningfully reduce uncertainty.
+6. List confounders and failure modes up front.
+7. If implementation is requested, create the scripts, configs, and logging plan.
+8. Write the plan to disk before running expensive work.
+
+## Pitfalls
+
+- Avoid experiments with no baseline.
+- Avoid metrics that do not connect to the claim.
+- Avoid ablations that change multiple variables at once.
+- Avoid broad plans that cannot be executed with the current environment.
+
+## Deliverable
+
+Produce:
+- hypothesis
+- setup
+- baselines
+- metrics
+- ablations
+- risks
+- next action
--- a/skills/research/literature-review/SKILL.md
+++ b/skills/research/literature-review/SKILL.md
@@ -0,0 +1,52 @@
+---
+name: literature-review
+description: Use this when the task is to survey prior work, compare papers, synthesize a field, or build a reading list grounded in primary sources.
+---
+
+# Literature Review
+
+## When To Use
+
+Use this skill when the user wants:
+- a research overview
+- a paper shortlist
+- a comparison of methods
+- a synthesis of consensus and disagreement
+- a source-backed brief on a topic
+
+## Procedure
+
+1. Search broadly first with `alpha_search`.
+2. Pick the strongest candidates by direct relevance, recency, citations, and venue quality.
+3. Inspect the top papers with `alpha_get_paper` before making concrete claims.
+4. Use `alpha_ask_paper` for missing methodological or experimental details.
+5. Build a compact evidence table:
+   - title
+   - year
+   - authors
+   - venue
+   - claim or contribution
+   - important caveats
+6. Distinguish:
+   - what multiple sources agree on
+   - where methods or findings differ
+   - what remains unresolved
+7. If the user wants a durable artifact, write a markdown brief to disk.
+8. If you discover an important gotcha about a paper, save it with `alpha_annotate_paper`.
+
+## Pitfalls
+
+- Do not summarize a field from titles alone.
+- Do not flatten disagreements into fake consensus.
+- Do not treat recent preprints as established facts without saying so.
+- Do not cite secondary commentary when a primary source is available.
+
+## Output Shape
+
+Prefer this structure:
+- question
+- strongest papers
+- major findings
+- disagreements or caveats
+- open questions
+- recommended next reading or experiments
--- a/skills/research/paper-code-audit/SKILL.md
+++ b/skills/research/paper-code-audit/SKILL.md
@@ -0,0 +1,50 @@
+---
+name: paper-code-audit
+description: Use this when the task is to compare a paper against its repository, verify whether claims are implemented, or assess reproducibility risk.
+---
+
+# Paper Code Audit
+
+## When To Use
+
+Use this skill for:
+- paper-versus-code verification
+- implementation gap analysis
+- reproducibility audits
+- checking whether public code matches reported results
+
+## Procedure
+
+1. Locate the paper with `alpha_search`.
+2. Load the paper with `alpha_get_paper`.
+3. Extract implementation-relevant details using `alpha_ask_paper`:
+   - datasets
+   - preprocessing
+   - model architecture
+   - hyperparameters
+   - evaluation protocol
+4. If the paper links a repository, inspect it using `alpha_read_code`.
+5. Compare paper claims against code realities:
+   - are all components present
+   - do defaults match the paper
+   - are metrics/eval scripts exposed
+   - are hidden assumptions required
+6. Record concrete mismatches, not vibes.
+7. Save the audit in `outputs/`.
+8. If you find a durable gotcha, save it with `alpha_annotate_paper`.
+
+## Pitfalls
+
+- Do not infer repository behavior without opening the relevant files.
+- Do not assume README claims reflect the actual defaults.
+- Do not mark something as missing if it exists under another name without checking.
+
+## Deliverable
+
+Include:
+- paper summary
+- repository coverage
+- confirmed matches
+- mismatches or omissions
+- reproducibility risks
+- recommended next actions
--- a/skills/research/paper-writing/SKILL.md
+++ b/skills/research/paper-writing/SKILL.md
@@ -0,0 +1,45 @@
+---
+name: paper-writing
+description: Use this when the task is to turn research notes, experiments, or a literature review into a polished paper-style writeup with Markdown and LaTeX.
+---
+
+# Paper Writing
+
+## When To Use
+
+Use this skill for:
+- research reports that should read like a paper
+- internal memos with equations or formal structure
+- polished writeups of experiments or literature reviews
+- converting rough notes into a coherent draft
+
+## Procedure
+
+1. Make sure the underlying claims are already grounded in sources, experiments, or explicit caveats.
+2. Build the draft around a proper research structure:
+   - title
+   - abstract
+   - introduction or problem statement
+   - related work
+   - approach, synthesis, or methodology
+   - evidence, experiments, or case studies
+   - limitations
+   - conclusion
+3. Use Markdown by default.
+4. Use LaTeX only where equations or notation genuinely improve clarity.
+5. Keep claims falsifiable and scoped.
+6. Save polished drafts to `papers/`.
+
+## Pitfalls
+
+- Do not use LaTeX for decoration.
+- Do not make a draft look more certain than the evidence supports.
+- Do not hide missing citations or weak evidence; flag them.
+
+## Deliverable
+
+A readable paper-style draft with:
+- explicit structure
+- traceable claims
+- equations only where useful
+- limitations stated plainly
--- a/skills/research/reading-list/SKILL.md
+++ b/skills/research/reading-list/SKILL.md
@@ -0,0 +1,49 @@
+---
+name: reading-list
+description: Use this when the user wants a curated reading sequence, paper shortlist, or tiered set of papers for learning or project onboarding.
+---
+
+# Reading List
+
+## When To Use
+
+Use this skill for:
+- getting up to speed on a topic
+- onboarding into a research area
+- choosing which papers to read first
+- constructing a project-specific reading order
+
+## Procedure
+
+1. Start with `alpha_search` in `all` mode.
+2. Inspect the strongest candidates with `alpha_get_paper`.
+3. Use `alpha_ask_paper` for fit questions like:
+   - what problem does this really solve
+   - what assumptions does it rely on
+   - what prior work does it build on
+4. Classify papers into roles:
+   - foundational
+   - key recent advances
+   - evaluation or benchmark references
+   - critiques or limitations
+   - likely replication targets
+5. Order the list intentionally:
+   - start with orientation
+   - move to strongest methods
+   - finish with edges, critiques, or adjacent work
+6. Write the final list as a durable markdown artifact in `outputs/`.
+
+## Pitfalls
+
+- Do not sort purely by citations.
+- Do not over-index on recency when fundamentals matter.
+- Do not include papers you have not inspected at all.
+
+## Deliverable
+
+For each paper include:
+- title
+- year
+- why it matters
+- when to read it in the sequence
+- one caveat or limitation
--- a/skills/research/replication/SKILL.md
+++ b/skills/research/replication/SKILL.md
@@ -0,0 +1,52 @@
+---
+name: replication
+description: Use this when the task is to reproduce a paper result, benchmark a claim, rebuild an experiment, or evaluate whether a published result holds in practice.
+---
+
+# Replication
+
+## When To Use
+
+Use this skill for:
+- paper reproduction
+- benchmark recreation
+- ablation reruns
+- claim verification through code and experiments
+
+## Procedure
+
+1. Identify the canonical source paper and inspect it with `alpha_get_paper`.
+2. Extract the exact target:
+   - task
+   - dataset
+   - model or method
+   - metrics
+   - hardware or runtime assumptions
+3. Use `alpha_ask_paper` to pull out the exact details missing from the report.
+4. If the paper has a public repository, inspect it with `alpha_read_code`.
+5. Search the local workspace for existing code, notebooks, configs, and datasets.
+6. Write down the missing pieces explicitly before running anything.
+7. If the environment is sufficient, implement the minimal runnable reproduction path.
+8. Run the experiment with built-in file and shell tools.
+9. Save:
+   - commands used
+   - configs
+   - raw outputs
+   - summarized results
+10. Compare observed results with the paper and explain gaps.
+11. If the paper had a practical gotcha, attach it with `alpha_annotate_paper`.
+
+## Pitfalls
+
+- Do not claim replication succeeded if key conditions were missing.
+- Do not compare different metrics as if they were equivalent.
+- Do not ignore dataset or preprocessing mismatch.
+- Do not hide failed runs; record them and explain them.
+
+## Verification
+
+A good replication outcome includes:
+- the exact command path
+- the data or config used
+- the observed metrics
+- a clear statement of match, partial match, or mismatch