Overhaul Feynman harness: streamline agents, prompts, and extensions

Remove legacy chains, skills, and config modules. Add citation agent,
SYSTEM.md, modular research-tools extension, and web-access layer.
Add ralph-wiggum to Pi package stack for long-running loops.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Advait Paliwal
2026-03-23 14:59:30 -07:00
parent d23e679331
commit 406d50b3ff
60 changed files with 2994 additions and 3191 deletions

View File

@@ -1,17 +0,0 @@
---
description: Design the smallest convincing ablation set for an AI research project.
---
Design an ablation plan for: $@
Requirements:
- Identify the exact claims the paper is making.
- For each claim, determine what ablation or control is necessary to support it.
- Prefer the `verifier` subagent when the claim structure is complicated.
- Distinguish:
- must-have ablations
- nice-to-have ablations
- unnecessary experiments
- Call out where benchmark norms imply mandatory controls.
- Optimize for the minimum convincing set, not experiment sprawl.
- If the user wants a durable artifact, save exactly one plan to `outputs/` as markdown.
- End with a `Sources` section containing direct URLs for any external sources used.

View File

@@ -4,11 +4,8 @@ description: Compare a paper's claims against its public codebase and identify m
Audit the paper and codebase for: $@
Requirements:
- Prefer the `researcher` subagent for evidence gathering and the `verifier` subagent for the mismatch pass when the audit is non-trivial.
- Identify the canonical paper first with `alpha_search` and `alpha_get_paper`.
- Extract implementation-sensitive claims with `alpha_ask_paper`.
- If a public repo exists, inspect it with `alpha_read_code`.
- Compare claimed methods, defaults, metrics, and data handling against the repository.
- Use the `researcher` subagent for evidence gathering and the `citation` subagent to verify sources and add inline citations when the audit is non-trivial.
- Compare claimed methods, defaults, metrics, and data handling against the actual code.
- Call out missing code, mismatches, ambiguous defaults, and reproduction risks.
- End with a `Sources` section containing paper and repository URLs.
- Save exactly one audit artifact to `outputs/` as markdown.
- End with a `Sources` section containing paper and repository URLs.

View File

@@ -1,19 +1,32 @@
---
description: Turn a research idea into a paper-oriented end-to-end run with literature, hypotheses, experiments when possible, and a draft artifact.
description: Autonomous experiment loop — try ideas, measure results, keep what works, discard what doesn't, repeat.
---
Run an autoresearch workflow for: $@
Start an autoresearch optimization loop for: $@
Requirements:
- Prefer the project `auto` chain or the `planner` + `researcher` + `verifier` + `writer` subagents when the task is broad enough to benefit from decomposition.
- If the run is likely to take a while, or the user wants it detached, launch the subagent workflow in background with `clarify: false, async: true` and report how to inspect status.
- Start by clarifying the research objective, scope, and target contribution.
- Search for the strongest relevant primary sources first.
- If the topic is current, product-oriented, market-facing, or asks about latest developments, start with `web_search` and `fetch_content`.
- Use `alpha_search` for academic background or paper-centric parts of the topic, but do not rely on it alone for current topics.
- Build a compact evidence table before committing to a paper narrative.
- If experiments are feasible in the current environment, design and run the smallest experiment that materially reduces uncertainty.
- If experiments are not feasible, produce a paper-style draft that is explicit about missing validation and limitations.
- Produce one final durable markdown artifact for the user-facing result.
- If the result is a paper-style draft, save it to `papers/`; otherwise save it to `outputs/`.
- Do not create extra user-facing intermediate markdown files unless the user explicitly asks for them.
- End with a `Sources` section containing direct URLs for every source used.
This command uses pi-autoresearch. Enter autoresearch mode and begin the autonomous experiment loop.
## Behavior
- If `autoresearch.md` and `autoresearch.jsonl` already exist in the project, resume the existing session with the user's input as additional context.
- Otherwise, gather the optimization target from the user:
- What to optimize (test speed, bundle size, training loss, build time, etc.)
- The benchmark command to run
- The metric name, unit, and direction (lower/higher is better)
- Files in scope for changes
- Then initialize the session: create `autoresearch.md`, `autoresearch.sh`, run the baseline, and start looping.
## Loop
Each iteration: edit → commit → `run_experiment``log_experiment` → keep or revert → repeat. Do not stop unless interrupted or `maxIterations` is reached.
## Key tools
- `init_experiment` — one-time session config (name, metric, unit, direction)
- `run_experiment` — run the benchmark command, capture output and wall-clock time
- `log_experiment` — record result, auto-commit, update dashboard
## Subcommands
- `/autoresearch <text>` — start or resume the loop
- `/autoresearch off` — stop the loop, keep data
- `/autoresearch clear` — delete all state and start fresh

View File

@@ -4,17 +4,8 @@ description: Compare multiple sources on a topic and produce a source-grounded m
Compare sources for: $@
Requirements:
- Use the `researcher` subagent to gather source material when the comparison set is broad, and the `verifier` subagent to pressure-test the resulting matrix when needed.
- Identify the strongest relevant primary sources first.
- For current or market-facing topics, use `web_search` and `fetch_content` to gather up-to-date primary sources before comparing them.
- For academic claims, use `alpha_search` and inspect the strongest papers directly.
- Inspect the top sources directly before comparing them.
- Build a comparison matrix covering:
- source
- key claim
- evidence type
- caveats
- confidence
- Use the `researcher` subagent to gather source material when the comparison set is broad, and the `citation` subagent to verify sources and add inline citations to the final matrix.
- Build a comparison matrix covering: source, key claim, evidence type, caveats, confidence.
- Distinguish agreement, disagreement, and uncertainty clearly.
- Save exactly one comparison to `outputs/` as markdown.
- End with a `Sources` section containing direct URLs for every source used.
- If the user wants a durable artifact, save exactly one comparison to `outputs/` as markdown.

View File

@@ -1,34 +1,107 @@
---
description: Run a thorough, source-heavy investigation on a topic and produce a durable research brief with explicit evidence and source links.
description: Run a thorough, source-heavy investigation on a topic and produce a durable research brief with inline citations.
---
Run a deep research workflow for: $@
Requirements:
- Treat `/deepresearch` as one coherent Feynman workflow from the user's perspective. Do not expose internal orchestration primitives unless the user explicitly asks.
- Start as the lead researcher. First make a compact plan: what must be answered, what evidence types are needed, and which sub-questions are worth splitting out.
- Stay single-agent by default for narrow topics. Only use `subagent` when the task is broad enough that separate context windows materially improve breadth or speed.
- If you use subagents, launch them as one worker batch around clearly disjoint sub-questions. Wait for the batch to finish, synthesize the results, and only then decide whether a second batch is needed.
- Prefer breadth-first worker batches for deep research: different market segments, different source types, different time periods, different technical angles, or different competing explanations.
- Use `researcher` workers for evidence gathering, `verifier` workers for adversarial claim-checking, and `writer` only if you already have solid evidence and need help polishing the final artifact.
- Do not make the workflow chain-shaped by default. Hidden worker batches are optional implementation details, not the user-facing model.
- If the user wants it to run unattended, or the sweep will clearly take a while, prefer background execution with `subagent` using `clarify: false, async: true`, then report how to inspect status.
- If the topic is current, product-oriented, market-facing, regulatory, or asks about latest developments, start with `web_search` and `fetch_content`.
- If the topic has an academic literature component, use `alpha_search`, `alpha_get_paper`, and `alpha_ask_paper` for the strongest papers.
- Do not rely on a single source type when the topic spans both current reality and academic background.
- Build a compact evidence table before synthesizing conclusions.
- After synthesis, run a final verification/citation pass. For the strongest claims, independently confirm support and remove anything unsupported, fabricated, or stale.
- Distinguish clearly between established facts, plausible inferences, disagreements, and unresolved questions.
- Produce exactly one durable markdown artifact in `outputs/`.
- The final artifact should read like one deep research memo, not like stitched-together worker transcripts.
- Do not leave extra user-facing intermediate markdown files behind unless the user explicitly asks for them.
- End with a `Sources` section containing direct URLs for every source used.
You are the Lead Researcher. You plan, delegate, evaluate, loop, write, and cite. Internal orchestration is invisible to the user unless they ask.
Default execution shape:
1. Clarify the actual research objective if needed.
2. Make a short plan and identify the key sub-questions.
3. Decide single-agent versus worker-batch execution.
4. Gather evidence across the needed source types.
5. Synthesize findings and identify remaining gaps.
6. If needed, run one more worker batch for unresolved gaps.
7. Perform a verification/citation pass.
8. Write the final brief with a strict `Sources` section.
## 1. Plan
Analyze the research question using extended thinking. Develop a research strategy:
- Key questions that must be answered
- Evidence types needed (papers, web, code, data, docs)
- Sub-questions disjoint enough to parallelize
- Source types and time periods that matter
Save the plan immediately with `memory_remember` (type: `fact`, key: `deepresearch.plan`). Context windows get truncated on long runs — the plan must survive.
## 2. Scale decision
| Query type | Execution |
|---|---|
| Single fact or narrow question | Search directly yourself, no subagents, 3-10 tool calls |
| Direct comparison (2-3 items) | 2 parallel `researcher` subagents |
| Broad survey or multi-faceted topic | 3-4 parallel `researcher` subagents |
| Complex multi-domain research | 4-6 parallel `researcher` subagents |
Never spawn subagents for work you can do in 5 tool calls.
## 3. Spawn researchers
Launch parallel `researcher` subagents via `subagent`. Each gets a structured brief with:
- **Objective:** what to find
- **Output format:** numbered sources, evidence table, inline source references
- **Tool guidance:** which search tools to prioritize
- **Task boundaries:** what NOT to cover (another researcher handles that)
Assign each researcher a clearly disjoint dimension — different source types, geographic scopes, time periods, or technical angles. Never duplicate coverage.
```
{
tasks: [
{ agent: "researcher", task: "...", output: "research-web.md" },
{ agent: "researcher", task: "...", output: "research-papers.md" }
],
concurrency: 4,
failFast: false
}
```
Researchers write full outputs to files and pass references back — do not have them return full content into your context.
## 4. Evaluate and loop
After researchers return, read their output files and critically assess:
- Which plan questions remain unanswered?
- Which answers rest on only one source?
- Are there contradictions needing resolution?
- Is any key angle missing entirely?
If gaps are significant, spawn another targeted batch of researchers. No fixed cap on rounds — iterate until evidence is sufficient or sources are exhausted. Update the stored plan with `memory_remember` as it evolves.
Most topics need 1-2 rounds. Stop when additional rounds would not materially change conclusions.
## 5. Write the report
Once evidence is sufficient, YOU write the full research brief directly. Do not delegate writing to another agent. Read the research files, synthesize the findings, and produce a complete document:
```markdown
# Title
## Executive Summary
2-3 paragraph overview of key findings.
## Section 1: ...
Detailed findings organized by theme or question.
## Section N: ...
## Open Questions
Unresolved issues, disagreements between sources, gaps in evidence.
```
Save this draft to a temp file (e.g., `draft.md` in the chain artifacts dir or a temp path).
## 6. Cite
Spawn the `citation` agent to post-process YOUR draft. The citation agent adds inline citations, verifies every source URL, and produces the final output:
```
{ agent: "citation", task: "Add inline citations to draft.md using the research files as source material. Verify every URL.", output: "brief.md" }
```
The citation agent does not rewrite the report — it only anchors claims to sources and builds the numbered Sources section.
## 7. Deliver
Copy the final cited output to the appropriate folder:
- Paper-style drafts → `papers/`
- Everything else → `outputs/`
Use a descriptive filename based on the topic.
## Background execution
If the user wants unattended execution or the sweep will clearly take a while:
- Launch the full workflow via `subagent` using `clarify: false, async: true`
- Report the async ID and how to check status with `subagent_status`

View File

@@ -4,18 +4,8 @@ description: Turn research findings into a polished paper-style draft with equat
Write a paper-style draft for: $@
Requirements:
- Prefer the `writer` subagent when the draft should be produced from already-collected notes, and use `verifier` first if the evidence still looks shaky.
- Ground every claim in inspected sources, experiments, or explicit inference.
- Use clean Markdown structure with LaTeX where equations materially help.
- Include at minimum:
- title
- abstract
- problem statement
- related work
- method or synthesis
- evidence or experiments
- limitations
- conclusion
- If citations are available, include citation placeholders or references clearly enough to convert later.
- Add a `Sources` appendix with direct URLs for all primary references used while drafting.
- Use the `writer` subagent when the draft should be produced from already-collected notes, then use the `citation` subagent to add inline citations and verify sources.
- Include at minimum: title, abstract, problem statement, related work, method or synthesis, evidence or experiments, limitations, conclusion.
- Use clean Markdown with LaTeX where equations materially help.
- Save exactly one draft to `papers/` as markdown.
- End with a `Sources` appendix with direct URLs for all primary references.

View File

@@ -5,12 +5,7 @@ Investigate the following topic as a literature review: $@
Requirements:
- Use the `researcher` subagent when the sweep is wide enough to benefit from delegated paper triage before synthesis.
- If the topic is academic or paper-centric, use `alpha_search` first.
- If the topic is current, product-oriented, market-facing, or asks about latest developments, use `web_search` and `fetch_content` first, then use `alpha_search` only for academic background.
- Use `alpha_get_paper` on the most relevant papers before making strong claims.
- Use `alpha_ask_paper` for targeted follow-up questions when the report is not enough.
- Prefer primary sources and note when something appears to be a preprint or secondary summary.
- Separate consensus, disagreements, and open questions.
- When useful, propose concrete next experiments or follow-up reading.
- End with a `Sources` section containing direct URLs for every paper or source used.
- If the user wants an artifact, write exactly one review to disk as markdown.
- Save exactly one literature review to `outputs/` as markdown.
- End with a `Sources` section containing direct URLs for every source used.

View File

@@ -1,14 +0,0 @@
---
description: Produce a general research memo grounded in explicit sources and direct links.
---
Write a research memo about: $@
Requirements:
- Use the `researcher` and `writer` subagents when decomposition will improve quality or reduce context pressure.
- Start by finding the strongest relevant sources.
- If the topic is current, market-facing, product-oriented, regulatory, or asks about latest developments, use `web_search` and `fetch_content` first.
- Use `alpha_search` for academic background where relevant, but do not rely on it alone for current topics.
- Read or inspect the top sources directly before making strong claims.
- Distinguish facts, interpretations, and open questions.
- End with a `Sources` section containing direct URLs for every source used.
- If the user wants a durable artifact, save exactly one memo to `outputs/` as markdown.

View File

@@ -1,15 +0,0 @@
---
description: Build a prioritized reading list on a research topic with rationale for each paper.
---
Create a research reading list for: $@
Requirements:
- Use the `researcher` subagent when a wider literature sweep would help before curating the final list.
- If the topic is academic, use `alpha_search` with `all` mode.
- If the topic is current, product-oriented, or asks for the latest landscape, use `web_search` and `fetch_content` first, then add `alpha_search` for academic background when relevant.
- Inspect the strongest papers or primary sources directly before recommending them.
- Use `alpha_ask_paper` when a paper's fit is unclear.
- Group papers by role when useful: foundational, strongest recent work, methods, benchmarks, critiques, replication targets.
- For each paper, explain why it is on the list.
- Include direct URLs for each recommended source.
- Save exactly one final reading list to `outputs/` as markdown.

View File

@@ -1,18 +0,0 @@
---
description: Turn reviewer comments into a structured rebuttal and revision plan for an AI research paper.
---
Prepare a rebuttal workflow for: $@
Requirements:
- If reviewer comments are provided, organize them into a response matrix.
- If reviewer comments are not yet provided, infer the likely strongest objections from the current draft and review them before drafting responses.
- Prefer the `reviewer` subagent or the project `review` chain when fresh critical review is still needed.
- For each issue, produce:
- reviewer concern
- whether it is valid
- evidence available now
- paper changes needed
- rebuttal language
- Do not overclaim fixes that have not been implemented.
- Save exactly one rebuttal matrix to `outputs/` as markdown.
- End with a `Sources` section containing direct URLs for all inspected external sources.

View File

@@ -1,19 +0,0 @@
---
description: Build a related-work map and justify why an AI research project needs to exist.
---
Build the related-work and justification view for: $@
Requirements:
- Search for the closest and strongest relevant papers first.
- Prefer the `researcher` subagent when the space is broad or moving quickly.
- Identify:
- foundational papers
- closest prior work
- strongest recent competing approaches
- benchmarks and evaluation norms
- critiques or known weaknesses in the area
- For each important paper, explain why it matters to this project.
- Be explicit about what real gap remains after considering the strongest prior work.
- If the project is not differentiated enough, say so clearly.
- If the user wants a durable result, save exactly one artifact to `outputs/` as markdown.
- End with a `Sources` section containing direct URLs.

View File

@@ -4,11 +4,7 @@ description: Plan or execute a replication workflow for a paper, claim, or bench
Design a replication plan for: $@
Requirements:
- Use the `subagent` tool for decomposition when the replication needs separate planning, evidence extraction, and execution passes.
- Identify the canonical paper or source material first.
- Use `alpha_get_paper` for the target paper.
- Use `alpha_ask_paper` to extract the exact implementation or evaluation details you still need.
- If the paper links code, inspect it with `alpha_read_code`.
- Use the `researcher` subagent to extract implementation details from the target paper and any linked code.
- Determine what code, datasets, metrics, and environment are needed.
- If enough information is available locally, implement and run the replication steps.
- Save notes, scripts, and results to disk in a reproducible layout.

View File

@@ -4,21 +4,8 @@ description: Simulate an AI research peer review with likely objections, severit
Review this AI research artifact: $@
Requirements:
- Prefer the project `review` chain or the `researcher` + `verifier` + `reviewer` subagents when the artifact is large or the review needs to inspect paper, code, and experiments together.
- Inspect the strongest relevant sources directly before making strong review claims.
- If the artifact is a paper or draft, evaluate:
- novelty and related-work positioning
- clarity of claims
- baseline fairness
- evaluation design
- missing ablations
- reproducibility details
- whether conclusions outrun the evidence
- If code or experiment artifacts exist, compare them against the claimed method and evaluation.
- Produce:
- short verdict
- likely reviewer objections
- severity for each issue
- revision plan in priority order
- Spawn a `researcher` subagent to gather evidence on the artifact — inspect the paper, code, cited work, and any linked experimental artifacts. Save to `research.md`.
- Spawn a `reviewer` subagent with `research.md` to produce the final peer review with inline annotations.
- For small or simple artifacts where evidence gathering is overkill, run the `reviewer` subagent directly instead.
- Save exactly one review artifact to `outputs/` as markdown.
- End with a `Sources` section containing direct URLs for every inspected external source.

View File

@@ -4,11 +4,8 @@ description: Set up a recurring or deferred research watch on a topic, company,
Create a research watch for: $@
Requirements:
- Start with a baseline sweep of the topic using the strongest relevant sources.
- If the watch is about current events, products, markets, regulations, or releases, use `web_search` and `fetch_content` first.
- If the watch has a literature component, add `alpha_search` and inspect the strongest papers directly.
- Start with a baseline sweep of the topic.
- Summarize what should be monitored, what signals matter, and what counts as a meaningful change.
- Use `schedule_prompt` to create the recurring or delayed follow-up instead of merely promising to check later.
- If the user wants detached execution for the initial sweep, use `subagent` in background mode and report how to inspect status.
- Save exactly one durable baseline artifact to `outputs/`.
- Save exactly one baseline artifact to `outputs/`.
- End with a `Sources` section containing direct URLs for every source used.