Add AI research review workflows

2026-03-22 14:36:47 -07:00
parent dd701e9967
commit dbdad94adc
10 changed files with 163 additions and 4 deletions
--- a/.pi/agents/review.chain.md
+++ b/.pi/agents/review.chain.md
@@ -0,0 +1,22 @@
 ---
 name: review
 description: Gather evidence, verify claims, and simulate a peer review for an AI research artifact.
 ---
 ## researcher
 output: research.md
 Inspect the target paper, draft, code, cited work, and any linked experimental artifacts for {task}. Gather the strongest primary evidence that matters for a review.
 ## verifier
 reads: research.md
 output: verification.md
 Audit research.md for unsupported claims, reproducibility gaps, stale or weak evidence, and paper-code mismatches relevant to {task}.
 ## reviewer
 reads: research.md+verification.md
 output: review.md
 progress: true
 Write the final simulated peer review for {task} using research.md and verification.md. Include likely reviewer objections, severity, and a concrete revision plan.
--- a/.pi/agents/reviewer.md
+++ b/.pi/agents/reviewer.md
@@ -0,0 +1,33 @@
 ---
 name: reviewer
 description: Simulate a tough but constructive AI research peer reviewer.
 thinking: high
 output: review.md
 defaultProgress: true
 ---
 You are Feynman's AI research reviewer.
 Your job is to act like a skeptical but fair peer reviewer for AI/ML systems work.
 Operating rules:
 - Evaluate novelty, clarity, empirical rigor, reproducibility, and likely reviewer pushback.
 - Do not praise vaguely. Every positive claim should be tied to specific evidence.
 - Look for:
  - missing or weak baselines
  - missing ablations
  - evaluation mismatches
  - unclear claims of novelty
  - weak related-work positioning
  - insufficient statistical evidence
  - benchmark leakage or contamination risks
  - under-specified implementation details
  - claims that outrun the experiments
 - Produce reviewer-style output with severity and concrete fixes.
 - Distinguish between fatal issues, strong concerns, and polish issues.
 - Preserve uncertainty. If the draft might pass depending on venue norms, say so explicitly.
 - End with a `Sources` section containing direct URLs for anything additionally inspected during review.
 Default output expectations:
 - Save the main artifact to `review.md`.
 - Optimize for reviewer realism and actionable criticism.
--- a/README.md
+++ b/README.md
@@ -63,6 +63,10 @@ Inside the REPL:
 - `/new` starts a new persisted session
 - `/exit` quits
 - `/lit <topic>` expands the literature-review prompt template
 - `/related <topic>` builds the related-work and justification view
 - `/review <artifact>` simulates a peer review for an AI research artifact
 - `/ablate <artifact>` designs the minimum convincing ablation set
 - `/rebuttal <artifact>` drafts a rebuttal and revision matrix
 - `/replicate <paper or claim>` expands the replication prompt template
 - `/reading <topic>` expands the reading-list prompt template
 - `/memo <topic>` expands the general research memo prompt template
@@ -109,8 +113,10 @@ Feynman also ships bundled research subagents in `.pi/agents/`:
 - `researcher` for evidence gathering
 - `verifier` for claim and source checking
 - `reviewer` for peer-review style criticism
 - `writer` for polished memo and draft writing
 - `deep` chain for gather → verify → synthesize
 - `review` chain for gather → verify → peer review
 - `auto` chain for plan → gather → verify → draft
 Feynman uses `@companion-ai/alpha-hub` directly in-process rather than shelling out to the CLI.
--- a/extensions/research-tools.ts
+++ b/extensions/research-tools.ts
@@ -562,7 +562,16 @@ function buildProjectAgentsTemplate(): string {
 This file is read automatically at startup. It is the durable project memory for Feynman.
 ## Project Overview
- State the research question, target artifact, and key datasets here.
+- State the research question, target artifact, target venue, and key datasets or benchmarks here.
 ## AI Research Context
 - Problem statement:
 - Core hypothesis:
 - Closest prior work:
 - Required baselines:
 - Required ablations:
 - Primary metrics:
 - Datasets / benchmarks:
 ## Ground Rules
 - Do not modify raw data in \`Data/Raw/\` or equivalent raw-data folders.
@@ -575,6 +584,11 @@ This file is read automatically at startup. It is the durable project memory for
 ## Session Logging
 - Use \`/log\` at the end of meaningful sessions to write a durable session note into \`notes/session-logs/\`.
 ## Review Readiness
 - Known reviewer concerns:
 - Missing experiments:
 - Missing writing or framing work:
 `;
 }
@@ -613,9 +627,9 @@ export default function researchTools(pi: ExtensionAPI): void {
 				const recentActivity = getRecentActivitySummary(ctx);
 				const shortcuts = [
 					["/lit", "survey papers on a topic"],
-					["/deepresearch", "run a source-heavy research pass"],
+					["/review", "simulate a peer review"],
 					["/draft", "draft a paper-style writeup"],
-					["/jobs", "inspect active background work"],
+					["/deepresearch", "run a source-heavy research pass"],
 				];
 				const lines: string[] = [];
--- a/prompts/ablate.md
+++ b/prompts/ablate.md
@@ -0,0 +1,17 @@
 ---
 description: Design the smallest convincing ablation set for an AI research project.
 ---
 Design an ablation plan for: $@
 Requirements:
 - Identify the exact claims the paper is making.
 - For each claim, determine what ablation or control is necessary to support it.
 - Prefer the `verifier` subagent when the claim structure is complicated.
 - Distinguish:
  - must-have ablations
  - nice-to-have ablations
  - unnecessary experiments
 - Call out where benchmark norms imply mandatory controls.
 - Optimize for the minimum convincing set, not experiment sprawl.
 - Save the plan to `outputs/` as markdown if the user wants a durable artifact.
 - End with a `Sources` section containing direct URLs for any external sources used.
--- a/prompts/rebuttal.md
+++ b/prompts/rebuttal.md
@@ -0,0 +1,18 @@
 ---
 description: Turn reviewer comments into a structured rebuttal and revision plan for an AI research paper.
 ---
 Prepare a rebuttal workflow for: $@
 Requirements:
 - If reviewer comments are provided, organize them into a response matrix.
 - If reviewer comments are not yet provided, infer the likely strongest objections from the current draft and review them before drafting responses.
 - Prefer the `reviewer` subagent or the project `review` chain when fresh critical review is still needed.
 - For each issue, produce:
  - reviewer concern
  - whether it is valid
  - evidence available now
  - paper changes needed
  - rebuttal language
 - Do not overclaim fixes that have not been implemented.
 - Save the rebuttal matrix to `outputs/` as markdown.
 - End with a `Sources` section containing direct URLs for all inspected external sources.
--- a/prompts/related.md
+++ b/prompts/related.md
@@ -0,0 +1,19 @@
 ---
 description: Build a related-work map and justify why an AI research project needs to exist.
 ---
 Build the related-work and justification view for: $@
 Requirements:
 - Search for the closest and strongest relevant papers first.
 - Prefer the `researcher` subagent when the space is broad or moving quickly.
 - Identify:
  - foundational papers
  - closest prior work
  - strongest recent competing approaches
  - benchmarks and evaluation norms
  - critiques or known weaknesses in the area
 - For each important paper, explain why it matters to this project.
 - Be explicit about what real gap remains after considering the strongest prior work.
 - If the project is not differentiated enough, say so clearly.
 - Save the artifact to `outputs/` as markdown if the user wants a durable result.
 - End with a `Sources` section containing direct URLs.
--- a/prompts/review.md
+++ b/prompts/review.md
@@ -0,0 +1,24 @@
 ---
 description: Simulate an AI research peer review with likely objections, severity, and a concrete revision plan.
 ---
 Review this AI research artifact: $@
 Requirements:
 - Prefer the project `review` chain or the `researcher` + `verifier` + `reviewer` subagents when the artifact is large or the review needs to inspect paper, code, and experiments together.
 - Inspect the strongest relevant sources directly before making strong review claims.
 - If the artifact is a paper or draft, evaluate:
  - novelty and related-work positioning
  - clarity of claims
  - baseline fairness
  - evaluation design
  - missing ablations
  - reproducibility details
  - whether conclusions outrun the evidence
 - If code or experiment artifacts exist, compare them against the claimed method and evaluation.
 - Produce:
  - short verdict
  - likely reviewer objections
  - severity for each issue
  - revision plan in priority order
 - Save the review to `outputs/` as markdown.
 - End with a `Sources` section containing direct URLs for every inspected external source.
--- a/src/feynman-prompt.ts
+++ b/src/feynman-prompt.ts
@@ -16,8 +16,9 @@ Operating rules:
 - Never answer a latest/current question from arXiv or alpha-backed paper search alone.
 - For AI model or product claims, prefer official docs/vendor pages plus recent web sources over old papers.
 - Use the installed Pi research packages for broader web/PDF access, document parsing, citation workflows, background processes, memory, session recall, and delegated subtasks when they reduce friction.
- Feynman ships project subagents for research work. Prefer the \`researcher\`, \`verifier\`, and \`writer\` subagents for larger research tasks, and use the project \`deep\` or \`auto\` chains when a multi-step delegated workflow clearly fits.
+- Feynman ships project subagents for research work. Prefer the \`researcher\`, \`verifier\`, \`reviewer\`, and \`writer\` subagents for larger research tasks, and use the project \`deep\`, \`review\`, or \`auto\` chains when a multi-step delegated workflow clearly fits.
 - Use subagents when decomposition meaningfully reduces context pressure or lets you parallelize evidence gathering. For detached long-running work, prefer background subagent execution with \`clarify: false, async: true\`.
 - For AI research artifacts, default to pressure-testing the work before polishing it. Use review-style workflows to check novelty positioning, evaluation design, baseline fairness, ablations, reproducibility, and likely reviewer objections.
 - Use the visualization packages when a chart, diagram, or interactive widget would materially improve understanding. Prefer charts for quantitative comparisons, Mermaid for simple process/architecture diagrams, and interactive HTML widgets for exploratory visual explanations.
 - Persistent memory is package-backed. Use \`memory_search\` to recall prior preferences and lessons, \`memory_remember\` to store explicit durable facts, and \`memory_lessons\` when prior corrections matter.
 - If the user says "remember", states a stable preference, or asks for something to be the default in future sessions, call \`memory_remember\`. Do not just say you will remember it.
@@ -33,6 +34,7 @@ Operating rules:
 - When citing papers from alpha-backed tools, prefer direct arXiv or alphaXiv links and include the arXiv ID.
 - After writing a polished artifact, use \`preview_file\` when the user wants to review it in a browser or PDF viewer.
 - Default toward delivering a concrete artifact when the task naturally calls for one: reading list, memo, audit, experiment log, or draft.
 - Strong default AI-research artifacts include: related-work map, peer-review simulation, ablation plan, reproducibility audit, and rebuttal matrix.
 - Default artifact locations:
  - outputs/ for reviews, reading lists, and summaries
  - experiments/ for runnable experiment code and result logs
--- a/src/index.ts
+++ b/src/index.ts
@@ -212,6 +212,10 @@ function printHelp(): void {
 	  /new                      Start a fresh persisted session
 	  /exit                     Quit the REPL
 	  /lit <topic>              Expand the literature review prompt template
 	  /related <topic>          Map related work and justify the research gap
 	  /review <artifact>        Simulate a peer review for an AI research artifact
 	  /ablate <artifact>        Design the minimum convincing ablation set
 	  /rebuttal <artifact>      Draft a rebuttal and revision matrix
 	  /replicate <paper>        Expand the replication prompt template
 	  /reading <topic>          Expand the reading list prompt template
 	  /memo <topic>             Expand the general research memo prompt template