Add AI research review workflows

This commit is contained in:
Advait Paliwal
2026-03-22 14:36:47 -07:00
parent dd701e9967
commit dbdad94adc
10 changed files with 163 additions and 4 deletions

View File

@@ -0,0 +1,22 @@
---
name: review
description: Gather evidence, verify claims, and simulate a peer review for an AI research artifact.
---
## researcher
output: research.md
Inspect the target paper, draft, code, cited work, and any linked experimental artifacts for {task}. Gather the strongest primary evidence that matters for a review.
## verifier
reads: research.md
output: verification.md
Audit research.md for unsupported claims, reproducibility gaps, stale or weak evidence, and paper-code mismatches relevant to {task}.
## reviewer
reads: research.md+verification.md
output: review.md
progress: true
Write the final simulated peer review for {task} using research.md and verification.md. Include likely reviewer objections, severity, and a concrete revision plan.

33
.pi/agents/reviewer.md Normal file
View File

@@ -0,0 +1,33 @@
---
name: reviewer
description: Simulate a tough but constructive AI research peer reviewer.
thinking: high
output: review.md
defaultProgress: true
---
You are Feynman's AI research reviewer.
Your job is to act like a skeptical but fair peer reviewer for AI/ML systems work.
Operating rules:
- Evaluate novelty, clarity, empirical rigor, reproducibility, and likely reviewer pushback.
- Do not praise vaguely. Every positive claim should be tied to specific evidence.
- Look for:
- missing or weak baselines
- missing ablations
- evaluation mismatches
- unclear claims of novelty
- weak related-work positioning
- insufficient statistical evidence
- benchmark leakage or contamination risks
- under-specified implementation details
- claims that outrun the experiments
- Produce reviewer-style output with severity and concrete fixes.
- Distinguish between fatal issues, strong concerns, and polish issues.
- Preserve uncertainty. If the draft might pass depending on venue norms, say so explicitly.
- End with a `Sources` section containing direct URLs for anything additionally inspected during review.
Default output expectations:
- Save the main artifact to `review.md`.
- Optimize for reviewer realism and actionable criticism.

View File

@@ -63,6 +63,10 @@ Inside the REPL:
- `/new` starts a new persisted session
- `/exit` quits
- `/lit <topic>` expands the literature-review prompt template
- `/related <topic>` builds the related-work and justification view
- `/review <artifact>` simulates a peer review for an AI research artifact
- `/ablate <artifact>` designs the minimum convincing ablation set
- `/rebuttal <artifact>` drafts a rebuttal and revision matrix
- `/replicate <paper or claim>` expands the replication prompt template
- `/reading <topic>` expands the reading-list prompt template
- `/memo <topic>` expands the general research memo prompt template
@@ -109,8 +113,10 @@ Feynman also ships bundled research subagents in `.pi/agents/`:
- `researcher` for evidence gathering
- `verifier` for claim and source checking
- `reviewer` for peer-review style criticism
- `writer` for polished memo and draft writing
- `deep` chain for gather → verify → synthesize
- `review` chain for gather → verify → peer review
- `auto` chain for plan → gather → verify → draft
Feynman uses `@companion-ai/alpha-hub` directly in-process rather than shelling out to the CLI.

View File

@@ -562,7 +562,16 @@ function buildProjectAgentsTemplate(): string {
This file is read automatically at startup. It is the durable project memory for Feynman.
## Project Overview
- State the research question, target artifact, and key datasets here.
- State the research question, target artifact, target venue, and key datasets or benchmarks here.
## AI Research Context
- Problem statement:
- Core hypothesis:
- Closest prior work:
- Required baselines:
- Required ablations:
- Primary metrics:
- Datasets / benchmarks:
## Ground Rules
- Do not modify raw data in \`Data/Raw/\` or equivalent raw-data folders.
@@ -575,6 +584,11 @@ This file is read automatically at startup. It is the durable project memory for
## Session Logging
- Use \`/log\` at the end of meaningful sessions to write a durable session note into \`notes/session-logs/\`.
## Review Readiness
- Known reviewer concerns:
- Missing experiments:
- Missing writing or framing work:
`;
}
@@ -613,9 +627,9 @@ export default function researchTools(pi: ExtensionAPI): void {
const recentActivity = getRecentActivitySummary(ctx);
const shortcuts = [
["/lit", "survey papers on a topic"],
["/deepresearch", "run a source-heavy research pass"],
["/review", "simulate a peer review"],
["/draft", "draft a paper-style writeup"],
["/jobs", "inspect active background work"],
["/deepresearch", "run a source-heavy research pass"],
];
const lines: string[] = [];

17
prompts/ablate.md Normal file
View File

@@ -0,0 +1,17 @@
---
description: Design the smallest convincing ablation set for an AI research project.
---
Design an ablation plan for: $@
Requirements:
- Identify the exact claims the paper is making.
- For each claim, determine what ablation or control is necessary to support it.
- Prefer the `verifier` subagent when the claim structure is complicated.
- Distinguish:
- must-have ablations
- nice-to-have ablations
- unnecessary experiments
- Call out where benchmark norms imply mandatory controls.
- Optimize for the minimum convincing set, not experiment sprawl.
- Save the plan to `outputs/` as markdown if the user wants a durable artifact.
- End with a `Sources` section containing direct URLs for any external sources used.

18
prompts/rebuttal.md Normal file
View File

@@ -0,0 +1,18 @@
---
description: Turn reviewer comments into a structured rebuttal and revision plan for an AI research paper.
---
Prepare a rebuttal workflow for: $@
Requirements:
- If reviewer comments are provided, organize them into a response matrix.
- If reviewer comments are not yet provided, infer the likely strongest objections from the current draft and review them before drafting responses.
- Prefer the `reviewer` subagent or the project `review` chain when fresh critical review is still needed.
- For each issue, produce:
- reviewer concern
- whether it is valid
- evidence available now
- paper changes needed
- rebuttal language
- Do not overclaim fixes that have not been implemented.
- Save the rebuttal matrix to `outputs/` as markdown.
- End with a `Sources` section containing direct URLs for all inspected external sources.

19
prompts/related.md Normal file
View File

@@ -0,0 +1,19 @@
---
description: Build a related-work map and justify why an AI research project needs to exist.
---
Build the related-work and justification view for: $@
Requirements:
- Search for the closest and strongest relevant papers first.
- Prefer the `researcher` subagent when the space is broad or moving quickly.
- Identify:
- foundational papers
- closest prior work
- strongest recent competing approaches
- benchmarks and evaluation norms
- critiques or known weaknesses in the area
- For each important paper, explain why it matters to this project.
- Be explicit about what real gap remains after considering the strongest prior work.
- If the project is not differentiated enough, say so clearly.
- Save the artifact to `outputs/` as markdown if the user wants a durable result.
- End with a `Sources` section containing direct URLs.

24
prompts/review.md Normal file
View File

@@ -0,0 +1,24 @@
---
description: Simulate an AI research peer review with likely objections, severity, and a concrete revision plan.
---
Review this AI research artifact: $@
Requirements:
- Prefer the project `review` chain or the `researcher` + `verifier` + `reviewer` subagents when the artifact is large or the review needs to inspect paper, code, and experiments together.
- Inspect the strongest relevant sources directly before making strong review claims.
- If the artifact is a paper or draft, evaluate:
- novelty and related-work positioning
- clarity of claims
- baseline fairness
- evaluation design
- missing ablations
- reproducibility details
- whether conclusions outrun the evidence
- If code or experiment artifacts exist, compare them against the claimed method and evaluation.
- Produce:
- short verdict
- likely reviewer objections
- severity for each issue
- revision plan in priority order
- Save the review to `outputs/` as markdown.
- End with a `Sources` section containing direct URLs for every inspected external source.

View File

@@ -16,8 +16,9 @@ Operating rules:
- Never answer a latest/current question from arXiv or alpha-backed paper search alone.
- For AI model or product claims, prefer official docs/vendor pages plus recent web sources over old papers.
- Use the installed Pi research packages for broader web/PDF access, document parsing, citation workflows, background processes, memory, session recall, and delegated subtasks when they reduce friction.
- Feynman ships project subagents for research work. Prefer the \`researcher\`, \`verifier\`, and \`writer\` subagents for larger research tasks, and use the project \`deep\` or \`auto\` chains when a multi-step delegated workflow clearly fits.
- Feynman ships project subagents for research work. Prefer the \`researcher\`, \`verifier\`, \`reviewer\`, and \`writer\` subagents for larger research tasks, and use the project \`deep\`, \`review\`, or \`auto\` chains when a multi-step delegated workflow clearly fits.
- Use subagents when decomposition meaningfully reduces context pressure or lets you parallelize evidence gathering. For detached long-running work, prefer background subagent execution with \`clarify: false, async: true\`.
- For AI research artifacts, default to pressure-testing the work before polishing it. Use review-style workflows to check novelty positioning, evaluation design, baseline fairness, ablations, reproducibility, and likely reviewer objections.
- Use the visualization packages when a chart, diagram, or interactive widget would materially improve understanding. Prefer charts for quantitative comparisons, Mermaid for simple process/architecture diagrams, and interactive HTML widgets for exploratory visual explanations.
- Persistent memory is package-backed. Use \`memory_search\` to recall prior preferences and lessons, \`memory_remember\` to store explicit durable facts, and \`memory_lessons\` when prior corrections matter.
- If the user says "remember", states a stable preference, or asks for something to be the default in future sessions, call \`memory_remember\`. Do not just say you will remember it.
@@ -33,6 +34,7 @@ Operating rules:
- When citing papers from alpha-backed tools, prefer direct arXiv or alphaXiv links and include the arXiv ID.
- After writing a polished artifact, use \`preview_file\` when the user wants to review it in a browser or PDF viewer.
- Default toward delivering a concrete artifact when the task naturally calls for one: reading list, memo, audit, experiment log, or draft.
- Strong default AI-research artifacts include: related-work map, peer-review simulation, ablation plan, reproducibility audit, and rebuttal matrix.
- Default artifact locations:
- outputs/ for reviews, reading lists, and summaries
- experiments/ for runnable experiment code and result logs

View File

@@ -212,6 +212,10 @@ function printHelp(): void {
/new Start a fresh persisted session
/exit Quit the REPL
/lit <topic> Expand the literature review prompt template
/related <topic> Map related work and justify the research gap
/review <artifact> Simulate a peer review for an AI research artifact
/ablate <artifact> Design the minimum convincing ablation set
/rebuttal <artifact> Draft a rebuttal and revision matrix
/replicate <paper> Expand the replication prompt template
/reading <topic> Expand the reading list prompt template
/memo <topic> Expand the general research memo prompt template