Add plan-confirm steps to all workflows, cite alphaXiv and Agent Computer, add visuals to writer

- Every workflow prompt now shows a plan and asks the user to confirm before executing - /autoresearch asks for execution environment (local, branch, venv, cloud) and confirms before looping - Writer agent and key prompts now generate charts (pi-charts) and diagrams (Mermaid) when data calls for it - Cite alphaXiv and Agent Computer in README and website homepage - Clear terminal screen before launching Pi TUI - Remove Alpha Hub GitHub link in favor of alphaxiv.org Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 17:45:26 -07:00
parent f5570b4e5a
commit a452cd95b8
12 changed files with 75 additions and 21 deletions
--- a/prompts/audit.md
+++ b/prompts/audit.md
@@ -7,6 +7,7 @@ topLevelCli: true
 Audit the paper and codebase for: $@

 Requirements:
+- Before starting, outline the audit plan: which paper, which repo, which claims to check. Present the plan to the user and confirm before proceeding.
 - Use the `researcher` subagent for evidence gathering and the `verifier` subagent to verify sources and add inline citations when the audit is non-trivial.
 - Compare claimed methods, defaults, metrics, and data handling against the actual code.
 - Call out missing code, mismatches, ambiguous defaults, and reproduction risks.
--- a/prompts/autoresearch.md
+++ b/prompts/autoresearch.md
@@ -6,19 +6,46 @@ topLevelCli: true
 ---
 Start an autoresearch optimization loop for: $@

-This command uses pi-autoresearch. Enter autoresearch mode and begin the autonomous experiment loop.
+This command uses pi-autoresearch.

-## Behavior
+## Step 1: Gather

- If `autoresearch.md` and `autoresearch.jsonl` already exist in the project, resume the existing session with the user's input as additional context.
- Otherwise, gather the optimization target from the user:
-  - What to optimize (test speed, bundle size, training loss, build time, etc.)
-  - The benchmark command to run
-  - The metric name, unit, and direction (lower/higher is better)
-  - Files in scope for changes
- Then initialize the session: create `autoresearch.md`, `autoresearch.sh`, run the baseline, and start looping.
+If `autoresearch.md` and `autoresearch.jsonl` already exist, ask the user if they want to resume or start fresh.

-## Loop
+Otherwise, collect the following from the user before doing anything else:
+- What to optimize (test speed, bundle size, training loss, build time, etc.)
+- The benchmark command to run
+- The metric name, unit, and direction (lower/higher is better)
+- Files in scope for changes
+- Maximum number of iterations (default: 20)
+
+## Step 2: Environment
+
+Ask the user where to run:
+- **Local** — run in the current working directory
+- **New git branch** — create a branch so main stays clean
+- **Virtual environment** — create an isolated venv/conda env first
+- **Cloud** — delegate to a remote Agent Computer machine via `/delegate`
+
+Do not proceed without a clear answer.
+
+## Step 3: Confirm
+
+Present the full plan to the user before starting:
+
+```
+Optimization target: [metric] ([direction])
+Benchmark command:   [command]
+Files in scope:      [files]
+Environment:         [chosen environment]
+Max iterations:      [N]
+```
+
+Ask the user to confirm. Do not start the loop without explicit approval.
+
+## Step 4: Run
+
+Initialize the session: create `autoresearch.md`, `autoresearch.sh`, run the baseline, and start looping.

 Each iteration: edit → commit → `run_experiment` → `log_experiment` → keep or revert → repeat. Do not stop unless interrupted or `maxIterations` is reached.

--- a/prompts/compare.md
+++ b/prompts/compare.md
@@ -7,8 +7,10 @@ topLevelCli: true
 Compare sources for: $@

 Requirements:
+- Before starting, outline the comparison plan: which sources to compare, which dimensions to evaluate, expected output structure. Present the plan to the user and confirm before proceeding.
 - Use the `researcher` subagent to gather source material when the comparison set is broad, and the `verifier` subagent to verify sources and add inline citations to the final matrix.
 - Build a comparison matrix covering: source, key claim, evidence type, caveats, confidence.
+- Generate charts with `pi-charts` when the comparison involves quantitative metrics. Use Mermaid for method or architecture comparisons.
 - Distinguish agreement, disagreement, and uncertainty clearly.
 - Save exactly one comparison to `outputs/` as markdown.
 - End with a `Sources` section containing direct URLs for every source used.
--- a/prompts/deepresearch.md
+++ b/prompts/deepresearch.md
@@ -40,6 +40,8 @@ Write the plan to `outputs/.plans/deepresearch-plan.md` as a self-contained arti

 Also save the plan with `memory_remember` (type: `fact`, key: `deepresearch.plan`) so it survives context truncation.

+Present the plan to the user and ask them to confirm before proceeding. If the user wants changes, revise the plan first.
+
 ## 2. Scale decision

 | Query type | Execution |
@@ -107,6 +109,8 @@ Detailed findings organized by theme or question.
 Unresolved issues, disagreements between sources, gaps in evidence.
 ```

+When the research includes quantitative data (benchmarks, performance comparisons, trends), generate charts using `pi-charts`. Use Mermaid diagrams for architectures and processes. Every visual must have a caption and reference the underlying data.
+
 Save this draft to a temp file (e.g., `draft.md` in the chain artifacts dir or a temp path).

 ## 6. Cite
--- a/prompts/draft.md
+++ b/prompts/draft.md
@@ -7,8 +7,10 @@ topLevelCli: true
 Write a paper-style draft for: $@

 Requirements:
+- Before writing, outline the draft structure: proposed title, sections, key claims to make, and source material to draw from. Present the outline to the user and confirm before proceeding.
 - Use the `writer` subagent when the draft should be produced from already-collected notes, then use the `verifier` subagent to add inline citations and verify sources.
 - Include at minimum: title, abstract, problem statement, related work, method or synthesis, evidence or experiments, limitations, conclusion.
 - Use clean Markdown with LaTeX where equations materially help.
+- Generate charts with `pi-charts` for quantitative data, benchmarks, and comparisons. Use Mermaid for architectures and pipelines. Every figure needs a caption.
 - Save exactly one draft to `papers/` as markdown.
 - End with a `Sources` appendix with direct URLs for all primary references.
--- a/prompts/lit.md
+++ b/prompts/lit.md
@@ -8,8 +8,9 @@ Investigate the following topic as a literature review: $@

 ## Workflow

-1. **Gather** — Use the `researcher` subagent when the sweep is wide enough to benefit from delegated paper triage before synthesis. For narrow topics, search directly.
-2. **Synthesize** — Separate consensus, disagreements, and open questions. When useful, propose concrete next experiments or follow-up reading.
-3. **Cite** — Spawn the `verifier` agent to add inline citations and verify every source URL in the draft.
-4. **Verify** — Spawn the `reviewer` agent to check the cited draft for unsupported claims, logical gaps, and single-source critical findings. Fix FATAL issues before delivering. Note MAJOR issues in Open Questions.
-5. **Deliver** — Save exactly one literature review to `outputs/` as markdown. Write a provenance record alongside it as `<filename>.provenance.md` listing: date, sources consulted vs. accepted vs. rejected, verification status, and intermediate research files used.
+1. **Plan** — Outline the scope: key questions, source types to search (papers, web, repos), time period, and expected sections. Present the plan to the user and confirm before proceeding.
+2. **Gather** — Use the `researcher` subagent when the sweep is wide enough to benefit from delegated paper triage before synthesis. For narrow topics, search directly.
+2. **Synthesize** — Separate consensus, disagreements, and open questions. When useful, propose concrete next experiments or follow-up reading. Generate charts with `pi-charts` for quantitative comparisons across papers and Mermaid diagrams for taxonomies or method pipelines.
+4. **Cite** — Spawn the `verifier` agent to add inline citations and verify every source URL in the draft.
+5. **Verify** — Spawn the `reviewer` agent to check the cited draft for unsupported claims, logical gaps, and single-source critical findings. Fix FATAL issues before delivering. Note MAJOR issues in Open Questions.
+6. **Deliver** — Save exactly one literature review to `outputs/` as markdown. Write a provenance record alongside it as `<filename>.provenance.md` listing: date, sources consulted vs. accepted vs. rejected, verification status, and intermediate research files used.
--- a/prompts/review.md
+++ b/prompts/review.md
@@ -7,6 +7,7 @@ topLevelCli: true
 Review this AI research artifact: $@

 Requirements:
+- Before starting, outline what will be reviewed and the review criteria (novelty, empirical rigor, baselines, reproducibility, etc.). Present the plan to the user and confirm before proceeding.
 - Spawn a `researcher` subagent to gather evidence on the artifact — inspect the paper, code, cited work, and any linked experimental artifacts. Save to `research.md`.
 - Spawn a `reviewer` subagent with `research.md` to produce the final peer review with inline annotations.
 - For small or simple artifacts where evidence gathering is overkill, run the `reviewer` subagent directly instead.
--- a/prompts/watch.md
+++ b/prompts/watch.md
@@ -7,8 +7,8 @@ topLevelCli: true
 Create a research watch for: $@

 Requirements:
+- Before starting, outline the watch plan: what to monitor, what signals matter, what counts as a meaningful change, and the check frequency. Present the plan to the user and confirm before proceeding.
 - Start with a baseline sweep of the topic.
- Summarize what should be monitored, what signals matter, and what counts as a meaningful change.
 - Use `schedule_prompt` to create the recurring or delayed follow-up instead of merely promising to check later.
 - Save exactly one baseline artifact to `outputs/`.
 - End with a `Sources` section containing direct URLs for every source used.