Improve Feynman packaging and research prompts

2026-03-24 09:57:25 -07:00
parent 6ff4dde341
commit 0f62901ab0
17 changed files with 253 additions and 36 deletions
--- a/prompts/audit.md
+++ b/prompts/audit.md
@@ -6,10 +6,12 @@ topLevelCli: true
 ---
 Audit the paper and codebase for: $@

+Derive a short slug from the audit target (lowercase, hyphens, no filler words, ≤5 words). Use this slug for all files in this run.
+
 Requirements:
- Before starting, outline the audit plan: which paper, which repo, which claims to check. Present the plan to the user and confirm before proceeding.
+- Before starting, outline the audit plan: which paper, which repo, which claims to check. Write the plan to `outputs/.plans/<slug>.md`. Present the plan to the user and confirm before proceeding.
 - Use the `researcher` subagent for evidence gathering and the `verifier` subagent to verify sources and add inline citations when the audit is non-trivial.
 - Compare claimed methods, defaults, metrics, and data handling against the actual code.
 - Call out missing code, mismatches, ambiguous defaults, and reproduction risks.
- Save exactly one audit artifact to `outputs/` as markdown.
+- Save exactly one audit artifact to `outputs/<slug>-audit.md`.
 - End with a `Sources` section containing paper and repository URLs.
--- a/prompts/compare.md
+++ b/prompts/compare.md
@@ -6,11 +6,13 @@ topLevelCli: true
 ---
 Compare sources for: $@

+Derive a short slug from the comparison topic (lowercase, hyphens, no filler words, ≤5 words). Use this slug for all files in this run.
+
 Requirements:
- Before starting, outline the comparison plan: which sources to compare, which dimensions to evaluate, expected output structure. Present the plan to the user and confirm before proceeding.
+- Before starting, outline the comparison plan: which sources to compare, which dimensions to evaluate, expected output structure. Write the plan to `outputs/.plans/<slug>.md`. Present the plan to the user and confirm before proceeding.
 - Use the `researcher` subagent to gather source material when the comparison set is broad, and the `verifier` subagent to verify sources and add inline citations to the final matrix.
 - Build a comparison matrix covering: source, key claim, evidence type, caveats, confidence.
 - Generate charts with `pi-charts` when the comparison involves quantitative metrics. Use Mermaid for method or architecture comparisons.
 - Distinguish agreement, disagreement, and uncertainty clearly.
- Save exactly one comparison to `outputs/` as markdown.
+- Save exactly one comparison to `outputs/<slug>-comparison.md`.
 - End with a `Sources` section containing direct URLs for every source used.
--- a/prompts/deepresearch.md
+++ b/prompts/deepresearch.md
@@ -17,7 +17,7 @@ Analyze the research question using extended thinking. Develop a research strate
 - Source types and time periods that matter
 - Acceptance criteria: what evidence would make the answer "sufficient"

-Write the plan to `outputs/.plans/deepresearch-plan.md` as a self-contained artifact:
+Derive a short slug from the topic (lowercase, hyphens, no filler words, ≤5 words — e.g. "cloud-sandbox-pricing" not "deepresearch-plan"). Write the plan to `outputs/.plans/<slug>.md` as a self-contained artifact. Use this same slug for all artifacts in this run.

 ```markdown
 # Research Plan: [topic]
@@ -38,7 +38,7 @@ Write the plan to `outputs/.plans/deepresearch-plan.md` as a self-contained arti
 (Updated as the workflow progresses)
 ```

-Also save the plan with `memory_remember` (type: `fact`, key: `deepresearch.plan`) so it survives context truncation.
+Also save the plan with `memory_remember` (type: `fact`, key: `deepresearch.<slug>.plan`) so it survives context truncation.

 Present the plan to the user and ask them to confirm before proceeding. If the user wants changes, revise the plan first.

@@ -66,8 +66,8 @@ Assign each researcher a clearly disjoint dimension — different source types,
 ```
 {
  tasks: [
-    { agent: "researcher", task: "...", output: "research-web.md" },
-    { agent: "researcher", task: "...", output: "research-papers.md" }
+    { agent: "researcher", task: "...", output: "<slug>-research-web.md" },
+    { agent: "researcher", task: "...", output: "<slug>-research-papers.md" }
  ],
  concurrency: 4,
  failFast: false
@@ -86,7 +86,7 @@ After researchers return, read their output files and critically assess:

 If gaps are significant, spawn another targeted batch of researchers. No fixed cap on rounds — iterate until evidence is sufficient or sources are exhausted.

-Update the plan artifact (`outputs/.plans/deepresearch-plan.md`) decision log after each round.
+Update the plan artifact (`outputs/.plans/<slug>.md`) decision log after each round.

 Most topics need 1-2 rounds. Stop when additional rounds would not materially change conclusions.

@@ -111,14 +111,14 @@ Unresolved issues, disagreements between sources, gaps in evidence.

 When the research includes quantitative data (benchmarks, performance comparisons, trends), generate charts using `pi-charts`. Use Mermaid diagrams for architectures and processes. Every visual must have a caption and reference the underlying data.

-Save this draft to a temp file (e.g., `draft.md` in the chain artifacts dir or a temp path).
+Save this draft to `outputs/.drafts/<slug>-draft.md`.

 ## 6. Cite

 Spawn the `verifier` agent to post-process YOUR draft. The verifier agent adds inline citations, verifies every source URL, and produces the final output:

 ```
-{ agent: "verifier", task: "Add inline citations to draft.md using the research files as source material. Verify every URL.", output: "brief.md" }
+{ agent: "verifier", task: "Add inline citations to <slug>-draft.md using the research files as source material. Verify every URL.", output: "<slug>-brief.md" }
 ```

 The verifier agent does not rewrite the report — it only anchors claims to sources and builds the numbered Sources section.
@@ -132,7 +132,7 @@ Spawn the `reviewer` agent against the cited draft. The reviewer checks for:
 - Overstated confidence relative to evidence quality

 ```
-{ agent: "reviewer", task: "Verify brief.md — flag any claims that lack sufficient source backing, identify logical gaps, and check that confidence levels match evidence strength. This is a verification pass, not a peer review.", output: "verification.md" }
+{ agent: "reviewer", task: "Verify <slug>-brief.md — flag any claims that lack sufficient source backing, identify logical gaps, and check that confidence levels match evidence strength. This is a verification pass, not a peer review.", output: "<slug>-verification.md" }
 ```

 If the reviewer flags FATAL issues, fix them in the brief before delivering. MAJOR issues get noted in the Open Questions section. MINOR issues are accepted.
@@ -143,9 +143,9 @@ Copy the final cited and verified output to the appropriate folder:
 - Paper-style drafts → `papers/`
 - Everything else → `outputs/`

-Use a descriptive filename based on the topic.
+Save the final output as `<slug>.md` (in `outputs/` or `papers/` per the rule above).

-Write a provenance record alongside the main artifact as `<filename>.provenance.md`:
+Write a provenance record alongside it as `<slug>.provenance.md`:

 ```markdown
 # Provenance: [topic]
@@ -156,8 +156,8 @@ Write a provenance record alongside the main artifact as `<filename>.provenance.
 - **Sources accepted:** [sources that survived citation verification]
 - **Sources rejected:** [dead links, unverifiable, or removed]
 - **Verification:** [PASS / PASS WITH NOTES — summary of reviewer findings]
- **Plan:** outputs/.plans/deepresearch-plan.md
- **Research files:** [list of intermediate research-*.md files]
+- **Plan:** outputs/.plans/<slug>.md
+- **Research files:** [list of intermediate <slug>-research-*.md files]
 ```

 ## Background execution
--- a/prompts/delegate.md
+++ b/prompts/delegate.md
@@ -17,5 +17,5 @@ Delegate the following task to a remote Agent Computer machine: $@
   - What artifact to produce when done (summary file)
   - Any tools or data sources to use
 6. **Monitor** — Use `computer agent watch <machine> --session <session_id>` to stream progress. Report status to the user at meaningful milestones.
-7. **Retrieve results** — When the remote agent finishes, pull the summary back with `computer agent prompt <machine> "cat /workspace/outputs/summary.md" --session <session_id>`. Present results to the user.
+7. **Retrieve results** — When the remote agent finishes, pull the results back with `computer agent prompt <machine> "cat /workspace/outputs/<slug>.md" --session <session_id>` (derive the slug from the task topic). Present results to the user.
 8. **Clean up** — Close the session with `computer agent close <machine> --session <session_id>` unless the user wants to continue.
--- a/prompts/draft.md
+++ b/prompts/draft.md
@@ -6,11 +6,13 @@ topLevelCli: true
 ---
 Write a paper-style draft for: $@

+Derive a short slug from the topic (lowercase, hyphens, no filler words, ≤5 words). Use this slug for all files in this run.
+
 Requirements:
- Before writing, outline the draft structure: proposed title, sections, key claims to make, and source material to draw from. Present the outline to the user and confirm before proceeding.
+- Before writing, outline the draft structure: proposed title, sections, key claims to make, and source material to draw from. Write the outline to `outputs/.plans/<slug>.md`. Present the outline to the user and confirm before proceeding.
 - Use the `writer` subagent when the draft should be produced from already-collected notes, then use the `verifier` subagent to add inline citations and verify sources.
 - Include at minimum: title, abstract, problem statement, related work, method or synthesis, evidence or experiments, limitations, conclusion.
 - Use clean Markdown with LaTeX where equations materially help.
 - Generate charts with `pi-charts` for quantitative data, benchmarks, and comparisons. Use Mermaid for architectures and pipelines. Every figure needs a caption.
- Save exactly one draft to `papers/` as markdown.
+- Save exactly one draft to `papers/<slug>.md`.
 - End with a `Sources` appendix with direct URLs for all primary references.
--- a/prompts/lit.md
+++ b/prompts/lit.md
@@ -6,11 +6,13 @@ topLevelCli: true
 ---
 Investigate the following topic as a literature review: $@

+Derive a short slug from the topic (lowercase, hyphens, no filler words, ≤5 words). Use this slug for all files in this run.
+
 ## Workflow

-1. **Plan** — Outline the scope: key questions, source types to search (papers, web, repos), time period, and expected sections. Present the plan to the user and confirm before proceeding.
-2. **Gather** — Use the `researcher` subagent when the sweep is wide enough to benefit from delegated paper triage before synthesis. For narrow topics, search directly.
-2. **Synthesize** — Separate consensus, disagreements, and open questions. When useful, propose concrete next experiments or follow-up reading. Generate charts with `pi-charts` for quantitative comparisons across papers and Mermaid diagrams for taxonomies or method pipelines.
+1. **Plan** — Outline the scope: key questions, source types to search (papers, web, repos), time period, and expected sections. Write the plan to `outputs/.plans/<slug>.md`. Present the plan to the user and confirm before proceeding.
+2. **Gather** — Use the `researcher` subagent when the sweep is wide enough to benefit from delegated paper triage before synthesis. For narrow topics, search directly. Researcher outputs go to `<slug>-research-*.md`.
+3. **Synthesize** — Separate consensus, disagreements, and open questions. When useful, propose concrete next experiments or follow-up reading. Generate charts with `pi-charts` for quantitative comparisons across papers and Mermaid diagrams for taxonomies or method pipelines.
 4. **Cite** — Spawn the `verifier` agent to add inline citations and verify every source URL in the draft.
 5. **Verify** — Spawn the `reviewer` agent to check the cited draft for unsupported claims, logical gaps, and single-source critical findings. Fix FATAL issues before delivering. Note MAJOR issues in Open Questions.
-6. **Deliver** — Save exactly one literature review to `outputs/` as markdown. Write a provenance record alongside it as `<filename>.provenance.md` listing: date, sources consulted vs. accepted vs. rejected, verification status, and intermediate research files used.
+6. **Deliver** — Save the final literature review to `outputs/<slug>.md`. Write a provenance record alongside it as `outputs/<slug>.provenance.md` listing: date, sources consulted vs. accepted vs. rejected, verification status, and intermediate research files used.
--- a/prompts/review.md
+++ b/prompts/review.md
@@ -6,10 +6,12 @@ topLevelCli: true
 ---
 Review this AI research artifact: $@

+Derive a short slug from the artifact name (lowercase, hyphens, no filler words, ≤5 words). Use this slug for all files in this run.
+
 Requirements:
 - Before starting, outline what will be reviewed and the review criteria (novelty, empirical rigor, baselines, reproducibility, etc.). Present the plan to the user and confirm before proceeding.
- Spawn a `researcher` subagent to gather evidence on the artifact — inspect the paper, code, cited work, and any linked experimental artifacts. Save to `research.md`.
- Spawn a `reviewer` subagent with `research.md` to produce the final peer review with inline annotations.
+- Spawn a `researcher` subagent to gather evidence on the artifact — inspect the paper, code, cited work, and any linked experimental artifacts. Save to `<slug>-research.md`.
+- Spawn a `reviewer` subagent with `<slug>-research.md` to produce the final peer review with inline annotations.
 - For small or simple artifacts where evidence gathering is overkill, run the `reviewer` subagent directly instead.
- Save exactly one review artifact to `outputs/` as markdown.
+- Save exactly one review artifact to `outputs/<slug>-review.md`.
 - End with a `Sources` section containing direct URLs for every inspected external source.
--- a/prompts/watch.md
+++ b/prompts/watch.md
@@ -6,9 +6,11 @@ topLevelCli: true
 ---
 Create a research watch for: $@

+Derive a short slug from the watch topic (lowercase, hyphens, no filler words, ≤5 words). Use this slug for all files in this run.
+
 Requirements:
- Before starting, outline the watch plan: what to monitor, what signals matter, what counts as a meaningful change, and the check frequency. Present the plan to the user and confirm before proceeding.
+- Before starting, outline the watch plan: what to monitor, what signals matter, what counts as a meaningful change, and the check frequency. Write the plan to `outputs/.plans/<slug>.md`. Present the plan to the user and confirm before proceeding.
 - Start with a baseline sweep of the topic.
 - Use `schedule_prompt` to create the recurring or delayed follow-up instead of merely promising to check later.
- Save exactly one baseline artifact to `outputs/`.
+- Save exactly one baseline artifact to `outputs/<slug>-baseline.md`.
 - End with a `Sources` section containing direct URLs for every source used.