Polish Feynman harness and stabilize Pi web runtime

2026-03-22 20:20:26 -07:00
parent 7f0def3a4c
commit 46810f97b7
47 changed files with 3178 additions and 869 deletions
--- a/prompts/ablate.md
+++ b/prompts/ablate.md
@@ -13,5 +13,5 @@ Requirements:
  - unnecessary experiments
 - Call out where benchmark norms imply mandatory controls.
 - Optimize for the minimum convincing set, not experiment sprawl.
- Save the plan to `outputs/` as markdown if the user wants a durable artifact.
+- If the user wants a durable artifact, save exactly one plan to `outputs/` as markdown.
 - End with a `Sources` section containing direct URLs for any external sources used.
--- a/prompts/audit.md
+++ b/prompts/audit.md
@@ -11,4 +11,4 @@ Requirements:
 - Compare claimed methods, defaults, metrics, and data handling against the repository.
 - Call out missing code, mismatches, ambiguous defaults, and reproduction risks.
 - End with a `Sources` section containing paper and repository URLs.
- Save the audit to `outputs/` as markdown.
+- Save exactly one audit artifact to `outputs/` as markdown.
--- a/prompts/autoresearch.md
+++ b/prompts/autoresearch.md
@@ -13,6 +13,7 @@ Requirements:
 - Build a compact evidence table before committing to a paper narrative.
 - If experiments are feasible in the current environment, design and run the smallest experiment that materially reduces uncertainty.
 - If experiments are not feasible, produce a paper-style draft that is explicit about missing validation and limitations.
- Save intermediate planning or synthesis artifacts to `notes/` or `outputs/`.
- Save the final paper-style draft to `papers/`.
+- Produce one final durable markdown artifact for the user-facing result.
+- If the result is a paper-style draft, save it to `papers/`; otherwise save it to `outputs/`.
+- Do not create extra user-facing intermediate markdown files unless the user explicitly asks for them.
 - End with a `Sources` section containing direct URLs for every source used.
--- a/prompts/compare.md
+++ b/prompts/compare.md
@@ -17,4 +17,4 @@ Requirements:
  - confidence
 - Distinguish agreement, disagreement, and uncertainty clearly.
 - End with a `Sources` section containing direct URLs for every source used.
- Save the comparison to `outputs/` as markdown if the user wants a durable artifact.
+- If the user wants a durable artifact, save exactly one comparison to `outputs/` as markdown.
--- a/prompts/deepresearch.md
+++ b/prompts/deepresearch.md
@@ -4,12 +4,31 @@ description: Run a thorough, source-heavy investigation on a topic and produce a
 Run a deep research workflow for: $@

 Requirements:
- If the task is broad, multi-source, or obviously long-running, prefer delegating through the `subagent` tool. Use the project `researcher`, `verifier`, and `writer` agents, or the project `deep` chain when that decomposition fits.
+- Treat `/deepresearch` as one coherent Feynman workflow from the user's perspective. Do not expose internal orchestration primitives unless the user explicitly asks.
+- Start as the lead researcher. First make a compact plan: what must be answered, what evidence types are needed, and which sub-questions are worth splitting out.
+- Stay single-agent by default for narrow topics. Only use `subagent` when the task is broad enough that separate context windows materially improve breadth or speed.
+- If you use subagents, launch them as one worker batch around clearly disjoint sub-questions. Wait for the batch to finish, synthesize the results, and only then decide whether a second batch is needed.
+- Prefer breadth-first worker batches for deep research: different market segments, different source types, different time periods, different technical angles, or different competing explanations.
+- Use `researcher` workers for evidence gathering, `verifier` workers for adversarial claim-checking, and `writer` only if you already have solid evidence and need help polishing the final artifact.
+- Do not make the workflow chain-shaped by default. Hidden worker batches are optional implementation details, not the user-facing model.
 - If the user wants it to run unattended, or the sweep will clearly take a while, prefer background execution with `subagent` using `clarify: false, async: true`, then report how to inspect status.
 - If the topic is current, product-oriented, market-facing, regulatory, or asks about latest developments, start with `web_search` and `fetch_content`.
 - If the topic has an academic literature component, use `alpha_search`, `alpha_get_paper`, and `alpha_ask_paper` for the strongest papers.
 - Do not rely on a single source type when the topic spans both current reality and academic background.
 - Build a compact evidence table before synthesizing conclusions.
+- After synthesis, run a final verification/citation pass. For the strongest claims, independently confirm support and remove anything unsupported, fabricated, or stale.
 - Distinguish clearly between established facts, plausible inferences, disagreements, and unresolved questions.
- Produce a durable markdown artifact in `outputs/`.
+- Produce exactly one durable markdown artifact in `outputs/`.
+- The final artifact should read like one deep research memo, not like stitched-together worker transcripts.
+- Do not leave extra user-facing intermediate markdown files behind unless the user explicitly asks for them.
 - End with a `Sources` section containing direct URLs for every source used.
+
+Default execution shape:
+1. Clarify the actual research objective if needed.
+2. Make a short plan and identify the key sub-questions.
+3. Decide single-agent versus worker-batch execution.
+4. Gather evidence across the needed source types.
+5. Synthesize findings and identify remaining gaps.
+6. If needed, run one more worker batch for unresolved gaps.
+7. Perform a verification/citation pass.
+8. Write the final brief with a strict `Sources` section.
--- a/prompts/draft.md
+++ b/prompts/draft.md
@@ -18,4 +18,4 @@ Requirements:
  - conclusion
 - If citations are available, include citation placeholders or references clearly enough to convert later.
 - Add a `Sources` appendix with direct URLs for all primary references used while drafting.
- Save the draft to `papers/` as markdown.
+- Save exactly one draft to `papers/` as markdown.
--- a/prompts/lit.md
+++ b/prompts/lit.md
@@ -13,4 +13,4 @@ Requirements:
 - Separate consensus, disagreements, and open questions.
 - When useful, propose concrete next experiments or follow-up reading.
 - End with a `Sources` section containing direct URLs for every paper or source used.
- If the user wants an artifact, write the review to disk as markdown.
+- If the user wants an artifact, write exactly one review to disk as markdown.
--- a/prompts/memo.md
+++ b/prompts/memo.md
@@ -11,4 +11,4 @@ Requirements:
 - Read or inspect the top sources directly before making strong claims.
 - Distinguish facts, interpretations, and open questions.
 - End with a `Sources` section containing direct URLs for every source used.
- Save the memo to `outputs/` as markdown if the user wants a durable artifact.
+- If the user wants a durable artifact, save exactly one memo to `outputs/` as markdown.
--- a/prompts/reading.md
+++ b/prompts/reading.md
@@ -12,4 +12,4 @@ Requirements:
 - Group papers by role when useful: foundational, strongest recent work, methods, benchmarks, critiques, replication targets.
 - For each paper, explain why it is on the list.
 - Include direct URLs for each recommended source.
- Save the final reading list to `outputs/` as markdown.
+- Save exactly one final reading list to `outputs/` as markdown.
--- a/prompts/rebuttal.md
+++ b/prompts/rebuttal.md
@@ -14,5 +14,5 @@ Requirements:
  - paper changes needed
  - rebuttal language
 - Do not overclaim fixes that have not been implemented.
- Save the rebuttal matrix to `outputs/` as markdown.
+- Save exactly one rebuttal matrix to `outputs/` as markdown.
 - End with a `Sources` section containing direct URLs for all inspected external sources.
--- a/prompts/related.md
+++ b/prompts/related.md
@@ -15,5 +15,5 @@ Requirements:
 - For each important paper, explain why it matters to this project.
 - Be explicit about what real gap remains after considering the strongest prior work.
 - If the project is not differentiated enough, say so clearly.
- Save the artifact to `outputs/` as markdown if the user wants a durable result.
+- If the user wants a durable result, save exactly one artifact to `outputs/` as markdown.
 - End with a `Sources` section containing direct URLs.
--- a/prompts/review.md
+++ b/prompts/review.md
@@ -20,5 +20,5 @@ Requirements:
  - likely reviewer objections
  - severity for each issue
  - revision plan in priority order
- Save the review to `outputs/` as markdown.
+- Save exactly one review artifact to `outputs/` as markdown.
 - End with a `Sources` section containing direct URLs for every inspected external source.
--- a/prompts/watch.md
+++ b/prompts/watch.md
@@ -10,5 +10,5 @@ Requirements:
 - Summarize what should be monitored, what signals matter, and what counts as a meaningful change.
 - Use `schedule_prompt` to create the recurring or delayed follow-up instead of merely promising to check later.
 - If the user wants detached execution for the initial sweep, use `subagent` in background mode and report how to inspect status.
- Save a durable baseline artifact to `outputs/`.
+- Save exactly one durable baseline artifact to `outputs/`.
 - End with a `Sources` section containing direct URLs for every source used.