fix: unblock unattended research workflows

2026-04-12 13:15:45 -07:00
parent aa96b5ee14
commit 4f6574f233
13 changed files with 117 additions and 12 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -15,6 +15,42 @@ Use this file to track chronology, not release notes. Keep entries short, factua
 - Blockers: ...
 - Next: ...

+### 2026-04-12 00:00 local — capital-france
+
+- Objective: Run an unattended deep-research workflow for the question "What is the capital of France?"
+- Changed: Created plan artifact at `outputs/.plans/capital-france.md`; scoped the workflow as a narrow fact-verification run with direct lead-agent evidence gathering instead of researcher subagents.
+- Verified: Read existing `CHANGELOG.md` and recalled prior saved plan memory for `capital-france` before finalizing the new run plan.
+- Failed / learned: None yet.
+- Blockers: Need at least two current independent authoritative sources and a quick ambiguity check before drafting.
+- Next: Collect current official/public sources, resolve any legal nuance, then draft and verify the brief.
+
+### 2026-04-12 00:20 local — capital-france
+
+- Objective: Complete evidence gathering and ambiguity check for the capital-of-France workflow.
+- Changed: Wrote `notes/capital-france-research-web.md` and `notes/capital-france-legal-context.md`; identified Insee (2024) and a Sénat report as the two main corroborating sources.
+- Verified: Cross-read current public French sources that explicitly describe Paris as the capital/capital city of France; found no current contradiction.
+- Failed / learned: The Presidency homepage was useful contextual support but not explicit enough to carry the core claim alone.
+- Blockers: Need citation pass and final review pass before promotion.
+- Next: Draft the brief, then run verifier and reviewer passes.
+
+### 2026-04-12 00:35 local — capital-france
+
+- Objective: Move from gathered evidence to a citable draft.
+- Changed: Wrote `outputs/.drafts/capital-france-draft.md` and updated the plan ledger to mark drafting complete.
+- Verified: Kept the core claim narrowly scoped to what the Insee and Sénat sources explicitly support; treated the Élysée page as contextual only.
+- Failed / learned: None.
+- Blockers: Need verifier URL/citation pass and reviewer verification pass before final promotion.
+- Next: Run verifier on the draft, then review and promote the final brief.
+
+### 2026-04-12 10:05 local — capital-france
+
+- Objective: Run the citation-verification pass on the capital-of-France draft and promote a final cited brief.
+- Changed: Verified the three draft source URLs were live (HTTP 200 at check time), added numbered inline citations, downgraded unsupported phrasing around the Élysée/context and broad ambiguity claims, and wrote `outputs/capital-france-brief.md`.
+- Verified: Confirmed Insee explicitly says Paris is the capital of France; confirmed the Sénat report describes Paris’s capital status and the presence of national institutions; confirmed the Élysée homepage is contextual only and not explicit enough to carry the core claim.
+- Failed / learned: The draft wording about the Presidency being seated in Paris was not directly supported by the cited homepage, so it was removed rather than carried forward.
+- Blockers: Reviewer pass still pending if the workflow requires an adversarial final check.
+- Next: If needed, run a final reviewer pass; otherwise use `outputs/capital-france-brief.md` as the canonical brief.
+
 ### 2026-03-25 00:00 local — scaling-laws

 - Objective: Set up a deep research workflow for scaling laws.
@@ -186,3 +222,21 @@ Use this file to track chronology, not release notes. Keep entries short, factua
 - Failed / learned: The website build still emits duplicate-id warnings for a handful of docs pages, but it completes successfully; those warnings predate this pass and were not introduced by the model-command edits.
 - Blockers: The Bedrock path is verified with the current shell's AWS credential chain, not with a fresh machine lacking AWS config; broader upstream Pi behavior around IMDS/default-profile autodiscovery without the sentinel is still outside this repo.
 - Next: Commit and push the combined Pi/model/docs maintenance branch, then decide whether to tackle the deeper search/deepresearch hang issues separately or leave them for focused repro work.
+
+### 2026-04-12 13:35 PDT — workflow-unattended-and-search-curator-fix-pass
+
+- Objective: Fix the remaining workflow deadlocks instead of leaving `deepresearch` and terminal web search half-functional after the maintenance push.
+- Changed: Updated the built-in research workflow prompts (`deepresearch`, `lit`, `review`, `audit`, `compare`, `draft`, `watch`) so they present the plan and continue automatically rather than blocking for approval; extended the `pi-web-access` runtime patch so Feynman rewrites its default workflow from browser-based `summary-review` to `none`; added explicit `workflow: "none"` persistence in `src/search/commands.ts` and `src/pi/web-access.ts`, plus surfaced the workflow in doctor/status-style output.
+- Verified: Reproduced the original `deepresearch` failure mode in print mode, where the run created `outputs/.plans/capital-france.md` and then stopped waiting for user confirmation; after the prompt changes, reran `deepresearch "What is the capital of France?"` and confirmed it progressed beyond planning and produced `outputs/.drafts/capital-france-draft.md`; inspected `pi-web-access@0.10.6` and confirmed the exact `waiting for summary approval...` string and `summary-review` default live in that package; added regression tests for the new `pi-web-access` patch and workflow-none status handling; reran `npm test`, `npm run typecheck`, and `npm run build`; smoke-tested `feynman search set exa exa_test_key` under a throwaway `FEYNMAN_HOME` and confirmed it writes `"workflow": "none"` to `web-search.json`.
+- Failed / learned: The long-running deepresearch session still spends substantial time in later reasoning/writing steps for even a narrow query, but the plan-confirmation deadlock itself is resolved; the remaining slowness is model/workflow behavior, not the original stop-after-plan bug.
+- Blockers: I did not install and execute the full optional `pi-session-search` package locally, so the terminal `summary approval` fix is validated by source inspection plus the Feynman patch path and config persistence rather than a local end-to-end package install.
+- Next: Commit and push the workflow/search fix pass, then close or answer the remaining deepresearch/search issues with the specific root causes and shipped fixes.
+
+### 2026-04-12 13:20 PDT — capital-france (citation verification brief)
+
+- Objective: Verify citations in the capital-of-France draft and produce a cited verifier brief.
+- Changed: Read `outputs/.drafts/capital-france-draft.md`, `notes/capital-france-research-web.md`, and `notes/capital-france-legal-context.md`; fetched the three draft URLs directly; wrote `notes/capital-france-brief.md` with inline numbered citations and a numbered direct-URL sources list.
+- Verified: Confirmed the Insee, Sénat, and Élysée URLs were reachable on 2026-04-12; confirmed Insee and Sénat support the core claim that Paris is the capital of France; marked the Élysée homepage as contextual-only support.
+- Failed / learned: The Élysée homepage does not explicitly state the core claim, so it should not be used as sole evidence for capital status.
+- Blockers: None for the verifier brief; any stronger legal memo would still need a more direct constitutional/statutory basis if that specific question is asked.
+- Next: Promote the brief into the final output or downgrade/remove any claim that leans on the Élysée URL alone.
--- a/prompts/audit.md
+++ b/prompts/audit.md
@@ -9,7 +9,7 @@ Audit the paper and codebase for: $@
 Derive a short slug from the audit target (lowercase, hyphens, no filler words, ≤5 words). Use this slug for all files in this run.

 Requirements:
- Before starting, outline the audit plan: which paper, which repo, which claims to check. Write the plan to `outputs/.plans/<slug>.md`. Present the plan to the user and confirm before proceeding.
+- Before starting, outline the audit plan: which paper, which repo, which claims to check. Write the plan to `outputs/.plans/<slug>.md`. Present the plan to the user, then continue automatically. Do not block the workflow waiting for confirmation.
 - Use the `researcher` subagent for evidence gathering and the `verifier` subagent to verify sources and add inline citations when the audit is non-trivial.
 - Compare claimed methods, defaults, metrics, and data handling against the actual code.
 - Call out missing code, mismatches, ambiguous defaults, and reproduction risks.
--- a/prompts/compare.md
+++ b/prompts/compare.md
@@ -9,7 +9,7 @@ Compare sources for: $@
 Derive a short slug from the comparison topic (lowercase, hyphens, no filler words, ≤5 words). Use this slug for all files in this run.

 Requirements:
- Before starting, outline the comparison plan: which sources to compare, which dimensions to evaluate, expected output structure. Write the plan to `outputs/.plans/<slug>.md`. Present the plan to the user and confirm before proceeding.
+- Before starting, outline the comparison plan: which sources to compare, which dimensions to evaluate, expected output structure. Write the plan to `outputs/.plans/<slug>.md`. Present the plan to the user, then continue automatically. Do not block the workflow waiting for confirmation.
 - Use the `researcher` subagent to gather source material when the comparison set is broad, and the `verifier` subagent to verify sources and add inline citations to the final matrix.
 - Build a comparison matrix covering: source, key claim, evidence type, caveats, confidence.
 - Generate charts with `pi-charts` when the comparison involves quantitative metrics. Use Mermaid for method or architecture comparisons.
--- a/prompts/deepresearch.md
+++ b/prompts/deepresearch.md
@@ -51,7 +51,7 @@ If `CHANGELOG.md` exists, read the most recent relevant entries before finalizin

 Also save the plan with `memory_remember` (type: `fact`, key: `deepresearch.<slug>.plan`) so it survives context truncation.

-Present the plan to the user and ask them to confirm before proceeding. If the user wants changes, revise the plan first.
+Present the plan to the user, then continue automatically. Do not block the workflow waiting for approval. If the user actively asks for changes, revise the plan first before proceeding.

 ## 2. Scale decision

--- a/prompts/draft.md
+++ b/prompts/draft.md
@@ -9,7 +9,7 @@ Write a paper-style draft for: $@
 Derive a short slug from the topic (lowercase, hyphens, no filler words, ≤5 words). Use this slug for all files in this run.

 Requirements:
- Before writing, outline the draft structure: proposed title, sections, key claims to make, source material to draw from, and a verification log for the critical claims, figures, and calculations. Write the outline to `outputs/.plans/<slug>.md`. Present the outline to the user and confirm before proceeding.
+- Before writing, outline the draft structure: proposed title, sections, key claims to make, source material to draw from, and a verification log for the critical claims, figures, and calculations. Write the outline to `outputs/.plans/<slug>.md`. Present the outline to the user, then continue automatically. Do not block the workflow waiting for confirmation.
 - Use the `writer` subagent when the draft should be produced from already-collected notes, then use the `verifier` subagent to add inline citations and verify sources.
 - Include at minimum: title, abstract, problem statement, related work, method or synthesis, evidence or experiments, limitations, conclusion.
 - Use clean Markdown with LaTeX where equations materially help.
--- a/prompts/lit.md
+++ b/prompts/lit.md
@@ -10,7 +10,7 @@ Derive a short slug from the topic (lowercase, hyphens, no filler words, ≤5 wo

 ## Workflow

-1. **Plan** — Outline the scope: key questions, source types to search (papers, web, repos), time period, expected sections, and a small task ledger plus verification log. Write the plan to `outputs/.plans/<slug>.md`. Present the plan to the user and confirm before proceeding.
+1. **Plan** — Outline the scope: key questions, source types to search (papers, web, repos), time period, expected sections, and a small task ledger plus verification log. Write the plan to `outputs/.plans/<slug>.md`. Present the plan to the user, then continue automatically. Do not block the workflow waiting for confirmation.
 2. **Gather** — Use the `researcher` subagent when the sweep is wide enough to benefit from delegated paper triage before synthesis. For narrow topics, search directly. Researcher outputs go to `<slug>-research-*.md`. Do not silently skip assigned questions; mark them `done`, `blocked`, or `superseded`.
 3. **Synthesize** — Separate consensus, disagreements, and open questions. When useful, propose concrete next experiments or follow-up reading. Generate charts with `pi-charts` for quantitative comparisons across papers and Mermaid diagrams for taxonomies or method pipelines. Before finishing the draft, sweep every strong claim against the verification log and downgrade anything that is inferred or single-source critical.
 4. **Cite** — Spawn the `verifier` agent to add inline citations and verify every source URL in the draft.
--- a/prompts/review.md
+++ b/prompts/review.md
@@ -9,7 +9,7 @@ Review this AI research artifact: $@
 Derive a short slug from the artifact name (lowercase, hyphens, no filler words, ≤5 words). Use this slug for all files in this run.

 Requirements:
- Before starting, outline what will be reviewed, the review criteria (novelty, empirical rigor, baselines, reproducibility, etc.), and any verification-specific checks needed for claims, figures, and reported metrics. Present the plan to the user and confirm before proceeding.
+- Before starting, outline what will be reviewed, the review criteria (novelty, empirical rigor, baselines, reproducibility, etc.), and any verification-specific checks needed for claims, figures, and reported metrics. Present the plan to the user, then continue automatically. Do not block the workflow waiting for confirmation.
 - Spawn a `researcher` subagent to gather evidence on the artifact — inspect the paper, code, cited work, and any linked experimental artifacts. Save to `<slug>-research.md`.
 - Spawn a `reviewer` subagent with `<slug>-research.md` to produce the final peer review with inline annotations.
 - For small or simple artifacts where evidence gathering is overkill, run the `reviewer` subagent directly instead.
--- a/prompts/watch.md
+++ b/prompts/watch.md
@@ -9,7 +9,7 @@ Create a research watch for: $@
 Derive a short slug from the watch topic (lowercase, hyphens, no filler words, ≤5 words). Use this slug for all files in this run.

 Requirements:
- Before starting, outline the watch plan: what to monitor, what signals matter, what counts as a meaningful change, and the check frequency. Write the plan to `outputs/.plans/<slug>.md`. Present the plan to the user and confirm before proceeding.
+- Before starting, outline the watch plan: what to monitor, what signals matter, what counts as a meaningful change, and the check frequency. Write the plan to `outputs/.plans/<slug>.md`. Present the plan to the user, then continue automatically. Do not block the workflow waiting for confirmation.
 - Start with a baseline sweep of the topic.
 - Use `schedule_prompt` to create the recurring or delayed follow-up instead of merely promising to check later.
 - Save exactly one baseline artifact to `outputs/<slug>-baseline.md`.
--- a/scripts/lib/pi-web-access-patch.mjs
+++ b/scripts/lib/pi-web-access-patch.mjs
@@ -16,14 +16,28 @@ const PATCHED_CONFIG_EXPR =

 export function patchPiWebAccessSource(relativePath, source) {
 	let patched = source;
+	let changed = false;

-	if (patched.includes(PATCHED_CONFIG_EXPR)) {
-		return patched;
+	if (!patched.includes(PATCHED_CONFIG_EXPR)) {
+		patched = patched.split(LEGACY_CONFIG_EXPR).join(PATCHED_CONFIG_EXPR);
+		changed = patched !== source;
 	}

-	patched = patched.split(LEGACY_CONFIG_EXPR).join(PATCHED_CONFIG_EXPR);
+	if (relativePath === "index.ts") {
+		if (patched.includes('return "summary-review";')) {
+			patched = patched.replace('return "summary-review";', 'return "none";');
+			changed = true;
+		}
+		if (patched.includes('summary-review = open curator with auto summary draft (default)')) {
+			patched = patched.replace(
+				'summary-review = open curator with auto summary draft (default)',
+				'summary-review = open curator with auto summary draft',
+			);
+			changed = true;
+		}
+	}

-	if (relativePath === "index.ts" && patched !== source) {
+	if (relativePath === "index.ts" && changed) {
 		patched = patched.replace('import { join } from "node:path";', 'import { dirname, join } from "node:path";');
 		patched = patched.replace('const dir = join(homedir(), ".pi");', "const dir = dirname(WEB_SEARCH_CONFIG_PATH);");
 	}
--- a/src/pi/web-access.ts
+++ b/src/pi/web-access.ts
@@ -3,11 +3,13 @@ import { dirname, resolve } from "node:path";
 import { getFeynmanHome } from "../config/paths.js";

 export type PiWebSearchProvider = "auto" | "perplexity" | "exa" | "gemini";
+export type PiWebSearchWorkflow = "none" | "summary-review";

 export type PiWebAccessConfig = Record<string, unknown> & {
 	route?: PiWebSearchProvider;
 	provider?: PiWebSearchProvider;
 	searchProvider?: PiWebSearchProvider;
+	workflow?: PiWebSearchWorkflow;
 	perplexityApiKey?: string;
 	exaApiKey?: string;
 	geminiApiKey?: string;
@@ -18,6 +20,7 @@ export type PiWebAccessStatus = {
 	configPath: string;
 	searchProvider: PiWebSearchProvider;
 	requestProvider: PiWebSearchProvider;
+	workflow: PiWebSearchWorkflow;
 	perplexityConfigured: boolean;
 	exaConfigured: boolean;
 	geminiApiConfigured: boolean;
@@ -35,6 +38,10 @@ function normalizeProvider(value: unknown): PiWebSearchProvider | undefined {
 	return value === "auto" || value === "perplexity" || value === "exa" || value === "gemini" ? value : undefined;
 }

+function normalizeWorkflow(value: unknown): PiWebSearchWorkflow | undefined {
+	return value === "none" || value === "summary-review" ? value : undefined;
+}
+
 function normalizeNonEmptyString(value: unknown): string | undefined {
 	return typeof value === "string" && value.trim().length > 0 ? value.trim() : undefined;
 }
@@ -102,6 +109,7 @@ export function getPiWebAccessStatus(
 	const searchProvider =
 		normalizeProvider(config.searchProvider) ?? normalizeProvider(config.route) ?? normalizeProvider(config.provider) ?? "auto";
 	const requestProvider = normalizeProvider(config.provider) ?? normalizeProvider(config.route) ?? searchProvider;
+	const workflow = normalizeWorkflow(config.workflow) ?? "none";
 	const perplexityConfigured = Boolean(normalizeNonEmptyString(config.perplexityApiKey));
 	const exaConfigured = Boolean(normalizeNonEmptyString(config.exaApiKey));
 	const geminiApiConfigured = Boolean(normalizeNonEmptyString(config.geminiApiKey));
@@ -112,6 +120,7 @@ export function getPiWebAccessStatus(
 		configPath,
 		searchProvider,
 		requestProvider,
+		workflow,
 		perplexityConfigured,
 		exaConfigured,
 		geminiApiConfigured,
@@ -128,6 +137,7 @@ export function formatPiWebAccessDoctorLines(
 		"web access: pi-web-access",
 		`  search route: ${status.routeLabel}`,
 		`  request route: ${status.requestProvider}`,
+		`  search workflow: ${status.workflow}`,
 		`  perplexity api: ${status.perplexityConfigured ? "configured" : "not configured"}`,
 		`  exa api: ${status.exaConfigured ? "configured" : "not configured"}`,
 		`  gemini api: ${status.geminiApiConfigured ? "configured" : "not configured"}`,
--- a/src/search/commands.ts
+++ b/src/search/commands.ts
@@ -18,6 +18,7 @@ export function printSearchStatus(): void {
 	printInfo("Managed by: pi-web-access");
 	printInfo(`Search route: ${status.routeLabel}`);
 	printInfo(`Request route: ${status.requestProvider}`);
+	printInfo(`Search workflow: ${status.workflow}`);
 	printInfo(`Perplexity API configured: ${status.perplexityConfigured ? "yes" : "no"}`);
 	printInfo(`Exa API configured: ${status.exaConfigured ? "yes" : "no"}`);
 	printInfo(`Gemini API configured: ${status.geminiApiConfigured ? "yes" : "no"}`);
@@ -36,6 +37,7 @@ export function setSearchProvider(provider: PiWebSearchProvider, apiKey?: string
 	const updates: Partial<Record<keyof PiWebAccessConfig, unknown>> = {
 		provider,
 		searchProvider: provider,
+		workflow: "none",
 		route: undefined,
 	};
 	const apiKeyField = PROVIDER_API_KEY_FIELDS[provider];
@@ -50,7 +52,7 @@ export function setSearchProvider(provider: PiWebSearchProvider, apiKey?: string
 }

 export function clearSearchConfig(): void {
-	savePiWebAccessConfig({ provider: undefined, searchProvider: undefined, route: undefined });
+	savePiWebAccessConfig({ provider: undefined, searchProvider: undefined, route: undefined, workflow: "none" });

 	const status = getPiWebAccessStatus();
 	console.log(`Web search provider reset to ${status.routeLabel}.`);
--- a/tests/pi-web-access-patch.test.ts
+++ b/tests/pi-web-access-patch.test.ts
@@ -33,6 +33,27 @@ test("patchPiWebAccessSource updates index.ts directory handling", () => {
 	assert.match(patched, /const dir = dirname\(WEB_SEARCH_CONFIG_PATH\);/);
 });

+test("patchPiWebAccessSource defaults workflow to none for index.ts", () => {
+	const input = [
+		'function resolveWorkflow(input: unknown, hasUI: boolean): WebSearchWorkflow {',
+		'\tif (!hasUI) return "none";',
+		'\tif (typeof input === "string" && input.trim().toLowerCase() === "none") return "none";',
+		'\treturn "summary-review";',
+		'}',
+		'workflow: Type.Optional(',
+		'\tStringEnum(["none", "summary-review"], {',
+		'\t\tdescription: "Search workflow mode: none = no curator, summary-review = open curator with auto summary draft (default)",',
+		'\t}),',
+		'),',
+		"",
+	].join("\n");
+
+	const patched = patchPiWebAccessSource("index.ts", input);
+
+	assert.match(patched, /return "none";/);
+	assert.doesNotMatch(patched, /summary-review = open curator with auto summary draft \(default\)/);
+});
+
 test("patchPiWebAccessSource is idempotent", () => {
 	const input = [
 		'import { join } from "node:path";',
--- a/tests/pi-web-access.test.ts
+++ b/tests/pi-web-access.test.ts
@@ -62,6 +62,7 @@ test("getPiWebAccessStatus reads Pi web-access config directly", () => {
 	const status = getPiWebAccessStatus(loadPiWebAccessConfig(configPath), configPath);
 	assert.equal(status.routeLabel, "Exa");
 	assert.equal(status.requestProvider, "exa");
+	assert.equal(status.workflow, "none");
 	assert.equal(status.exaConfigured, true);
 	assert.equal(status.geminiApiConfigured, true);
 	assert.equal(status.perplexityConfigured, false);
@@ -86,6 +87,7 @@ test("getPiWebAccessStatus reads Gemini routes directly", () => {
 	const status = getPiWebAccessStatus(loadPiWebAccessConfig(configPath), configPath);
 	assert.equal(status.routeLabel, "Gemini");
 	assert.equal(status.requestProvider, "gemini");
+	assert.equal(status.workflow, "none");
 	assert.equal(status.exaConfigured, false);
 	assert.equal(status.geminiApiConfigured, true);
 	assert.equal(status.perplexityConfigured, false);
@@ -100,6 +102,7 @@ test("getPiWebAccessStatus supports the legacy route key", () => {

 	assert.equal(status.routeLabel, "Perplexity");
 	assert.equal(status.requestProvider, "perplexity");
+	assert.equal(status.workflow, "none");
 	assert.equal(status.perplexityConfigured, true);
 });

@@ -112,5 +115,6 @@ test("formatPiWebAccessDoctorLines reports Pi-managed web access", () => {
 	);

 	assert.equal(lines[0], "web access: pi-web-access");
+	assert.ok(lines.some((line) => line.includes("search workflow: none")));
 	assert.ok(lines.some((line) => line.includes("/tmp/pi-web-search.json")));
 });