fix: require final research artifacts before exit

2026-04-12 13:17:55 -07:00
parent 4f6574f233
commit 5d10285372
3 changed files with 37 additions and 1 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -42,6 +42,15 @@ Use this file to track chronology, not release notes. Keep entries short, factua
 - Blockers: Need verifier URL/citation pass and reviewer verification pass before final promotion.
 - Next: Run verifier on the draft, then review and promote the final brief.

+### 2026-04-12 00:50 local — capital-france
+
+- Objective: Complete citation, verification, and final promotion for the capital-of-France workflow.
+- Changed: Produced `outputs/capital-france-brief.md`, ran verification into `notes/capital-france-verification.md`, promoted the final brief to `outputs/capital-france.md`, and wrote `outputs/capital-france.provenance.md`.
+- Verified: Reviewer found no FATAL or MAJOR issues. Core claim remains backed by two independent French public-institution sources, with Insee as the primary explicit source and the Sénat report as corroboration.
+- Failed / learned: The runtime did not expose a named `verifier` subagent, so I used an available worker in a verifier-equivalent role and recorded that deviation in the plan.
+- Blockers: None.
+- Next: If needed, extend the brief with deeper legal-historical sourcing, but the narrow factual question is sufficiently answered.
+
 ### 2026-04-12 10:05 local — capital-france

 - Objective: Run the citation-verification pass on the capital-of-France draft and promote a final cited brief.
@@ -51,6 +60,15 @@ Use this file to track chronology, not release notes. Keep entries short, factua
 - Blockers: Reviewer pass still pending if the workflow requires an adversarial final check.
 - Next: If needed, run a final reviewer pass; otherwise use `outputs/capital-france-brief.md` as the canonical brief.

+### 2026-04-12 10:20 local — capital-france
+
+- Objective: Close the workflow with final review, final artifact promotion, and provenance.
+- Changed: Ran a reviewer pass recorded in `notes/capital-france-verification.md`; promoted the cited brief into `outputs/capital-france.md`; wrote `outputs/capital-france.provenance.md`; updated the run plan to mark all tasks complete.
+- Verified: Reviewer verdict was PASS WITH MINOR REVISIONS only; those minor wording fixes were applied before delivery.
+- Failed / learned: The runtime did not expose a project-named `verifier` agent, so the citation pass used an available worker agent as a verifier-equivalent step.
+- Blockers: None.
+- Next: Optional only — produce a legal memorandum on the basis of Paris's capital status if requested.
+
 ### 2026-03-25 00:00 local — scaling-laws

 - Objective: Set up a deep research workflow for scaling laws.
@@ -232,6 +250,15 @@ Use this file to track chronology, not release notes. Keep entries short, factua
 - Blockers: I did not install and execute the full optional `pi-session-search` package locally, so the terminal `summary approval` fix is validated by source inspection plus the Feynman patch path and config persistence rather than a local end-to-end package install.
 - Next: Commit and push the workflow/search fix pass, then close or answer the remaining deepresearch/search issues with the specific root causes and shipped fixes.

+### 2026-04-12 14:05 PDT — final-artifact-hardening-pass
+
+- Objective: Reduce the chance of unattended research workflows stopping at intermediate artifacts like `<slug>-brief.md` without promoting the final deliverable and provenance sidecar.
+- Changed: Tightened `prompts/deepresearch.md` so the agent must verify on disk that the plan, draft, cited brief, promoted final output, and provenance sidecar all exist before stopping; tightened `prompts/lit.md` so it explicitly checks for the final output plus provenance sidecar instead of stopping at an intermediate cited draft.
+- Verified: Cross-read the current deepresearch/lit deliver steps after the earlier unattended-run reproductions and confirmed the missing enforcement point was the final on-disk artifact check, not the naming convention itself.
+- Failed / learned: This is still prompt-level enforcement rather than a deterministic post-processing hook, so it improves completion reliability but does not provide the same guarantees as a dedicated artifact-finalization wrapper.
+- Blockers: I did not rerun a full broad deepresearch workflow end-to-end after this prompt-only hardening because those runs are materially longer and more expensive than the narrow reproductions already used to isolate the earlier deadlocks.
+- Next: Commit and push the prompt hardening, then, if needed, add a deterministic wrapper around final artifact promotion instead of relying only on prompt adherence.
+
 ### 2026-04-12 13:20 PDT — capital-france (citation verification brief)

 - Objective: Verify citations in the capital-of-France draft and produce a cited verifier brief.
--- a/prompts/deepresearch.md
+++ b/prompts/deepresearch.md
@@ -182,6 +182,15 @@ Write a provenance record alongside it as `<slug>.provenance.md`:
 - **Research files:** [list of intermediate <slug>-research-*.md files]
 ```

+Before you stop, verify on disk that all of these exist:
+- `outputs/.plans/<slug>.md`
+- `outputs/.drafts/<slug>-draft.md`
+- `<slug>-brief.md` intermediate cited brief
+- `outputs/<slug>.md` or `papers/<slug>.md` final promoted deliverable
+- `outputs/<slug>.provenance.md` or `papers/<slug>.provenance.md` provenance sidecar
+
+Do not stop at `<slug>-brief.md` alone. If the cited brief exists but the promoted final output or provenance sidecar does not, create them before responding.
+
 ## Background execution

 If the user wants unattended execution or the sweep will clearly take a while:
--- a/prompts/lit.md
+++ b/prompts/lit.md
@@ -15,4 +15,4 @@ Derive a short slug from the topic (lowercase, hyphens, no filler words, ≤5 wo
 3. **Synthesize** — Separate consensus, disagreements, and open questions. When useful, propose concrete next experiments or follow-up reading. Generate charts with `pi-charts` for quantitative comparisons across papers and Mermaid diagrams for taxonomies or method pipelines. Before finishing the draft, sweep every strong claim against the verification log and downgrade anything that is inferred or single-source critical.
 4. **Cite** — Spawn the `verifier` agent to add inline citations and verify every source URL in the draft.
 5. **Verify** — Spawn the `reviewer` agent to check the cited draft for unsupported claims, logical gaps, zombie sections, and single-source critical findings. Fix FATAL issues before delivering. Note MAJOR issues in Open Questions. If FATAL issues were found, run one more verification pass after the fixes.
-6. **Deliver** — Save the final literature review to `outputs/<slug>.md`. Write a provenance record alongside it as `outputs/<slug>.provenance.md` listing: date, sources consulted vs. accepted vs. rejected, verification status, and intermediate research files used.
+6. **Deliver** — Save the final literature review to `outputs/<slug>.md`. Write a provenance record alongside it as `outputs/<slug>.provenance.md` listing: date, sources consulted vs. accepted vs. rejected, verification status, and intermediate research files used. Before you stop, verify on disk that both files exist; do not stop at an intermediate cited draft alone.