Deduplicate fabricated-results guardrails

This commit is contained in:
Advait Paliwal
2026-04-15 22:53:38 -07:00
parent 501364da45
commit 043e241464
4 changed files with 13 additions and 7 deletions

View File

@@ -17,7 +17,7 @@ You receive a draft document and the research files it was built from. Your job
4. **Remove unsourced claims** — if a factual claim in the draft cannot be traced to any source in the research files, either find a source for it or remove it. Do not leave unsourced factual claims.
5. **Verify meaning, not just topic overlap.** A citation is valid only if the source actually supports the specific number, quote, or conclusion attached to it.
6. **Refuse fake certainty.** Do not use words like `verified`, `confirmed`, or `reproduced` unless the draft already contains or the research files provide the underlying evidence.
7. **Never invent or keep fabricated results.** If any image, figure, chart, table, benchmark, score, dataset, sample size, ablation, or experimental result lacks explicit provenance, remove it or replace it with a clearly labeled TODO. Never keep a made-up result because it “looks plausible.”
7. **Enforce the system prompt's provenance rule.** Unsupported results, figures, charts, tables, benchmarks, and quantitative claims must be removed or converted to TODOs.
## Citation rules
@@ -41,7 +41,7 @@ For code-backed or quantitative claims:
- Treat captions such as “illustrative,” “simulated,” “representative,” or “example” as insufficient unless the user explicitly requested synthetic/example data. Otherwise remove the visual and mark the missing experiment.
- Do not preserve polished summaries that outrun the raw evidence.
## Fabrication audit
## Result provenance audit
Before saving the final document, scan for:
- numeric scores or percentages,

View File

@@ -15,7 +15,7 @@ You are Feynman's writing subagent.
3. **Be explicit about gaps.** If the research files have unresolved questions or conflicting evidence, surface them — do not paper over them.
4. **Do not promote draft text into fact.** If a result is tentative, inferred, or awaiting verification, label it that way in the prose.
5. **No aesthetic laundering.** Do not make plots, tables, or summaries look cleaner than the underlying evidence justifies.
6. **Never fabricate results.** Do not invent experimental scores, datasets, sample sizes, ablations, benchmark tables, charts, image captions, or figures. If evidence is missing, write `No results are available yet` or `TODO: run experiment` rather than producing plausible-looking data.
6. **Follow the system prompt's provenance rule.** Missing results become gaps or TODOs, never plausible-looking data.
## Output structure
@@ -50,7 +50,7 @@ Unresolved issues, disagreements between sources, gaps in evidence.
- Do NOT add inline citations — the verifier agent handles that as a separate post-processing step.
- Do NOT add a Sources section — the verifier agent builds that.
- Before finishing, do a claim sweep: every strong factual statement in the draft should have an obvious source home in the research files.
- Before finishing, do a fake-result sweep: remove or replace any numeric result, figure, chart, benchmark, table, or image that lacks explicit provenance.
- Before finishing, do a result-provenance sweep for numeric results, figures, charts, benchmarks, tables, and images.
## Output contract
- Save the main artifact to the specified output path (default: `draft.md`).

View File

@@ -39,14 +39,20 @@ test("research writing prompts forbid fabricated results and unproven figures",
for (const [label, content] of [
["system prompt", systemPrompt],
["writer prompt", writerPrompt],
["verifier prompt", verifierPrompt],
] as const) {
assert.match(content, /Never (invent|fabricate)/i, `${label} must explicitly forbid invented or fabricated results`);
assert.match(content, /(figure|chart|image|table)/i, `${label} must cover visual/table provenance`);
assert.match(content, /(provenance|source|artifact|script|raw)/i, `${label} must require traceable support`);
}
for (const [label, content] of [
["writer prompt", writerPrompt],
["verifier prompt", verifierPrompt],
["draft prompt", draftPrompt],
] as const) {
assert.match(content, /system prompt.*provenance rule/i, `${label} must point back to the system provenance rule`);
}
assert.match(draftPrompt, /system prompt's provenance rules/i);
assert.match(draftPrompt, /placeholder or proposed experimental plan/i);
assert.match(draftPrompt, /source-backed quantitative data/i);

View File

@@ -35,7 +35,7 @@ When working from existing session context (after a deep research or literature
The writer pays attention to academic conventions: claims are attributed to their sources with inline citations, methodology sections describe procedures precisely, and limitations are discussed honestly. The draft includes placeholder sections for any content the writer cannot generate from available sources, clearly marking what needs human input.
The draft workflow must not invent experimental results, scores, figures, images, tables, or benchmark data. When no source material or raw artifact supports a result, Feynman should leave a clearly labeled placeholder such as `No experimental results are available yet` or `TODO: run experiment` instead of producing plausible-looking data.
Drafts follow Feynman's system-wide provenance rules: unsupported results, figures, images, tables, or benchmark data should become clearly labeled gaps or TODOs, not plausible-looking claims.
## Output format