Add system-wide guardrails against fabricated results

This commit is contained in:
Advait Paliwal
2026-04-15 22:45:04 -07:00
parent 9bc59dad53
commit fe24224965
10 changed files with 14 additions and 10 deletions

View File

@@ -24,6 +24,8 @@ Operating rules:
- Do not force chain-shaped orchestration onto the user. Multi-agent decomposition is an internal tactic, not the primary UX. - Do not force chain-shaped orchestration onto the user. Multi-agent decomposition is an internal tactic, not the primary UX.
- For AI research artifacts, default to pressure-testing the work before polishing it. Use review-style workflows to check novelty positioning, evaluation design, baseline fairness, ablations, reproducibility, and likely reviewer objections. - For AI research artifacts, default to pressure-testing the work before polishing it. Use review-style workflows to check novelty positioning, evaluation design, baseline fairness, ablations, reproducibility, and likely reviewer objections.
- Do not say `verified`, `confirmed`, `checked`, or `reproduced` unless you actually performed the check and can point to the supporting source, artifact, or command output. - Do not say `verified`, `confirmed`, `checked`, or `reproduced` unless you actually performed the check and can point to the supporting source, artifact, or command output.
- Never invent or fabricate experimental results, scores, datasets, sample sizes, ablations, benchmark tables, figures, images, charts, or quantitative comparisons. If the user asks for a paper, report, draft, figure, or result and the underlying data is missing, write a clearly labeled placeholder such as `No experimental results are available yet` or `TODO: run experiment`.
- Every quantitative result, figure, table, chart, image, or benchmark claim must trace to at least one explicit source URL, research note, raw artifact path, or script/command output. If provenance is missing, omit the claim or mark it as a planned measurement instead of presenting it as fact.
- When a task involves calculations, code, or quantitative outputs, define the minimal test or oracle set before implementation and record the results of those checks before delivery. - When a task involves calculations, code, or quantitative outputs, define the minimal test or oracle set before implementation and record the results of those checks before delivery.
- If a plot, number, or conclusion looks cleaner than expected, assume it may be wrong until it survives explicit checks. Never smooth curves, drop inconvenient variations, or tune presentation-only outputs without stating that choice. - If a plot, number, or conclusion looks cleaner than expected, assume it may be wrong until it survives explicit checks. Never smooth curves, drop inconvenient variations, or tune presentation-only outputs without stating that choice.
- When a verification pass finds one issue, continue searching for others. Do not stop after the first error unless the whole branch is blocked. - When a verification pass finds one issue, continue searching for others. Do not stop after the first error unless the whole branch is blocked.

View File

@@ -25,7 +25,7 @@ curl -fsSL https://feynman.is/install | bash
irm https://feynman.is/install.ps1 | iex irm https://feynman.is/install.ps1 | iex
``` ```
The one-line installer fetches the latest tagged release. To pin a version, pass it explicitly, for example `curl -fsSL https://feynman.is/install | bash -s -- 0.2.18`. The one-line installer fetches the latest tagged release. To pin a version, pass it explicitly, for example `curl -fsSL https://feynman.is/install | bash -s -- 0.2.19`.
The installer downloads a standalone native bundle with its own Node.js runtime. The installer downloads a standalone native bundle with its own Node.js runtime.

4
package-lock.json generated
View File

@@ -1,12 +1,12 @@
{ {
"name": "@companion-ai/feynman", "name": "@companion-ai/feynman",
"version": "0.2.18", "version": "0.2.19",
"lockfileVersion": 3, "lockfileVersion": 3,
"requires": true, "requires": true,
"packages": { "packages": {
"": { "": {
"name": "@companion-ai/feynman", "name": "@companion-ai/feynman",
"version": "0.2.18", "version": "0.2.19",
"hasInstallScript": true, "hasInstallScript": true,
"license": "MIT", "license": "MIT",
"dependencies": { "dependencies": {

View File

@@ -1,6 +1,6 @@
{ {
"name": "@companion-ai/feynman", "name": "@companion-ai/feynman",
"version": "0.2.18", "version": "0.2.19",
"description": "Research-first CLI agent built on Pi and alphaXiv", "description": "Research-first CLI agent built on Pi and alphaXiv",
"license": "MIT", "license": "MIT",
"type": "module", "type": "module",

View File

@@ -110,7 +110,7 @@ This usually means the release exists, but not all platform bundles were uploade
Workarounds: Workarounds:
- try again after the release finishes publishing - try again after the release finishes publishing
- pass the latest published version explicitly, e.g.: - pass the latest published version explicitly, e.g.:
& ([scriptblock]::Create((irm https://feynman.is/install.ps1))) -Version 0.2.18 & ([scriptblock]::Create((irm https://feynman.is/install.ps1))) -Version 0.2.19
"@ "@
} }

View File

@@ -261,7 +261,7 @@ This usually means the release exists, but not all platform bundles were uploade
Workarounds: Workarounds:
- try again after the release finishes publishing - try again after the release finishes publishing
- pass the latest published version explicitly, e.g.: - pass the latest published version explicitly, e.g.:
curl -fsSL https://feynman.is/install | bash -s -- 0.2.18 curl -fsSL https://feynman.is/install | bash -s -- 0.2.19
EOF EOF
exit 1 exit 1
fi fi

View File

@@ -33,10 +33,12 @@ test("bundled prompts and skills do not contain blocked promotional product cont
test("draft workflow explicitly forbids fabricated results and unproven figures", () => { test("draft workflow explicitly forbids fabricated results and unproven figures", () => {
const draftPrompt = readFileSync(join(repoRoot, "prompts", "draft.md"), "utf8"); const draftPrompt = readFileSync(join(repoRoot, "prompts", "draft.md"), "utf8");
const systemPrompt = readFileSync(join(repoRoot, ".feynman", "SYSTEM.md"), "utf8");
const writerPrompt = readFileSync(join(repoRoot, ".feynman", "agents", "writer.md"), "utf8"); const writerPrompt = readFileSync(join(repoRoot, ".feynman", "agents", "writer.md"), "utf8");
const verifierPrompt = readFileSync(join(repoRoot, ".feynman", "agents", "verifier.md"), "utf8"); const verifierPrompt = readFileSync(join(repoRoot, ".feynman", "agents", "verifier.md"), "utf8");
for (const [label, content] of [ for (const [label, content] of [
["system prompt", systemPrompt],
["draft prompt", draftPrompt], ["draft prompt", draftPrompt],
["writer prompt", writerPrompt], ["writer prompt", writerPrompt],
["verifier prompt", verifierPrompt], ["verifier prompt", verifierPrompt],

View File

@@ -261,7 +261,7 @@ This usually means the release exists, but not all platform bundles were uploade
Workarounds: Workarounds:
- try again after the release finishes publishing - try again after the release finishes publishing
- pass the latest published version explicitly, e.g.: - pass the latest published version explicitly, e.g.:
curl -fsSL https://feynman.is/install | bash -s -- 0.2.18 curl -fsSL https://feynman.is/install | bash -s -- 0.2.19
EOF EOF
exit 1 exit 1
fi fi

View File

@@ -110,7 +110,7 @@ This usually means the release exists, but not all platform bundles were uploade
Workarounds: Workarounds:
- try again after the release finishes publishing - try again after the release finishes publishing
- pass the latest published version explicitly, e.g.: - pass the latest published version explicitly, e.g.:
& ([scriptblock]::Create((irm https://feynman.is/install.ps1))) -Version 0.2.18 & ([scriptblock]::Create((irm https://feynman.is/install.ps1))) -Version 0.2.19
"@ "@
} }

View File

@@ -117,13 +117,13 @@ These installers download the bundled `skills/` and `prompts/` trees plus the re
The one-line installer already targets the latest tagged release. To pin an exact version, pass it explicitly: The one-line installer already targets the latest tagged release. To pin an exact version, pass it explicitly:
```bash ```bash
curl -fsSL https://feynman.is/install | bash -s -- 0.2.18 curl -fsSL https://feynman.is/install | bash -s -- 0.2.19
``` ```
On Windows: On Windows:
```powershell ```powershell
& ([scriptblock]::Create((irm https://feynman.is/install.ps1))) -Version 0.2.18 & ([scriptblock]::Create((irm https://feynman.is/install.ps1))) -Version 0.2.19
``` ```
## Post-install setup ## Post-install setup