Add system-wide guardrails against fabricated results
This commit is contained in:
@@ -24,6 +24,8 @@ Operating rules:
|
|||||||
- Do not force chain-shaped orchestration onto the user. Multi-agent decomposition is an internal tactic, not the primary UX.
|
- Do not force chain-shaped orchestration onto the user. Multi-agent decomposition is an internal tactic, not the primary UX.
|
||||||
- For AI research artifacts, default to pressure-testing the work before polishing it. Use review-style workflows to check novelty positioning, evaluation design, baseline fairness, ablations, reproducibility, and likely reviewer objections.
|
- For AI research artifacts, default to pressure-testing the work before polishing it. Use review-style workflows to check novelty positioning, evaluation design, baseline fairness, ablations, reproducibility, and likely reviewer objections.
|
||||||
- Do not say `verified`, `confirmed`, `checked`, or `reproduced` unless you actually performed the check and can point to the supporting source, artifact, or command output.
|
- Do not say `verified`, `confirmed`, `checked`, or `reproduced` unless you actually performed the check and can point to the supporting source, artifact, or command output.
|
||||||
|
- Never invent or fabricate experimental results, scores, datasets, sample sizes, ablations, benchmark tables, figures, images, charts, or quantitative comparisons. If the user asks for a paper, report, draft, figure, or result and the underlying data is missing, write a clearly labeled placeholder such as `No experimental results are available yet` or `TODO: run experiment`.
|
||||||
|
- Every quantitative result, figure, table, chart, image, or benchmark claim must trace to at least one explicit source URL, research note, raw artifact path, or script/command output. If provenance is missing, omit the claim or mark it as a planned measurement instead of presenting it as fact.
|
||||||
- When a task involves calculations, code, or quantitative outputs, define the minimal test or oracle set before implementation and record the results of those checks before delivery.
|
- When a task involves calculations, code, or quantitative outputs, define the minimal test or oracle set before implementation and record the results of those checks before delivery.
|
||||||
- If a plot, number, or conclusion looks cleaner than expected, assume it may be wrong until it survives explicit checks. Never smooth curves, drop inconvenient variations, or tune presentation-only outputs without stating that choice.
|
- If a plot, number, or conclusion looks cleaner than expected, assume it may be wrong until it survives explicit checks. Never smooth curves, drop inconvenient variations, or tune presentation-only outputs without stating that choice.
|
||||||
- When a verification pass finds one issue, continue searching for others. Do not stop after the first error unless the whole branch is blocked.
|
- When a verification pass finds one issue, continue searching for others. Do not stop after the first error unless the whole branch is blocked.
|
||||||
|
|||||||
@@ -25,7 +25,7 @@ curl -fsSL https://feynman.is/install | bash
|
|||||||
irm https://feynman.is/install.ps1 | iex
|
irm https://feynman.is/install.ps1 | iex
|
||||||
```
|
```
|
||||||
|
|
||||||
The one-line installer fetches the latest tagged release. To pin a version, pass it explicitly, for example `curl -fsSL https://feynman.is/install | bash -s -- 0.2.18`.
|
The one-line installer fetches the latest tagged release. To pin a version, pass it explicitly, for example `curl -fsSL https://feynman.is/install | bash -s -- 0.2.19`.
|
||||||
|
|
||||||
The installer downloads a standalone native bundle with its own Node.js runtime.
|
The installer downloads a standalone native bundle with its own Node.js runtime.
|
||||||
|
|
||||||
|
|||||||
4
package-lock.json
generated
4
package-lock.json
generated
@@ -1,12 +1,12 @@
|
|||||||
{
|
{
|
||||||
"name": "@companion-ai/feynman",
|
"name": "@companion-ai/feynman",
|
||||||
"version": "0.2.18",
|
"version": "0.2.19",
|
||||||
"lockfileVersion": 3,
|
"lockfileVersion": 3,
|
||||||
"requires": true,
|
"requires": true,
|
||||||
"packages": {
|
"packages": {
|
||||||
"": {
|
"": {
|
||||||
"name": "@companion-ai/feynman",
|
"name": "@companion-ai/feynman",
|
||||||
"version": "0.2.18",
|
"version": "0.2.19",
|
||||||
"hasInstallScript": true,
|
"hasInstallScript": true,
|
||||||
"license": "MIT",
|
"license": "MIT",
|
||||||
"dependencies": {
|
"dependencies": {
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
{
|
{
|
||||||
"name": "@companion-ai/feynman",
|
"name": "@companion-ai/feynman",
|
||||||
"version": "0.2.18",
|
"version": "0.2.19",
|
||||||
"description": "Research-first CLI agent built on Pi and alphaXiv",
|
"description": "Research-first CLI agent built on Pi and alphaXiv",
|
||||||
"license": "MIT",
|
"license": "MIT",
|
||||||
"type": "module",
|
"type": "module",
|
||||||
|
|||||||
@@ -110,7 +110,7 @@ This usually means the release exists, but not all platform bundles were uploade
|
|||||||
Workarounds:
|
Workarounds:
|
||||||
- try again after the release finishes publishing
|
- try again after the release finishes publishing
|
||||||
- pass the latest published version explicitly, e.g.:
|
- pass the latest published version explicitly, e.g.:
|
||||||
& ([scriptblock]::Create((irm https://feynman.is/install.ps1))) -Version 0.2.18
|
& ([scriptblock]::Create((irm https://feynman.is/install.ps1))) -Version 0.2.19
|
||||||
"@
|
"@
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -261,7 +261,7 @@ This usually means the release exists, but not all platform bundles were uploade
|
|||||||
Workarounds:
|
Workarounds:
|
||||||
- try again after the release finishes publishing
|
- try again after the release finishes publishing
|
||||||
- pass the latest published version explicitly, e.g.:
|
- pass the latest published version explicitly, e.g.:
|
||||||
curl -fsSL https://feynman.is/install | bash -s -- 0.2.18
|
curl -fsSL https://feynman.is/install | bash -s -- 0.2.19
|
||||||
EOF
|
EOF
|
||||||
exit 1
|
exit 1
|
||||||
fi
|
fi
|
||||||
|
|||||||
@@ -33,10 +33,12 @@ test("bundled prompts and skills do not contain blocked promotional product cont
|
|||||||
|
|
||||||
test("draft workflow explicitly forbids fabricated results and unproven figures", () => {
|
test("draft workflow explicitly forbids fabricated results and unproven figures", () => {
|
||||||
const draftPrompt = readFileSync(join(repoRoot, "prompts", "draft.md"), "utf8");
|
const draftPrompt = readFileSync(join(repoRoot, "prompts", "draft.md"), "utf8");
|
||||||
|
const systemPrompt = readFileSync(join(repoRoot, ".feynman", "SYSTEM.md"), "utf8");
|
||||||
const writerPrompt = readFileSync(join(repoRoot, ".feynman", "agents", "writer.md"), "utf8");
|
const writerPrompt = readFileSync(join(repoRoot, ".feynman", "agents", "writer.md"), "utf8");
|
||||||
const verifierPrompt = readFileSync(join(repoRoot, ".feynman", "agents", "verifier.md"), "utf8");
|
const verifierPrompt = readFileSync(join(repoRoot, ".feynman", "agents", "verifier.md"), "utf8");
|
||||||
|
|
||||||
for (const [label, content] of [
|
for (const [label, content] of [
|
||||||
|
["system prompt", systemPrompt],
|
||||||
["draft prompt", draftPrompt],
|
["draft prompt", draftPrompt],
|
||||||
["writer prompt", writerPrompt],
|
["writer prompt", writerPrompt],
|
||||||
["verifier prompt", verifierPrompt],
|
["verifier prompt", verifierPrompt],
|
||||||
|
|||||||
@@ -261,7 +261,7 @@ This usually means the release exists, but not all platform bundles were uploade
|
|||||||
Workarounds:
|
Workarounds:
|
||||||
- try again after the release finishes publishing
|
- try again after the release finishes publishing
|
||||||
- pass the latest published version explicitly, e.g.:
|
- pass the latest published version explicitly, e.g.:
|
||||||
curl -fsSL https://feynman.is/install | bash -s -- 0.2.18
|
curl -fsSL https://feynman.is/install | bash -s -- 0.2.19
|
||||||
EOF
|
EOF
|
||||||
exit 1
|
exit 1
|
||||||
fi
|
fi
|
||||||
|
|||||||
@@ -110,7 +110,7 @@ This usually means the release exists, but not all platform bundles were uploade
|
|||||||
Workarounds:
|
Workarounds:
|
||||||
- try again after the release finishes publishing
|
- try again after the release finishes publishing
|
||||||
- pass the latest published version explicitly, e.g.:
|
- pass the latest published version explicitly, e.g.:
|
||||||
& ([scriptblock]::Create((irm https://feynman.is/install.ps1))) -Version 0.2.18
|
& ([scriptblock]::Create((irm https://feynman.is/install.ps1))) -Version 0.2.19
|
||||||
"@
|
"@
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -117,13 +117,13 @@ These installers download the bundled `skills/` and `prompts/` trees plus the re
|
|||||||
The one-line installer already targets the latest tagged release. To pin an exact version, pass it explicitly:
|
The one-line installer already targets the latest tagged release. To pin an exact version, pass it explicitly:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
curl -fsSL https://feynman.is/install | bash -s -- 0.2.18
|
curl -fsSL https://feynman.is/install | bash -s -- 0.2.19
|
||||||
```
|
```
|
||||||
|
|
||||||
On Windows:
|
On Windows:
|
||||||
|
|
||||||
```powershell
|
```powershell
|
||||||
& ([scriptblock]::Create((irm https://feynman.is/install.ps1))) -Version 0.2.18
|
& ([scriptblock]::Create((irm https://feynman.is/install.ps1))) -Version 0.2.19
|
||||||
```
|
```
|
||||||
|
|
||||||
## Post-install setup
|
## Post-install setup
|
||||||
|
|||||||
Reference in New Issue
Block a user