Rename .pi to .feynman, rename citation agent to verifier, add website, skills, and docs

- Rename project config dir from .pi/ to .feynman/ (Pi supports this via piConfig.configDir) - Rename citation agent to verifier across all prompts, agents, skills, and docs - Add website with homepage and 24 doc pages (Astro + Tailwind) - Add skills for all workflows (deep-research, lit, review, audit, replicate, compare, draft, autoresearch, watch, jobs, session-log, agentcomputer) - Add Pi-native prompt frontmatter (args, section, topLevelCli) and read at runtime - Remove sync-docs generation layer — docs are standalone - Remove metadata/prompts.mjs and metadata/packages.mjs — not needed at runtime - Rewrite README and homepage copy - Add environment selection to /replicate before executing - Add prompts/delegate.md and AGENTS.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 17:35:35 -07:00
parent 406d50b3ff
commit f5570b4e5a
98 changed files with 9886 additions and 298 deletions
--- a/website/src/content/docs/workflows/audit.md
+++ b/website/src/content/docs/workflows/audit.md
@@ -0,0 +1,39 @@
+---
+title: Code Audit
+description: Compare paper claims against public codebases
+section: Workflows
+order: 4
+---
+
+## Usage
+
+```
+/audit <item>
+```
+
+## What it does
+
+Compares claims made in a paper against its public codebase. Surfaces mismatches, missing experiments, and reproducibility risks.
+
+## What it checks
+
+- Do the reported hyperparameters match the code?
+- Are all claimed experiments present in the repository?
+- Does the training loop match the described methodology?
+- Are there undocumented preprocessing steps?
+- Do evaluation metrics match the paper's claims?
+
+## Example
+
+```
+/audit 2401.12345
+```
+
+## Output
+
+An audit report with:
+
+- Claim-by-claim verification
+- Identified mismatches
+- Missing components
+- Reproducibility risk assessment
--- a/website/src/content/docs/workflows/autoresearch.md
+++ b/website/src/content/docs/workflows/autoresearch.md
@@ -0,0 +1,44 @@
+---
+title: Autoresearch
+description: Autonomous experiment optimization loop
+section: Workflows
+order: 8
+---
+
+## Usage
+
+```
+/autoresearch <idea>
+```
+
+## What it does
+
+Runs an autonomous experiment loop:
+
+1. **Edit** — Modify code or configuration
+2. **Commit** — Save the change
+3. **Benchmark** — Run evaluation
+4. **Evaluate** — Compare against baseline
+5. **Keep or revert** — Persist improvements, roll back regressions
+6. **Repeat** — Continue until the target is hit
+
+## Tracking
+
+Metrics are tracked in:
+
+- `autoresearch.md` — Human-readable progress log
+- `autoresearch.jsonl` — Machine-readable metrics over time
+
+## Controls
+
+```
+/autoresearch <idea>     # start or resume
+/autoresearch off        # stop, keep data
+/autoresearch clear      # delete all state, start fresh
+```
+
+## Example
+
+```
+/autoresearch optimize the learning rate schedule for better convergence
+```
--- a/website/src/content/docs/workflows/compare.md
+++ b/website/src/content/docs/workflows/compare.md
@@ -0,0 +1,29 @@
+---
+title: Source Comparison
+description: Compare multiple sources with agreement/disagreement matrix
+section: Workflows
+order: 6
+---
+
+## Usage
+
+```
+/compare <topic>
+```
+
+## What it does
+
+Compares multiple sources on a topic. Builds an agreement/disagreement matrix showing where sources align and where they conflict.
+
+## Example
+
+```
+/compare approaches to constitutional AI training
+```
+
+## Output
+
+- Source-by-source breakdown
+- Agreement/disagreement matrix
+- Synthesis of key differences
+- Assessment of which positions have stronger evidence
--- a/website/src/content/docs/workflows/deep-research.md
+++ b/website/src/content/docs/workflows/deep-research.md
@@ -0,0 +1,40 @@
+---
+title: Deep Research
+description: Thorough source-heavy investigation with parallel agents
+section: Workflows
+order: 1
+---
+
+## Usage
+
+```
+/deepresearch <topic>
+```
+
+## What it does
+
+Deep research runs a thorough, source-heavy investigation. It plans the research scope, delegates to parallel researcher agents, synthesizes findings, and adds inline citations.
+
+The workflow follows these steps:
+
+1. **Plan** — Clarify the research question and identify search strategy
+2. **Delegate** — Spawn parallel researcher agents to gather evidence from different source types (papers, web, repos)
+3. **Synthesize** — Merge findings, resolve contradictions, identify gaps
+4. **Cite** — Add inline citations and verify all source URLs
+5. **Deliver** — Write a durable research brief to `outputs/`
+
+## Example
+
+```
+/deepresearch transformer scaling laws and their implications for compute-optimal training
+```
+
+## Output
+
+Produces a structured research brief with:
+
+- Executive summary
+- Key findings organized by theme
+- Evidence tables with source links
+- Open questions and suggested next steps
+- Numbered sources section with direct URLs
--- a/website/src/content/docs/workflows/draft.md
+++ b/website/src/content/docs/workflows/draft.md
@@ -0,0 +1,37 @@
+---
+title: Draft Writing
+description: Paper-style draft generation from research findings
+section: Workflows
+order: 7
+---
+
+## Usage
+
+```
+/draft <topic>
+```
+
+## What it does
+
+Produces a paper-style draft with structured sections. Writes to `papers/`.
+
+## Structure
+
+The generated draft includes:
+
+- Title
+- Abstract
+- Introduction / Background
+- Method or Approach
+- Evidence and Analysis
+- Limitations
+- Conclusion
+- Sources
+
+## Example
+
+```
+/draft survey of differentiable physics simulators
+```
+
+The writer agent works only from supplied evidence — it never fabricates content. If evidence is insufficient, it explicitly notes the gaps.
--- a/website/src/content/docs/workflows/literature-review.md
+++ b/website/src/content/docs/workflows/literature-review.md
@@ -0,0 +1,31 @@
+---
+title: Literature Review
+description: Map consensus, disagreements, and open questions
+section: Workflows
+order: 2
+---
+
+## Usage
+
+```
+/lit <topic>
+```
+
+## What it does
+
+Runs a structured literature review that searches across academic papers and web sources. Explicitly separates consensus findings from disagreements and open questions.
+
+## Example
+
+```
+/lit multimodal reasoning benchmarks for large language models
+```
+
+## Output
+
+A structured review covering:
+
+- **Consensus** — What the field agrees on
+- **Disagreements** — Where sources conflict
+- **Open questions** — What remains unresolved
+- **Sources** — Direct links to all referenced papers and articles
--- a/website/src/content/docs/workflows/replication.md
+++ b/website/src/content/docs/workflows/replication.md
@@ -0,0 +1,42 @@
+---
+title: Replication
+description: Plan replications of papers and claims
+section: Workflows
+order: 5
+---
+
+## Usage
+
+```
+/replicate <paper or claim>
+```
+
+## What it does
+
+Extracts key implementation details from a paper, identifies what's needed to replicate the results, and asks where to run before executing anything.
+
+Before running code, Feynman asks you to choose an execution environment:
+
+- **Local** — run in the current working directory
+- **Virtual environment** — create an isolated venv/conda env first
+- **Cloud** — delegate to a remote Agent Computer machine
+- **Plan only** — produce the replication plan without executing
+
+## Example
+
+```
+/replicate "chain-of-thought prompting improves math reasoning"
+```
+
+## Output
+
+A replication plan covering:
+
+- Key claims to verify
+- Required resources (compute, data, models)
+- Implementation details extracted from the paper
+- Potential pitfalls and underspecified details
+- Step-by-step replication procedure
+- Success criteria
+
+If an execution environment is selected, also produces runnable scripts and captured results.
--- a/website/src/content/docs/workflows/review.md
+++ b/website/src/content/docs/workflows/review.md
@@ -0,0 +1,49 @@
+---
+title: Peer Review
+description: Simulated peer review with severity-graded feedback
+section: Workflows
+order: 3
+---
+
+## Usage
+
+```
+/review <artifact>
+```
+
+## What it does
+
+Simulates a tough-but-fair peer review for AI research artifacts. Evaluates novelty, empirical rigor, baselines, ablations, and reproducibility.
+
+The reviewer agent identifies:
+
+- Weak baselines
+- Missing ablations
+- Evaluation mismatches
+- Benchmark leakage
+- Under-specified implementation details
+
+## Severity levels
+
+Feedback is graded by severity:
+
+- **FATAL** — Fundamental issues that invalidate the claims
+- **MAJOR** — Significant problems that need addressing
+- **MINOR** — Small improvements or clarifications
+
+## Example
+
+```
+/review outputs/scaling-laws-brief.md
+```
+
+## Output
+
+Structured review with:
+
+- Summary of the work
+- Strengths
+- Weaknesses (severity-graded)
+- Questions for the authors
+- Verdict (accept / revise / reject)
+- Revision plan
--- a/website/src/content/docs/workflows/watch.md
+++ b/website/src/content/docs/workflows/watch.md
@@ -0,0 +1,29 @@
+---
+title: Watch
+description: Recurring research monitoring
+section: Workflows
+order: 9
+---
+
+## Usage
+
+```
+/watch <topic>
+```
+
+## What it does
+
+Schedules a recurring research watch. Sets a baseline of current knowledge and defines what constitutes a meaningful change worth reporting.
+
+## Example
+
+```
+/watch new papers on test-time compute scaling
+```
+
+## How it works
+
+1. Feynman establishes a baseline by surveying current sources
+2. Defines change signals (new papers, updated results, new repos)
+3. Schedules periodic checks via `pi-schedule-prompt`
+4. Reports only when meaningful changes are detected