Rename .pi to .feynman, rename citation agent to verifier, add website, skills, and docs

- Rename project config dir from .pi/ to .feynman/ (Pi supports this via piConfig.configDir)
- Rename citation agent to verifier across all prompts, agents, skills, and docs
- Add website with homepage and 24 doc pages (Astro + Tailwind)
- Add skills for all workflows (deep-research, lit, review, audit, replicate, compare, draft, autoresearch, watch, jobs, session-log, agentcomputer)
- Add Pi-native prompt frontmatter (args, section, topLevelCli) and read at runtime
- Remove sync-docs generation layer — docs are standalone
- Remove metadata/prompts.mjs and metadata/packages.mjs — not needed at runtime
- Rewrite README and homepage copy
- Add environment selection to /replicate before executing
- Add prompts/delegate.md and AGENTS.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Advait Paliwal
2026-03-23 17:35:35 -07:00
parent 406d50b3ff
commit f5570b4e5a
98 changed files with 9886 additions and 298 deletions

View File

@@ -0,0 +1,39 @@
---
title: Code Audit
description: Compare paper claims against public codebases
section: Workflows
order: 4
---
## Usage
```
/audit <item>
```
## What it does
Compares claims made in a paper against its public codebase. Surfaces mismatches, missing experiments, and reproducibility risks.
## What it checks
- Do the reported hyperparameters match the code?
- Are all claimed experiments present in the repository?
- Does the training loop match the described methodology?
- Are there undocumented preprocessing steps?
- Do evaluation metrics match the paper's claims?
## Example
```
/audit 2401.12345
```
## Output
An audit report with:
- Claim-by-claim verification
- Identified mismatches
- Missing components
- Reproducibility risk assessment

View File

@@ -0,0 +1,44 @@
---
title: Autoresearch
description: Autonomous experiment optimization loop
section: Workflows
order: 8
---
## Usage
```
/autoresearch <idea>
```
## What it does
Runs an autonomous experiment loop:
1. **Edit** — Modify code or configuration
2. **Commit** — Save the change
3. **Benchmark** — Run evaluation
4. **Evaluate** — Compare against baseline
5. **Keep or revert** — Persist improvements, roll back regressions
6. **Repeat** — Continue until the target is hit
## Tracking
Metrics are tracked in:
- `autoresearch.md` — Human-readable progress log
- `autoresearch.jsonl` — Machine-readable metrics over time
## Controls
```
/autoresearch <idea> # start or resume
/autoresearch off # stop, keep data
/autoresearch clear # delete all state, start fresh
```
## Example
```
/autoresearch optimize the learning rate schedule for better convergence
```

View File

@@ -0,0 +1,29 @@
---
title: Source Comparison
description: Compare multiple sources with agreement/disagreement matrix
section: Workflows
order: 6
---
## Usage
```
/compare <topic>
```
## What it does
Compares multiple sources on a topic. Builds an agreement/disagreement matrix showing where sources align and where they conflict.
## Example
```
/compare approaches to constitutional AI training
```
## Output
- Source-by-source breakdown
- Agreement/disagreement matrix
- Synthesis of key differences
- Assessment of which positions have stronger evidence

View File

@@ -0,0 +1,40 @@
---
title: Deep Research
description: Thorough source-heavy investigation with parallel agents
section: Workflows
order: 1
---
## Usage
```
/deepresearch <topic>
```
## What it does
Deep research runs a thorough, source-heavy investigation. It plans the research scope, delegates to parallel researcher agents, synthesizes findings, and adds inline citations.
The workflow follows these steps:
1. **Plan** — Clarify the research question and identify search strategy
2. **Delegate** — Spawn parallel researcher agents to gather evidence from different source types (papers, web, repos)
3. **Synthesize** — Merge findings, resolve contradictions, identify gaps
4. **Cite** — Add inline citations and verify all source URLs
5. **Deliver** — Write a durable research brief to `outputs/`
## Example
```
/deepresearch transformer scaling laws and their implications for compute-optimal training
```
## Output
Produces a structured research brief with:
- Executive summary
- Key findings organized by theme
- Evidence tables with source links
- Open questions and suggested next steps
- Numbered sources section with direct URLs

View File

@@ -0,0 +1,37 @@
---
title: Draft Writing
description: Paper-style draft generation from research findings
section: Workflows
order: 7
---
## Usage
```
/draft <topic>
```
## What it does
Produces a paper-style draft with structured sections. Writes to `papers/`.
## Structure
The generated draft includes:
- Title
- Abstract
- Introduction / Background
- Method or Approach
- Evidence and Analysis
- Limitations
- Conclusion
- Sources
## Example
```
/draft survey of differentiable physics simulators
```
The writer agent works only from supplied evidence — it never fabricates content. If evidence is insufficient, it explicitly notes the gaps.

View File

@@ -0,0 +1,31 @@
---
title: Literature Review
description: Map consensus, disagreements, and open questions
section: Workflows
order: 2
---
## Usage
```
/lit <topic>
```
## What it does
Runs a structured literature review that searches across academic papers and web sources. Explicitly separates consensus findings from disagreements and open questions.
## Example
```
/lit multimodal reasoning benchmarks for large language models
```
## Output
A structured review covering:
- **Consensus** — What the field agrees on
- **Disagreements** — Where sources conflict
- **Open questions** — What remains unresolved
- **Sources** — Direct links to all referenced papers and articles

View File

@@ -0,0 +1,42 @@
---
title: Replication
description: Plan replications of papers and claims
section: Workflows
order: 5
---
## Usage
```
/replicate <paper or claim>
```
## What it does
Extracts key implementation details from a paper, identifies what's needed to replicate the results, and asks where to run before executing anything.
Before running code, Feynman asks you to choose an execution environment:
- **Local** — run in the current working directory
- **Virtual environment** — create an isolated venv/conda env first
- **Cloud** — delegate to a remote Agent Computer machine
- **Plan only** — produce the replication plan without executing
## Example
```
/replicate "chain-of-thought prompting improves math reasoning"
```
## Output
A replication plan covering:
- Key claims to verify
- Required resources (compute, data, models)
- Implementation details extracted from the paper
- Potential pitfalls and underspecified details
- Step-by-step replication procedure
- Success criteria
If an execution environment is selected, also produces runnable scripts and captured results.

View File

@@ -0,0 +1,49 @@
---
title: Peer Review
description: Simulated peer review with severity-graded feedback
section: Workflows
order: 3
---
## Usage
```
/review <artifact>
```
## What it does
Simulates a tough-but-fair peer review for AI research artifacts. Evaluates novelty, empirical rigor, baselines, ablations, and reproducibility.
The reviewer agent identifies:
- Weak baselines
- Missing ablations
- Evaluation mismatches
- Benchmark leakage
- Under-specified implementation details
## Severity levels
Feedback is graded by severity:
- **FATAL** — Fundamental issues that invalidate the claims
- **MAJOR** — Significant problems that need addressing
- **MINOR** — Small improvements or clarifications
## Example
```
/review outputs/scaling-laws-brief.md
```
## Output
Structured review with:
- Summary of the work
- Strengths
- Weaknesses (severity-graded)
- Questions for the authors
- Verdict (accept / revise / reject)
- Revision plan

View File

@@ -0,0 +1,29 @@
---
title: Watch
description: Recurring research monitoring
section: Workflows
order: 9
---
## Usage
```
/watch <topic>
```
## What it does
Schedules a recurring research watch. Sets a baseline of current knowledge and defines what constitutes a meaningful change worth reporting.
## Example
```
/watch new papers on test-time compute scaling
```
## How it works
1. Feynman establishes a baseline by surveying current sources
2. Defines change signals (new papers, updated results, new repos)
3. Schedules periodic checks via `pi-schedule-prompt`
4. Reports only when meaningful changes are detected