feynman/replicate.md at 85e0c4d8c4cfb955c50b91bcb4e179c5ec745467

Files

Advait Paliwal 7024a86024 Replace Pi tool registrations with skills and CLI integration

- Remove all manually registered Pi tools (alpha_search, alpha_get_paper,
  alpha_ask_paper, alpha_annotate_paper, alpha_list_annotations,
  alpha_read_code, session_search, preview_file) and their wrappers
  (alpha.ts, preview.ts, session-search.ts, alpha-tools.test.ts)
- Add Pi skill files for alpha-research, session-search, preview,
  modal-compute, and runpod-compute in skills/
- Sync skills to ~/.feynman/agent/skills/ on startup via syncBundledAssets
- Add node_modules/.bin to Pi subprocess PATH so alpha CLI is accessible
- Add /outputs extension command to browse research artifacts via dialog
- Add Modal and RunPod as execution environments in /replicate and
  /autoresearch prompts
- Remove redundant /alpha-login /alpha-logout /alpha-status REPL commands
  (feynman alpha CLI still works)
- Update README, researcher agent, metadata, and website docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-25 00:38:45 -07:00

2.3 KiB

Raw Blame History

description, args, section, topLevelCli

description	args	section	topLevelCli
Plan or execute a replication workflow for a paper, claim, or benchmark.	<paper>	Research Workflows	true

Design a replication plan for: $@

Workflow

Extract — Use the researcher subagent to pull implementation details from the target paper and any linked code. If CHANGELOG.md exists, read the most recent relevant entries before planning or resuming.
Plan — Determine what code, datasets, metrics, and environment are needed. Be explicit about what is verified, what is inferred, what is still missing, and which checks or test oracles will be used to decide whether the replication succeeded.
Environment — Before running anything, ask the user where to execute:
- Local — run in the current working directory
- Virtual environment — create an isolated venv/conda env first
- Docker — run experiment code inside an isolated Docker container
- Modal — run on Modal's serverless GPU infrastructure. Write a Modal-decorated Python script and execute with modal run <script.py>. Best for burst GPU jobs that don't need persistent state. Requires modal CLI (pip install modal && modal setup).
- RunPod — provision a GPU pod on RunPod and SSH in for execution. Use runpodctl to create pods, transfer files, and manage lifecycle. Best for long-running experiments or when you need SSH access and persistent storage. Requires runpodctl CLI and RUNPOD_API_KEY.
- Plan only — produce the replication plan without executing
Execute — If the user chose an execution environment, implement and run the replication steps there. Save notes, scripts, raw outputs, and results to disk in a reproducible layout. Do not call the outcome replicated unless the planned checks actually passed.
Log — For multi-step or resumable replication work, append concise entries to CHANGELOG.md after meaningful progress, failed attempts, major verification outcomes, and before stopping. Record the active objective, what changed, what was checked, and the next step.
Report — End with a Sources section containing paper and repository URLs.

Do not install packages, run training, or execute experiments without confirming the execution environment first.

2.3 KiB Raw Blame History

Workflow

2.3 KiB

Raw Blame History