Files
feynman/prompts/autoresearch.md
Advait Paliwal 7024a86024 Replace Pi tool registrations with skills and CLI integration
- Remove all manually registered Pi tools (alpha_search, alpha_get_paper,
  alpha_ask_paper, alpha_annotate_paper, alpha_list_annotations,
  alpha_read_code, session_search, preview_file) and their wrappers
  (alpha.ts, preview.ts, session-search.ts, alpha-tools.test.ts)
- Add Pi skill files for alpha-research, session-search, preview,
  modal-compute, and runpod-compute in skills/
- Sync skills to ~/.feynman/agent/skills/ on startup via syncBundledAssets
- Add node_modules/.bin to Pi subprocess PATH so alpha CLI is accessible
- Add /outputs extension command to browse research artifacts via dialog
- Add Modal and RunPod as execution environments in /replicate and
  /autoresearch prompts
- Remove redundant /alpha-login /alpha-logout /alpha-status REPL commands
  (feynman alpha CLI still works)
- Update README, researcher agent, metadata, and website docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 00:38:45 -07:00

67 lines
2.8 KiB
Markdown

---
description: Autonomous experiment loop — try ideas, measure results, keep what works, discard what doesn't, repeat.
args: <idea>
section: Research Workflows
topLevelCli: true
---
Start an autoresearch optimization loop for: $@
This command uses pi-autoresearch.
## Step 1: Gather
If `autoresearch.md` and `autoresearch.jsonl` already exist, ask the user if they want to resume or start fresh.
If `CHANGELOG.md` exists, read the most recent relevant entries before resuming.
Otherwise, collect the following from the user before doing anything else:
- What to optimize (test speed, bundle size, training loss, build time, etc.)
- The benchmark command to run
- The metric name, unit, and direction (lower/higher is better)
- Files in scope for changes
- Maximum number of iterations (default: 20)
## Step 2: Environment
Ask the user where to run:
- **Local** — run in the current working directory
- **New git branch** — create a branch so main stays clean
- **Virtual environment** — create an isolated venv/conda env first
- **Docker** — run experiment code inside an isolated Docker container
- **Modal** — run on Modal's serverless GPU infrastructure. Write Modal-decorated scripts and execute with `modal run`. Best for GPU-heavy benchmarks with no persistent state between iterations. Requires `modal` CLI.
- **RunPod** — provision a GPU pod via `runpodctl` and run iterations there over SSH. Best for experiments needing persistent state, large datasets, or SSH access between iterations. Requires `runpodctl` CLI.
Do not proceed without a clear answer.
## Step 3: Confirm
Present the full plan to the user before starting:
```
Optimization target: [metric] ([direction])
Benchmark command: [command]
Files in scope: [files]
Environment: [chosen environment]
Max iterations: [N]
```
Ask the user to confirm. Do not start the loop without explicit approval.
## Step 4: Run
Initialize the session: create `autoresearch.md`, `autoresearch.sh`, run the baseline, and start looping.
Each iteration: edit → commit → `run_experiment``log_experiment` → keep or revert → repeat. Do not stop unless interrupted or `maxIterations` is reached.
After the baseline and after meaningful iteration milestones, append a concise entry to `CHANGELOG.md` summarizing what changed, what metric result was observed, what failed, and the next step.
## Key tools
- `init_experiment` — one-time session config (name, metric, unit, direction)
- `run_experiment` — run the benchmark command, capture output and wall-clock time
- `log_experiment` — record result, auto-commit, update dashboard
## Subcommands
- `/autoresearch <text>` — start or resume the loop
- `/autoresearch off` — stop the loop, keep data
- `/autoresearch clear` — delete all state and start fresh