feynman/autoresearch.md at bfa538fa00b750fa0874015e8e1cd2358fac5e45

Files

Advait Paliwal 7024a86024 Replace Pi tool registrations with skills and CLI integration

- Remove all manually registered Pi tools (alpha_search, alpha_get_paper,
  alpha_ask_paper, alpha_annotate_paper, alpha_list_annotations,
  alpha_read_code, session_search, preview_file) and their wrappers
  (alpha.ts, preview.ts, session-search.ts, alpha-tools.test.ts)
- Add Pi skill files for alpha-research, session-search, preview,
  modal-compute, and runpod-compute in skills/
- Sync skills to ~/.feynman/agent/skills/ on startup via syncBundledAssets
- Add node_modules/.bin to Pi subprocess PATH so alpha CLI is accessible
- Add /outputs extension command to browse research artifacts via dialog
- Add Modal and RunPod as execution environments in /replicate and
  /autoresearch prompts
- Remove redundant /alpha-login /alpha-logout /alpha-status REPL commands
  (feynman alpha CLI still works)
- Update README, researcher agent, metadata, and website docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-25 00:38:45 -07:00

2.8 KiB

Raw Blame History

description, args, section, topLevelCli

description	args	section	topLevelCli
Autonomous experiment loop — try ideas, measure results, keep what works, discard what doesn't, repeat.	<idea>	Research Workflows	true

Start an autoresearch optimization loop for: $@

This command uses pi-autoresearch.

Step 1: Gather

If autoresearch.md and autoresearch.jsonl already exist, ask the user if they want to resume or start fresh. If CHANGELOG.md exists, read the most recent relevant entries before resuming.

Otherwise, collect the following from the user before doing anything else:

What to optimize (test speed, bundle size, training loss, build time, etc.)
The benchmark command to run
The metric name, unit, and direction (lower/higher is better)
Files in scope for changes
Maximum number of iterations (default: 20)

Step 2: Environment

Ask the user where to run:

Local — run in the current working directory
New git branch — create a branch so main stays clean
Virtual environment — create an isolated venv/conda env first
Docker — run experiment code inside an isolated Docker container
Modal — run on Modal's serverless GPU infrastructure. Write Modal-decorated scripts and execute with modal run. Best for GPU-heavy benchmarks with no persistent state between iterations. Requires modal CLI.
RunPod — provision a GPU pod via runpodctl and run iterations there over SSH. Best for experiments needing persistent state, large datasets, or SSH access between iterations. Requires runpodctl CLI.

Do not proceed without a clear answer.

Step 3: Confirm

Present the full plan to the user before starting:

Optimization target: [metric] ([direction])
Benchmark command:   [command]
Files in scope:      [files]
Environment:         [chosen environment]
Max iterations:      [N]

Ask the user to confirm. Do not start the loop without explicit approval.

Step 4: Run

Initialize the session: create autoresearch.md, autoresearch.sh, run the baseline, and start looping.

Each iteration: edit → commit → run_experiment → log_experiment → keep or revert → repeat. Do not stop unless interrupted or maxIterations is reached. After the baseline and after meaningful iteration milestones, append a concise entry to CHANGELOG.md summarizing what changed, what metric result was observed, what failed, and the next step.

Key tools

init_experiment — one-time session config (name, metric, unit, direction)
run_experiment — run the benchmark command, capture output and wall-clock time
log_experiment — record result, auto-commit, update dashboard

Subcommands

/autoresearch <text> — start or resume the loop
/autoresearch off — stop the loop, keep data
/autoresearch clear — delete all state and start fresh

2.8 KiB Raw Blame History