50 lines
1.4 KiB
Markdown
50 lines
1.4 KiB
Markdown
---
|
|
name: experiment-design
|
|
description: Use this when the task is to turn a vague research idea into a testable experiment, define metrics, choose baselines, or plan ablations.
|
|
---
|
|
|
|
# Experiment Design
|
|
|
|
## When To Use
|
|
|
|
Use this skill when the user has:
|
|
- a hypothesis to test
|
|
- a method to evaluate
|
|
- an unclear benchmark plan
|
|
- a need for baselines, ablations, or metrics
|
|
|
|
## Procedure
|
|
|
|
1. Restate the research question as a falsifiable claim.
|
|
2. Define:
|
|
- independent variables
|
|
- dependent variables
|
|
- success metrics
|
|
- baselines
|
|
- constraints
|
|
3. Search for prior work first.
|
|
4. If the setup is tied to current products, APIs, model offerings, pricing, or market behavior, use `web_search` and `fetch_content` first.
|
|
5. Use `alpha_search`, `alpha_get_paper`, and `alpha_ask_paper` for academic baselines and prior experiments.
|
|
6. Prefer the smallest experiment that can meaningfully reduce uncertainty.
|
|
7. List confounders and failure modes up front.
|
|
8. If implementation is requested, create the scripts, configs, and logging plan.
|
|
9. Write the plan to disk before running expensive work.
|
|
|
|
## Pitfalls
|
|
|
|
- Avoid experiments with no baseline.
|
|
- Avoid metrics that do not connect to the claim.
|
|
- Avoid ablations that change multiple variables at once.
|
|
- Avoid broad plans that cannot be executed with the current environment.
|
|
|
|
## Deliverable
|
|
|
|
Produce:
|
|
- hypothesis
|
|
- setup
|
|
- baselines
|
|
- metrics
|
|
- ablations
|
|
- risks
|
|
- next action
|