1.4 KiB
1.4 KiB
name, description
| name | description |
|---|---|
| experiment-design | Use this when the task is to turn a vague research idea into a testable experiment, define metrics, choose baselines, or plan ablations. |
Experiment Design
When To Use
Use this skill when the user has:
- a hypothesis to test
- a method to evaluate
- an unclear benchmark plan
- a need for baselines, ablations, or metrics
Procedure
- Restate the research question as a falsifiable claim.
- Define:
- independent variables
- dependent variables
- success metrics
- baselines
- constraints
- Search for prior work first.
- If the setup is tied to current products, APIs, model offerings, pricing, or market behavior, use
web_searchandfetch_contentfirst. - Use
alpha_search,alpha_get_paper, andalpha_ask_paperfor academic baselines and prior experiments. - Prefer the smallest experiment that can meaningfully reduce uncertainty.
- List confounders and failure modes up front.
- If implementation is requested, create the scripts, configs, and logging plan.
- Write the plan to disk before running expensive work.
Pitfalls
- Avoid experiments with no baseline.
- Avoid metrics that do not connect to the claim.
- Avoid ablations that change multiple variables at once.
- Avoid broad plans that cannot be executed with the current environment.
Deliverable
Produce:
- hypothesis
- setup
- baselines
- metrics
- ablations
- risks
- next action