Overhaul Feynman harness: streamline agents, prompts, and extensions

Remove legacy chains, skills, and config modules. Add citation agent,
SYSTEM.md, modular research-tools extension, and web-access layer.
Add ralph-wiggum to Pi package stack for long-running loops.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Advait Paliwal
2026-03-23 14:59:30 -07:00
parent d23e679331
commit 406d50b3ff
60 changed files with 2994 additions and 3191 deletions

View File

@@ -1,56 +0,0 @@
---
name: autoresearch
description: Use this when the user wants an end-to-end idea-to-paper run, from problem framing through literature, experiments if feasible, and a paper-style draft.
---
# AutoResearch
## When To Use
Use this skill when the user wants:
- an idea turned into a paper-style draft
- a full research workflow, not just a memo or reading list
- autonomous progress from topic framing to deliverable
## Procedure
1. Restate the idea as a concrete research question and identify the likely contribution type:
- empirical result
- synthesis or review
- method proposal
- benchmark or audit
2. Search for relevant primary sources first.
3. If the topic is current, product-oriented, market-facing, or asks about latest developments, start with `web_search` and `fetch_content`.
4. Use `alpha_search`, `alpha_get_paper`, and `alpha_ask_paper` for academic background or paper-centric parts of the topic.
5. Build a compact evidence table in `notes/` or `outputs/` before deciding on the paper narrative.
6. Decide whether experiments are feasible in the current environment:
- if yes, design and run the smallest experiment that materially reduces uncertainty
- if no, continue with a literature-grounded or theory-grounded draft and state the limitation clearly
7. Produce at least two artifacts:
- an intermediate artifact (research memo, evidence table, or experiment log)
- a final paper-style draft in `papers/`
8. Structure the final draft with:
- title
- abstract
- introduction
- related work
- method or synthesis
- evidence or experiments
- limitations
- conclusion
9. End with a `Sources` section containing direct URLs for every source used.
## Pitfalls
- Do not jump straight to drafting before checking the literature.
- Do not treat a current topic as if papers alone are enough.
- Do not fake experiments when the environment cannot support them.
- Do not present speculative contributions as established results.
- Do not omit limitations or missing validation.
## Deliverable
A complete idea-to-paper run should leave behind:
- one intermediate artifact in `notes/` or `outputs/`
- one final paper-style draft in `papers/`
- a source list with direct URLs

View File

@@ -1,39 +0,0 @@
---
name: context-recall
description: Use this when the user asks what was done before, refers to earlier sessions, wants prior artifacts, or expects Feynman to remember past work.
---
# Context Recall
## When To Use
Use this skill when the user:
- asks what was done previously
- refers to an earlier paper, memo, or artifact
- expects cross-session continuity
- asks what has already been tried or written
## Procedure
1. Read durable memory first with `memory_search` or `memory_lessons`.
2. Search prior sessions with `session_search`.
3. If needed, inspect the current workspace for artifacts in `outputs/`, `notes/`, `experiments/`, and `papers/`.
4. Distinguish clearly between:
- durable remembered facts
- session transcript recall
- currently present files on disk
5. If you find a stable correction or preference that should persist, save it with `memory_remember`.
## Pitfalls
- Do not claim to remember something without checking memory or session history.
- Do not confuse durable memory with transient task progress.
- Do not summarize prior work from vague impressions; recover evidence first.
## Deliverable
Include:
- what was previously done
- where the evidence came from
- which artifacts or files exist now
- any gaps or uncertainty

View File

@@ -1,54 +0,0 @@
---
name: deep-research
description: Use this when the user wants a broad, thorough investigation with strong sourcing, explicit evidence tables, and a durable research brief.
---
# Deep Research
## When To Use
Use this skill when the user wants:
- a thorough investigation rather than a quick memo
- a broad landscape analysis
- careful source comparison across multiple source types
- a durable research brief with explicit evidence
## Procedure
1. Clarify the exact scope and what decision or question the research should support.
2. Choose the right retrieval mix:
- use `web_search` and `fetch_content` first for current, product, market, regulatory, or latest topics
- use `alpha_search`, `alpha_get_paper`, and `alpha_ask_paper` for academic background or paper-centric claims
- use both when the topic spans current reality and academic literature
3. Gather enough high-quality sources before synthesizing.
4. Build an evidence table covering:
- source
- claim
- evidence type
- caveats
- relevance
5. Synthesize:
- strongest findings
- disagreements
- open questions
- what would change the conclusion
6. Save a durable markdown brief to `outputs/`.
7. End with a `Sources` section containing direct URLs for every source used.
## Pitfalls
- Do not answer a current topic from papers alone.
- Do not answer an academic topic from search snippets alone.
- Do not collapse disagreement into fake consensus.
- Do not omit the evidence table on broad or high-stakes topics.
## Deliverable
Include:
- scope
- evidence table
- key findings
- disagreements or caveats
- open questions
- recommendation or next step
- sources

View File

@@ -1,49 +0,0 @@
---
name: experiment-design
description: Use this when the task is to turn a vague research idea into a testable experiment, define metrics, choose baselines, or plan ablations.
---
# Experiment Design
## When To Use
Use this skill when the user has:
- a hypothesis to test
- a method to evaluate
- an unclear benchmark plan
- a need for baselines, ablations, or metrics
## Procedure
1. Restate the research question as a falsifiable claim.
2. Define:
- independent variables
- dependent variables
- success metrics
- baselines
- constraints
3. Search for prior work first.
4. If the setup is tied to current products, APIs, model offerings, pricing, or market behavior, use `web_search` and `fetch_content` first.
5. Use `alpha_search`, `alpha_get_paper`, and `alpha_ask_paper` for academic baselines and prior experiments.
6. Prefer the smallest experiment that can meaningfully reduce uncertainty.
7. List confounders and failure modes up front.
8. If implementation is requested, create the scripts, configs, and logging plan.
9. Write the plan to disk before running expensive work.
## Pitfalls
- Avoid experiments with no baseline.
- Avoid metrics that do not connect to the claim.
- Avoid ablations that change multiple variables at once.
- Avoid broad plans that cannot be executed with the current environment.
## Deliverable
Produce:
- hypothesis
- setup
- baselines
- metrics
- ablations
- risks
- next action

View File

@@ -1,57 +0,0 @@
---
name: literature-review
description: Use this when the task is to survey prior work, compare papers, synthesize a field, or build a reading list grounded in primary sources.
---
# Literature Review
## When To Use
Use this skill when the user wants:
- a research overview
- a paper shortlist
- a comparison of methods
- a synthesis of consensus and disagreement
- a source-backed brief on a topic
## Procedure
1. Search broadly first.
2. If the topic is primarily academic or paper-centric, start with `alpha_search`.
3. If the topic includes current products, companies, markets, software, or "latest/current" framing, start with `web_search` and `fetch_content`, then use `alpha_search` only for academic background.
4. Pick the strongest candidates by direct relevance, recency, citations, venue quality, and source quality.
5. Inspect the top papers with `alpha_get_paper` before making concrete claims.
6. Use `alpha_ask_paper` for missing methodological or experimental details.
7. Build a compact evidence table:
- title
- year
- authors
- venue
- claim or contribution
- important caveats
8. Distinguish:
- what multiple sources agree on
- where methods or findings differ
- what remains unresolved
9. If the user wants a durable artifact, write a markdown brief to disk.
10. If you discover an important gotcha about a paper, save it with `alpha_annotate_paper`.
11. End with a `Sources` section that lists direct URLs, not just titles.
## Pitfalls
- Do not summarize a field from titles alone.
- Do not flatten disagreements into fake consensus.
- Do not treat recent preprints as established facts without saying so.
- Do not cite secondary commentary when a primary source is available.
- Do not treat a current product or market topic as if it were a paper-only topic.
## Output Shape
Prefer this structure:
- question
- strongest papers
- major findings
- disagreements or caveats
- open questions
- recommended next reading or experiments
- sources

View File

@@ -1,52 +0,0 @@
---
name: paper-code-audit
description: Use this when the task is to compare a paper against its repository, verify whether claims are implemented, or assess reproducibility risk.
---
# Paper Code Audit
## When To Use
Use this skill for:
- paper-versus-code verification
- implementation gap analysis
- reproducibility audits
- checking whether public code matches reported results
## Procedure
1. Locate the paper with `alpha_search`.
2. Load the paper with `alpha_get_paper`.
3. Extract implementation-relevant details using `alpha_ask_paper`:
- datasets
- preprocessing
- model architecture
- hyperparameters
- evaluation protocol
4. If the paper links a repository, inspect it using `alpha_read_code`.
5. Compare paper claims against code realities:
- are all components present
- do defaults match the paper
- are metrics/eval scripts exposed
- are hidden assumptions required
6. Record concrete mismatches, not vibes.
7. Save the audit in `outputs/`.
8. If you find a durable gotcha, save it with `alpha_annotate_paper`.
9. End with a `Sources` section for the paper and repository.
## Pitfalls
- Do not infer repository behavior without opening the relevant files.
- Do not assume README claims reflect the actual defaults.
- Do not mark something as missing if it exists under another name without checking.
## Deliverable
Include:
- paper summary
- repository coverage
- confirmed matches
- mismatches or omissions
- reproducibility risks
- recommended next actions
- sources

View File

@@ -1,46 +0,0 @@
---
name: paper-writing
description: Use this when the task is to turn research notes, experiments, or a literature review into a polished paper-style writeup with Markdown and LaTeX.
---
# Paper Writing
## When To Use
Use this skill for:
- research reports that should read like a paper
- internal memos with equations or formal structure
- polished writeups of experiments or literature reviews
- converting rough notes into a coherent draft
## Procedure
1. Make sure the underlying claims are already grounded in sources, experiments, or explicit caveats.
2. Build the draft around a proper research structure:
- title
- abstract
- introduction or problem statement
- related work
- approach, synthesis, or methodology
- evidence, experiments, or case studies
- limitations
- conclusion
3. Use Markdown by default.
4. Use LaTeX only where equations or notation genuinely improve clarity.
5. Keep claims falsifiable and scoped.
6. Save polished drafts to `papers/`.
7. Add a `Sources` appendix with direct URLs to all inspected references.
## Pitfalls
- Do not use LaTeX for decoration.
- Do not make a draft look more certain than the evidence supports.
- Do not hide missing citations or weak evidence; flag them.
## Deliverable
A readable paper-style draft with:
- explicit structure
- traceable claims
- equations only where useful
- limitations stated plainly

View File

@@ -1,53 +0,0 @@
---
name: reading-list
description: Use this when the user wants a curated reading sequence, paper shortlist, or tiered set of papers for learning or project onboarding.
---
# Reading List
## When To Use
Use this skill for:
- getting up to speed on a topic
- onboarding into a research area
- choosing which papers to read first
- constructing a project-specific reading order
## Procedure
1. Start with source discovery that matches the topic.
2. For academic topics, use `alpha_search` in `all` mode.
3. For current, product-oriented, or market-facing topics, use `web_search` and `fetch_content` first, then use `alpha_search` for background literature if needed.
4. Inspect the strongest candidates directly before recommending them.
5. Use `alpha_ask_paper` for fit questions like:
- what problem does this really solve
- what assumptions does it rely on
- what prior work does it build on
6. Classify papers or sources into roles:
- foundational
- key recent advances
- evaluation or benchmark references
- critiques or limitations
- likely replication targets
7. Order the list intentionally:
- start with orientation
- move to strongest methods
- finish with edges, critiques, or adjacent work
8. Write the final list as a durable markdown artifact in `outputs/`.
9. For every source, include a direct URL.
## Pitfalls
- Do not sort purely by citations.
- Do not over-index on recency when fundamentals matter.
- Do not include papers you have not inspected at all.
- Do not force everything into papers when the user actually needs current docs, products, or market sources.
## Deliverable
For each paper include:
- title
- year
- why it matters
- when to read it in the sequence
- one caveat or limitation

View File

@@ -1,52 +0,0 @@
---
name: replication
description: Use this when the task is to reproduce a paper result, benchmark a claim, rebuild an experiment, or evaluate whether a published result holds in practice.
---
# Replication
## When To Use
Use this skill for:
- paper reproduction
- benchmark recreation
- ablation reruns
- claim verification through code and experiments
## Procedure
1. Identify the canonical source paper and inspect it with `alpha_get_paper`.
2. Extract the exact target:
- task
- dataset
- model or method
- metrics
- hardware or runtime assumptions
3. Use `alpha_ask_paper` to pull out the exact details missing from the report.
4. If the paper has a public repository, inspect it with `alpha_read_code`.
5. Search the local workspace for existing code, notebooks, configs, and datasets.
6. Write down the missing pieces explicitly before running anything.
7. If the environment is sufficient, implement the minimal runnable reproduction path.
8. Run the experiment with built-in file and shell tools.
9. Save:
- commands used
- configs
- raw outputs
- summarized results
10. Compare observed results with the paper and explain gaps.
11. If the paper had a practical gotcha, attach it with `alpha_annotate_paper`.
## Pitfalls
- Do not claim replication succeeded if key conditions were missing.
- Do not compare different metrics as if they were equivalent.
- Do not ignore dataset or preprocessing mismatch.
- Do not hide failed runs; record them and explain them.
## Verification
A good replication outcome includes:
- the exact command path
- the data or config used
- the observed metrics
- a clear statement of match, partial match, or mismatch

View File

@@ -1,45 +0,0 @@
---
name: research-memo
description: Use this when the user wants a source-grounded memo, briefing, landscape summary, or background note that is broader than a single paper.
---
# Research Memo
## When To Use
Use this skill for:
- background research
- topic briefings
- market or field overviews
- synthesis across multiple sources
- internal memos that need traceable evidence
## Procedure
1. Find relevant sources first.
2. If the topic is current, product-oriented, market-facing, or asks about latest developments, use `web_search` and `fetch_content` first.
3. If there is an academic literature component, use `alpha_search` and inspect the strongest papers directly.
4. Inspect the strongest sources directly before synthesizing.
5. Separate:
- established facts
- plausible inferences
- unresolved questions
6. Write a memo with clear sections and a concise narrative.
7. End with a `Sources` section containing direct links.
8. Save the memo to `outputs/` when the user wants a durable artifact.
## Pitfalls
- Do not summarize from search snippets alone.
- Do not omit the source list.
- Do not present inference as fact.
- Do not rely on paper search alone for latest/current topics.
## Deliverable
Include:
- topic
- key findings
- disagreements or caveats
- open questions
- sources

View File

@@ -1,44 +0,0 @@
---
name: source-comparison
description: Use this when the task is to compare multiple papers, reports, or sources and produce a grounded matrix of agreements, disagreements, and confidence.
---
# Source Comparison
## When To Use
Use this skill for:
- comparing papers on the same topic
- reconciling conflicting claims
- assessing multiple sources before making a recommendation
- producing evidence matrices
## Procedure
1. Find and inspect the strongest relevant sources first.
2. For each source, extract:
- main claim
- evidence type
- caveats
- what would falsify or weaken the claim
3. Build a comparison table or matrix.
4. Separate:
- points of agreement
- points of disagreement
- unresolved questions
5. End with a `Sources` section containing direct URLs.
## Pitfalls
- Do not compare sources you have not actually opened.
- Do not blur disagreement into consensus.
- Do not omit source links.
## Deliverable
Include:
- matrix
- agreement summary
- disagreement summary
- confidence assessment
- sources