Overhaul Feynman harness: streamline agents, prompts, and extensions

Remove legacy chains, skills, and config modules. Add citation agent, SYSTEM.md, modular research-tools extension, and web-access layer. Add ralph-wiggum to Pi package stack for long-running loops. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 14:59:30 -07:00
parent d23e679331
commit 406d50b3ff
60 changed files with 2994 additions and 3191 deletions
--- a/skills/research/autoresearch/SKILL.md
+++ b/skills/research/autoresearch/SKILL.md
@@ -1,56 +0,0 @@
---
-name: autoresearch
-description: Use this when the user wants an end-to-end idea-to-paper run, from problem framing through literature, experiments if feasible, and a paper-style draft.
---
-
-# AutoResearch
-
-## When To Use
-
-Use this skill when the user wants:
- an idea turned into a paper-style draft
- a full research workflow, not just a memo or reading list
- autonomous progress from topic framing to deliverable
-
-## Procedure
-
-1. Restate the idea as a concrete research question and identify the likely contribution type:
-   - empirical result
-   - synthesis or review
-   - method proposal
-   - benchmark or audit
-2. Search for relevant primary sources first.
-3. If the topic is current, product-oriented, market-facing, or asks about latest developments, start with `web_search` and `fetch_content`.
-4. Use `alpha_search`, `alpha_get_paper`, and `alpha_ask_paper` for academic background or paper-centric parts of the topic.
-5. Build a compact evidence table in `notes/` or `outputs/` before deciding on the paper narrative.
-6. Decide whether experiments are feasible in the current environment:
-   - if yes, design and run the smallest experiment that materially reduces uncertainty
-   - if no, continue with a literature-grounded or theory-grounded draft and state the limitation clearly
-7. Produce at least two artifacts:
-   - an intermediate artifact (research memo, evidence table, or experiment log)
-   - a final paper-style draft in `papers/`
-8. Structure the final draft with:
-   - title
-   - abstract
-   - introduction
-   - related work
-   - method or synthesis
-   - evidence or experiments
-   - limitations
-   - conclusion
-9. End with a `Sources` section containing direct URLs for every source used.
-
-## Pitfalls
-
- Do not jump straight to drafting before checking the literature.
- Do not treat a current topic as if papers alone are enough.
- Do not fake experiments when the environment cannot support them.
- Do not present speculative contributions as established results.
- Do not omit limitations or missing validation.
-
-## Deliverable
-
-A complete idea-to-paper run should leave behind:
- one intermediate artifact in `notes/` or `outputs/`
- one final paper-style draft in `papers/`
- a source list with direct URLs
--- a/skills/research/context-recall/SKILL.md
+++ b/skills/research/context-recall/SKILL.md
@@ -1,39 +0,0 @@
---
-name: context-recall
-description: Use this when the user asks what was done before, refers to earlier sessions, wants prior artifacts, or expects Feynman to remember past work.
---
-
-# Context Recall
-
-## When To Use
-
-Use this skill when the user:
- asks what was done previously
- refers to an earlier paper, memo, or artifact
- expects cross-session continuity
- asks what has already been tried or written
-
-## Procedure
-
-1. Read durable memory first with `memory_search` or `memory_lessons`.
-2. Search prior sessions with `session_search`.
-3. If needed, inspect the current workspace for artifacts in `outputs/`, `notes/`, `experiments/`, and `papers/`.
-4. Distinguish clearly between:
-   - durable remembered facts
-   - session transcript recall
-   - currently present files on disk
-5. If you find a stable correction or preference that should persist, save it with `memory_remember`.
-
-## Pitfalls
-
- Do not claim to remember something without checking memory or session history.
- Do not confuse durable memory with transient task progress.
- Do not summarize prior work from vague impressions; recover evidence first.
-
-## Deliverable
-
-Include:
- what was previously done
- where the evidence came from
- which artifacts or files exist now
- any gaps or uncertainty
--- a/skills/research/deep-research/SKILL.md
+++ b/skills/research/deep-research/SKILL.md
@@ -1,54 +0,0 @@
---
-name: deep-research
-description: Use this when the user wants a broad, thorough investigation with strong sourcing, explicit evidence tables, and a durable research brief.
---
-
-# Deep Research
-
-## When To Use
-
-Use this skill when the user wants:
- a thorough investigation rather than a quick memo
- a broad landscape analysis
- careful source comparison across multiple source types
- a durable research brief with explicit evidence
-
-## Procedure
-
-1. Clarify the exact scope and what decision or question the research should support.
-2. Choose the right retrieval mix:
-   - use `web_search` and `fetch_content` first for current, product, market, regulatory, or latest topics
-   - use `alpha_search`, `alpha_get_paper`, and `alpha_ask_paper` for academic background or paper-centric claims
-   - use both when the topic spans current reality and academic literature
-3. Gather enough high-quality sources before synthesizing.
-4. Build an evidence table covering:
-   - source
-   - claim
-   - evidence type
-   - caveats
-   - relevance
-5. Synthesize:
-   - strongest findings
-   - disagreements
-   - open questions
-   - what would change the conclusion
-6. Save a durable markdown brief to `outputs/`.
-7. End with a `Sources` section containing direct URLs for every source used.
-
-## Pitfalls
-
- Do not answer a current topic from papers alone.
- Do not answer an academic topic from search snippets alone.
- Do not collapse disagreement into fake consensus.
- Do not omit the evidence table on broad or high-stakes topics.
-
-## Deliverable
-
-Include:
- scope
- evidence table
- key findings
- disagreements or caveats
- open questions
- recommendation or next step
- sources
--- a/skills/research/experiment-design/SKILL.md
+++ b/skills/research/experiment-design/SKILL.md
@@ -1,49 +0,0 @@
---
-name: experiment-design
-description: Use this when the task is to turn a vague research idea into a testable experiment, define metrics, choose baselines, or plan ablations.
---
-
-# Experiment Design
-
-## When To Use
-
-Use this skill when the user has:
- a hypothesis to test
- a method to evaluate
- an unclear benchmark plan
- a need for baselines, ablations, or metrics
-
-## Procedure
-
-1. Restate the research question as a falsifiable claim.
-2. Define:
-   - independent variables
-   - dependent variables
-   - success metrics
-   - baselines
-   - constraints
-3. Search for prior work first.
-4. If the setup is tied to current products, APIs, model offerings, pricing, or market behavior, use `web_search` and `fetch_content` first.
-5. Use `alpha_search`, `alpha_get_paper`, and `alpha_ask_paper` for academic baselines and prior experiments.
-6. Prefer the smallest experiment that can meaningfully reduce uncertainty.
-7. List confounders and failure modes up front.
-8. If implementation is requested, create the scripts, configs, and logging plan.
-9. Write the plan to disk before running expensive work.
-
-## Pitfalls
-
- Avoid experiments with no baseline.
- Avoid metrics that do not connect to the claim.
- Avoid ablations that change multiple variables at once.
- Avoid broad plans that cannot be executed with the current environment.
-
-## Deliverable
-
-Produce:
- hypothesis
- setup
- baselines
- metrics
- ablations
- risks
- next action
--- a/skills/research/literature-review/SKILL.md
+++ b/skills/research/literature-review/SKILL.md
@@ -1,57 +0,0 @@
---
-name: literature-review
-description: Use this when the task is to survey prior work, compare papers, synthesize a field, or build a reading list grounded in primary sources.
---
-
-# Literature Review
-
-## When To Use
-
-Use this skill when the user wants:
- a research overview
- a paper shortlist
- a comparison of methods
- a synthesis of consensus and disagreement
- a source-backed brief on a topic
-
-## Procedure
-
-1. Search broadly first.
-2. If the topic is primarily academic or paper-centric, start with `alpha_search`.
-3. If the topic includes current products, companies, markets, software, or "latest/current" framing, start with `web_search` and `fetch_content`, then use `alpha_search` only for academic background.
-4. Pick the strongest candidates by direct relevance, recency, citations, venue quality, and source quality.
-5. Inspect the top papers with `alpha_get_paper` before making concrete claims.
-6. Use `alpha_ask_paper` for missing methodological or experimental details.
-7. Build a compact evidence table:
-   - title
-   - year
-   - authors
-   - venue
-   - claim or contribution
-   - important caveats
-8. Distinguish:
-   - what multiple sources agree on
-   - where methods or findings differ
-   - what remains unresolved
-9. If the user wants a durable artifact, write a markdown brief to disk.
-10. If you discover an important gotcha about a paper, save it with `alpha_annotate_paper`.
-11. End with a `Sources` section that lists direct URLs, not just titles.
-
-## Pitfalls
-
- Do not summarize a field from titles alone.
- Do not flatten disagreements into fake consensus.
- Do not treat recent preprints as established facts without saying so.
- Do not cite secondary commentary when a primary source is available.
- Do not treat a current product or market topic as if it were a paper-only topic.
-
-## Output Shape
-
-Prefer this structure:
- question
- strongest papers
- major findings
- disagreements or caveats
- open questions
- recommended next reading or experiments
- sources
--- a/skills/research/paper-code-audit/SKILL.md
+++ b/skills/research/paper-code-audit/SKILL.md
@@ -1,52 +0,0 @@
---
-name: paper-code-audit
-description: Use this when the task is to compare a paper against its repository, verify whether claims are implemented, or assess reproducibility risk.
---
-
-# Paper Code Audit
-
-## When To Use
-
-Use this skill for:
- paper-versus-code verification
- implementation gap analysis
- reproducibility audits
- checking whether public code matches reported results
-
-## Procedure
-
-1. Locate the paper with `alpha_search`.
-2. Load the paper with `alpha_get_paper`.
-3. Extract implementation-relevant details using `alpha_ask_paper`:
-   - datasets
-   - preprocessing
-   - model architecture
-   - hyperparameters
-   - evaluation protocol
-4. If the paper links a repository, inspect it using `alpha_read_code`.
-5. Compare paper claims against code realities:
-   - are all components present
-   - do defaults match the paper
-   - are metrics/eval scripts exposed
-   - are hidden assumptions required
-6. Record concrete mismatches, not vibes.
-7. Save the audit in `outputs/`.
-8. If you find a durable gotcha, save it with `alpha_annotate_paper`.
-9. End with a `Sources` section for the paper and repository.
-
-## Pitfalls
-
- Do not infer repository behavior without opening the relevant files.
- Do not assume README claims reflect the actual defaults.
- Do not mark something as missing if it exists under another name without checking.
-
-## Deliverable
-
-Include:
- paper summary
- repository coverage
- confirmed matches
- mismatches or omissions
- reproducibility risks
- recommended next actions
- sources
--- a/skills/research/paper-writing/SKILL.md
+++ b/skills/research/paper-writing/SKILL.md
@@ -1,46 +0,0 @@
---
-name: paper-writing
-description: Use this when the task is to turn research notes, experiments, or a literature review into a polished paper-style writeup with Markdown and LaTeX.
---
-
-# Paper Writing
-
-## When To Use
-
-Use this skill for:
- research reports that should read like a paper
- internal memos with equations or formal structure
- polished writeups of experiments or literature reviews
- converting rough notes into a coherent draft
-
-## Procedure
-
-1. Make sure the underlying claims are already grounded in sources, experiments, or explicit caveats.
-2. Build the draft around a proper research structure:
-   - title
-   - abstract
-   - introduction or problem statement
-   - related work
-   - approach, synthesis, or methodology
-   - evidence, experiments, or case studies
-   - limitations
-   - conclusion
-3. Use Markdown by default.
-4. Use LaTeX only where equations or notation genuinely improve clarity.
-5. Keep claims falsifiable and scoped.
-6. Save polished drafts to `papers/`.
-7. Add a `Sources` appendix with direct URLs to all inspected references.
-
-## Pitfalls
-
- Do not use LaTeX for decoration.
- Do not make a draft look more certain than the evidence supports.
- Do not hide missing citations or weak evidence; flag them.
-
-## Deliverable
-
-A readable paper-style draft with:
- explicit structure
- traceable claims
- equations only where useful
- limitations stated plainly
--- a/skills/research/reading-list/SKILL.md
+++ b/skills/research/reading-list/SKILL.md
@@ -1,53 +0,0 @@
---
-name: reading-list
-description: Use this when the user wants a curated reading sequence, paper shortlist, or tiered set of papers for learning or project onboarding.
---
-
-# Reading List
-
-## When To Use
-
-Use this skill for:
- getting up to speed on a topic
- onboarding into a research area
- choosing which papers to read first
- constructing a project-specific reading order
-
-## Procedure
-
-1. Start with source discovery that matches the topic.
-2. For academic topics, use `alpha_search` in `all` mode.
-3. For current, product-oriented, or market-facing topics, use `web_search` and `fetch_content` first, then use `alpha_search` for background literature if needed.
-4. Inspect the strongest candidates directly before recommending them.
-5. Use `alpha_ask_paper` for fit questions like:
-   - what problem does this really solve
-   - what assumptions does it rely on
-   - what prior work does it build on
-6. Classify papers or sources into roles:
-   - foundational
-   - key recent advances
-   - evaluation or benchmark references
-   - critiques or limitations
-   - likely replication targets
-7. Order the list intentionally:
-   - start with orientation
-   - move to strongest methods
-   - finish with edges, critiques, or adjacent work
-8. Write the final list as a durable markdown artifact in `outputs/`.
-9. For every source, include a direct URL.
-
-## Pitfalls
-
- Do not sort purely by citations.
- Do not over-index on recency when fundamentals matter.
- Do not include papers you have not inspected at all.
- Do not force everything into papers when the user actually needs current docs, products, or market sources.
-
-## Deliverable
-
-For each paper include:
- title
- year
- why it matters
- when to read it in the sequence
- one caveat or limitation
--- a/skills/research/replication/SKILL.md
+++ b/skills/research/replication/SKILL.md
@@ -1,52 +0,0 @@
---
-name: replication
-description: Use this when the task is to reproduce a paper result, benchmark a claim, rebuild an experiment, or evaluate whether a published result holds in practice.
---
-
-# Replication
-
-## When To Use
-
-Use this skill for:
- paper reproduction
- benchmark recreation
- ablation reruns
- claim verification through code and experiments
-
-## Procedure
-
-1. Identify the canonical source paper and inspect it with `alpha_get_paper`.
-2. Extract the exact target:
-   - task
-   - dataset
-   - model or method
-   - metrics
-   - hardware or runtime assumptions
-3. Use `alpha_ask_paper` to pull out the exact details missing from the report.
-4. If the paper has a public repository, inspect it with `alpha_read_code`.
-5. Search the local workspace for existing code, notebooks, configs, and datasets.
-6. Write down the missing pieces explicitly before running anything.
-7. If the environment is sufficient, implement the minimal runnable reproduction path.
-8. Run the experiment with built-in file and shell tools.
-9. Save:
-   - commands used
-   - configs
-   - raw outputs
-   - summarized results
-10. Compare observed results with the paper and explain gaps.
-11. If the paper had a practical gotcha, attach it with `alpha_annotate_paper`.
-
-## Pitfalls
-
- Do not claim replication succeeded if key conditions were missing.
- Do not compare different metrics as if they were equivalent.
- Do not ignore dataset or preprocessing mismatch.
- Do not hide failed runs; record them and explain them.
-
-## Verification
-
-A good replication outcome includes:
- the exact command path
- the data or config used
- the observed metrics
- a clear statement of match, partial match, or mismatch
--- a/skills/research/research-memo/SKILL.md
+++ b/skills/research/research-memo/SKILL.md
@@ -1,45 +0,0 @@
---
-name: research-memo
-description: Use this when the user wants a source-grounded memo, briefing, landscape summary, or background note that is broader than a single paper.
---
-
-# Research Memo
-
-## When To Use
-
-Use this skill for:
- background research
- topic briefings
- market or field overviews
- synthesis across multiple sources
- internal memos that need traceable evidence
-
-## Procedure
-
-1. Find relevant sources first.
-2. If the topic is current, product-oriented, market-facing, or asks about latest developments, use `web_search` and `fetch_content` first.
-3. If there is an academic literature component, use `alpha_search` and inspect the strongest papers directly.
-4. Inspect the strongest sources directly before synthesizing.
-5. Separate:
-   - established facts
-   - plausible inferences
-   - unresolved questions
-6. Write a memo with clear sections and a concise narrative.
-7. End with a `Sources` section containing direct links.
-8. Save the memo to `outputs/` when the user wants a durable artifact.
-
-## Pitfalls
-
- Do not summarize from search snippets alone.
- Do not omit the source list.
- Do not present inference as fact.
- Do not rely on paper search alone for latest/current topics.
-
-## Deliverable
-
-Include:
- topic
- key findings
- disagreements or caveats
- open questions
- sources
--- a/skills/research/source-comparison/SKILL.md
+++ b/skills/research/source-comparison/SKILL.md
@@ -1,44 +0,0 @@
---
-name: source-comparison
-description: Use this when the task is to compare multiple papers, reports, or sources and produce a grounded matrix of agreements, disagreements, and confidence.
---
-
-# Source Comparison
-
-## When To Use
-
-Use this skill for:
- comparing papers on the same topic
- reconciling conflicting claims
- assessing multiple sources before making a recommendation
- producing evidence matrices
-
-## Procedure
-
-1. Find and inspect the strongest relevant sources first.
-2. For each source, extract:
-   - main claim
-   - evidence type
-   - caveats
-   - what would falsify or weaken the claim
-3. Build a comparison table or matrix.
-4. Separate:
-   - points of agreement
-   - points of disagreement
-   - unresolved questions
-5. End with a `Sources` section containing direct URLs.
-
-## Pitfalls
-
- Do not compare sources you have not actually opened.
- Do not blur disagreement into consensus.
- Do not omit source links.
-
-## Deliverable
-
-Include:
- matrix
- agreement summary
- disagreement summary
- confidence assessment
- sources