chore: CLAUDE.md + build.py refresh + feynman-skills import

- CLAUDE.md: updated project guidance
- build.py: install flow tweaks (post install_opencode fix)
- personas/_shared/feynman-skills/: 20 Feynman skills imported from ~/Documents/opencode-skills-parked/, sibling _platform-mapping.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
salvacybersec
2026-04-19 01:35:13 +03:00
parent 0b308ed8be
commit 3126dadd19
26 changed files with 2204 additions and 13 deletions

View File

@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
## What This Is
A platform-agnostic system prompt library for LLM agents. 29 personas across 10 domains, 111 variants, 59,712 words. Includes 796 shared skills, 58 brand design systems, 23 company agents, 168 AD/red team attack docs (InternalAllTheThings), and auto-install to 7 platforms (Claude, Antigravity, Gemini, OpenClaw, OpenCode, Paperclip, raw).
A platform-agnostic system prompt library for LLM agents. 29 personas across 10 domains, 111 variants, 59,712 words. Includes 796 shared skills + 20 Feynman research-workflow skills, 58 brand design systems, 23 company agents, 168 AD/red team attack docs (InternalAllTheThings), and auto-install to 7 platforms (Claude, Antigravity, Gemini, OpenClaw, OpenCode, Paperclip, raw).
## Build
@@ -37,6 +37,7 @@ Optional: `cp config.example.yaml config.yaml` for dynamic variable injection. B
- `skills/` — 42 shared skills from OpenClaw/kali-claw (SKILL.md + references per skill)
- `paperclip-skills/` — 52 skills from paperclip-docs (ceo-advisor, coding-agent, security-review, etc.)
- `community-skills/` — 703 skills from skills.sh marketplace (shadcn, vercel, olla, expo, etc.) (shadcn, vercel, marketing, expo, obsidian, impeccable, browser-use, stitch, firecrawl, github, neon, azure, etc.)
- `feynman-skills/` — 20 research-workflow skills adapted from Feynman (deep-research, literature-review, paper-code-audit, peer-review, paper-writing, replication, source-comparison, summarize, alpha-research, eli5, autoresearch, docker/modal/runpod compute, session-log, session-search, jobs, watch, preview, contributing). Cross-platform: Claude Code + OpenCode. Subagent refs (`researcher`/`reviewer`/`writer`/`verifier`) mapped to host `Task`/`task` tool. Mapped to Scholar/Forge/Oracle personas.
- `design-md/` — 58 brand DESIGN.md files (Stripe, Claude, Linear, Apple, Vercel, etc.)
- `ui-ux-pro-max/` — BM25 search engine + 14 CSV data files (67 styles, 161 products, 57 fonts)
- `paperclip-agents/` — 23 company agents (Odin/CEO, Thor/CTO, Freya/CMO, Frigg/COO + 19 team members)
@@ -57,10 +58,11 @@ Optional: `cp config.example.yaml config.yaml` for dynamic variable injection. B
```bash
python3 build.py --install claude # 111 slash commands → ~/.claude/commands/
python3 build.py --install claude-skills # shared skills → ~/.claude/skills/ (default: skills,paperclip-skills,feynman-skills)
python3 build.py --install antigravity # personas → ~/.config/antigravity/personas/
python3 build.py --install gemini # Gems → generated/_gems/
python3 build.py --install openclaw # IDENTITY.md + 29 personas → generated/_openclaw/
python3 build.py --install opencode # 29 agents + 1530 skills → ~/.config/opencode/{agents,skills}/
python3 build.py --install opencode # 29 agents + skills → ~/.config/opencode/{agents,skills}/
python3 build.py --install paperclip # 52 agents + 73 skills → generated/_paperclip/
python3 build.py --install all # all platforms at once
```

133
build.py
View File

@@ -258,9 +258,13 @@ def build_persona(
# Inject mapped skills for this persona
if skills_index:
mapped_skills = []
for skill_name, skill_info in skills_index.get("skills", {}).items():
if persona_name in skill_info.get("personas", []):
mapped_skills.append(skill_name)
for bucket in ("skills", "feynman_skills"):
for skill_name, skill_info in skills_index.get(bucket, {}).items():
if not isinstance(skill_info, dict):
continue
if persona_name in skill_info.get("personas", []):
if skill_name not in mapped_skills:
mapped_skills.append(skill_name)
# Also check config-based custom mapping
skill_map = skills_index.get("_skill_persona_map", {})
for skill_name, persona_list in skill_map.items():
@@ -466,11 +470,39 @@ def infer_personas_from_skill_metadata(skill_name: str, metadata: dict) -> list:
"ot": ["centurion", "bastion", "sentinel"],
"scada": ["centurion", "bastion", "sentinel"],
"ics": ["centurion", "bastion", "sentinel"],
# Research / academic workflows (Feynman skills)
"research": ["scholar", "oracle"],
"paper": ["scholar", "oracle"],
"arxiv": ["scholar", "oracle"],
"replication": ["scholar", "forge"],
"peer review": ["scholar"],
"literature review": ["scholar"],
"experiment": ["forge", "scholar"],
"citation": ["scholar", "oracle"],
}
for keyword, mapped_personas in keyword_map.items():
if keyword in blob:
personas.update(mapped_personas)
# Feynman research-workflow skills map to scholar (primary) + forge/oracle.
FEYNMAN_SKILLS = {
"alpha-research", "autoresearch", "contributing", "deep-research",
"docker", "eli5", "jobs", "literature-review", "modal-compute",
"paper-code-audit", "paper-writing", "peer-review", "preview",
"replication", "runpod-compute", "session-log", "session-search",
"source-comparison", "summarize", "watch",
}
if name in FEYNMAN_SKILLS:
if name in {"deep-research", "literature-review", "source-comparison",
"paper-code-audit", "peer-review", "paper-writing",
"replication", "alpha-research", "eli5", "summarize"}:
personas.update(["scholar", "oracle"])
if name in {"autoresearch", "docker", "modal-compute", "runpod-compute",
"replication"}:
personas.add("forge")
if name in {"watch", "session-search", "jobs"}:
personas.add("oracle")
# Conservative fallback for unmapped cybersecurity skills
if not personas and "cyber" in domain:
personas.update(["bastion"])
@@ -497,12 +529,17 @@ def search_skills(shared_dir: Path, query: str):
query_terms = query.lower().split()
results = []
for skills_subdir in ["skills", "paperclip-skills", "community-skills"]:
for skills_subdir in [
"skills",
"paperclip-skills",
"community-skills",
"feynman-skills",
]:
skills_path = shared_dir / skills_subdir
if not skills_path.exists():
continue
for skill_dir in sorted(skills_path.iterdir()):
if not skill_dir.is_dir():
if not skill_dir.is_dir() or skill_dir.name.startswith("_"):
continue
skill_md = skill_dir / "SKILL.md"
if not skill_md.exists():
@@ -623,12 +660,13 @@ def run_tests(personas_dir: Path, target: str = None):
def build_skills_index(shared_dir: Path, config: dict = None) -> dict:
"""Index all shared skills from _shared/{skills,paperclip-skills,community-skills}/."""
"""Index all shared skills from _shared/{skills,paperclip-skills,community-skills,feynman-skills}/."""
skill_map = load_skill_persona_map(config or {})
index = {
"skills": {},
"paperclip_skills": {},
"community_skills": {},
"feynman_skills": {},
"design_brands": [],
"ui_ux_styles": 0,
"_skill_persona_map": skill_map,
@@ -692,6 +730,47 @@ def build_skills_index(shared_dir: Path, config: dict = None) -> dict:
if skill_md.exists():
index["community_skills"][skill_dir.name] = True
# Index feynman-skills (research workflows adapted from Feynman).
# Use the same persona-aware indexing as shared skills so mapped skills
# flow into Scholar / Forge / Oracle persona JSON outputs.
fskills_dir = shared_dir / "feynman-skills"
if fskills_dir.exists():
for skill_dir in sorted(fskills_dir.iterdir()):
if not skill_dir.is_dir() or skill_dir.name.startswith("_"):
continue
skill_md = skill_dir / "SKILL.md"
if not skill_md.exists():
continue
skill_meta = parse_skill_frontmatter(skill_md)
inferred_personas = infer_personas_from_skill_metadata(
skill_dir.name, skill_meta
)
configured_personas = skill_map.get(skill_dir.name, [])
merged_personas = sorted(
set(configured_personas).union(inferred_personas)
)
content = skill_md.read_text(encoding="utf-8")
first_line = ""
for line in content.split("\n"):
line = line.strip()
if line and not line.startswith(
("---", "#", "name:", "description:")
):
first_line = line[:120]
break
index["feynman_skills"][skill_dir.name] = {
"personas": merged_personas,
"summary": first_line,
"domain": str(skill_meta.get("domain", "")),
"subdomain": str(skill_meta.get("subdomain", "")),
"tags": skill_meta.get("tags", []),
"mapped_by": {
"explicit": configured_personas,
"inferred": inferred_personas,
},
"has_references": (skill_dir / "references").is_dir(),
}
# Index design brands
design_dir = shared_dir / "design-md"
if design_dir.exists():
@@ -880,6 +959,7 @@ def build_catalog(
f" Skills: {len(si.get('skills', {}))} shared + "
f"{len(si.get('paperclip_skills', {}))} paperclip + "
f"{len(si.get('community_skills', {}))} community + "
f"{len(si.get('feynman_skills', {}))} feynman + "
f"{len(si.get('design_brands', []))} design brands + "
f"{si.get('ui_ux_styles', 0)} UI/UX data files"
)
@@ -1193,6 +1273,18 @@ def install_claude_skills(
per_source[source] = count
# Feynman-skill SKILL.md files reference `../_platform-mapping.md`. Emit the
# sibling at ~/.claude/skills/_platform-mapping.md so relative refs resolve.
if "feynman-skills" in sources:
pmap_src = shared_dir / "feynman-skills" / "_platform-mapping.md"
if pmap_src.exists():
if dry_run:
print(f" [dry-run] would emit {skills_dir}/_platform-mapping.md")
else:
(skills_dir / "_platform-mapping.md").write_text(
pmap_src.read_text(encoding="utf-8"), encoding="utf-8"
)
mode = "[dry-run] " if dry_run else ""
print(f" {mode}Claude skills — per source: "
+ ", ".join(f"{k}={v}" for k, v in per_source.items()))
@@ -1299,6 +1391,11 @@ def _classify_skill_topic(name: str, fm: dict) -> str:
return "security-general"
NAME_PATTERNS = [
# Feynman research-workflow skills — keep before generic patterns so they
# win over broader matches. All map into buckets already in the default.
("ai-llm-dev", r"^(alpha-research|autoresearch|deep-research|literature-review|paper-code-audit|paper-writing|peer-review|replication|source-comparison|summarize|eli5)$"),
("cloud-infra", r"^(modal-compute|runpod-compute)$"),
("ops-sysadmin", r"^(session-log|session-search|preview|watch|jobs|contributing)$"),
("coding-frontend", r"^(react|nextjs|next-|angular|vue-|svelte|tailwind|shadcn|vercel|expo|remotion|frontend|ui-ux|accessibility|canvas-|stitch|framer)"),
("coding-backend", r"^(python|java-|csharp|dotnet|aspnet|kotlin|swift|rust-|golang|go-|ruby-|php-|nodejs|node-|bash-|cli-|bazel|async-|architecting-|aspire-)"),
("coding-tools", r"^(commit|changelog|debug-|refactor|test-driven|tdd|bdd|git-|github-|gitlab-|bats|copilot|codeql|code-review|linting|formatting|add-|adr-|agent-browser|mcp-)"),
@@ -1556,12 +1653,17 @@ def install_opencode(
_shutil.rmtree(existing)
if shared_dir:
for skills_subdir in ["skills", "paperclip-skills", "community-skills"]:
for skills_subdir in [
"skills",
"paperclip-skills",
"community-skills",
"feynman-skills",
]:
src_root = shared_dir / skills_subdir
if not src_root.exists():
continue
for skill_dir in src_root.iterdir():
if not skill_dir.is_dir():
if not skill_dir.is_dir() or skill_dir.name.startswith("_"):
continue
src_skill = skill_dir / "SKILL.md"
if not src_skill.exists():
@@ -1599,6 +1701,15 @@ def install_opencode(
)
skill_count += 1
# Feynman-skill SKILL.md files reference `../_platform-mapping.md`. Emit the
# sibling so the relative reference resolves inside ~/.config/opencode/skills/.
if shared_dir:
pmap_src = shared_dir / "feynman-skills" / "_platform-mapping.md"
if pmap_src.exists():
(skills_dir / "_platform-mapping.md").write_text(
pmap_src.read_text(encoding="utf-8"), encoding="utf-8"
)
print(
f" OpenCode: {agent_count} agents installed to {agents_dir}"
)
@@ -1932,10 +2043,10 @@ def main():
# --- claude-skills filters --------------------------------------------
parser.add_argument(
"--skill-sources",
default="skills,paperclip-skills",
default="skills,paperclip-skills,feynman-skills",
help="Comma-separated list of _shared/<dir> sources for claude-skills "
"(available: skills,paperclip-skills,community-skills). "
"Default: skills,paperclip-skills",
"(available: skills,paperclip-skills,community-skills,feynman-skills). "
"Default: skills,paperclip-skills,feynman-skills",
)
parser.add_argument(
"--skill-subdomains",

View File

@@ -0,0 +1,79 @@
# Agents
`AGENTS.md` is the repo-level contract for agents working in this repository.
Pi subagent behavior does **not** live here. The source of truth for bundled Pi subagents is `.feynman/agents/*.md`, which the runtime syncs into the Pi agent directory. If you need to change how `researcher`, `reviewer`, `writer`, or `verifier` behave, edit the corresponding file in `.feynman/agents/` instead of duplicating those prompts here.
## Pi subagents
Feynman ships four bundled research subagents:
- `researcher`
- `reviewer`
- `writer`
- `verifier`
They are defined in `.feynman/agents/` and invoked via the Pi `subagent` tool.
## What belongs here
Keep this file focused on cross-agent repo conventions:
- output locations and file naming expectations
- workspace-level continuity expectations for long-running work
- provenance and verification requirements
- handoff rules between the lead agent and subagents
Do **not** restate per-agent prompt text here unless there is a repo-wide constraint that applies to all agents.
## Output conventions
- Research outputs go in `outputs/`.
- Paper-style drafts go in `papers/`.
- Session logs go in `notes/`.
- The workspace-level lab notebook lives at `CHANGELOG.md`.
- Plan artifacts for long-running workflows go in `outputs/.plans/`.
- Intermediate research artifacts are written to disk by subagents and read by the lead agent. They are not returned inline unless the user explicitly asks for them.
- Long-running workflows should treat the plan artifact as an externalized working memory, not a static outline. Keep task status and verification state there as the run evolves.
- Long-running or resumable workflows should also treat `CHANGELOG.md` as the chronological lab notebook: what changed, what failed, what was verified, and what should happen next.
- Do not create or update `CHANGELOG.md` for trivial one-shot tasks.
## File naming
Every workflow that produces artifacts must derive a short **slug** from the topic (lowercase, hyphens, no filler words, ≤5 words — e.g. `cloud-sandbox-pricing`). All files in a single run use that slug as a prefix:
- Plan: `outputs/.plans/<slug>.md`
- Intermediate research: `<slug>-research-web.md`, `<slug>-research-papers.md`, etc.
- Draft: `outputs/.drafts/<slug>-draft.md`
- Cited brief: `<slug>-brief.md`
- Verification: `<slug>-verification.md`
- Final output: `outputs/<slug>.md` or `papers/<slug>.md`
- Provenance: `<slug>.provenance.md` (next to the final output)
Never use generic names like `research.md`, `draft.md`, `brief.md`, or `summary.md`. Concurrent runs must not collide.
## Workspace changelog
- `CHANGELOG.md` is a lab notebook, not release notes.
- Read `CHANGELOG.md` before resuming substantial work when it exists.
- Append concise entries after meaningful progress, failed approaches, major verification results, or new blockers.
- Each entry should identify the active slug or objective and end with the next recommended step.
- Mark verification state honestly with labels such as `verified`, `unverified`, `blocked`, or `inferred` only when they match the underlying evidence.
## Provenance and verification
- Every output from `/deepresearch` and `/lit` must include a `.provenance.md` sidecar.
- Provenance sidecars should record source accounting and verification status.
- Source verification and citation cleanup belong in the `verifier` stage, not in ad hoc edits after delivery.
- Verification passes should happen before delivery when the workflow calls for them.
- If a workflow uses the words `verified`, `confirmed`, or `checked`, the underlying artifact should record what was actually checked and how.
- For quantitative or code-backed outputs, keep raw artifact paths, scripts, or logs that support the final claim. Do not rely on polished summaries alone.
- Never smooth over missing checks. Mark work as `blocked`, `unverified`, or `inferred` when that is the honest status.
## Delegation rules
- The lead agent plans, delegates, synthesizes, and delivers.
- Use subagents when the work is meaningfully decomposable; do not spawn them for trivial work.
- Prefer file-based handoffs over dumping large intermediate results back into parent context.
- The lead agent is responsible for reconciling task completion. Subagents may not silently skip assigned tasks; skipped or merged tasks must be recorded in the plan artifact.
- For critical claims, require at least one adversarial verification pass after synthesis. Fix fatal issues before delivery or surface them explicitly.

View File

@@ -0,0 +1,115 @@
# Contributing to Feynman
Feynman is a research-first CLI built on Pi and alphaXiv. This guide is for humans and agents contributing code, prompts, skills, docs, installers, or workflow behavior to the repository.
## Quick Links
- GitHub: https://github.com/getcompanion-ai/feynman
- Docs: https://feynman.is/docs
- Repo agent contract: [AGENTS.md](AGENTS.md)
- Issues: https://github.com/getcompanion-ai/feynman/issues
## What Goes Where
- CLI/runtime code: `src/`
- Bundled prompt templates: `prompts/`
- Bundled Pi skills: `skills/`
- Bundled Pi subagent prompts: `.feynman/agents/`
- Docs site: `website/`
- Build/release scripts: `scripts/`
- Generated research artifacts: `outputs/`, `papers/`, `notes/`
If you need to change how bundled subagents behave, edit `.feynman/agents/*.md`. Do not duplicate that behavior in `AGENTS.md`.
## Before You Open a PR
1. Start from the latest `main`.
2. Use Node.js `22.x` for local development. The supported runtime range is Node.js `20.19.0` through `24.x`; `.nvmrc` pins the preferred local version while `package.json`, `website/package.json`, and the runtime version guard define the broader supported range.
3. Install dependencies from the repo root:
```bash
nvm use || nvm install
npm install
```
4. Run the required checks before asking for review:
```bash
npm test
npm run typecheck
npm run build
```
5. If you changed the docs site, also validate the website:
```bash
cd website
npm install
npm run build
```
6. Keep the PR focused. Do not mix unrelated cleanup with the real change.
7. Add or update tests when behavior changes.
8. Update docs, prompts, or skills when the user-facing workflow changes.
## Contribution Rules
- Bugs, docs fixes, installer fixes, and focused workflow improvements are good PRs.
- Large feature changes should start with an issue or a concrete implementation discussion before code lands.
- Avoid refactor-only PRs unless they are necessary to unblock a real fix or requested by a maintainer.
- Do not silently change release behavior, installer behavior, or runtime defaults without documenting the reason in the PR.
- Use American English in docs, comments, prompts, UI copy, and examples.
- Do not add bundled prompts, skills, or docs whose primary purpose is to market, endorse, or funnel users toward a third-party product or service. Product integrations must be justified by user-facing utility and written in neutral language.
## Repo-Specific Checks
### Prompt and skill changes
- New workflows usually live in `prompts/*.md`.
- New reusable capabilities usually live in `skills/<name>/SKILL.md`.
- Keep skill files concise. Put detailed operational rules in the prompt or in focused reference files only when needed.
- If a new workflow should be invokable from the CLI, make sure its prompt frontmatter includes the correct metadata and that the command works through the normal prompt discovery path.
### Agent and artifact conventions
- `AGENTS.md` is the repo-level contract for workspace conventions, handoffs, provenance, and output naming.
- Long-running research flows should write plan artifacts to `outputs/.plans/` and use `CHANGELOG.md` as a lab notebook when the work is substantial.
- Do not update `CHANGELOG.md` for trivial one-shot changes.
### Release and versioning discipline
- The curl installer and release docs point users at tagged releases, not arbitrary commits on `main`.
- If you ship user-visible fixes after a tag, do not leave the repo in a state where `main` and the latest release advertise the same version string while containing different behavior.
- When changing release-sensitive behavior, check the version story across:
- `.nvmrc`
- `package.json`
- `website/package.json`
- `scripts/check-node-version.mjs`
- install docs in `README.md` and `website/src/content/docs/getting-started/installation.md`
## AI-Assisted Contributions
AI-assisted PRs are fine. The contributor is still responsible for the diff.
- Understand the code you are submitting.
- Run the local checks yourself instead of assuming generated code is correct.
- Include enough context in the PR description for a reviewer to understand the change quickly.
- If an agent updated prompts or skills, verify the instructions match the actual repo behavior.
## Review Expectations
- Explain what changed and why.
- Call out tradeoffs, follow-up work, and anything intentionally not handled.
- Include screenshots for UI changes.
- Resolve review comments you addressed before requesting review again.
## Good First Areas
Useful contributions usually land in one of these areas:
- installation and upgrade reliability
- research workflow quality
- model/provider setup ergonomics
- docs clarity
- preview and export stability
- packaging and release hygiene

View File

@@ -0,0 +1,52 @@
# Feynman Skills (Platform-Agnostic Port)
Adapted from the [Feynman](https://feynman.is) research skills pack (v0.2.34) for use with **Claude Code**, **OpenCode**, and any Anthropic-spec skill runtime.
## Source
- Upstream: `~/.codex/skills/feynman/` (Feynman CLI installer)
- Upstream license/model: research workflows produced by the Feynman project
- This copy is re-adapted to remove Feynman-runtime coupling so skills work standalone.
## What changed from upstream
| Upstream element | This port |
|---|---|
| `/deepresearch`, `/lit`, `/review`, `/draft`, `/audit`, `/replicate`, `/compare`, `/watch`, `/log`, `/jobs` slash commands | Inline procedure inside `SKILL.md` or in `references/<name>.md`. Invoke the skill by name, not by slash command. |
| `researcher` / `reviewer` / `writer` / `verifier` bundled subagents | Mapped to **Claude Code `Task` tool** (`subagent_type: scholar / oracle / general-purpose`) and **OpenCode `task` tool** (with a `scholar` / `oracle` / `forge` agent). |
| `pi-autoresearch`, `pi-schedule-prompt`, `pi-charts`, `pi-processes` | Replaced with platform equivalents (Claude `ScheduleWakeup` / cron, Mermaid, bash) or marked optional. |
| `~/.feynman/sessions/` transcripts | Generalized to per-platform session paths (documented in `session-search/`). |
| `/preview` command | Bash fallbacks (`xdg-open`, `open`, `pandoc`). |
| `../prompts/<name>.md` sibling references | Inlined into each SKILL.md, or moved to `references/<name>.md` inside the skill so the skill is portable. |
## Output conventions (carried over from upstream `AGENTS.md`)
- Research outputs → `outputs/`
- Paper-style drafts → `papers/`
- Session logs → `notes/`
- Workspace lab notebook → `CHANGELOG.md`
- Plan artifacts → `outputs/.plans/`
- Intermediate research → `<slug>-research-*.md` on disk, not returned inline
- Slug rule: every workflow derives a short hyphenated slug (`≤5 words`) and prefixes all artifacts with it — concurrent runs must not collide
## Platform-tool mapping reference
See `_platform-mapping.md` — the canonical mapping used by every SKILL.md in this directory.
## Skill list (19)
Research workflows:
- `deep-research`, `literature-review`, `source-comparison`, `paper-code-audit`,
- `peer-review`, `paper-writing`, `replication`, `autoresearch`
Paper utilities:
- `alpha-research`, `eli5`
Compute environments:
- `docker`, `modal-compute`, `runpod-compute`
Session / project:
- `session-log`, `session-search`, `jobs`, `watch`, `preview`
Self-referential:
- `contributing` (for contributing to the upstream Feynman repo)

View File

@@ -0,0 +1,96 @@
# Platform Mapping Reference
Every Feynman-skill SKILL.md in this directory refers to the abstractions defined here. Both runtimes implement the same conceptual operations under different tool names.
## Subagent delegation
Feynman ships four bundled research roles: `researcher`, `reviewer`, `writer`, `verifier`. Outside of Feynman, dispatch them via the host platform's generic delegation primitive.
| Role | Claude Code | OpenCode |
|---|---|---|
| `researcher` (evidence gathering, source hunting) | `Task` tool with `subagent_type: scholar` (if installed from personas repo) or `general-purpose` | `task` tool invoking `scholar` agent (if installed) or `general` |
| `reviewer` (adversarial review of a cited draft) | `Task` tool with `subagent_type: general-purpose`, prompt: *"review this artifact for FATAL/MAJOR/MINOR issues — no rewrites, only findings"* | `task` tool invoking a `reviewer`-prompted general agent |
| `writer` (synthesize notes into a polished draft) | `Task` tool with `subagent_type: forge` (personas) or `general-purpose` | `task` tool invoking `forge` or `general` |
| `verifier` (URL / citation verification) | `Task` tool with `subagent_type: oracle` (personas) or `general-purpose` + `WebFetch` permission | `task` tool invoking `oracle` or `general` with `webfetch` allowed |
The lead agent (the skill caller) plans, dispatches, synthesizes, and delivers. Subagents write artifacts to disk and the lead reads them. Never dump large intermediate results back into parent context.
## Scheduling recurring work
| Need | Claude Code | OpenCode |
|---|---|---|
| Wake up later in same session | `ScheduleWakeup` | (no direct equivalent — use cron) |
| Cron-style recurring agent runs | `CronCreate` / `CronList` / `CronDelete` | system `cron` invoking `opencode run` |
| Interactive loop with self-pacing | `/loop` slash command | Manual re-invocation |
## Charts and diagrams
`pi-charts` is a Feynman-runtime helper. Outside Feynman:
- **Quantitative comparisons**: output Mermaid bar/line/pie charts, or write CSV and ask the user to render. Do not invent charts.
- **Architecture / pipeline diagrams**: Mermaid `graph TD` / `flowchart`.
- **Every figure** needs a provenance-bearing caption naming the source.
## Preview / render
`/preview` is Feynman-specific. Fallbacks:
```bash
# macOS
open <file.md> # opens in default app
open <file.pdf>
# Linux
xdg-open <file.md>
xdg-open <file.pdf>
# PDF export (cross-platform)
pandoc <file.md> -o <file.pdf>
```
## Session history search
Feynman stores transcripts at `~/.feynman/sessions/*.jsonl`. Other runtimes:
| Platform | Session store |
|---|---|
| Claude Code | `~/.claude/projects/<hash-of-cwd>/*.jsonl` |
| OpenCode | `~/.local/share/opencode/session/` |
Search with `grep -ril "<query>" <store>` or the platform's session-search command.
## Paper search
`alpha` CLI (alphaXiv-backed) is platform-agnostic — install via `pip install alpha-hub` or the upstream installer. When `alpha` is unavailable, fall back to:
- `WebSearch` + arXiv abstract page fetch (Claude Code)
- `webfetch` + arXiv (OpenCode)
- Semantic Scholar API / OpenAlex API for programmatic paper search
## Cross-session persistence (plans, memory)
Feynman has a `memory_remember` tool that lets a workflow stash a plan or artifact under a stable key (e.g. `deepresearch.<slug>.plan`) so a later session can recover it. Outside Feynman:
| Platform | Equivalent |
|---|---|
| Claude Code | `auto-memory` system (`~/.claude/projects/<hash>/memory/`) — write `{{slug}}-plan.md` and add a one-liner to `MEMORY.md` |
| OpenCode | No first-party equivalent; use filesystem (`outputs/.plans/<slug>.md` is already the canonical location) |
| Any runtime | Filesystem is the lowest-common-denominator — `outputs/.plans/<slug>.md` survives session boundaries |
Rule: always write the plan to disk first (`outputs/.plans/<slug>.md`). If platform-native memory exists, also mirror a pointer there. Never rely on memory alone.
## Background process inspection
Feynman `process` tool → outside Feynman:
```bash
# running processes
ps auxf
pgrep -fa <pattern>
# cron / systemd-timers
crontab -l
systemctl --user list-timers
```
Claude Code also has `Monitor` for streaming background-command output.

View File

@@ -0,0 +1,53 @@
---
name: alpha-research
description: Search, read, and query research papers via the `alpha` CLI (alphaXiv-backed). Use when the user asks about academic papers, wants to find research on a topic, needs to read a specific paper, ask questions about a paper, inspect a paper's code repository, or manage paper annotations.
allowed-tools: Bash(alpha:*)
---
# Alpha Research CLI
Use the `alpha` CLI via bash for all paper research operations. Platform-agnostic — works in Claude Code, OpenCode, or any shell.
## Install
```bash
pip install alpha-hub
alpha login # authenticate with alphaXiv
alpha status # verify auth
```
If `alpha` is unavailable, fall back to `WebSearch` (Claude Code) / `webfetch` (OpenCode) against `arxiv.org` or `semanticscholar.org`.
## Commands
| Command | Description |
|---------|-------------|
| `alpha search "<query>"` | Search papers. Prefer `--mode semantic` by default; use `--mode keyword` only for exact-term lookup and `--mode agentic` for broader retrieval. |
| `alpha get <arxiv-id-or-url>` | Fetch paper content and any local annotation |
| `alpha get --full-text <arxiv-id>` | Get raw full text instead of AI report |
| `alpha ask <arxiv-id> "<question>"` | Ask a question about a paper's PDF |
| `alpha code <github-url> [path]` | Read files from a paper's GitHub repo. Use `/` for overview |
| `alpha annotate <paper-id> "<note>"` | Save a persistent annotation on a paper |
| `alpha annotate --clear <paper-id>` | Remove an annotation |
| `alpha annotate --list` | List all annotations |
## Examples
```bash
alpha search "transformer scaling laws"
alpha search --mode agentic "efficient attention mechanisms for long context"
alpha get 2106.09685
alpha ask 2106.09685 "What optimizer did they use?"
alpha code https://github.com/karpathy/nanoGPT src/model.py
alpha annotate 2106.09685 "Key paper on LoRA — revisit for adapter comparison"
```
## When to use
- Academic paper search, reading, Q&A → `alpha`
- Current topics (products, releases, docs) → web search tools
- Mixed topics → combine both
## PDF fetch warning
`alpha get --full-text` can crash on malformed PDFs. Prefer metadata / abstracts / HTML for routine work; only pull full text when the user asks for a deep read.

View File

@@ -0,0 +1,95 @@
---
name: autoresearch
description: Autonomous experiment loop that tries ideas, measures results, keeps what works, and discards what doesn't. Use when the user asks to optimize a metric, run an experiment loop, improve performance iteratively, or automate benchmarking.
allowed-tools: Bash(git:*), Bash(docker:*), Bash(modal:*), Bash(runpodctl:*)
---
# Autoresearch — Autonomous Optimization Loop
Run an iterative optimize-measure-commit loop against a user-chosen metric. The loop edits code, runs a benchmark, keeps commits that improve the metric, and reverts the rest.
> **Upstream note.** Feynman ships `pi-autoresearch` with the `init_experiment` / `run_experiment` / `log_experiment` tools. This skill documents the same loop as a generic procedure so it works in Claude Code and OpenCode without those tools. When `pi-autoresearch` is present, delegate to it; otherwise implement the loop with git + bash.
## Step 1 — Gather requirements
If `autoresearch.md` and `autoresearch.jsonl` already exist in the workspace, ask the user whether to **resume** or **start fresh**. If `CHANGELOG.md` exists, read the most recent relevant entries before resuming.
Otherwise, collect from the user before doing anything else:
- **What to optimize** — test speed, bundle size, training loss, build time, etc.
- **Benchmark command** — the exact shell command that produces the metric
- **Metric** — name, unit, and direction (lower-is-better or higher-is-better)
- **Files in scope** — which files the loop is allowed to modify
- **Max iterations** — default 20
## Step 2 — Pick an environment
Ask the user where to run iterations:
- **Local** — current working directory
- **New git branch** — create a branch so `main` stays clean
- **Virtual environment** — isolated venv/conda first
- **Docker** — run iterations inside a container (see `docker` skill)
- **Modal** — serverless GPU; stateless burst (see `modal-compute` skill)
- **RunPod** — persistent GPU pod with SSH (see `runpod-compute` skill)
Do not proceed without a clear answer.
## Step 3 — Confirm
Before starting the loop, present the full plan:
```
Optimization target: <metric> (<direction>)
Benchmark command: <command>
Files in scope: <files>
Environment: <chosen environment>
Max iterations: <N>
```
Wait for explicit approval. No silent starts.
## Step 4 — Run the loop
Initialize session files:
- `autoresearch.md` — human-readable running log (one section per iteration)
- `autoresearch.sh` — the benchmark command, committed so it's reproducible
- `autoresearch.jsonl` — one JSON record per iteration: `{iter, diff_ref, metric_value, kept, duration_s, notes}`
Run the **baseline** once and record the metric. This is iteration 0.
Then loop until `max_iterations` or user interruption:
1. **Propose a change** — pick one hypothesis (fewer deps, tighter loop, different algo, different config) based on what you've learned from prior iterations. State it in one sentence before editing.
2. **Edit** the files in scope.
3. **Commit** with a descriptive message (`autoresearch: iter N — <hypothesis>`).
4. **Run the benchmark** (`bash autoresearch.sh`), capture output and wall-clock time.
5. **Decide**:
- If the metric improved (per direction): **keep** the commit.
- Otherwise: `git revert` the commit or `git reset --hard HEAD~1` if still uncommitted-on-branch.
6. **Log** the iteration to `autoresearch.md` + `autoresearch.jsonl`.
7. After meaningful milestones, append a concise entry to `CHANGELOG.md` summarizing what changed, the metric movement, and the next hypothesis.
## Step 5 — When to stop
- `max_iterations` reached
- The metric plateaus for 3+ iterations
- The user interrupts
- You run out of clearly-motivated hypotheses (don't flail)
## Step 6 — Final report
When the loop stops, write a short summary: starting metric, ending metric, which hypotheses helped, which didn't, and what the next direction would be. Save to `outputs/<slug>-autoresearch-summary.md`.
## Subcommands (Feynman parity)
- `autoresearch <text>` — start or resume the loop
- `autoresearch off` — stop the loop, keep data
- `autoresearch clear` — delete all state and start fresh
## Key invariants
- **No silent rewrites.** Every iteration's metric movement must be traceable to a commit.
- **No invented results.** If the benchmark fails, log the failure as iteration data; don't pretend it succeeded.
- **No config drift.** The benchmark command must be stable across iterations — if it needs to change, that's a new session.

View File

@@ -0,0 +1,45 @@
---
name: contributing
description: Contribute changes to the upstream Feynman repository. Use when the task is to add features, fix bugs, update prompts or skills, change install or release behavior, improve docs, or prepare a focused PR against the Feynman project itself.
---
# Contributing to Upstream Feynman
> **Scope note.** This skill applies only when you are actively working inside a clone of the upstream Feynman repository (https://feynman.is). If you are using these skills in another runtime (Claude Code, OpenCode, your own research workspace), skip this skill — the contributing targets below don't exist here.
Read `CONTRIBUTING.md` and `AGENTS.md` at the Feynman repo root before making changes. Those two files are the source of truth; this skill is the short index.
## When this applies
- CLI or runtime changes in `src/`
- prompt changes in `prompts/`
- bundled skill changes in `skills/`
- subagent behavior changes in `.feynman/agents/`
- install, packaging, or release changes in `scripts/`, `README.md`, or website docs
## Minimum local checks before claiming a change is done
```bash
npm test
npm run typecheck
npm run build
```
If the docs site changed, also validate `website/`.
## Release-sensitive changes
When changing release-sensitive behavior, verify that these stay aligned:
- `.nvmrc`
- package `engines`
- runtime guards (version checks in CLI entry points)
- install docs (`README.md`, install script)
Changes that touch any of these should go in one atomic PR with a CHANGELOG entry.
## PR discipline
- One topic per PR. Don't bundle a prompt fix with a runtime refactor.
- Tests for any new public behavior.
- Update `prompts/` and the corresponding `skills/<name>/SKILL.md` together — a prompt change without a skill pointer update leaves callers out of sync.

View File

@@ -0,0 +1,208 @@
---
name: deep-research
description: Run a thorough, source-heavy investigation on any topic. Use when the user asks for deep research, a comprehensive analysis, an in-depth report, or a multi-source investigation. Produces a cited research brief with provenance tracking.
---
# Deep Research
Execute a source-heavy investigation on a topic and produce a durable, cited brief with a provenance sidecar. This is an **execution skill**, not an explainer — your first actions should be tool calls that create directories and write the plan artifact.
## Do not
- Do not answer by describing the protocol.
- Do not restate or summarize these instructions in chat.
- Do not stop after planning — continue immediately through gathering and drafting.
- Do not ask the user for confirmation unless they explicitly requested plan review.
- Do not end with chat-only output. Every run leaves artifacts on disk.
## Subagent mapping
See `../_platform-mapping.md`. Roles used: `researcher` (evidence gathering), `verifier` (URL + citation verification), `reviewer` (adversarial review). Dispatch via `Task` tool (Claude Code) or `task` tool (OpenCode).
## Required artifacts
Derive a short slug from the topic (lowercase, hyphenated, no filler words, ≤5 words).
Every run must leave these files on disk:
- `outputs/.plans/<slug>.md`
- `outputs/.drafts/<slug>-draft.md`
- `outputs/.drafts/<slug>-cited.md`
- `outputs/<slug>.md` (or `papers/<slug>.md` for paper-style briefs)
- `outputs/<slug>.provenance.md` (or `papers/<slug>.provenance.md`)
If any capability fails, continue in **degraded mode** and still write a blocked/partial final output and provenance sidecar. Set `Verification: BLOCKED` when verification could not complete. Never end with only an explanation in chat.
## Step 1 — Plan
Create `outputs/.plans/<slug>.md` immediately. Required sections:
- **Key questions** — what the brief must answer
- **Evidence needed** — for each question, what kind of source satisfies it
- **Scale decision** — direct-search OR subagent-delegated (see Step 2)
- **Task ledger** — one row per sub-question; status `pending | done | blocked | superseded`
- **Verification log** — critical claims that will need citation verification
- **Decision log** — key calls made during the run
Make the scale decision before assigning owners. For a narrow "what is X" explainer, the plan must use lead-owned direct search only — do not allocate researcher subagents in the task ledger.
After writing the plan, continue immediately. Do not pause for approval.
**Optional cross-session persistence.** If the runtime has a memory primitive (Claude Code `auto-memory`, a `memory_remember` tool, or equivalent), also mirror the plan there under key `deepresearch.<slug>.plan`. If no such primitive exists, continue — the disk file is the canonical copy. See `../_platform-mapping.md`.
## Step 2 — Scale
**Use direct search for:**
- Single fact or narrow question, including "what is X" explainers
- Work you can answer with 310 tool calls
For "what is X" explainer topics, **do not spawn researcher subagents** unless the user explicitly asks for comprehensive coverage, current landscape, benchmarks, or production deployment. Don't inflate a simple explainer into a multi-agent survey.
**Use subagents only when decomposition clearly helps:**
- Direct comparison of 23 items: 2 `researcher` subagents
- Broad survey or multi-faceted topic: 34 `researcher` subagents
- Complex multi-domain research: 46 `researcher` subagents
## Step 3 — Gather evidence
**PDF warning.** Avoid crash-prone PDF parsing. Do not fetch `.pdf` URLs unless the user explicitly asked for PDF extraction. Prefer paper metadata, abstracts, HTML pages, official docs, and web snippets. If only a PDF exists, cite its URL from search metadata and mark full-text parsing as blocked.
**If direct search was chosen:**
- Skip subagent spawning entirely.
- Search and fetch sources yourself using `WebSearch` / `WebFetch` (Claude Code) or equivalents.
- Use **at least 3 distinct queries**, covering definition/history, mechanism/formula, and current usage/comparison (when relevant).
- Record the exact search terms used in `<slug>-research-direct.md`.
- Write notes to `<slug>-research-direct.md`.
- Continue to synthesis.
**If subagents were chosen:**
- Write a per-researcher brief first: `outputs/.plans/<slug>-T1.md`, `outputs/.plans/<slug>-T2.md`, etc.
- Keep the subagent dispatch payload small and valid — no multi-paragraph instructions inside the JSON.
- Always set `failFast: false` if your runtime exposes it.
- Do not name exact tool commands in subagent tasks unless those tool names are visible in the current tool set.
- Prefer broad guidance like "use paper search and web search"; if a PDF parser or paper fetch fails, the researcher must continue from metadata, abstracts, and web sources and mark PDF parsing as blocked.
Example Claude Code dispatch shape (conceptual — adapt to the tool's actual schema):
```
Task(
subagent_type="scholar",
description="research-web",
prompt="Read outputs/.plans/<slug>-T1.md and write <slug>-research-web.md"
)
Task(
subagent_type="scholar",
description="research-papers",
prompt="Read outputs/.plans/<slug>-T2.md and write <slug>-research-papers.md"
)
```
Dispatch independent researchers in parallel (single message, multiple tool-use blocks).
After evidence gathering, update the plan ledger and verification log. If research failed, record exactly what failed and proceed with a blocked/partial draft.
## Step 4 — Draft
**Write the report yourself. Do not delegate synthesis.**
Save to `outputs/.drafts/<slug>-draft.md`.
Include:
- Executive summary
- Findings organized by question/theme
- Evidence-backed caveats and disagreements
- Open questions
- No invented sources, results, figures, benchmarks, images, charts, or tables
**Pre-citation sweep of the draft:**
- Every critical claim, number, figure, table, or benchmark must map to a source URL, research note, raw artifact path, or command/script output.
- Remove or downgrade unsupported claims.
- Mark inferences explicitly as inferences.
## Step 5 — Cite
**If direct search / no researcher subagents was chosen:**
- Do citation yourself.
- Verify reachable HTML/doc URLs with `WebFetch` or equivalent.
- Copy or rewrite the draft to `outputs/.drafts/<slug>-cited.md` with inline citations and a Sources section.
- Do not spawn a `verifier` subagent for direct-search runs.
**If researcher subagents were used:**
Run the `verifier` subagent after the draft exists. This is mandatory and must complete before any reviewer runs. Do not run verifier and reviewer in parallel.
Task shape (conceptual):
```
Task(
subagent_type="oracle", # or general-purpose with WebFetch
description="verify-citations",
prompt="Add inline citations to outputs/.drafts/<slug>-draft.md using the research files as source material. Verify every URL is reachable. Write the complete cited brief to outputs/.drafts/<slug>-cited.md."
)
```
After the verifier returns, confirm on disk that `outputs/.drafts/<slug>-cited.md` exists. If the verifier wrote elsewhere, find and move the cited file into place.
## Step 6 — Review
**If direct search / no researcher subagents was chosen:**
- Review the cited draft yourself.
- Write `<slug>-verification.md` with FATAL / MAJOR / MINOR findings and the checks performed.
- Fix FATAL issues before delivery.
- Do not spawn a `reviewer` subagent for direct-search runs.
**If researcher subagents were used:**
Only after `outputs/.drafts/<slug>-cited.md` exists, run the `reviewer` subagent against it.
Task shape (conceptual):
```
Task(
subagent_type="general-purpose",
description="review-cited-draft",
prompt="Verify outputs/.drafts/<slug>-cited.md. Flag unsupported claims, logical gaps, single-source critical claims, and overstated confidence. This is a verification pass, not a peer review. Write to <slug>-verification.md."
)
```
If the reviewer flags FATAL issues, fix them before delivery and run one more review pass. Note MAJOR issues in Open Questions. Accept MINOR issues.
**Applying reviewer fixes:** small localized edits for 13 simple corrections. For section rewrites, table rewrites, or more than 3 substantive fixes, read the cited draft and write a corrected full file to `outputs/.drafts/<slug>-revised.md` — do not issue one giant multi-replacement edit.
The final candidate is `outputs/.drafts/<slug>-revised.md` if it exists; otherwise `outputs/.drafts/<slug>-cited.md`.
## Step 7 — Deliver
Copy the final candidate to:
- `papers/<slug>.md` for paper-style drafts
- `outputs/<slug>.md` for everything else
Write the provenance sidecar next to it:
```markdown
# Provenance: <topic>
- **Date:** YYYY-MM-DD
- **Rounds:** <number of research rounds>
- **Sources consulted:** <count and/or list>
- **Sources accepted:** <count and/or list>
- **Sources rejected:** <dead, unverifiable, or removed>
- **Verification:** PASS | PASS WITH NOTES | BLOCKED
- **Plan:** outputs/.plans/<slug>.md
- **Research files:** <list>
```
Before responding, verify on disk that all required artifacts exist. If verification could not be completed, set `Verification: BLOCKED` or `PASS WITH NOTES` and list the missing checks.
## Final response
Keep it brief: link the final file, the provenance file, and any blocked checks. Do not restate the report.

View File

@@ -0,0 +1,85 @@
---
name: docker
description: Execute research code inside isolated Docker containers for safe replication, experiments, and benchmarks. Use when the user selects Docker as the execution environment or asks to run code safely, in isolation, or in a sandbox.
allowed-tools: Bash(docker:*)
---
# Docker Sandbox
Run research code inside Docker containers while the host stays clean. The container gets the project files, runs the commands, and results sync back. Works identically in Claude Code and OpenCode.
## When to use
- User selects "Docker Sandbox" as the execution environment in `replication` or `autoresearch`
- Running untrusted code from a paper's repository
- Experiments that install packages or modify system state
- Any time the user asks to run something "safely" or "isolated"
## How it works
1. Build or pull an appropriate base image for the research code
2. Mount the project directory into the container
3. Run experiment commands inside the container
4. Results write back to the mounted directory
## Running commands in a container
For Python research code (most common):
```bash
docker run --rm -v "$(pwd)":/workspace -w /workspace python:3.11 bash -c "
pip install -r requirements.txt &&
python train.py
"
```
For projects with a Dockerfile:
```bash
docker build -t feynman-experiment .
docker run --rm -v "$(pwd)/results":/workspace/results feynman-experiment
```
For GPU workloads (requires NVIDIA Container Toolkit):
```bash
docker run --rm --gpus all -v "$(pwd)":/workspace -w /workspace pytorch/pytorch:latest bash -c "
pip install -r requirements.txt &&
python train.py
"
```
## Choosing the base image
| Research type | Base image |
| --- | --- |
| Python ML/DL | `pytorch/pytorch:latest` or `tensorflow/tensorflow:latest-gpu` |
| Python general | `python:3.11` |
| Node.js | `node:20` |
| R / statistics | `rocker/r-ver:4` |
| Julia | `julia:1.10` |
| Multi-language | `ubuntu:24.04` with manual installs |
## Persistent containers
For iterative experiments (like `autoresearch`), create a named container instead of `--rm`. Choose a descriptive name based on the experiment:
```bash
docker create --name <name> -v "$(pwd)":/workspace -w /workspace python:3.11 tail -f /dev/null
docker start <name>
docker exec <name> bash -c "pip install -r requirements.txt"
docker exec <name> bash -c "python train.py"
```
This preserves installed packages across iterations. Clean up with:
```bash
docker stop <name> && docker rm <name>
```
## Notes
- The mounted workspace syncs results back to the host automatically
- Containers are network-enabled by default — add `--network none` for full isolation
- For GPU access, Docker must be configured with the NVIDIA Container Toolkit
- Check availability: `command -v docker`

View File

@@ -0,0 +1,34 @@
---
name: eli5
description: Explain research, papers, or technical ideas in plain English with minimal jargon, concrete analogies, and clear takeaways. Use when the user says "ELI5 this", asks for a simple explanation of a paper or research result, wants jargon removed, or asks what something technically dense actually means.
---
# ELI5 — Explain Like I'm Five
Use the `alpha-research` skill first when the user names a specific paper, arXiv id, DOI, or paper URL.
If the user gives only a topic, identify 13 representative papers and anchor the explanation around the clearest or most important one.
## Output structure
- **One-Sentence Summary** — the idea in one sentence, no jargon
- **Big Idea** — the insight that matters, in plain language
- **How It Works** — mechanism, step by step, with one good analogy
- **Why It Matters** — concrete consequence for the reader
- **What To Be Skeptical Of** — limitations the paper itself flags, and common misreadings
- **If You Remember 3 Things** — three sentences, each ≤15 words
## Guidelines
- Use short sentences and concrete words.
- Define jargon immediately or remove it.
- Prefer one good analogy over several weak ones.
- Separate what the paper actually shows from speculation or interpretation.
- Keep the explanation inline in the conversation unless the user explicitly asks to save it as an artifact.
- Do not invent results, benchmarks, or history. If you are unsure, say so instead of smoothing it over.
## When to save to disk
Only when the user asks. Otherwise inline is fine — ELI5 is a reading aid, not an artifact.
If saving: `outputs/<slug>-eli5.md` where `<slug>` is a short hyphenated version of the paper/topic name.

View File

@@ -0,0 +1,79 @@
---
name: jobs
description: Inspect active background research work including running processes, scheduled follow-ups, and pending tasks. Use when the user asks what's running, checks on background work, or wants to see scheduled jobs.
allowed-tools: Bash(ps:*), Bash(pgrep:*), Bash(crontab:*), Bash(systemctl:*)
---
# Jobs
Inspect active background work — running processes, scheduled follow-ups, and managed subagent tasks. This is an operational status skill, not a workflow launcher.
## What to inspect
Summarize the following categories. Skip any that are empty — don't pad the output.
### 1. Active foreground/background processes
```bash
# anything the user started recently in this shell
jobs
ps -o pid,etime,cmd --user "$(whoami)" | head -30
pgrep -fa "python|node|modal|runpodctl|docker" || true
```
### 2. Scheduled / recurring work
Claude Code:
- `CronList` — if the `schedule` skill is active, it lists registered triggers.
- Any `ScheduleWakeup` calls pending in the current session.
OpenCode:
- System `cron` invoking `opencode run`:
```bash
crontab -l 2>/dev/null | grep -i opencode || true
```
- systemd user timers:
```bash
systemctl --user list-timers --all 2>/dev/null || true
```
### 3. Running containers / remote pods
```bash
command -v docker && docker ps
command -v runpodctl && runpodctl get pod
command -v modal && modal app list
```
### 4. Managed subagent tasks (Claude Code only)
Use `TaskList` to see any `Task`-tool subagents still in flight.
## Summary format
```markdown
# Active work
## Processes
- PID 12345 — `python train.py` — running 00:23:11
## Scheduled
- cron: `0 9 * * * opencode run "/watch attention papers"` — next fire tomorrow 09:00
- ScheduleWakeup pending: fire in 12 min — "checking long bun build"
## Remote compute
- Modal: app `experiment` running, 1 A100
- RunPod: pod `xxx` stopped, volume retained
## Failures needing attention
- (none)
## Next command
- To inspect the long-running training: `tail -f ~/logs/train.log`
```
## What NOT to do
- Don't kill processes without confirming with the user.
- Don't return massive `ps` dumps — filter to relevant processes.
- If nothing is running, say "nothing active" — don't invent jobs.

View File

@@ -0,0 +1,108 @@
---
name: literature-review
description: Run a literature review using paper search and primary-source synthesis. Use when the user asks for a lit review, paper survey, state of the art, or academic landscape summary on a research topic.
---
# Literature Review
Produce a grounded, source-cited survey of the state of the art on a topic. Output is a Markdown artifact in `outputs/` with a `.provenance.md` sidecar.
Derive a short slug from the topic (lowercase, hyphens, ≤5 words). All files in this run use the slug as a prefix.
## Subagent mapping
See `../_platform-mapping.md`. In short:
| Upstream role | Claude Code | OpenCode |
|---|---|---|
| `researcher` | `Task` tool, `subagent_type: scholar` or `general-purpose` | `task` tool, `scholar` or `general` agent |
| `verifier` | `Task` tool, `subagent_type: oracle` or `general-purpose` (with WebFetch) | `task` tool, `oracle` or `general` with `webfetch: allow` |
| `reviewer` | `Task` tool, `subagent_type: general-purpose` with review-only prompt | `task` tool, general agent with review-only prompt |
## Workflow
### 1. Plan
Write `outputs/.plans/<slug>.md`. Include:
- Key questions the review must answer
- Source types to search (arXiv papers, web, repos, conference proceedings)
- Time period (e.g. "last 3 years" or explicit year range)
- Expected section structure
- Task ledger (one row per sub-question; status = `pending` / `done` / `blocked` / `superseded`)
- Verification log (critical claims that will need citation verification)
Summarize the plan to the user in 23 sentences. Continue immediately unless the user asks for plan review.
### 2. Gather
**Narrow topic (23 obvious angles):** search directly yourself — no subagent needed. Use the `alpha-research` skill for paper search.
**Wide topic (broad survey):** dispatch 24 `researcher` subagents in parallel. Give each a brief written to `outputs/.plans/<slug>-T<N>.md` describing exactly what that subagent should cover. Each researcher writes its notes to `<slug>-research-<topic>.md` on disk — not returned inline.
Rules:
- No silent skipping — if a researcher can't cover an assigned question, it must mark the ledger entry `blocked` with the reason.
- No PDF parsing unless the user asked for it. Prefer metadata, abstracts, HTML docs.
- At least 3 distinct queries when researching directly, covering definition/history, mechanism, and current usage.
### 3. Synthesize
You (the lead) write the draft, not a subagent. Save to `outputs/.drafts/<slug>-draft.md`.
Separate clearly:
- **Consensus** — claims multiple sources agree on
- **Disagreements** — explicitly name the split and who sits where
- **Open questions** — what the field hasn't settled
Before handing to the verifier, sweep every strong claim against your verification log. Downgrade anything inferred or single-source-critical.
### 4. Cite
Dispatch the `verifier` subagent against `outputs/.drafts/<slug>-draft.md`. Task:
> Add inline citations to every claim using the research files as source material. Verify each URL is reachable. Write the complete cited brief to `outputs/.drafts/<slug>-cited.md`.
After the verifier returns, confirm on disk that `outputs/.drafts/<slug>-cited.md` exists. If the verifier wrote elsewhere, move the file into place.
### 5. Review
Dispatch the `reviewer` subagent against the cited draft. Task:
> Check `outputs/.drafts/<slug>-cited.md` for: unsupported claims, logical gaps, zombie sections, single-source critical findings, overstated confidence. Categorize findings as FATAL / MAJOR / MINOR. Write to `<slug>-verification.md`.
- Fix all FATAL issues before delivery. If you fix FATALs, run one more review pass.
- Note MAJOR issues under "Open Questions" in the final draft.
- Accept MINOR issues.
When applying reviewer fixes: small localized edits for 13 corrections; for larger rewrites, write `outputs/.drafts/<slug>-revised.md` instead of making one giant edit call.
### 6. Deliver
Copy the final candidate (`<slug>-revised.md` if it exists, else `<slug>-cited.md`) to `outputs/<slug>.md`.
Write `outputs/<slug>.provenance.md` next to it:
```markdown
# Provenance: <topic>
- **Date:** YYYY-MM-DD
- **Rounds:** <number of research rounds>
- **Sources consulted:** <count>
- **Sources accepted:** <count>
- **Sources rejected:** <dead / unverifiable / removed>
- **Verification:** PASS | PASS WITH NOTES | BLOCKED
- **Plan:** outputs/.plans/<slug>.md
- **Research files:** <list>
```
Before responding, verify on disk that both files exist. Do not stop at an intermediate cited draft.
## Charts and diagrams
- **Quantitative comparison across papers** — Mermaid bar chart, or CSV in `outputs/.notes/<slug>-data.csv`. No invented numbers.
- **Taxonomies / method pipelines** — Mermaid `graph TD`. Every figure needs a provenance-bearing caption.
## Final response
Brief: link the final artifact, the provenance sidecar, and list any blocked checks.

View File

@@ -0,0 +1,64 @@
---
name: modal-compute
description: Run GPU workloads on Modal's serverless infrastructure. Use when the user needs remote GPU compute for training, inference, benchmarks, or batch processing and Modal CLI is available.
allowed-tools: Bash(modal:*)
---
# Modal Compute
Use the `modal` CLI for serverless GPU workloads. No pod lifecycle to manage — write a decorated Python script and run it. Works identically in Claude Code and OpenCode.
## Setup
```bash
pip install modal
modal setup # one-time auth
```
Check availability: `command -v modal`.
## Commands
| Command | Description |
|---------|-------------|
| `modal run script.py` | Run a script on Modal (ephemeral) |
| `modal run --detach script.py` | Run detached (background) |
| `modal deploy script.py` | Deploy persistently |
| `modal serve script.py` | Serve with hot-reload (dev) |
| `modal shell --gpu a100` | Interactive shell with GPU |
| `modal app list` | List deployed apps |
## GPU types
`T4`, `L4`, `A10G`, `L40S`, `A100`, `A100-80GB`, `H100`, `H200`, `B200`
Multi-GPU: `"H100:4"` for 4x H100s.
## Script pattern
```python
import modal
app = modal.App("experiment")
image = modal.Image.debian_slim(python_version="3.11").pip_install("torch==2.8.0")
@app.function(gpu="A100", image=image, timeout=600)
def train():
import torch
# training code here
@app.local_entrypoint()
def main():
train.remote()
```
## When to use
- Stateless burst GPU jobs (training, inference, benchmarks)
- No persistent state needed between runs
- Fast iteration — no pod provisioning delay
## When NOT to use
- Long-running experiments needing persistent SSH and filesystem state → use `runpod-compute` instead
- Multi-step pipelines that stream intermediate files between stages on the same host

View File

@@ -0,0 +1,98 @@
---
name: paper-code-audit
description: Compare a paper's claims against its public codebase. Use when the user asks to audit a paper, check code-claim consistency, verify reproducibility of a specific paper, or find mismatches between a paper and its implementation.
---
# Paper-Code Audit
Compare a paper's claimed methods, defaults, metrics, and data handling against the actual code. Surface mismatches, omissions, and reproduction risks.
Derive a slug from the audit target (lowercase, hyphens, ≤5 words).
## Subagent mapping
See `../_platform-mapping.md`. Dispatch `researcher` for evidence gathering, `verifier` for citation/URL verification.
## Workflow
### 1. Plan
Write `outputs/.plans/<slug>.md` with:
- **Paper** — title, arXiv id or DOI, and the specific version being audited
- **Repo** — canonical URL (be explicit about fork vs upstream)
- **Claims to check** — numbered list of specific claims (e.g. "claim 3: final layer uses GELU activation")
- **Verification approach** — per-claim, how will you check it (grep source, run a specific script, diff configs)
Summarize the plan briefly, continue immediately unless the user asked for plan review.
### 2. Gather evidence
**Non-trivial audits:** dispatch a `researcher` subagent to pull implementation details from paper sections and linked code. Use the `alpha-research` skill's `alpha code` command (or equivalent repo browsing) to read source files.
**Small audits (single claim, ≤3 files):** the lead agent gathers directly.
For each claim, record:
- The paper section or figure where the claim is made
- The code location (file:line or function name) where it should be implemented
- What the code actually does
### 3. Compare
Organize findings under these buckets:
| Bucket | Meaning |
|---|---|
| **MATCH** | Code matches the claim faithfully |
| **MISMATCH** | Code contradicts the paper |
| **OMITTED** | Claim is in paper but code doesn't implement it |
| **UNDOCUMENTED** | Code does something material that isn't in the paper |
| **AMBIGUOUS** | Paper's description is too vague to verify against code |
| **MISSING CODE** | The referenced module/experiment is not in the public repo |
### 4. Cite
For non-trivial audits, dispatch `verifier` against the draft to verify every URL (paper links, repo links, commit hashes) and add inline citations where missing.
### 5. Deliver
Save exactly one audit artifact to `outputs/<slug>-audit.md`:
```markdown
# Audit: <paper title>
**Paper:** <link> (<version/date>)
**Repo:** <link> (<commit hash used for audit>)
**Date:** YYYY-MM-DD
## Summary
- Claims checked: <N>
- MATCH: <n> | MISMATCH: <n> | OMITTED: <n> | UNDOCUMENTED: <n> | AMBIGUOUS: <n> | MISSING CODE: <n>
## Findings
### <claim 1>
- **Paper says:** <quote or summary> (<section>)
- **Code does:** <what you found> (<file:line>)
- **Verdict:** MATCH / MISMATCH / OMITTED / …
- **Impact on reproducibility:** <brief>
### <claim 2>
...
## Reproduction risks
- <risks ordered by severity>
## Sources
- <paper URL>
- <repo URL at audit commit>
```
End with a `Sources` section containing paper and repository URLs pinned to the version audited (commit hash, not `main`).
## What NOT to do
- Don't run the code unless the user explicitly asked for an execution audit. Reading is often enough to find the mismatch.
- Don't generalize from `src/models/transformer.py` to "the method" without checking the experiment scripts actually call it.
- Don't grade papers. The audit reports what is and isn't in the code; it doesn't pass judgement on the research.

View File

@@ -0,0 +1,100 @@
---
name: paper-writing
description: Turn research findings into a polished paper-style draft with sections, equations, and citations. Use when the user asks to write a paper, draft a report, write up findings, or produce a technical document from collected research.
---
# Paper Writing
Turn collected research notes into a polished paper-style draft with explicit claims, source-backed evidence, and clean Markdown+LaTeX formatting.
Derive a slug from the topic (lowercase, hyphens, ≤5 words). All files in this run use the slug as a prefix.
## Subagent mapping
See `../_platform-mapping.md`. Use `writer` (drafting) and `verifier` (citation + URL verification).
## Prerequisites
This skill assumes research notes already exist — from `deep-research`, `literature-review`, `source-comparison`, or the user's own notes. If research hasn't happened yet, run the appropriate research skill first.
## Workflow
### 1. Outline
Write `outputs/.plans/<slug>.md` with:
- **Proposed title**
- **Section structure** — title, abstract, problem statement, related work, method/synthesis, evidence/experiments, limitations, conclusion
- **Key claims** — numbered list of the strongest claims the paper will make
- **Source material** — which research notes or raw artifacts each claim will draw from
- **Verification log** — a row per critical claim, figure, and calculation; populated during drafting
Briefly summarize the outline to the user, continue immediately unless they asked for outline review.
### 2. Draft
**Option A — subagent-driven (preferred when the notes are dense and the outline is solid):** dispatch a `writer` subagent with the outline and note paths. Task:
> Write a paper-style draft from the outline in `outputs/.plans/<slug>.md` and the notes at `<slug>-research-*.md`. Save to `outputs/.drafts/<slug>-draft.md`. Include all sections from the outline. Use LaTeX where equations materially help. Do not invent results.
**Option B — lead-agent drafting:** write the draft yourself, saving to `outputs/.drafts/<slug>-draft.md`.
Section requirements (minimum):
- **Title** — descriptive, not clickbait
- **Abstract** — ≤200 words; problem, approach, headline result
- **Problem statement** — what's unsolved, why it matters
- **Related work** — honest positioning; no straw-manning
- **Method / synthesis** — clean exposition of what you're claiming
- **Evidence / experiments** — source-backed results; no invented tables
- **Limitations** — explicit, not buried
- **Conclusion** — what's established, what's next
### 3. Guardrails while drafting
- **No invented results.** If evidence is missing, leave a placeholder (`[TODO: verify claim against benchmark X]`) or describe the experiment you'd need to run, rather than fabricating.
- **Every number, figure, and benchmark** must map to a source URL, research note, or script output.
- **Tentative results** — mark them explicitly ("preliminary evidence suggests…" rather than "we show…").
- **Charts** — Mermaid or source-backed CSV, never invented.
### 4. Self-sweep before handoff
Before calling the verifier, sweep the draft:
- Does every strong claim have a traceable source?
- Is every figure caption provenance-bearing (names the source)?
- Are tentative claims marked as tentative?
- Are unsupported numerics removed or marked TODO?
If the sweep finds issues, fix them yourself — don't push the problem onto the verifier.
### 5. Cite
Dispatch `verifier` subagent to add inline citations and verify every URL:
> Add inline citations to `outputs/.drafts/<slug>-draft.md` using the research files as source material. Verify each URL is reachable. Write the complete cited draft to `outputs/.drafts/<slug>-cited.md`.
### 6. Deliver
Save the final draft to `papers/<slug>.md`. Write `papers/<slug>.provenance.md` with:
```markdown
# Provenance: <title>
- **Date:** YYYY-MM-DD
- **Outline:** outputs/.plans/<slug>.md
- **Research files:** <list>
- **Draft:** outputs/.drafts/<slug>-draft.md
- **Cited draft:** outputs/.drafts/<slug>-cited.md
- **Verification:** PASS | PASS WITH NOTES | BLOCKED
- **Sources:** <count accepted / count rejected>
```
End the paper with a `Sources` appendix listing direct URLs for all primary references.
## Format conventions
- Markdown with embedded LaTeX (`$\ldots$` inline, `$$\ldots$$` display). Equations only where they materially help comprehension.
- Mermaid `graph TD` / `flowchart` for architectures, pipelines, and taxonomies.
- Tables in standard Markdown; for complex tables, emit an HTML `<table>` block or a separate CSV.
- Figure captions always name the source.

View File

@@ -0,0 +1,99 @@
---
name: peer-review
description: Simulate a tough but constructive peer review of an AI research artifact. Use when the user asks for a review, critique, feedback on a paper or draft, or wants to identify weaknesses before submission.
---
# Peer Review
Simulate a rigorous AI-research peer review with likely objections, severity scoring, and a concrete revision plan. Output a structured review artifact in `outputs/`.
Derive a slug from the artifact name (lowercase, hyphens, ≤5 words).
## Subagent mapping
See `../_platform-mapping.md`. Use `researcher` to gather evidence on the artifact; a second `reviewer`-style subagent (or the lead agent with a reviewer prompt) writes the actual review.
## Workflow
### 1. Plan
Briefly outline:
- What will be reviewed (paper PDF, cited draft, code repo, or all of the above)
- Review criteria — novelty, empirical rigor, baselines, reproducibility, clarity, related-work coverage
- Verification-specific checks for claims, figures, and reported metrics
Summarize to the user in 23 sentences, continue immediately unless they asked for plan review.
### 2. Gather evidence (for non-trivial artifacts)
Dispatch a `researcher` subagent to:
- Read the paper / draft
- Inspect the code repo (via `alpha-research` `alpha code` or equivalent)
- Cross-check cited work for misrepresentation
- Look at any linked experimental artifacts (logs, tables, commit history)
Output: `<slug>-research.md` on disk.
For small/simple artifacts where a full research pass is overkill, the lead agent reads directly and skips this step.
### 3. Write the review
Either dispatch a `reviewer` subagent with `<slug>-research.md` as input, or write the review yourself.
Required structure for `outputs/<slug>-review.md`:
```markdown
# Review: <title>
**Artifact:** <link or path>
**Reviewer:** <lead agent / subagent>
**Date:** YYYY-MM-DD
## Summary (one paragraph)
<what the artifact claims and what it actually delivers>
## Strengths
- bulleted, concrete, with section references
## Objections (severity-scored)
### FATAL
- <issue> — why it breaks the paper's core claim, what would need to change
### MAJOR
- <issue> — substantive flaw that should be addressed before submission
### MINOR
- <issue> — would improve the paper but isn't blocking
## Reproducibility check
- Are baselines reported with seeds / runs / variance? <y/n + evidence>
- Is the code available and runnable? <y/n + evidence>
- Are datasets cited and accessible? <y/n + evidence>
## Related-work coverage
- Work the paper engages with adequately
- Work the paper ignores or misrepresents (name specific references)
## Revision plan
A numbered list of concrete edits, in priority order, that would address the FATAL + MAJOR objections.
## Sources
- <every external reference touched during the review>
```
### 4. Second pass (only if FATALs were fixed)
If the first review found FATALs and the author fixes them, run one verification-style pass before final delivery. This pass checks that the fix actually addresses the original objection — no "we updated the introduction" cop-outs for a methodology flaw.
### 5. Deliver
Save exactly one review artifact to `outputs/<slug>-review.md`. End with a `Sources` section with direct URLs for every inspected external source.
## What a peer review is NOT
- Not a summary. The review assumes the reader has read the artifact.
- Not a rewrite suggestion. Flag problems; don't draft the fix.
- Not a hit piece. Every objection should be actionable and specific.

View File

@@ -0,0 +1,62 @@
---
name: preview
description: Preview Markdown, LaTeX, PDF, or code artifacts in the browser or as PDF. Use when the user wants to review a written artifact, export a report, or view a rendered document.
allowed-tools: Bash(open:*), Bash(xdg-open:*), Bash(pandoc:*)
---
# Preview
Render and open artifacts produced by the research workflows. This is a thin wrapper over OS-native openers and `pandoc`.
> **Upstream note.** Feynman ships a `/preview` slash command. Outside Feynman, fall back to the bash commands below — both Claude Code and OpenCode can execute them.
## Open a file in the default app
macOS:
```bash
open <file.md>
open <file.pdf>
open <file.html>
```
Linux:
```bash
xdg-open <file.md>
xdg-open <file.pdf>
xdg-open <file.html>
```
The default app is whatever the OS has registered for the extension — usually a Markdown viewer, Preview/Evince, or a browser.
## Export Markdown to PDF
`pandoc` is the standard cross-platform renderer:
```bash
pandoc outputs/<slug>.md -o outputs/<slug>.pdf \
--pdf-engine=xelatex \
--variable geometry:margin=1in \
--toc
```
For papers with LaTeX equations, prefer `--pdf-engine=xelatex` or `lualatex`. If LaTeX is not installed, `--pdf-engine=weasyprint` is a lightweight alternative that renders HTML+CSS to PDF.
## Export to HTML
```bash
pandoc outputs/<slug>.md -o outputs/<slug>.html --standalone --mathjax
```
## When to use
- User asks to "preview", "render", "export", or "view" a written artifact
- Before delivering a paper or brief, sanity-check rendering
- Converting `.md` outputs to PDF for sharing outside the repo
## What to pass back to the user
- The absolute path to the rendered file
- Whether the render succeeded or had LaTeX/pandoc warnings
- If the user wanted it opened, confirm the OS opener returned exit 0

View File

@@ -0,0 +1,118 @@
---
name: replication
description: Plan or execute a replication of a paper, claim, or benchmark. Use when the user asks to replicate results, reproduce an experiment, verify a claim empirically, or build a replication package.
---
# Replication
Plan — and optionally execute — a replication of a paper, claim, or benchmark. Always confirm the execution environment with the user before running any code.
Derive a slug from the paper or claim (lowercase, hyphens, ≤5 words).
## Subagent mapping
See `../_platform-mapping.md`. Use `researcher` to extract implementation details from the paper and repo.
## Workflow
### 1. Extract
Dispatch a `researcher` subagent (or read directly for small papers) to pull implementation details from the target paper and any linked code:
- Algorithm / architecture specifics
- Hyperparameters, config defaults, random seeds
- Dataset — name, source, preprocessing
- Training regime — epochs, batch size, optimizer
- Metrics — exact definitions, evaluation splits
- Hardware — what they used, what you'll need
If `CHANGELOG.md` exists in the workspace, read the most recent relevant entries before planning or resuming.
### 2. Plan
Write `outputs/.plans/<slug>.md` with three explicit columns:
- **Verified** — details you confirmed from paper + code
- **Inferred** — details the paper/code implied but didn't state directly
- **Missing** — details the paper/code doesn't specify
Also include:
- **Check oracles** — the specific measurements that will decide whether the replication succeeded (e.g. "top-1 accuracy within ±0.5% of reported 78.2%")
- **Scope cut** — what's in (core claim) and what's out (ablations, follow-on experiments)
### 3. Environment — ask before running
Before executing anything, ask the user where to run:
- **Local** — current working directory
- **Virtual environment** — isolated venv/conda
- **Docker** — isolated container (see `docker` skill)
- **Modal** — serverless GPU (see `modal-compute` skill). Best for burst jobs without persistent state. Requires `modal` CLI.
- **RunPod** — persistent GPU pod with SSH (see `runpod-compute` skill). Best for long-running experiments. Requires `runpodctl` and `RUNPOD_API_KEY`.
- **Plan only** — produce the replication plan without executing
Do not install packages, run training, or execute experiments without an explicit answer.
### 4. Execute (if a runtime was chosen)
Implement and run the replication steps in the chosen environment. Save:
- **Scripts** — checked-in `.py` / `.sh` files in `experiments/<slug>/`
- **Configs** — exact configs used, checked in
- **Raw outputs** — logs, metrics, predictions, checkpoints (or at least checksums) in a reproducible layout
- **Results summary** — `outputs/<slug>-results.md` comparing your numbers to the paper's
Do not call the outcome "replicated" unless the planned oracles actually passed. If they didn't, write up what you observed and what diverged.
### 5. Log
For multi-step or resumable replication work, append concise entries to `CHANGELOG.md` after:
- Meaningful progress
- Failed attempts
- Major verification outcomes
- Before stopping for the session
Each entry: active objective, what changed, what was checked, next step.
### 6. Report
Save the final replication write-up to `outputs/<slug>-replication.md`:
```markdown
# Replication: <paper title>
**Paper:** <link> (<version>)
**Claim replicated:** <specific claim>
**Date:** YYYY-MM-DD
**Environment:** <chosen runtime>
## Oracles
- <oracle 1>: TARGET <x> — OBSERVED <y> — PASS / FAIL
- <oracle 2>: ...
## What matched
- <list>
## What diverged
- <list, with severity>
## Plausible causes
- <what could explain divergences>
## Reproducibility grade
- FULL | PARTIAL | FAILED | BLOCKED
## Sources
- <paper URL>
- <repo URL at commit used>
```
End with a `Sources` section containing paper and repository URLs pinned to the commit used for replication.
## Invariants
- **Confirm runtime before executing.** No silent installs or training runs.
- **Don't claim replication without oracle checks.** "Numbers look close" isn't a check; "top-1 within ±0.5%" is.
- **Log failures.** A failed replication with a written-up reason is more valuable than a hand-wavy "seems to work".

View File

@@ -0,0 +1,59 @@
---
name: runpod-compute
description: Provision and manage GPU pods on RunPod for long-running experiments. Use when the user needs persistent GPU compute with SSH access, large datasets, or multi-step experiments.
allowed-tools: Bash(runpodctl:*), Bash(ssh:*), Bash(scp:*)
---
# RunPod Compute
Use `runpodctl` CLI for persistent GPU pods with SSH access. Works identically in Claude Code and OpenCode.
## Setup
```bash
# macOS
brew install runpod/runpodctl/runpodctl
# Linux — download from https://github.com/runpod/runpodctl/releases
runpodctl config --apiKey=$RUNPOD_API_KEY
```
Check availability: `command -v runpodctl`.
## Commands
| Command | Description |
|---------|-------------|
| `runpodctl create pod --gpuType "NVIDIA A100 80GB PCIe" --imageName "runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04" --name experiment` | Create a pod |
| `runpodctl get pod` | List all pods |
| `runpodctl stop pod <id>` | Stop (preserves volume) |
| `runpodctl start pod <id>` | Resume a stopped pod |
| `runpodctl remove pod <id>` | Terminate and delete |
| `runpodctl gpu list` | List available GPU types and prices |
| `runpodctl send <file>` | Transfer files to/from pods |
| `runpodctl receive <code>` | Receive transferred files |
## SSH access
```bash
ssh root@<IP> -p <PORT> -i ~/.ssh/id_ed25519
```
Get connection details from `runpodctl get pod <id>`. Pods must expose port `22/tcp`.
## GPU types
`NVIDIA GeForce RTX 4090`, `NVIDIA RTX A6000`, `NVIDIA A40`, `NVIDIA A100 80GB PCIe`, `NVIDIA H100 80GB HBM3`
## When to use
- Long-running experiments needing persistent state
- Large dataset processing
- Multi-step work with SSH access between iterations
- When the experiment writes and reads many intermediate files on the same host
## Lifecycle discipline
- Always stop or remove pods after experiments — running pods bill by the minute.
- Use `runpodctl stop pod` to preserve the volume for resume, `remove pod` to release everything.
- For reproducibility, snapshot the pod image before destroying.

View File

@@ -0,0 +1,65 @@
---
name: session-log
description: Write a durable session log capturing completed work, findings, open questions, and next steps. Use when the user asks to log progress, save session notes, write up what was done, or create a research diary entry.
---
# Session Log
Write a durable, readable log of the current research session. This is a lab-notebook entry, not a task tracker.
## Output location
`notes/session-logs/<YYYY-MM-DD>-<slug>.md`
Where `<slug>` is 15 hyphenated words describing the session's focus (e.g. `scaling-laws-comparison`, `nanogpt-replication`).
If more than one session happens in a day, append a suffix: `2026-04-18-nanogpt-replication-2.md`.
## Required sections
```markdown
# Session: <topic>
**Date:** YYYY-MM-DD
**Duration:** <approx. hh:mm or start/end>
**Slug:** <slug>
## What was done
- bulleted list of concrete actions: files written, experiments run, papers read
- cite the artifacts by path (e.g. `outputs/attention-scaling.md`, `papers/my-draft.md`)
## Key findings
- strongest claims, results, or decisions from the session
- mark each as `verified`, `unverified`, `inferred`, or `blocked` — match the honesty of the underlying evidence
## Open questions
- things you wanted to settle but couldn't
- each question should be concrete enough to hand off to another session
## Next steps
- one or two concrete actions the next session should take
- name the artifact or command to resume from
## Sources
- direct URLs for any external claim that matters
```
## Tie-in to `CHANGELOG.md`
If the workspace has a `CHANGELOG.md` (repo-level lab notebook), the session log and the changelog are complementary:
- Session log = the full narrative of one sitting
- CHANGELOG.md entry = a 24 line summary pointing at the session log
Append a changelog entry after writing the session log, referencing it by path.
## When to write
- At the end of any substantive research session
- Before switching projects or stepping away for more than a day
- When the user explicitly asks to "log progress" or "save session notes"
## When to skip
- Trivial one-question lookups that produced no artifacts
- Pure clarification exchanges with no research output

View File

@@ -0,0 +1,68 @@
---
name: session-search
description: Search past session transcripts to recover prior work, conversations, and research context. Use when the user references something from a previous session, asks "what did we do before", or when you suspect relevant past context exists.
allowed-tools: Bash(grep:*), Bash(rg:*), Bash(ls:*)
---
# Session Search
Recover context from prior sessions by searching transcript stores. The session store path depends on the runtime.
## Session store locations
| Runtime | Transcript path |
|---|---|
| Claude Code | `~/.claude/projects/<hash-of-cwd>/*.jsonl` |
| OpenCode | `~/.local/share/opencode/session/` |
| Codex (Feynman host) | `~/.feynman/sessions/*.jsonl` |
Transcripts are typically JSONL — one JSON record per line with `type` (`session`, `message`, `tool_use`, `model_change`) and `message.content` fields.
## Direct search (works everywhere)
```bash
# keyword search across all transcripts
rg -l "scaling laws" ~/.claude/projects/
rg -l "scaling laws" ~/.local/share/opencode/session/
# fallback when ripgrep is not installed
grep -ril "scaling laws" ~/.claude/projects/
```
For structured queries against JSONL (e.g. "find all user messages about X"):
```bash
rg --json "query" ~/.claude/projects/ | jq 'select(.type == "match") | .data.path.text'
```
## Claude Code native
If the `session-logs` community skill is installed, it provides richer search over the current project's transcripts. Prefer it when available.
## OpenCode native
OpenCode stores sessions under `~/.local/share/opencode/session/<session-id>/`. List available sessions:
```bash
ls -lt ~/.local/share/opencode/session/ | head
```
Resume a past session with `opencode run -c <session-id>` (see OpenCode docs — subject to version).
## What to look for
- User messages referencing the same topic, paper, or codebase
- Assistant outputs (artifacts) that were saved to `outputs/`, `papers/`, or `notes/`
- Plan files (`outputs/.plans/<slug>.md`) that may still be valid
- Failed approaches — often more informative than the successful ones
## When to use
- User says "we talked about X before" or "remember the report on Y"
- Before starting research on a topic that feels familiar
- When resuming a paused workflow mid-project
## What NOT to do
- Do not inline large transcript dumps back into context — read and summarize, don't paste.
- Do not invent past conversations. If search returns nothing, say so instead of confabulating.

View File

@@ -0,0 +1,72 @@
---
name: source-comparison
description: Compare multiple sources on a topic and produce a grounded comparison matrix. Use when the user asks to compare papers, tools, approaches, frameworks, or claims across multiple sources.
---
# Source Comparison
Build a side-by-side comparison matrix grounded in primary sources, distinguishing agreement, disagreement, and uncertainty.
Derive a short slug from the comparison topic (lowercase, hyphens, ≤5 words). All files use this prefix.
## Subagent mapping
See `../_platform-mapping.md`. Use `researcher` for evidence gathering, `verifier` for citation verification.
## Workflow
### 1. Plan
Write `outputs/.plans/<slug>.md`:
- **Items being compared** — name them exactly (paper A vs paper B, tool X vs tool Y, etc.)
- **Dimensions of comparison** — 48 concrete axes (claim, evidence type, methodology, benchmark, caveats, confidence, etc.)
- **Source expectations** — what primary source is required for each cell
- **Output structure** — matrix columns and rows
Summarize briefly to the user, continue immediately unless they asked for plan review.
### 2. Gather
- **Narrow comparison (23 items, clear primary sources):** lead agent gathers directly.
- **Broad comparison (5+ items, or wide surface area):** dispatch `researcher` subagents, one per item or per dimension, each writing to `<slug>-research-<item>.md`.
Require primary sources. Reject blog posts that themselves cite unverifiable claims.
### 3. Build the matrix
One row per item, one column per dimension. Every cell must be traceable to a source.
Markdown table pattern:
```markdown
| Item | Key claim | Evidence type | Methodology | Caveats | Confidence |
|------|-----------|---------------|-------------|---------|-----------|
| ... | ... | ... | ... | ... | ... |
```
Follow the matrix with:
- **Agreement section** — dimensions where items converge
- **Disagreement section** — dimensions where items diverge, with the specific split named
- **Uncertainty section** — dimensions where primary sources don't settle it
### 4. Charts
- Quantitative dimensions → Mermaid bar chart, or CSV in `outputs/.notes/<slug>-data.csv`. No invented numbers.
- Method/architecture differences → Mermaid `graph TD`.
### 5. Cite
Dispatch `verifier` subagent to add inline citations and verify every URL in the comparison draft. Output to `outputs/.drafts/<slug>-cited.md`.
### 6. Deliver
Save final comparison to `outputs/<slug>-comparison.md`. End with a `Sources` section containing direct URLs for every source used.
Write `outputs/<slug>-comparison.provenance.md` with the same format as other research artifacts (see `literature-review` skill for the template).
## What NOT to compare
- Versions of the same thing that only differ in config — that's a benchmark, not a source comparison.
- Items without public primary sources — say so in the plan and stop.

View File

@@ -0,0 +1,154 @@
---
name: summarize
description: Summarize any URL, local file, or PDF using the RLM pattern — source stored on disk, never injected raw into context. Use when the user asks to summarize a long document, paper, webpage, or PDF that might exceed safe context-window limits.
allowed-tools: Bash(curl:*), Bash(pdftotext:*), Bash(python3:*)
---
# Summarize (RLM Pattern)
Summarize a URL, local file, or PDF without injecting the full document into context. The source stays on disk as an external variable; only bounded windows enter context.
Derive a short slug from the source filename or URL domain (lowercase, hyphens, ≤5 words — e.g. `attention-is-all-you-need`). All files use this prefix.
## Why the RLM pattern
Standard summarization injects the full document into context. Above ~15k tokens, early content degrades as the window fills (context rot). This workflow keeps the document on disk and reads only bounded windows — context pressure is proportional to the window size, not the document size.
Tier 1 (<8k chars) is a deliberate exception: direct injection is safe at ~2k tokens and windowed reading would add unnecessary friction.
## Step 1 — Fetch, validate, measure
Run all guards before any tier logic. A failure here is cheap; a failure mid-Tier-3 is not.
- **GitHub repo URL** (`https://github.com/owner/repo` exactly 4 slashes): fetch the raw README instead. Try `https://raw.githubusercontent.com/{owner}/{repo}/main/README.md`, then `/master/README.md`. A repo HTML page is not the document the user wants to summarize.
- **Remote URL:** fetch to disk: `curl -sL -o outputs/.notes/<slug>-raw.txt <url>`. Do NOT use a fetch tool whose return value enters context directly that bypasses the RLM principle.
- **Local file or PDF:** copy or extract to `outputs/.notes/<slug>-raw.txt`. For PDFs, extract text via `pdftotext <file> outputs/.notes/<slug>-raw.txt` (or equivalent) before measuring.
- **Empty or failed fetch:** if the file is <50 bytes after fetching, stop and surface the error do not proceed to tier selection.
- **Binary content:** if the file is >1 KB but contains <100 readable text characters, stop and tell the user the content appears binary or unextracted.
- **Existing output:** if `outputs/<slug>-summary.md` already exists, ask whether to overwrite or use a different slug. Do not proceed until confirmed.
Measure decoded text characters (not bytes UTF-8 multi-byte chars would overcount). Log: `[summarize] source=<source> slug=<slug> chars=<count>`.
## Step 2 — Choose tier
| Chars | Tier | Strategy |
|---|---|---|
| <8 000 | 1 | Direct read full content enters context (safe at ~2k tokens) |
| 8 000 60 000 | 2 | RLM-lite windowed bash extraction, progressive notes to disk |
| >60 000 | 3 | Full RLM — bash chunking + parallel researcher subagents |
Log: `[summarize] tier=<N> chars=<count>`.
## Tier 1 — Direct read
Read `outputs/.notes/<slug>-raw.txt` in full. Summarize directly using the output format below. Write to `outputs/<slug>-summary.md`.
## Tier 2 — RLM-lite windowed read
The document stays on disk. Extract 6 000-char windows via bash/python:
```python
# f.seek/f.read: the Read tool uses line offsets, not char offsets.
# For exact char-boundary windowing across arbitrary text, bash/python is required.
with open("outputs/.notes/<slug>-raw.txt", encoding="utf-8") as f:
f.seek(n * 6000)
window = f.read(6000)
```
For each window:
1. Extract key claims and evidence.
2. **Append to `outputs/.notes/<slug>-notes.md` before reading the next window.** This is the checkpoint: if the session is interrupted, processed windows survive.
3. Log: `[summarize] window <N>/<total> done`.
After all windows, synthesize `outputs/.notes/<slug>-notes.md` into `outputs/<slug>-summary.md`.
## Tier 3 — Full RLM parallel chunks
Each chunk gets a fresh researcher subagent context window — context rot is impossible because no subagent sees more than 6 000 chars.
**Why 500-char overlap:** academic documents contain multi-sentence arguments that span chunk boundaries. 500 chars (~80 words) ensures a cross-boundary claim appears fully in at least one adjacent chunk.
### 3a. Chunk the document
```python
import os
os.makedirs("outputs/.notes", exist_ok=True)
with open("outputs/.notes/<slug>-raw.txt", encoding="utf-8") as f:
text = f.read()
chunk_size, overlap = 6000, 500
chunks, i = [], 0
while i < len(text):
chunks.append(text[i : i + chunk_size])
i += chunk_size - overlap
for n, chunk in enumerate(chunks):
# Zero-pad so files sort correctly (chunk-002 before chunk-010)
with open(f"outputs/.notes/<slug>-chunk-{n:03d}.txt", "w", encoding="utf-8") as f:
f.write(chunk)
print(f"[summarize] chunks={len(chunks)} chunk_size={chunk_size} overlap={overlap}")
```
### 3b. Confirm before spawning
Briefly summarize: "Source is ~<chars> chars → <N> chunks → <N> researcher subagents. This may take several minutes." Then continue automatically. Do not ask for confirmation or wait for a proceed response unless the user explicitly requested review before launching.
### 3c. Dispatch researcher subagents
Dispatch one subagent per chunk (see `../_platform-mapping.md` for role mapping). Each subagent's prompt:
> Read ONLY `outputs/.notes/<slug>-chunk-NNN.txt`. Extract:
> (1) key claims
> (2) methodology or technical approach
> (3) cited evidence
>
> Do NOT use web search or fetch external URLs — this is single-source summarization. If a claim appears to start or end mid-sentence at the file boundary, mark it `BOUNDARY PARTIAL`. Write to `outputs/.notes/<slug>-summary-chunk-NNN.md`.
Use `failFast: false` / equivalent so one chunk failure doesn't kill the batch. Cap concurrency at ~4 to avoid rate limits.
### 3d. Aggregate
After all subagents return, verify every expected `outputs/.notes/<slug>-summary-chunk-NNN.md` exists. Note any missing chunk indices — they appear in the **Coverage gaps** section of the output. Do not abort on partial coverage; a partial summary with gaps noted is more useful than none.
When synthesizing:
- **Deduplicate** — a claim in multiple chunks is one claim; keep the most complete formulation.
- **Resolve boundary conflicts** — for adjacent-chunk contradictions, prefer the version with more supporting context.
- **Remove `BOUNDARY PARTIAL` markers** where a complete version exists in a neighbouring chunk.
Write the final synthesis to `outputs/<slug>-summary.md`.
## Output format
All tiers produce the same artifact at `outputs/<slug>-summary.md`:
```markdown
# Summary: <document title or source filename>
**Source:** <URL or file path>
**Date:** YYYY-MM-DD
**Tier:** 1 | 2 (N windows) | 3 (N chunks)
## Key Claims
<37 most important assertions, each as a bullet>
## Methodology
<approach, dataset, evaluation, baselines omit for non-research documents>
## Limitations
<what the source explicitly flags as weak, incomplete, or out of scope>
## Verdict
<one paragraph: what this document establishes, its credibility, who should read it>
## Sources
1. <title or filename><URL or file path>
## Coverage gaps
<only for Tier 3 with missing chunks list missing indices and approximate byte ranges>
```
Before stopping, verify on disk that `outputs/<slug>-summary.md` exists. Sources contains only the single source confirmed reachable in Step 1. No verifier subagent is needed — there are no URLs constructed from memory to verify.

View File

@@ -0,0 +1,70 @@
---
name: watch
description: Set up a recurring research watch on a topic, company, paper area, or product surface. Use when the user asks to monitor a field, track new papers, watch for updates, or set up alerts on a research area.
---
# Watch
Establish a recurring or deferred research watch. The watch has two parts: a **baseline sweep** so future checks have something to diff against, and a **scheduled follow-up** that runs the same sweep later.
## Workflow
Derive a short slug from the watch topic (lowercase, hyphens, ≤5 words).
### 1. Plan
Write `outputs/.plans/<slug>.md` with:
- **What to monitor** — sources, keywords, specific repos/sites
- **Signals that matter** — e.g. new arXiv papers, new GitHub releases, new benchmark entries
- **What counts as a meaningful change** — filter out noise (typo edits, reformatting)
- **Check frequency** — daily, weekly, monthly
- **How results will be compared** — diff against previous baseline, or append to a log
Briefly summarize the plan to the user. Continue immediately unless the user asks for plan review.
### 2. Baseline sweep
Run the initial research pass now so the watch has a starting point. Use the `deep-research` or `literature-review` skill procedures as appropriate for the topic.
Save the baseline to `outputs/<slug>-baseline.md`.
End the baseline with a `Sources` section listing direct URLs — these are the surfaces the watch will re-check.
### 3. Schedule the follow-up
Don't merely promise to check later — register an actual schedule. Use the **scheduling** facility available in the host runtime:
| Runtime | Scheduling primitive |
|---|---|
| Claude Code | `CronCreate` via the `schedule` skill, or system `cron` invoking `claude -p "<prompt>"` |
| OpenCode | system `cron` invoking `opencode run "<prompt>"` |
| In-session self-wake (Claude only) | `ScheduleWakeup` with delay in seconds |
Example cron entry (weekly watch):
```cron
0 9 * * 1 cd /path/to/workspace && opencode run "Re-run the <slug> watch; compare against outputs/<slug>-baseline.md; write diff to outputs/<slug>-watch-$(date +\%Y\%m\%d).md"
```
### 4. Compare on each follow-up run
When the scheduled run fires:
1. Re-run the baseline sweep.
2. Diff against the most recent previous output (`outputs/<slug>-baseline.md` or the most recent `outputs/<slug>-watch-*.md`).
3. Write `outputs/<slug>-watch-<YYYYMMDD>.md` with:
- New items found
- Items that changed materially
- Items that disappeared (rare but meaningful — e.g. paper retracted, repo deleted)
4. If nothing meaningfully changed, write a one-line entry noting that.
### 5. Stop conditions
Tell the user explicitly how to stop the watch — e.g. `crontab -e` and remove the line, or `CronDelete <id>`. Watches that run forever are noise generators.
## Output artifacts
- `outputs/.plans/<slug>.md` — watch plan
- `outputs/<slug>-baseline.md` — initial sweep
- `outputs/<slug>-watch-<date>.md` — each follow-up run