Rebuild website from scratch on Tailwind v4 + shadcn/ui
- Fresh Astro 5 project with Tailwind v4 and shadcn/ui olive preset - All shadcn components installed (Card, Button, Badge, Separator, etc.) - Homepage with hero, terminal demo, workflows, agents, sources, compute - Full docs system with 24 markdown pages across 5 sections - Sidebar navigation with active state highlighting - Prose styles for markdown content using shadcn color tokens - Dark/light theme toggle with localStorage persistence - Shiki everforest syntax themes for code blocks - 404 page with VT323 font - /docs redirect to installation page - GitHub star count fetch - Earthy green/cream oklch color palette matching TUI theme Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,39 +1,50 @@
|
||||
---
|
||||
title: Code Audit
|
||||
description: Compare paper claims against public codebases
|
||||
description: Compare a paper's claims against its public codebase for reproducibility.
|
||||
section: Workflows
|
||||
order: 4
|
||||
---
|
||||
|
||||
The code audit workflow compares a paper's claims against its public codebase to identify mismatches, undocumented deviations, and reproducibility risks. It bridges the gap between what a paper says and what the code actually does.
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
/audit <item>
|
||||
```
|
||||
|
||||
## What it does
|
||||
|
||||
Compares claims made in a paper against its public codebase. Surfaces mismatches, missing experiments, and reproducibility risks.
|
||||
|
||||
## What it checks
|
||||
|
||||
- Do the reported hyperparameters match the code?
|
||||
- Are all claimed experiments present in the repository?
|
||||
- Does the training loop match the described methodology?
|
||||
- Are there undocumented preprocessing steps?
|
||||
- Do evaluation metrics match the paper's claims?
|
||||
|
||||
## Example
|
||||
From the REPL:
|
||||
|
||||
```
|
||||
/audit 2401.12345
|
||||
/audit arxiv:2401.12345
|
||||
```
|
||||
|
||||
## Output
|
||||
```
|
||||
/audit https://github.com/org/repo --paper arxiv:2401.12345
|
||||
```
|
||||
|
||||
An audit report with:
|
||||
From the CLI:
|
||||
|
||||
- Claim-by-claim verification
|
||||
- Identified mismatches
|
||||
- Missing components
|
||||
- Reproducibility risk assessment
|
||||
```bash
|
||||
feynman audit 2401.12345
|
||||
```
|
||||
|
||||
When given an arXiv ID, Feynman locates the associated code repository from the paper's links, Papers With Code, or GitHub search. You can also provide the repository URL directly.
|
||||
|
||||
## How it works
|
||||
|
||||
The audit workflow operates in two passes. First, the researcher agent reads the paper and extracts all concrete claims: hyperparameters, architecture details, training procedures, dataset splits, evaluation metrics, and reported results. Each claim is tagged with its location in the paper for traceability.
|
||||
|
||||
Second, the verifier agent examines the codebase to find the corresponding implementation for each claim. It checks configuration files, training scripts, model definitions, and evaluation code to verify that the code matches the paper's description. When it finds a discrepancy -- a hyperparameter that differs, a training step that was described but not implemented, or an evaluation procedure that deviates from the paper -- it documents the mismatch with exact file paths and line numbers.
|
||||
|
||||
The audit also checks for common reproducibility issues like missing random seeds, non-deterministic operations without pinned versions, hardcoded paths, and absent environment specifications.
|
||||
|
||||
## Output format
|
||||
|
||||
The audit report contains:
|
||||
|
||||
- **Match Summary** -- Percentage of claims that match the code
|
||||
- **Confirmed Claims** -- Claims that are accurately reflected in the codebase
|
||||
- **Mismatches** -- Discrepancies between paper and code with evidence from both
|
||||
- **Missing Implementations** -- Claims in the paper with no corresponding code
|
||||
- **Reproducibility Risks** -- Issues like missing seeds, unpinned dependencies, or hardcoded paths
|
||||
|
||||
## When to use it
|
||||
|
||||
Use `/audit` when you are deciding whether to build on a paper's results, when replicating an experiment, or when reviewing a paper for a venue and want to verify its claims against the code. It is also useful for auditing your own papers before submission to catch inconsistencies between your writeup and implementation.
|
||||
|
||||
@@ -1,44 +1,58 @@
|
||||
---
|
||||
title: Autoresearch
|
||||
description: Autonomous experiment optimization loop
|
||||
description: Start an autonomous experiment loop that iteratively optimizes toward a goal.
|
||||
section: Workflows
|
||||
order: 8
|
||||
---
|
||||
|
||||
The autoresearch workflow launches an autonomous research loop that iteratively designs experiments, runs them, analyzes results, and proposes next steps. It is designed for open-ended exploration where the goal is optimization or discovery rather than a specific answer.
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
/autoresearch <idea>
|
||||
```
|
||||
|
||||
## What it does
|
||||
|
||||
Runs an autonomous experiment loop:
|
||||
|
||||
1. **Edit** — Modify code or configuration
|
||||
2. **Commit** — Save the change
|
||||
3. **Benchmark** — Run evaluation
|
||||
4. **Evaluate** — Compare against baseline
|
||||
5. **Keep or revert** — Persist improvements, roll back regressions
|
||||
6. **Repeat** — Continue until the target is hit
|
||||
|
||||
## Tracking
|
||||
|
||||
Metrics are tracked in:
|
||||
|
||||
- `autoresearch.md` — Human-readable progress log
|
||||
- `autoresearch.jsonl` — Machine-readable metrics over time
|
||||
|
||||
## Controls
|
||||
From the REPL:
|
||||
|
||||
```
|
||||
/autoresearch <idea> # start or resume
|
||||
/autoresearch off # stop, keep data
|
||||
/autoresearch clear # delete all state, start fresh
|
||||
/autoresearch Optimize prompt engineering strategies for math reasoning on GSM8K
|
||||
```
|
||||
|
||||
## Example
|
||||
From the CLI:
|
||||
|
||||
```bash
|
||||
feynman autoresearch "Optimize prompt engineering strategies for math reasoning on GSM8K"
|
||||
```
|
||||
|
||||
Autoresearch runs as a long-lived background process. You can monitor its progress, pause it, or redirect its focus at any time.
|
||||
|
||||
## How it works
|
||||
|
||||
The autoresearch workflow is powered by `@tmustier/pi-ralph-wiggum`, which provides long-running agent loops. The workflow begins by analyzing the research goal and designing an initial experiment plan. It then enters an iterative loop:
|
||||
|
||||
1. **Hypothesis** -- The agent proposes a hypothesis or modification based on current results
|
||||
2. **Experiment** -- It designs and executes an experiment to test the hypothesis
|
||||
3. **Analysis** -- Results are analyzed and compared against prior iterations
|
||||
4. **Decision** -- The agent decides whether to continue the current direction, try a variation, or pivot to a new approach
|
||||
|
||||
Each iteration builds on the previous ones. The agent maintains a running log of what has been tried, what worked, what failed, and what the current best result is. This prevents repeating failed approaches and ensures the search progresses efficiently.
|
||||
|
||||
## Monitoring and control
|
||||
|
||||
Check active autoresearch jobs:
|
||||
|
||||
```
|
||||
/autoresearch optimize the learning rate schedule for better convergence
|
||||
/jobs
|
||||
```
|
||||
|
||||
Autoresearch runs in the background, so you can continue using Feynman for other tasks while it works. The `/jobs` command shows the current status, iteration count, and best result so far. You can interrupt the loop at any time to provide guidance or redirect the search.
|
||||
|
||||
## Output format
|
||||
|
||||
Autoresearch produces a running experiment log that includes:
|
||||
|
||||
- **Experiment History** -- What was tried in each iteration with parameters and results
|
||||
- **Best Configuration** -- The best-performing setup found so far
|
||||
- **Ablation Results** -- Which factors mattered most based on the experiments run
|
||||
- **Recommendations** -- Suggested next steps based on observed trends
|
||||
|
||||
## When to use it
|
||||
|
||||
Use `/autoresearch` for tasks that benefit from iterative exploration: hyperparameter optimization, prompt engineering, architecture search, or any problem where the search space is large and the feedback signal is clear. It is not the right tool for answering a specific question (use `/deepresearch` for that) but excels at finding what works best through systematic experimentation.
|
||||
|
||||
@@ -1,29 +1,50 @@
|
||||
---
|
||||
title: Source Comparison
|
||||
description: Compare multiple sources with agreement/disagreement matrix
|
||||
description: Compare multiple sources and produce an agreement/disagreement matrix.
|
||||
section: Workflows
|
||||
order: 6
|
||||
---
|
||||
|
||||
The source comparison workflow analyzes multiple papers, articles, or documents side by side and produces a structured matrix showing where they agree, disagree, and differ in methodology. It is useful for understanding conflicting results, evaluating competing approaches, and identifying which claims have broad support versus limited evidence.
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
/compare <topic>
|
||||
```
|
||||
|
||||
## What it does
|
||||
|
||||
Compares multiple sources on a topic. Builds an agreement/disagreement matrix showing where sources align and where they conflict.
|
||||
|
||||
## Example
|
||||
From the REPL:
|
||||
|
||||
```
|
||||
/compare approaches to constitutional AI training
|
||||
/compare "GPT-4 vs Claude vs Gemini on reasoning benchmarks"
|
||||
```
|
||||
|
||||
## Output
|
||||
```
|
||||
/compare arxiv:2401.12345 arxiv:2402.67890 arxiv:2403.11111
|
||||
```
|
||||
|
||||
- Source-by-source breakdown
|
||||
- Agreement/disagreement matrix
|
||||
- Synthesis of key differences
|
||||
- Assessment of which positions have stronger evidence
|
||||
From the CLI:
|
||||
|
||||
```bash
|
||||
feynman compare "topic or list of sources"
|
||||
```
|
||||
|
||||
You can provide a topic and let Feynman find the sources, or list specific papers and documents for a targeted comparison.
|
||||
|
||||
## How it works
|
||||
|
||||
The comparison workflow begins by identifying or retrieving the sources to compare. If you provide a topic, the researcher agents find the most relevant and contrasting papers. If you provide specific IDs or files, they are used directly.
|
||||
|
||||
Each source is analyzed independently first: the researcher agents extract claims, results, methodology, and limitations from each document. Then the comparison engine aligns claims across sources -- identifying where two papers make the same claim (agreement), where they report contradictory results (disagreement), and where they measure different things entirely (non-overlapping scope).
|
||||
|
||||
The alignment step handles the nuance that papers often measure slightly different quantities or use different evaluation protocols. The comparison explicitly notes when an apparent disagreement might be explained by methodological differences rather than genuine conflicting results.
|
||||
|
||||
## Output format
|
||||
|
||||
The comparison produces:
|
||||
|
||||
- **Source Summaries** -- One-paragraph summary of each source's key contributions
|
||||
- **Agreement Matrix** -- Claims supported by multiple sources with citation evidence
|
||||
- **Disagreement Matrix** -- Conflicting claims with analysis of why sources diverge
|
||||
- **Methodology Differences** -- How the sources differ in approach, data, and evaluation
|
||||
- **Synthesis** -- An overall assessment of which claims are well-supported and which remain contested
|
||||
|
||||
## When to use it
|
||||
|
||||
Use `/compare` when you encounter contradictory results in the literature, when evaluating competing approaches to the same problem, or when you need to understand how different research groups frame the same topic. It is also useful for writing related work sections where you need to accurately characterize the state of debate.
|
||||
|
||||
@@ -1,40 +1,48 @@
|
||||
---
|
||||
title: Deep Research
|
||||
description: Thorough source-heavy investigation with parallel agents
|
||||
description: Run a thorough, multi-agent investigation that produces a cited research brief.
|
||||
section: Workflows
|
||||
order: 1
|
||||
---
|
||||
|
||||
Deep research is the flagship Feynman workflow. It dispatches multiple researcher agents in parallel to search academic papers, web sources, and code repositories, then synthesizes everything into a structured research brief with inline citations.
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
/deepresearch <topic>
|
||||
```
|
||||
|
||||
## What it does
|
||||
|
||||
Deep research runs a thorough, source-heavy investigation. It plans the research scope, delegates to parallel researcher agents, synthesizes findings, and adds inline citations.
|
||||
|
||||
The workflow follows these steps:
|
||||
|
||||
1. **Plan** — Clarify the research question and identify search strategy
|
||||
2. **Delegate** — Spawn parallel researcher agents to gather evidence from different source types (papers, web, repos)
|
||||
3. **Synthesize** — Merge findings, resolve contradictions, identify gaps
|
||||
4. **Cite** — Add inline citations and verify all source URLs
|
||||
5. **Deliver** — Write a durable research brief to `outputs/`
|
||||
|
||||
## Example
|
||||
From the REPL:
|
||||
|
||||
```
|
||||
/deepresearch transformer scaling laws and their implications for compute-optimal training
|
||||
/deepresearch What are the current approaches to mechanistic interpretability in LLMs?
|
||||
```
|
||||
|
||||
## Output
|
||||
From the CLI:
|
||||
|
||||
Produces a structured research brief with:
|
||||
```bash
|
||||
feynman deepresearch "What are the current approaches to mechanistic interpretability in LLMs?"
|
||||
```
|
||||
|
||||
- Executive summary
|
||||
- Key findings organized by theme
|
||||
- Evidence tables with source links
|
||||
- Open questions and suggested next steps
|
||||
- Numbered sources section with direct URLs
|
||||
Both forms are equivalent. The workflow begins immediately and streams progress as agents discover and analyze sources.
|
||||
|
||||
## How it works
|
||||
|
||||
The deep research workflow proceeds through four phases. First, the researcher agents fan out to search AlphaXiv for relevant papers and the web for non-academic sources like blog posts, documentation, and code repositories. Each agent tackles a different angle of the topic to maximize coverage.
|
||||
|
||||
Second, the agents read and extract key findings from the most relevant sources. They pull claims, methodology details, results, and limitations from each paper or article. For academic papers, they access the full PDF through AlphaXiv when available.
|
||||
|
||||
Third, a synthesis step cross-references findings across sources, identifies areas of consensus and disagreement, and organizes the material into a coherent narrative. The writer agent structures the output as a research brief with sections for background, key findings, open questions, and references.
|
||||
|
||||
Finally, the verifier agent spot-checks claims against their cited sources to flag any misattributions or unsupported assertions. The finished report is saved to your session directory and can be previewed as rendered HTML with `/preview`.
|
||||
|
||||
## Output format
|
||||
|
||||
The research brief follows a consistent structure:
|
||||
|
||||
- **Summary** -- A concise overview of the topic and key takeaways
|
||||
- **Background** -- Context and motivation for the research area
|
||||
- **Key Findings** -- The main results organized by theme, with inline citations
|
||||
- **Open Questions** -- Unresolved issues and promising research directions
|
||||
- **References** -- Full citation list with links to source papers and articles
|
||||
|
||||
## Customization
|
||||
|
||||
You can steer the research by being specific in your prompt. Narrow topics produce more focused briefs. Broad topics produce survey-style overviews. You can also specify constraints like "focus on papers from 2024" or "only consider empirical results" to guide the agents.
|
||||
|
||||
@@ -1,37 +1,51 @@
|
||||
---
|
||||
title: Draft Writing
|
||||
description: Paper-style draft generation from research findings
|
||||
description: Generate a paper-style draft from research findings and session context.
|
||||
section: Workflows
|
||||
order: 7
|
||||
---
|
||||
|
||||
The draft writing workflow generates structured academic-style documents from your research findings. It uses the writer agent to produce well-organized prose with proper citations, sections, and formatting suitable for papers, reports, or blog posts.
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
/draft <topic>
|
||||
```
|
||||
|
||||
## What it does
|
||||
|
||||
Produces a paper-style draft with structured sections. Writes to `papers/`.
|
||||
|
||||
## Structure
|
||||
|
||||
The generated draft includes:
|
||||
|
||||
- Title
|
||||
- Abstract
|
||||
- Introduction / Background
|
||||
- Method or Approach
|
||||
- Evidence and Analysis
|
||||
- Limitations
|
||||
- Conclusion
|
||||
- Sources
|
||||
|
||||
## Example
|
||||
From the REPL:
|
||||
|
||||
```
|
||||
/draft survey of differentiable physics simulators
|
||||
/draft A survey of retrieval-augmented generation techniques
|
||||
```
|
||||
|
||||
The writer agent works only from supplied evidence — it never fabricates content. If evidence is insufficient, it explicitly notes the gaps.
|
||||
```
|
||||
/draft --from-session
|
||||
```
|
||||
|
||||
From the CLI:
|
||||
|
||||
```bash
|
||||
feynman draft "A survey of retrieval-augmented generation techniques"
|
||||
```
|
||||
|
||||
When used with `--from-session`, the writer draws from the current session's research findings, making it a natural follow-up to a deep research or literature review workflow.
|
||||
|
||||
## How it works
|
||||
|
||||
The draft workflow leverages the writer agent, which specializes in producing structured academic prose. When given a topic, it first consults the researcher agents to gather source material, then organizes the findings into a coherent document with proper narrative flow.
|
||||
|
||||
When working from existing session context (after a deep research or literature review), the writer skips the research phase and works directly with the findings already gathered. This produces a more focused draft because the source material has already been vetted and organized.
|
||||
|
||||
The writer pays attention to academic conventions: claims are attributed to their sources with inline citations, methodology sections describe procedures precisely, and limitations are discussed honestly. The draft includes placeholder sections for any content the writer cannot generate from available sources, clearly marking what needs human input.
|
||||
|
||||
## Output format
|
||||
|
||||
The draft follows standard academic structure:
|
||||
|
||||
- **Abstract** -- Concise summary of the document's scope and findings
|
||||
- **Introduction** -- Motivation, context, and contribution statement
|
||||
- **Body Sections** -- Organized by topic with subsections as needed
|
||||
- **Discussion** -- Interpretation of findings and implications
|
||||
- **Limitations** -- Honest assessment of scope and gaps
|
||||
- **References** -- Complete bibliography in a consistent citation format
|
||||
|
||||
## Preview and iteration
|
||||
|
||||
After generating the draft, use `/preview` to render it as HTML or PDF with proper formatting, math rendering, and typography. You can iterate on the draft by asking Feynman to revise specific sections, add more detail, or restructure the argument.
|
||||
|
||||
@@ -1,31 +1,45 @@
|
||||
---
|
||||
title: Literature Review
|
||||
description: Map consensus, disagreements, and open questions
|
||||
description: Run a structured literature review with consensus mapping and gap analysis.
|
||||
section: Workflows
|
||||
order: 2
|
||||
---
|
||||
|
||||
The literature review workflow produces a structured survey of the academic landscape on a given topic. Unlike deep research which aims for a comprehensive brief, the literature review focuses specifically on mapping the state of the field -- what researchers agree on, where they disagree, and what remains unexplored.
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
/lit <topic>
|
||||
```
|
||||
|
||||
## What it does
|
||||
|
||||
Runs a structured literature review that searches across academic papers and web sources. Explicitly separates consensus findings from disagreements and open questions.
|
||||
|
||||
## Example
|
||||
From the REPL:
|
||||
|
||||
```
|
||||
/lit multimodal reasoning benchmarks for large language models
|
||||
/lit Scaling laws for language model performance
|
||||
```
|
||||
|
||||
## Output
|
||||
From the CLI:
|
||||
|
||||
A structured review covering:
|
||||
```bash
|
||||
feynman lit "Scaling laws for language model performance"
|
||||
```
|
||||
|
||||
- **Consensus** — What the field agrees on
|
||||
- **Disagreements** — Where sources conflict
|
||||
- **Open questions** — What remains unresolved
|
||||
- **Sources** — Direct links to all referenced papers and articles
|
||||
## How it works
|
||||
|
||||
The literature review workflow begins by having researcher agents search for papers on the topic across AlphaXiv and the web. The agents prioritize survey papers, highly-cited foundational work, and recent publications to capture both established knowledge and the current frontier.
|
||||
|
||||
After gathering sources, the agents extract claims, results, and methodology from each paper. The synthesis step then organizes findings into a structured review that maps out where the community has reached consensus, where active debate exists, and where gaps in the literature remain.
|
||||
|
||||
The output is organized chronologically and thematically, showing how ideas evolved over time and how different research groups approach the problem differently. Citation counts and publication venues are used as signals for weighting claims, though the review explicitly notes when influential work contradicts the mainstream view.
|
||||
|
||||
## Output format
|
||||
|
||||
The literature review produces:
|
||||
|
||||
- **Scope and Methodology** -- What was searched and how papers were selected
|
||||
- **Consensus** -- Claims that most papers agree on, with supporting citations
|
||||
- **Disagreements** -- Active debates where papers present conflicting evidence or interpretations
|
||||
- **Open Questions** -- Topics that the literature has not adequately addressed
|
||||
- **Timeline** -- Key milestones and how the field evolved
|
||||
- **References** -- Complete bibliography organized by relevance
|
||||
|
||||
## When to use it
|
||||
|
||||
Use `/lit` when you need a map of the research landscape rather than a deep dive into a specific question. It is particularly useful at the start of a new research project when you need to understand what has already been done, or when preparing a related work section for a paper.
|
||||
|
||||
@@ -1,42 +1,50 @@
|
||||
---
|
||||
title: Replication
|
||||
description: Plan replications of papers and claims
|
||||
description: Plan or execute a replication of a paper's experiments and claims.
|
||||
section: Workflows
|
||||
order: 5
|
||||
---
|
||||
|
||||
The replication workflow helps you plan and execute reproductions of published experiments, benchmark results, or specific claims. It generates a detailed replication plan, identifies potential pitfalls, and can guide you through the execution step by step.
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
/replicate <paper or claim>
|
||||
```
|
||||
|
||||
## What it does
|
||||
|
||||
Extracts key implementation details from a paper, identifies what's needed to replicate the results, and asks where to run before executing anything.
|
||||
|
||||
Before running code, Feynman asks you to choose an execution environment:
|
||||
|
||||
- **Local** — run in the current working directory
|
||||
- **Virtual environment** — create an isolated venv/conda env first
|
||||
- **Docker** — run experiment code inside an isolated Docker container
|
||||
- **Plan only** — produce the replication plan without executing
|
||||
|
||||
## Example
|
||||
From the REPL:
|
||||
|
||||
```
|
||||
/replicate "chain-of-thought prompting improves math reasoning"
|
||||
/replicate arxiv:2401.12345
|
||||
```
|
||||
|
||||
## Output
|
||||
```
|
||||
/replicate "The claim that sparse attention achieves 95% of dense attention quality at 60% compute"
|
||||
```
|
||||
|
||||
A replication plan covering:
|
||||
From the CLI:
|
||||
|
||||
- Key claims to verify
|
||||
- Required resources (compute, data, models)
|
||||
- Implementation details extracted from the paper
|
||||
- Potential pitfalls and underspecified details
|
||||
- Step-by-step replication procedure
|
||||
- Success criteria
|
||||
```bash
|
||||
feynman replicate "paper or claim"
|
||||
```
|
||||
|
||||
If an execution environment is selected, also produces runnable scripts and captured results.
|
||||
You can point the workflow at a full paper for a comprehensive replication plan, or at a specific claim for a focused reproduction.
|
||||
|
||||
## How it works
|
||||
|
||||
The replication workflow starts with the researcher agent reading the target paper and extracting every detail needed for reproduction: model architecture, hyperparameters, training schedule, dataset preparation, evaluation protocol, and hardware requirements. It cross-references these details against the codebase (if available) using the same machinery as the code audit workflow.
|
||||
|
||||
Next, the workflow generates a structured replication plan that breaks the experiment into discrete steps, estimates compute and time requirements, and identifies where the paper is underspecified. For each underspecified detail, it suggests reasonable defaults based on common practices in the field and flags the assumption as a potential source of divergence.
|
||||
|
||||
The plan also includes a risk assessment: which parts of the experiment are most likely to cause replication failure, what tolerance to expect for numerical results, and which claims are most sensitive to implementation details.
|
||||
|
||||
## Output format
|
||||
|
||||
The replication plan includes:
|
||||
|
||||
- **Requirements** -- Hardware, software, data, and estimated compute cost
|
||||
- **Step-by-step Plan** -- Ordered steps from environment setup through final evaluation
|
||||
- **Underspecified Details** -- Where the paper leaves out information needed for replication
|
||||
- **Risk Assessment** -- Which steps are most likely to cause divergence from reported results
|
||||
- **Success Criteria** -- What results would constitute a successful replication
|
||||
|
||||
## Iterative execution
|
||||
|
||||
After generating the plan, you can execute the replication interactively. Feynman walks you through each step, helps you write the code, monitors training runs, and compares intermediate results against the paper's reported values. When results diverge, it helps diagnose whether the cause is an implementation difference, a hyperparameter mismatch, or a genuine replication failure.
|
||||
|
||||
@@ -1,49 +1,52 @@
|
||||
---
|
||||
title: Peer Review
|
||||
description: Simulated peer review with severity-graded feedback
|
||||
description: Simulate a rigorous peer review with severity-graded feedback.
|
||||
section: Workflows
|
||||
order: 3
|
||||
---
|
||||
|
||||
The peer review workflow simulates a thorough academic peer review of a paper, draft, or research artifact. It produces severity-graded feedback with inline annotations, covering methodology, claims, writing quality, and reproducibility.
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
/review <artifact>
|
||||
```
|
||||
|
||||
## What it does
|
||||
|
||||
Simulates a tough-but-fair peer review for AI research artifacts. Evaluates novelty, empirical rigor, baselines, ablations, and reproducibility.
|
||||
|
||||
The reviewer agent identifies:
|
||||
|
||||
- Weak baselines
|
||||
- Missing ablations
|
||||
- Evaluation mismatches
|
||||
- Benchmark leakage
|
||||
- Under-specified implementation details
|
||||
|
||||
## Severity levels
|
||||
|
||||
Feedback is graded by severity:
|
||||
|
||||
- **FATAL** — Fundamental issues that invalidate the claims
|
||||
- **MAJOR** — Significant problems that need addressing
|
||||
- **MINOR** — Small improvements or clarifications
|
||||
|
||||
## Example
|
||||
From the REPL:
|
||||
|
||||
```
|
||||
/review outputs/scaling-laws-brief.md
|
||||
/review arxiv:2401.12345
|
||||
```
|
||||
|
||||
## Output
|
||||
```
|
||||
/review ~/papers/my-draft.pdf
|
||||
```
|
||||
|
||||
Structured review with:
|
||||
From the CLI:
|
||||
|
||||
- Summary of the work
|
||||
- Strengths
|
||||
- Weaknesses (severity-graded)
|
||||
- Questions for the authors
|
||||
- Verdict (accept / revise / reject)
|
||||
- Revision plan
|
||||
```bash
|
||||
feynman review arxiv:2401.12345
|
||||
feynman review my-draft.md
|
||||
```
|
||||
|
||||
You can pass an arXiv ID, a URL, or a local file path. For arXiv papers, Feynman fetches the full PDF through AlphaXiv.
|
||||
|
||||
## How it works
|
||||
|
||||
The review workflow assigns the reviewer agent to read the document end-to-end and evaluate it against standard academic criteria. The reviewer examines the paper's claims, checks whether the methodology supports the conclusions, evaluates the experimental design for potential confounds, and assesses the clarity and completeness of the writing.
|
||||
|
||||
Each piece of feedback is assigned a severity level: **critical** (fundamental issues that undermine the paper's validity), **major** (significant problems that should be addressed), **minor** (suggestions for improvement), or **nit** (stylistic or formatting issues). This grading helps you triage feedback and focus on what matters most.
|
||||
|
||||
The reviewer also produces a summary assessment with an overall recommendation and a confidence score indicating how certain it is about each finding. When the reviewer identifies a claim that cannot be verified from the paper alone, it flags it as needing additional evidence.
|
||||
|
||||
## Output format
|
||||
|
||||
The review output includes:
|
||||
|
||||
- **Summary Assessment** -- Overall evaluation and recommendation
|
||||
- **Strengths** -- What the paper does well
|
||||
- **Critical Issues** -- Fundamental problems that need to be addressed
|
||||
- **Major Issues** -- Significant concerns with suggested fixes
|
||||
- **Minor Issues** -- Smaller improvements and suggestions
|
||||
- **Inline Annotations** -- Specific comments tied to sections of the document
|
||||
|
||||
## Customization
|
||||
|
||||
You can focus the review by specifying what to examine: "focus on the statistical methodology" or "check the claims in Section 4 against the experimental results." The reviewer adapts its analysis to your priorities while still performing a baseline check of the full document.
|
||||
|
||||
@@ -1,29 +1,54 @@
|
||||
---
|
||||
title: Watch
|
||||
description: Recurring research monitoring
|
||||
description: Set up recurring research monitoring on a topic.
|
||||
section: Workflows
|
||||
order: 9
|
||||
---
|
||||
|
||||
The watch workflow sets up recurring research monitoring that periodically checks for new papers, articles, and developments on a topic you care about. It notifies you when something relevant appears and can automatically summarize new findings.
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
/watch <topic>
|
||||
```
|
||||
|
||||
## What it does
|
||||
|
||||
Schedules a recurring research watch. Sets a baseline of current knowledge and defines what constitutes a meaningful change worth reporting.
|
||||
|
||||
## Example
|
||||
From the REPL:
|
||||
|
||||
```
|
||||
/watch new papers on test-time compute scaling
|
||||
/watch New developments in state space models for sequence modeling
|
||||
```
|
||||
|
||||
From the CLI:
|
||||
|
||||
```bash
|
||||
feynman watch "New developments in state space models for sequence modeling"
|
||||
```
|
||||
|
||||
After setting up a watch, Feynman periodically runs searches on the topic and alerts you when it finds new relevant material.
|
||||
|
||||
## How it works
|
||||
|
||||
1. Feynman establishes a baseline by surveying current sources
|
||||
2. Defines change signals (new papers, updated results, new repos)
|
||||
3. Schedules periodic checks via `pi-schedule-prompt`
|
||||
4. Reports only when meaningful changes are detected
|
||||
The watch workflow is built on `pi-schedule-prompt`, which manages scheduled and recurring tasks. When you create a watch, Feynman stores the topic and search parameters, then runs a lightweight search at regular intervals (default: daily).
|
||||
|
||||
Each check searches AlphaXiv for new papers and the web for new articles matching your topic. Results are compared against what was found in previous checks to surface only genuinely new material. When new items are found, Feynman produces a brief summary of each and stores it in your session history.
|
||||
|
||||
The watch is smart about relevance. It does not just keyword-match -- it uses the same researcher agent that powers deep research to evaluate whether new papers are genuinely relevant to your topic or just superficially related. This keeps the signal-to-noise ratio high even for broad topics.
|
||||
|
||||
## Managing watches
|
||||
|
||||
List active watches:
|
||||
|
||||
```
|
||||
/jobs
|
||||
```
|
||||
|
||||
The `/jobs` command shows all active watches along with their schedule, last check time, and number of new items found. You can pause, resume, or delete watches from within the REPL.
|
||||
|
||||
## Output format
|
||||
|
||||
Each watch check produces:
|
||||
|
||||
- **New Papers** -- Titles, authors, and one-paragraph summaries of newly discovered papers
|
||||
- **New Articles** -- Relevant blog posts, documentation updates, or news articles
|
||||
- **Relevance Notes** -- Why each item was flagged as relevant to your watch topic
|
||||
|
||||
## When to use it
|
||||
|
||||
Use `/watch` to stay current on a research area without manually searching every day. It is particularly useful for fast-moving fields where new papers appear frequently, for tracking specific research groups or topics related to your own work, and for monitoring the literature while you focus on other tasks.
|
||||
|
||||
Reference in New Issue
Block a user