Update runtime checks and installer behavior

2026-03-25 13:55:32 -07:00
parent 572de7ba85
commit f6dbacc9d5
24 changed files with 431 additions and 21 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -14,3 +14,39 @@ Use this file to track chronology, not release notes. Keep entries short, factua
 - Failed / learned: ...
 - Blockers: ...
 - Next: ...
+
+### 2026-03-25 00:00 local — scaling-laws
+
+- Objective: Set up a deep research workflow for scaling laws.
+- Changed: Created plan artifact at `outputs/.plans/scaling-laws.md`; defined 4 disjoint researcher dimensions and acceptance criteria.
+- Verified: Read `CHANGELOG.md` and checked prior memory for related plan `scaling-laws-implications`.
+- Failed / learned: No prior run-specific changelog entries existed beyond the template.
+- Blockers: Waiting for user confirmation before launching researcher round 1.
+- Next: On confirmation, spawn 4 parallel researcher subagents and begin evidence collection.
+
+### 2026-03-25 00:30 local — scaling-laws (T4 inference/time-scale pass)
+
+- Objective: Complete T4 on inference/test-time scaling and reasoning-time compute, scoped to 2023–2026.
+- Changed: Wrote `notes/scaling-laws-research-inference.md`; updated `outputs/.plans/scaling-laws.md` to mark T4 done and log the inference-scaling verification pass.
+- Verified: Cross-read 13 primary/official sources covering Tree-of-Thoughts, PRMs, repeated sampling, compute-optimal test-time scaling, provable laws, o1, DeepSeek-R1, s1, verifier failures, Anthropic extended thinking, and OpenAI reasoning API docs.
+- Failed / learned: OpenAI blog fetch for `learning-to-reason-with-llms` returned malformed content, so the note leans on the o1 system card and API docs instead of that blog post.
+- Blockers: T2 and T5 remain open before final synthesis; no single unified law for inference-time scaling emerged from public sources.
+- Next: Complete T5 implications synthesis, then reconcile T3/T4 with foundational T2 before drafting the cited brief.
+
+### 2026-03-25 11:20 local — scaling-laws (T6 draft synthesis)
+
+- Objective: Synthesize the four research notes into a single user-facing draft brief for the scaling-laws workflow.
+- Changed: Wrote `outputs/.drafts/scaling-laws-draft.md` with an executive summary, curated reading list, qualitative meta-analysis, core-paper comparison table, explicit training-vs-inference distinction, and numbered inline citations with direct-URL sources.
+- Verified: Cross-checked the draft against `notes/scaling-laws-research-foundations.md`, `notes/scaling-laws-research-revisions.md`, `notes/scaling-laws-research-inference.md`, and `notes/scaling-laws-research-implications.md` to ensure the brief explicitly states the literature is too heterogeneous for a pooled effect-size estimate.
+- Failed / learned: The requested temp-run `context.md` and `plan.md` were absent, so the synthesis used `outputs/.plans/scaling-laws.md` plus the four note files as the working context.
+- Blockers: Citation/claim verification pass still pending; this draft should be treated as pre-verification.
+- Next: Run verifier/reviewer passes, then promote the draft into the final cited brief and provenance sidecar.
+
+### 2026-03-25 11:28 local — scaling-laws (final brief + pdf)
+
+- Objective: Deliver a paper guide and qualitative meta-analysis on AI scaling laws.
+- Changed: Finalized `outputs/scaling-laws.md` and sidecar `outputs/scaling-laws.provenance.md`; rendered preview PDF at `outputs/scaling-laws.pdf`; updated plan ledger and verification log in `outputs/.plans/scaling-laws.md`.
+- Verified: Ran a reviewer pass recorded in `notes/scaling-laws-verification.md`; spot-checked key primary papers via alpha-backed reads for Kaplan 2020, Chinchilla 2022, and Snell 2024; confirmed PDF render output exists.
+- Failed / learned: A pooled statistical meta-analysis would be misleading because the literature mixes heterogeneous outcomes, scaling axes, and evaluation regimes; final deliverable uses a qualitative meta-analysis instead.
+- Blockers: None for this brief.
+- Next: If needed, extend into a narrower sub-survey (e.g. only pretraining laws, only inference-time scaling, or only post-Chinchilla data-quality revisions).