- Upload progress bar with percentage, file count, speed, ETA
- Detects active upload from upload_*.log files automatically
- Last 10 upload lines shown with ✓/✗ color coding
- Combined log panel shows both setup.log and upload logs
- Upload folder distribution in API response
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Scans PDF folders and scores each document (0-100) based on:
- Text content (word count, coherence, OCR garbage detection)
- Font presence (scanned vs text-based)
- File size, page count, filename quality
- Language detection (Arabic/Russian/Turkish/English)
Labels: high (70+), medium (40-69), low (20-39), noise (<20)
Outputs JSON + CSV. Can move noise to Arsiv/noise with --move.
Usage: --scan, --report, --export-csv, --move [--confirm]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Speed profiles control timeout, retries, batch size, and delays:
fast: 30s timeout, 7 retries, batch 10, 1s delay (~5x faster)
medium: 60s timeout, 5 retries, batch 5, 2s delay (default)
slow: 300s timeout, 3 retries, batch 5, 5s delay (safe)
Analysis showed 54% of batches hit 300s timeout on Olla bad routes,
wasting 7.7h on 155 batches. Fast mode reduces timeout waste from
300s to 30s per bad route — real embeds take ~18s on average.
Also reduced default batch delay from 5s to 2s in config.yaml.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parses batch timestamps from setup.log, averages last 20 batches,
calculates remaining time. Shows ETA, docs remaining, and avg
seconds per batch in both web summary cards and CLI header.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Every API call:
- 5 retries with progressive backoff (Olla routes to random instances)
- Body error detection (API 200 but embed error in response)
Per persona verification:
- First batch: LanceDB must physically grow + query must return sources
- Every 10th batch: LanceDB growth check
- Final: triple check (LanceDB size + workspace doc count API + search query)
- Abort on model-not-found errors, skip after 5 consecutive failures
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Pre-flight: test embedding model with 3 retries (120s timeout for cold start)
- First-batch verify: after batch 1, query workspace to confirm vectors searchable
- Abort on model errors: "not found" or "failed to embed" stops immediately
- Consecutive failure guard: 3 fails in a row → skip persona, continue others
- Response error check: API 200 but embed error in body → caught and logged
- Never record progress for failed embeds
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fetch real workspace slugs from AnythingLLM API instead of guessing
- Show KB instead of 0MB for small LanceDB/vector sizes
- Fixes incorrect vector detection after embedding engine change
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Reduce embed batch to 5 — AnythingLLM hangs on batches >10
- Fix check_script_running() to properly detect setup.py process
(was returning false because pgrep matched monitor.py too)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sync README, skill, memory, and Obsidian note with current state:
- 29 persona workspaces across 5 clusters
- 88 mapped paths covering 39,754 files (67 GB)
- New --reassign --reset mode for fast vector recovery
- Expanded skip_extensions list
- Gitea repo reference added
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Skips the slow folder scan (50K+ files) and upload phases — directly
re-embeds already-uploaded documents to workspaces using progress state.
Use with --reset to clear assignment tracking first.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
28 persona workspace with document upload, OCR pipeline, and vector embedding
assignment via AnythingLLM API. Supports 5 clusters (intel, cyber, military,
humanities, engineering) with batch processing and resume capability.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>