Scans PDF folders and scores each document (0-100) based on:
- Text content (word count, coherence, OCR garbage detection)
- Font presence (scanned vs text-based)
- File size, page count, filename quality
- Language detection (Arabic/Russian/Turkish/English)
Labels: high (70+), medium (40-69), low (20-39), noise (<20)
Outputs JSON + CSV. Can move noise to Arsiv/noise with --move.
Usage: --scan, --report, --export-csv, --move [--confirm]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
28 persona workspace with document upload, OCR pipeline, and vector embedding
assignment via AnythingLLM API. Supports 5 clusters (intel, cyber, military,
humanities, engineering) with batch processing and resume capability.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>