3 Commits

Author SHA1 Message Date
salvacybersec
803e8be284 Add quality_analyzer.py — PDF quality scoring for FOIA/CIA filtering
Scans PDF folders and scores each document (0-100) based on:
- Text content (word count, coherence, OCR garbage detection)
- Font presence (scanned vs text-based)
- File size, page count, filename quality
- Language detection (Arabic/Russian/Turkish/English)

Labels: high (70+), medium (40-69), low (20-39), noise (<20)
Outputs JSON + CSV. Can move noise to Arsiv/noise with --move.

Usage: --scan, --report, --export-csv, --move [--confirm]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:00:34 +03:00
salvacybersec
1028d11507 Add structured logging + log panel to monitor
- setup.py: logging module with file (setup.log) + console output
  - Line-buffered output (fixes background execution buffering)
  - API calls with timeout (300s), retry (3x), debug logging
  - Per-batch progress: [1/29] persona batch 1/20 (20 docs)
  - --verbose flag for debug-level console
- monitor.py: log tail in CLI + web dashboard
  - CLI: colorized last 15 log lines
  - Web: scrollable log panel with level-based colors
- Smaller embed batches (20 instead of 50) for reliability

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 00:30:29 +03:00
salvacybersec
9e9b75e0b3 Initial commit: AnythingLLM persona RAG integration
28 persona workspace with document upload, OCR pipeline, and vector embedding
assignment via AnythingLLM API. Supports 5 clusters (intel, cyber, military,
humanities, engineering) with batch processing and resume capability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 23:07:44 +03:00