anything-llm-rag

salvacybersec/anything-llm-rag

Fork 0

Commit Graph

Author	SHA1	Message	Date
salvacybersec	803e8be284	Add quality_analyzer.py — PDF quality scoring for FOIA/CIA filtering Scans PDF folders and scores each document (0-100) based on: - Text content (word count, coherence, OCR garbage detection) - Font presence (scanned vs text-based) - File size, page count, filename quality - Language detection (Arabic/Russian/Turkish/English) Labels: high (70+), medium (40-69), low (20-39), noise (<20) Outputs JSON + CSV. Can move noise to Arsiv/noise with --move. Usage: --scan, --report, --export-csv, --move [--confirm] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 23:00:34 +03:00
salvacybersec	1028d11507	Add structured logging + log panel to monitor - setup.py: logging module with file (setup.log) + console output - Line-buffered output (fixes background execution buffering) - API calls with timeout (300s), retry (3x), debug logging - Per-batch progress: [1/29] persona batch 1/20 (20 docs) - --verbose flag for debug-level console - monitor.py: log tail in CLI + web dashboard - CLI: colorized last 15 log lines - Web: scrollable log panel with level-based colors - Smaller embed batches (20 instead of 50) for reliability Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 00:30:29 +03:00
salvacybersec	9e9b75e0b3	Initial commit: AnythingLLM persona RAG integration 28 persona workspace with document upload, OCR pipeline, and vector embedding assignment via AnythingLLM API. Supports 5 clusters (intel, cyber, military, humanities, engineering) with batch processing and resume capability. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 23:07:44 +03:00

Author

SHA1

Message

Date

salvacybersec

803e8be284

Add quality_analyzer.py — PDF quality scoring for FOIA/CIA filtering

Scans PDF folders and scores each document (0-100) based on:
- Text content (word count, coherence, OCR garbage detection)
- Font presence (scanned vs text-based)
- File size, page count, filename quality
- Language detection (Arabic/Russian/Turkish/English)

Labels: high (70+), medium (40-69), low (20-39), noise (<20)
Outputs JSON + CSV. Can move noise to Arsiv/noise with --move.

Usage: --scan, --report, --export-csv, --move [--confirm]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-07 23:00:34 +03:00

salvacybersec

1028d11507

Add structured logging + log panel to monitor

- setup.py: logging module with file (setup.log) + console output
  - Line-buffered output (fixes background execution buffering)
  - API calls with timeout (300s), retry (3x), debug logging
  - Per-batch progress: [1/29] persona batch 1/20 (20 docs)
  - --verbose flag for debug-level console
- monitor.py: log tail in CLI + web dashboard
  - CLI: colorized last 15 log lines
  - Web: scrollable log panel with level-based colors
- Smaller embed batches (20 instead of 50) for reliability

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-07 00:30:29 +03:00

salvacybersec

9e9b75e0b3

Initial commit: AnythingLLM persona RAG integration

28 persona workspace with document upload, OCR pipeline, and vector embedding
assignment via AnythingLLM API. Supports 5 clusters (intel, cyber, military,
humanities, engineering) with batch processing and resume capability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-06 23:07:44 +03:00

3 Commits