salvacybersec 9e9b75e0b3 Initial commit: AnythingLLM persona RAG integration
28 persona workspace with document upload, OCR pipeline, and vector embedding
assignment via AnythingLLM API. Supports 5 clusters (intel, cyber, military,
humanities, engineering) with batch processing and resume capability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 23:07:44 +03:00

AnythingLLM × Persona RAG Integration

28 persona workspace'i olan, kitap kütüphanesinden beslenen RAG sistemi. Her persona kendi uzmanlık alanındaki dokümanlarla vektör embed edilmiş durumda.

Mimari

  • AnythingLLM Desktophttp://localhost:3001
  • LLM: Ollama local (qwen3:14b)
  • Embedding: Google Gemini (gemini-embedding-001)
  • Vector DB: LanceDB
  • OCR: ocrmypdf (tur+eng)
  • Kitap Kaynağı: /mnt/storage/Common/Books/

Personalar (5 Cluster)

Cluster Personalar
Intel Frodo, Echo, Ghost, Oracle, Wraith, Scribe, Polyglot
Cyber Neo, Bastion, Sentinel, Specter, Phantom, Cipher, Vortex
Military Marshal, Centurion, Corsair, Warden, Medic
Humanities Chronos, Tribune, Arbiter, Ledger, Sage, Herald, Scholar, Gambit
Engineering Forge, Architect

Kullanım

# Durum kontrolü
python3 setup.py --status

# Workspace oluştur / güncelle
python3 setup.py --create-workspaces

# Tam pipeline (upload + OCR + embed)
python3 setup.py --upload-documents --resume

# Tek cluster veya persona
python3 setup.py --upload-documents --cluster cyber --resume
python3 setup.py --upload-documents --persona neo --priority 1 --resume

# Önizleme
python3 setup.py --upload-documents --dry-run

Pipeline

Phase A: Text dosyaları upload
Phase B: Scanned PDF'leri OCR (ocrmypdf)
Phase C: OCR'lı dosyaları upload
Final:   Workspace'lere assign/embed

Recovery

Vektör DB silinirse:

  1. upload_progress.json'da workspace_docs{} sıfırla
  2. python3 setup.py --upload-documents --resume (sadece re-embed yapar)

Dosyalar

  • setup.py — Ana entegrasyon scripti (upload, OCR, workspace assignment)
  • config.yaml — Persona-klasör eşlemeleri, API config, batch ayarları
  • upload_progress.json — Upload/atama state tracker (gitignore'd)
Description
No description provided
Readme 222 KiB
Languages
Python 100%