28 persona workspace with document upload, OCR pipeline, and vector embedding assignment via AnythingLLM API. Supports 5 clusters (intel, cyber, military, humanities, engineering) with batch processing and resume capability. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1.8 KiB
1.8 KiB
AnythingLLM × Persona RAG Integration
28 persona workspace'i olan, kitap kütüphanesinden beslenen RAG sistemi. Her persona kendi uzmanlık alanındaki dokümanlarla vektör embed edilmiş durumda.
Mimari
- AnythingLLM Desktop —
http://localhost:3001 - LLM: Ollama local (qwen3:14b)
- Embedding: Google Gemini (gemini-embedding-001)
- Vector DB: LanceDB
- OCR: ocrmypdf (tur+eng)
- Kitap Kaynağı:
/mnt/storage/Common/Books/
Personalar (5 Cluster)
| Cluster | Personalar |
|---|---|
| Intel | Frodo, Echo, Ghost, Oracle, Wraith, Scribe, Polyglot |
| Cyber | Neo, Bastion, Sentinel, Specter, Phantom, Cipher, Vortex |
| Military | Marshal, Centurion, Corsair, Warden, Medic |
| Humanities | Chronos, Tribune, Arbiter, Ledger, Sage, Herald, Scholar, Gambit |
| Engineering | Forge, Architect |
Kullanım
# Durum kontrolü
python3 setup.py --status
# Workspace oluştur / güncelle
python3 setup.py --create-workspaces
# Tam pipeline (upload + OCR + embed)
python3 setup.py --upload-documents --resume
# Tek cluster veya persona
python3 setup.py --upload-documents --cluster cyber --resume
python3 setup.py --upload-documents --persona neo --priority 1 --resume
# Önizleme
python3 setup.py --upload-documents --dry-run
Pipeline
Phase A: Text dosyaları upload
Phase B: Scanned PDF'leri OCR (ocrmypdf)
Phase C: OCR'lı dosyaları upload
Final: Workspace'lere assign/embed
Recovery
Vektör DB silinirse:
upload_progress.json'daworkspace_docs→{}sıfırlapython3 setup.py --upload-documents --resume(sadece re-embed yapar)
Dosyalar
setup.py— Ana entegrasyon scripti (upload, OCR, workspace assignment)config.yaml— Persona-klasör eşlemeleri, API config, batch ayarlarıupload_progress.json— Upload/atama state tracker (gitignore'd)