Files
anything-llm-rag/README.md
salvacybersec 9e9b75e0b3 Initial commit: AnythingLLM persona RAG integration
28 persona workspace with document upload, OCR pipeline, and vector embedding
assignment via AnythingLLM API. Supports 5 clusters (intel, cyber, military,
humanities, engineering) with batch processing and resume capability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 23:07:44 +03:00

64 lines
1.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# AnythingLLM × Persona RAG Integration
28 persona workspace'i olan, kitap kütüphanesinden beslenen RAG sistemi. Her persona kendi uzmanlık alanındaki dokümanlarla vektör embed edilmiş durumda.
## Mimari
- **AnythingLLM Desktop** — `http://localhost:3001`
- **LLM:** Ollama local (qwen3:14b)
- **Embedding:** Google Gemini (gemini-embedding-001)
- **Vector DB:** LanceDB
- **OCR:** ocrmypdf (tur+eng)
- **Kitap Kaynağı:** `/mnt/storage/Common/Books/`
## Personalar (5 Cluster)
| Cluster | Personalar |
|---------|-----------|
| Intel | Frodo, Echo, Ghost, Oracle, Wraith, Scribe, Polyglot |
| Cyber | Neo, Bastion, Sentinel, Specter, Phantom, Cipher, Vortex |
| Military | Marshal, Centurion, Corsair, Warden, Medic |
| Humanities | Chronos, Tribune, Arbiter, Ledger, Sage, Herald, Scholar, Gambit |
| Engineering | Forge, Architect |
## Kullanım
```bash
# Durum kontrolü
python3 setup.py --status
# Workspace oluştur / güncelle
python3 setup.py --create-workspaces
# Tam pipeline (upload + OCR + embed)
python3 setup.py --upload-documents --resume
# Tek cluster veya persona
python3 setup.py --upload-documents --cluster cyber --resume
python3 setup.py --upload-documents --persona neo --priority 1 --resume
# Önizleme
python3 setup.py --upload-documents --dry-run
```
## Pipeline
```
Phase A: Text dosyaları upload
Phase B: Scanned PDF'leri OCR (ocrmypdf)
Phase C: OCR'lı dosyaları upload
Final: Workspace'lere assign/embed
```
## Recovery
Vektör DB silinirse:
1. `upload_progress.json`'da `workspace_docs``{}` sıfırla
2. `python3 setup.py --upload-documents --resume` (sadece re-embed yapar)
## Dosyalar
- `setup.py` — Ana entegrasyon scripti (upload, OCR, workspace assignment)
- `config.yaml` — Persona-klasör eşlemeleri, API config, batch ayarları
- `upload_progress.json` — Upload/atama state tracker (gitignore'd)