Initial commit: AnythingLLM persona RAG integration
28 persona workspace with document upload, OCR pipeline, and vector embedding assignment via AnythingLLM API. Supports 5 clusters (intel, cyber, military, humanities, engineering) with batch processing and resume capability. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
63
README.md
Normal file
63
README.md
Normal file
@@ -0,0 +1,63 @@
|
||||
# AnythingLLM × Persona RAG Integration
|
||||
|
||||
28 persona workspace'i olan, kitap kütüphanesinden beslenen RAG sistemi. Her persona kendi uzmanlık alanındaki dokümanlarla vektör embed edilmiş durumda.
|
||||
|
||||
## Mimari
|
||||
|
||||
- **AnythingLLM Desktop** — `http://localhost:3001`
|
||||
- **LLM:** Ollama local (qwen3:14b)
|
||||
- **Embedding:** Google Gemini (gemini-embedding-001)
|
||||
- **Vector DB:** LanceDB
|
||||
- **OCR:** ocrmypdf (tur+eng)
|
||||
- **Kitap Kaynağı:** `/mnt/storage/Common/Books/`
|
||||
|
||||
## Personalar (5 Cluster)
|
||||
|
||||
| Cluster | Personalar |
|
||||
|---------|-----------|
|
||||
| Intel | Frodo, Echo, Ghost, Oracle, Wraith, Scribe, Polyglot |
|
||||
| Cyber | Neo, Bastion, Sentinel, Specter, Phantom, Cipher, Vortex |
|
||||
| Military | Marshal, Centurion, Corsair, Warden, Medic |
|
||||
| Humanities | Chronos, Tribune, Arbiter, Ledger, Sage, Herald, Scholar, Gambit |
|
||||
| Engineering | Forge, Architect |
|
||||
|
||||
## Kullanım
|
||||
|
||||
```bash
|
||||
# Durum kontrolü
|
||||
python3 setup.py --status
|
||||
|
||||
# Workspace oluştur / güncelle
|
||||
python3 setup.py --create-workspaces
|
||||
|
||||
# Tam pipeline (upload + OCR + embed)
|
||||
python3 setup.py --upload-documents --resume
|
||||
|
||||
# Tek cluster veya persona
|
||||
python3 setup.py --upload-documents --cluster cyber --resume
|
||||
python3 setup.py --upload-documents --persona neo --priority 1 --resume
|
||||
|
||||
# Önizleme
|
||||
python3 setup.py --upload-documents --dry-run
|
||||
```
|
||||
|
||||
## Pipeline
|
||||
|
||||
```
|
||||
Phase A: Text dosyaları upload
|
||||
Phase B: Scanned PDF'leri OCR (ocrmypdf)
|
||||
Phase C: OCR'lı dosyaları upload
|
||||
Final: Workspace'lere assign/embed
|
||||
```
|
||||
|
||||
## Recovery
|
||||
|
||||
Vektör DB silinirse:
|
||||
1. `upload_progress.json`'da `workspace_docs` → `{}` sıfırla
|
||||
2. `python3 setup.py --upload-documents --resume` (sadece re-embed yapar)
|
||||
|
||||
## Dosyalar
|
||||
|
||||
- `setup.py` — Ana entegrasyon scripti (upload, OCR, workspace assignment)
|
||||
- `config.yaml` — Persona-klasör eşlemeleri, API config, batch ayarları
|
||||
- `upload_progress.json` — Upload/atama state tracker (gitignore'd)
|
||||
Reference in New Issue
Block a user