107 lines
4.6 KiB
Markdown
107 lines
4.6 KiB
Markdown
---
|
|
phase: 03-tier-3-9-providers
|
|
plan: 02
|
|
subsystem: providers
|
|
tags: [providers, tier-3, specialized, voice, image, embeddings]
|
|
requires: [PROV-10-schema, embed-loader]
|
|
provides: [PROV-03]
|
|
affects: [pkg/providers/registry, engine/detector]
|
|
tech_stack_added: []
|
|
patterns: [dual-location-yaml, keyword-only-fallback, tight-prefix-regex]
|
|
files_created:
|
|
- providers/perplexity.yaml
|
|
- providers/you.yaml
|
|
- providers/voyage.yaml
|
|
- providers/jina.yaml
|
|
- providers/unstructured.yaml
|
|
- providers/assemblyai.yaml
|
|
- providers/deepgram.yaml
|
|
- providers/elevenlabs.yaml
|
|
- providers/stability.yaml
|
|
- providers/runway.yaml
|
|
- providers/midjourney.yaml
|
|
- pkg/providers/definitions/perplexity.yaml
|
|
- pkg/providers/definitions/you.yaml
|
|
- pkg/providers/definitions/voyage.yaml
|
|
- pkg/providers/definitions/jina.yaml
|
|
- pkg/providers/definitions/unstructured.yaml
|
|
- pkg/providers/definitions/assemblyai.yaml
|
|
- pkg/providers/definitions/deepgram.yaml
|
|
- pkg/providers/definitions/elevenlabs.yaml
|
|
- pkg/providers/definitions/stability.yaml
|
|
- pkg/providers/definitions/runway.yaml
|
|
- pkg/providers/definitions/midjourney.yaml
|
|
files_modified: []
|
|
decisions:
|
|
- "Providers without documented key prefixes (You.com, Unstructured, Runway, Midjourney) use keyword-only detection (no regex) to avoid Phase 2 false-positive regression."
|
|
- "Providers with documented prefixes (Perplexity pplx-, Jina jina_, Voyage pa-, Stability sk-) use tight regex with high/medium confidence."
|
|
- "ElevenLabs/Deepgram/AssemblyAI use hex alphanumeric patterns with low confidence + entropy_min 4.0 — keyword pre-filter guards against noise."
|
|
- "Midjourney has no official API; verify block uses empty URL as sentinel (no active verification possible)."
|
|
metrics:
|
|
duration_seconds: 70
|
|
tasks_completed: 2
|
|
files_changed: 22
|
|
completed_at: "2026-04-05T11:42:06Z"
|
|
---
|
|
|
|
# Phase 3 Plan 02: Tier 3 Specialized Providers Summary
|
|
|
|
11 specialized Tier 3 LLM/AI providers added (search, embeddings, voice, image/video) across dual-location YAML, bringing total Tier 3 count to 12 with pre-existing huggingface.
|
|
|
|
## What Was Built
|
|
|
|
### Task 1: Search + Embeddings (commit `7ad9588`)
|
|
|
|
Added 6 providers covering search APIs and embedding/document-processing services:
|
|
|
|
| Provider | Type | Detection |
|
|
|----------|------|-----------|
|
|
| Perplexity AI | Search LLM | `pplx-[A-Za-z0-9]{48,}` (high) |
|
|
| You.com | Search | keyword-only |
|
|
| Voyage AI | Embeddings | `pa-[A-Za-z0-9_\-]{40,}` (medium) |
|
|
| Jina AI | Embeddings | `jina_[A-Za-z0-9]{40,}` (high) |
|
|
| Unstructured.io | Doc processing | keyword-only |
|
|
| AssemblyAI | Voice (STT) | `[a-f0-9]{32}` (low) |
|
|
|
|
### Task 2: Voice + Image/Video (commit `0ac12e5`)
|
|
|
|
Added 5 providers covering speech, image, and video generation:
|
|
|
|
| Provider | Type | Detection |
|
|
|----------|------|-----------|
|
|
| Deepgram | Voice (STT) | `[a-f0-9]{40}` (low) |
|
|
| ElevenLabs | Voice (TTS) | `[a-f0-9]{32}` (low), `XI_API_KEY` |
|
|
| Stability AI | Image | `sk-[A-Za-z0-9]{48}` (medium) |
|
|
| Runway | Video | keyword-only |
|
|
| Midjourney | Image | keyword-only (no official API) |
|
|
|
|
All 11 provider YAMLs dual-located (`providers/` + `pkg/providers/definitions/`) to satisfy the embed loader contract.
|
|
|
|
## Key Decisions
|
|
|
|
- **Keyword-only where no documented format exists.** Per Phase 3 lessons-learned, providers without distinctive prefixes (You.com, Unstructured, Runway, Midjourney) rely solely on keyword pre-filtering to avoid false positives.
|
|
- **Tight regex for documented prefixes.** Perplexity (`pplx-`), Jina (`jina_`), Voyage (`pa-`), Stability (`sk-`) use prefix-anchored regex with high/medium confidence.
|
|
- **Low-confidence hex patterns backed by keyword pre-filter.** ElevenLabs, Deepgram, and AssemblyAI use hex-alphanumeric regex (32 or 40 chars) with `confidence: low` and `entropy_min: 4.0` — the Aho-Corasick keyword filter ensures these only fire on matched contexts.
|
|
- **Midjourney verify sentinel.** Midjourney has no first-party API; VerifySpec uses empty URL/status fields as a sentinel for "cannot actively verify."
|
|
|
|
## Verification
|
|
|
|
- `go test ./pkg/providers/... -count=1` → **PASS**
|
|
- `go test ./pkg/engine/... -count=1` → **PASS**
|
|
- `diff providers/<name>.yaml pkg/providers/definitions/<name>.yaml` for all 11 providers → identical
|
|
- `grep -l 'tier: 3' providers/*.yaml | wc -l` → **12** (PROV-03 satisfied)
|
|
|
|
## Deviations from Plan
|
|
|
|
None — plan executed exactly as written.
|
|
|
|
## Requirements Satisfied
|
|
|
|
- **PROV-03**: 12 Tier 3 Specialized providers (11 new + pre-existing huggingface)
|
|
|
|
## Self-Check: PASSED
|
|
|
|
- All 22 files present on disk.
|
|
- Commits `7ad9588` and `0ac12e5` exist on current branch.
|
|
- `go test ./pkg/providers/... ./pkg/engine/...` green after each task.
|