Files
keyhunter/.planning/phases/03-tier-3-9-providers/03-02-SUMMARY.md
salvacybersec e9948f4ccf docs(03-02): complete Tier 3 specialized providers plan
11 new Tier 3 providers (search, embeddings, voice, image/video). PROV-03 satisfied.
2026-04-05 14:43:01 +03:00

4.6 KiB

phase, plan, subsystem, tags, requires, provides, affects, tech_stack_added, patterns, files_created, files_modified, decisions, metrics
phase plan subsystem tags requires provides affects tech_stack_added patterns files_created files_modified decisions metrics
03-tier-3-9-providers 02 providers
providers
tier-3
specialized
voice
image
embeddings
PROV-10-schema
embed-loader
PROV-03
pkg/providers/registry
engine/detector
dual-location-yaml
keyword-only-fallback
tight-prefix-regex
providers/perplexity.yaml
providers/you.yaml
providers/voyage.yaml
providers/jina.yaml
providers/unstructured.yaml
providers/assemblyai.yaml
providers/deepgram.yaml
providers/elevenlabs.yaml
providers/stability.yaml
providers/runway.yaml
providers/midjourney.yaml
pkg/providers/definitions/perplexity.yaml
pkg/providers/definitions/you.yaml
pkg/providers/definitions/voyage.yaml
pkg/providers/definitions/jina.yaml
pkg/providers/definitions/unstructured.yaml
pkg/providers/definitions/assemblyai.yaml
pkg/providers/definitions/deepgram.yaml
pkg/providers/definitions/elevenlabs.yaml
pkg/providers/definitions/stability.yaml
pkg/providers/definitions/runway.yaml
pkg/providers/definitions/midjourney.yaml
Providers without documented key prefixes (You.com, Unstructured, Runway, Midjourney) use keyword-only detection (no regex) to avoid Phase 2 false-positive regression.
Providers with documented prefixes (Perplexity pplx-, Jina jina_, Voyage pa-, Stability sk-) use tight regex with high/medium confidence.
ElevenLabs/Deepgram/AssemblyAI use hex alphanumeric patterns with low confidence + entropy_min 4.0 — keyword pre-filter guards against noise.
Midjourney has no official API; verify block uses empty URL as sentinel (no active verification possible).
duration_seconds tasks_completed files_changed completed_at
70 2 22 2026-04-05T11:42:06Z

Phase 3 Plan 02: Tier 3 Specialized Providers Summary

11 specialized Tier 3 LLM/AI providers added (search, embeddings, voice, image/video) across dual-location YAML, bringing total Tier 3 count to 12 with pre-existing huggingface.

What Was Built

Task 1: Search + Embeddings (commit 7ad9588)

Added 6 providers covering search APIs and embedding/document-processing services:

Provider Type Detection
Perplexity AI Search LLM pplx-[A-Za-z0-9]{48,} (high)
You.com Search keyword-only
Voyage AI Embeddings pa-[A-Za-z0-9_\-]{40,} (medium)
Jina AI Embeddings jina_[A-Za-z0-9]{40,} (high)
Unstructured.io Doc processing keyword-only
AssemblyAI Voice (STT) [a-f0-9]{32} (low)

Task 2: Voice + Image/Video (commit 0ac12e5)

Added 5 providers covering speech, image, and video generation:

Provider Type Detection
Deepgram Voice (STT) [a-f0-9]{40} (low)
ElevenLabs Voice (TTS) [a-f0-9]{32} (low), XI_API_KEY
Stability AI Image sk-[A-Za-z0-9]{48} (medium)
Runway Video keyword-only
Midjourney Image keyword-only (no official API)

All 11 provider YAMLs dual-located (providers/ + pkg/providers/definitions/) to satisfy the embed loader contract.

Key Decisions

  • Keyword-only where no documented format exists. Per Phase 3 lessons-learned, providers without distinctive prefixes (You.com, Unstructured, Runway, Midjourney) rely solely on keyword pre-filtering to avoid false positives.
  • Tight regex for documented prefixes. Perplexity (pplx-), Jina (jina_), Voyage (pa-), Stability (sk-) use prefix-anchored regex with high/medium confidence.
  • Low-confidence hex patterns backed by keyword pre-filter. ElevenLabs, Deepgram, and AssemblyAI use hex-alphanumeric regex (32 or 40 chars) with confidence: low and entropy_min: 4.0 — the Aho-Corasick keyword filter ensures these only fire on matched contexts.
  • Midjourney verify sentinel. Midjourney has no first-party API; VerifySpec uses empty URL/status fields as a sentinel for "cannot actively verify."

Verification

  • go test ./pkg/providers/... -count=1PASS
  • go test ./pkg/engine/... -count=1PASS
  • diff providers/<name>.yaml pkg/providers/definitions/<name>.yaml for all 11 providers → identical
  • grep -l 'tier: 3' providers/*.yaml | wc -l12 (PROV-03 satisfied)

Deviations from Plan

None — plan executed exactly as written.

Requirements Satisfied

  • PROV-03: 12 Tier 3 Specialized providers (11 new + pre-existing huggingface)

Self-Check: PASSED

  • All 22 files present on disk.
  • Commits 7ad9588 and 0ac12e5 exist on current branch.
  • go test ./pkg/providers/... ./pkg/engine/... green after each task.