docs(03): auto-generated context with Phase 2 lessons
This commit is contained in:
75
.planning/phases/03-tier-3-9-providers/03-CONTEXT.md
Normal file
75
.planning/phases/03-tier-3-9-providers/03-CONTEXT.md
Normal file
@@ -0,0 +1,75 @@
|
|||||||
|
# Phase 3: Tier 3-9 Providers - Context
|
||||||
|
|
||||||
|
**Gathered:** 2026-04-05
|
||||||
|
**Status:** Ready for planning
|
||||||
|
**Mode:** Auto-generated (infrastructure phase — discuss skipped)
|
||||||
|
|
||||||
|
<domain>
|
||||||
|
## Phase Boundary
|
||||||
|
|
||||||
|
All 108+ LLM provider definitions exist — specialized models, Chinese/regional providers, infrastructure gateways, emerging tools, code assistants, self-hosted runtimes, and enterprise platforms. Completes the provider library started in Phase 2 (26 Tier 1-2 providers → 108+ total).
|
||||||
|
|
||||||
|
</domain>
|
||||||
|
|
||||||
|
<decisions>
|
||||||
|
## Implementation Decisions
|
||||||
|
|
||||||
|
### Claude's Discretion
|
||||||
|
All implementation choices are at Claude's discretion — pure infrastructure/data phase. Use existing YAML schema and patterns from Phase 2.
|
||||||
|
|
||||||
|
### Lessons from Phase 2 (CRITICAL)
|
||||||
|
- **Generic regex patterns cause false positives at scale.** Phase 2 Tier 2 providers without distinctive prefixes produced false positives on synthetic test fixtures. With 82 more providers in Phase 3 (mostly generic Chinese/regional/emerging providers), this risk compounds.
|
||||||
|
- **Mitigation strategy**:
|
||||||
|
1. Providers WITHOUT distinctive key prefixes MUST rely on keyword pre-filter alone, NOT on overly-broad regex matching.
|
||||||
|
2. Set confidence to "low" or omit regex entirely and use keyword-only detection where no documented key format exists.
|
||||||
|
3. Use HIGH entropy_min (≥4.0) for generic alphanumeric patterns.
|
||||||
|
4. Test each provider's regex against existing testdata/samples/ fixtures to prevent regression.
|
||||||
|
- **Category field**: still not in schema.go — YAMLs should OMIT category field unless schema is updated this phase.
|
||||||
|
|
||||||
|
</decisions>
|
||||||
|
|
||||||
|
<code_context>
|
||||||
|
## Existing Code Insights
|
||||||
|
|
||||||
|
### Reusable Assets
|
||||||
|
- Phase 2 established dual-location pattern: `providers/X.yaml` + `pkg/providers/definitions/X.yaml`
|
||||||
|
- pkg/providers/tier12_test.go — guardrail test pattern to replicate for Tier 3-9
|
||||||
|
- pkg/providers/schema.go — Provider struct (Name, DisplayName, Tier, FormatVersion, LastVerified, Keywords, Patterns, Verify)
|
||||||
|
- 27 existing provider YAMLs in providers/ as templates
|
||||||
|
|
||||||
|
### Established Patterns
|
||||||
|
- Go `embed.FS` via `go:embed definitions/*.yaml`
|
||||||
|
- RE2-only regex (no lookahead, no backrefs)
|
||||||
|
- Keywords list non-empty (required for Aho-Corasick pre-filter)
|
||||||
|
- patterns with regex, entropy_min, confidence fields
|
||||||
|
|
||||||
|
### Integration Points
|
||||||
|
- providers/ and pkg/providers/definitions/ — add N new YAML files (likely 82-90)
|
||||||
|
- pkg/providers/tier12_test.go — extend or mirror for tier3+ guardrail
|
||||||
|
- `keyhunter providers list --tier=enterprise` — requires tier filter support in cmd/providers.go (may need to be added this phase)
|
||||||
|
|
||||||
|
</code_context>
|
||||||
|
|
||||||
|
<specifics>
|
||||||
|
## Specific Ideas
|
||||||
|
|
||||||
|
Provider groups to cover (per ROADMAP success criteria):
|
||||||
|
- **Chinese/regional**: DeepSeek, Zhipu, Moonshot (Kimi), Qwen (Alibaba), Baidu (Wenxin/ERNIE), ByteDance (Doubao), 01.AI (Yi), MiniMax, StepFun, SenseTime
|
||||||
|
- **Self-hosted runtimes**: Ollama, vLLM, LocalAI, llama.cpp, LMStudio, text-generation-webui, Jan, GPT4All
|
||||||
|
- **Code assistants**: GitHub Copilot, Cursor, Tabnine, Codeium, Cody (Sourcegraph), Continue, Aider, Phind
|
||||||
|
- **Enterprise**: Salesforce (Einstein GPT), ServiceNow (Now Assist), SAP (Joule), Palantir (AIP), Databricks (DBRX), Snowflake (Cortex), Oracle (Generative AI), HPE, IBM Watsonx
|
||||||
|
- **Infrastructure gateways**: OpenRouter, LiteLLM, LangSmith, Helicone, PortKey, LangFuse, Traceloop
|
||||||
|
- **Specialized**: Perplexity (pplx-), ElevenLabs, AssemblyAI, Deepgram, Speechmatics, Runway, Pika, Stability AI, Midjourney, Ideogram, Leonardo AI, Suno, Udio, PlayHT, Descript
|
||||||
|
- **Emerging**: Writer, Jasper, Copy.ai, WordAI, ContentBot, Rytr
|
||||||
|
- Aim for 82-90 additional providers to reach 108+ total
|
||||||
|
|
||||||
|
</specifics>
|
||||||
|
|
||||||
|
<deferred>
|
||||||
|
## Deferred Ideas
|
||||||
|
|
||||||
|
- `--tier=enterprise` CLI filter: add to Phase 3 if schema already has Tier field, otherwise defer to later
|
||||||
|
- Provider "category" enum: defer to a separate phase (schema change)
|
||||||
|
- Keyword-only detection mode for providers without documented key formats: lightweight enhancement to pkg/engine/detector.go — consider including if time permits
|
||||||
|
|
||||||
|
</deferred>
|
||||||
Reference in New Issue
Block a user