Files
keyhunter/.planning/phases/03-tier-3-9-providers/03-01-SUMMARY.md

104 lines
5.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
phase: 03-tier-3-9-providers
plan: 01
subsystem: providers
tags: [providers, tier4, chinese, regional, keyword-only]
requires: [pkg/providers/registry, pkg/providers/schema]
provides:
- "16 Tier 4 Chinese/regional provider YAML definitions"
- "Keyword-only detection strategy for providers without documented key formats"
affects: [pkg/providers/definitions, providers]
tech-stack:
added: []
patterns:
- "Dual-located provider YAMLs (providers/ + pkg/providers/definitions/)"
- "Keyword-only detection (omit patterns field) for low-signal providers"
key-files:
created:
- providers/deepseek.yaml
- providers/zhipu.yaml
- providers/moonshot.yaml
- providers/qwen.yaml
- providers/baidu.yaml
- providers/bytedance.yaml
- providers/01ai.yaml
- providers/minimax.yaml
- providers/baichuan.yaml
- providers/stepfun.yaml
- providers/sensetime.yaml
- providers/iflytek.yaml
- providers/tencent.yaml
- providers/siliconflow.yaml
- providers/360ai.yaml
- providers/kuaishou.yaml
- pkg/providers/definitions/deepseek.yaml
- pkg/providers/definitions/zhipu.yaml
- pkg/providers/definitions/moonshot.yaml
- pkg/providers/definitions/qwen.yaml
- pkg/providers/definitions/baidu.yaml
- pkg/providers/definitions/bytedance.yaml
- pkg/providers/definitions/01ai.yaml
- pkg/providers/definitions/minimax.yaml
- pkg/providers/definitions/baichuan.yaml
- pkg/providers/definitions/stepfun.yaml
- pkg/providers/definitions/sensetime.yaml
- pkg/providers/definitions/iflytek.yaml
- pkg/providers/definitions/tencent.yaml
- pkg/providers/definitions/siliconflow.yaml
- pkg/providers/definitions/360ai.yaml
- pkg/providers/definitions/kuaishou.yaml
modified: []
decisions:
- "Providers without documented key formats (12 of 16) use keyword-only detection — no patterns field — to avoid the Phase 2 false-positive regression from overly-generic regex"
- "DeepSeek, Moonshot, Qwen, and SiliconFlow use documented sk- prefix patterns with medium/low confidence"
- "Empty verify blocks (url: \"\", valid_status: [], invalid_status: []) used for providers whose auth flows require SigV4-style request signing (Baidu Qianfan, Tencent TC3, iFlytek, ByteDance Volcengine, 360, Kuaishou, SenseTime) — live verification deferred to Phase 5"
metrics:
completed: "2026-04-05"
duration_minutes: 3
tasks: 2
files_created: 32
---
# Phase 03 Plan 01: Tier 4 Chinese/Regional Providers Summary
Added 16 Tier 4 Chinese/regional LLM provider YAML definitions (DeepSeek, Zhipu, Moonshot, Qwen, Baidu, ByteDance, 01.AI, MiniMax, Baichuan, StepFun, SenseTime, iFlytek, Tencent, SiliconFlow, 360 AI, Kuaishou) using keyword-anchored detection to avoid false positives, dual-located in `providers/` and `pkg/providers/definitions/` for Go embed.
## What Was Built
- 32 new YAML files (16 providers × 2 locations for the Go embed pattern)
- 4 providers with documented key prefix patterns: DeepSeek (`sk-[a-f0-9]{32}`), Moonshot (`sk-[A-Za-z0-9]{48}`), Qwen DashScope (`sk-[a-f0-9]{32}`), SiliconFlow (`sk-[a-z]{20,}`, low confidence)
- 12 providers using keyword-only detection (patterns field omitted): Zhipu, Baidu, ByteDance, 01.AI, MiniMax, Baichuan, StepFun, SenseTime, iFlytek, Tencent, 360 AI, Kuaishou
- All 16 providers carry strong keyword lists anchored on: SDK environment variable names, API hostnames, SDK package names, and model family identifiers
## Key Decisions
- **Keyword-only for low-signal providers**: Chinese/regional providers mostly lack publicly documented key formats. Rather than introduce generic `[A-Za-z0-9]{32,64}` patterns (which caused false positives in Phase 2), the patterns field is omitted entirely, relying on the Aho-Corasick keyword pre-filter alone. The registry already allows keyword-only providers.
- **Empty verify blocks for SigV4-style auth**: Providers using cloud-vendor-style request signing (Baidu Qianfan AK/SK, Tencent TC3-HMAC-SHA256, iFlytek HMAC, ByteDance Volcengine, 360, Kuaishou, SenseTime) cannot be verified with a simple Bearer token. Their verify blocks are stubbed (`url: ""`, empty status arrays) and will be populated in Phase 5 when the verification engine gains signed-request support.
## Deviations from Plan
None — plan executed exactly as written.
## Verification
- `for f in [all 16]; do diff providers/$f.yaml pkg/providers/definitions/$f.yaml; done` — zero diffs, all dual-located copies identical
- `go test ./pkg/providers/... -count=1` — PASS (registry loads all 16 new YAMLs without validation errors)
- `go test ./pkg/engine/... -count=1` — PASS (no detector regressions from the new keyword set)
- `grep -l 'tier: 4' providers/*.yaml | wc -l` — 16
- Keyword-only acceptance criterion satisfied: 12 of 16 YAMLs have no `patterns:` field
## Commits
- `35dbbc7` feat(03-01): add 8 Tier 4 Chinese providers (DeepSeek, Zhipu, Moonshot, Qwen, Baidu, ByteDance, 01.AI, MiniMax)
- `a019ba9` feat(03-01): add 8 Tier 4 providers (Baichuan, StepFun, SenseTime, iFlytek, Tencent, SiliconFlow, 360AI, Kuaishou)
## Known Stubs
The 12 keyword-only providers and 7 of the 16 verify blocks (empty URL/status) are intentional and documented above. They are not stubs blocking this plan's goal (PROV-04: 16 Tier 4 providers exist, registry loads them, engine tests green) — they are the correct engineering choice given provider documentation gaps and auth-flow complexity. Live verification for signed-request providers is tracked for Phase 5.
## Self-Check: PASSED
- All 32 YAML files verified present on disk
- Commits 35dbbc7 and a019ba9 verified in `git log`
- go test ./pkg/providers/... and ./pkg/engine/... both green