From d34da519dcbbec40b7ad9758260df11dfcdb1b28 Mon Sep 17 00:00:00 2001 From: salvacybersec Date: Sun, 5 Apr 2026 14:43:49 +0300 Subject: [PATCH] docs(03-01): complete Tier 4 Chinese/regional providers plan --- .planning/REQUIREMENTS.md | 2 +- .planning/ROADMAP.md | 4 +- .planning/STATE.md | 13 ++- .../03-tier-3-9-providers/03-01-SUMMARY.md | 103 ++++++++++++++++++ 4 files changed, 113 insertions(+), 9 deletions(-) create mode 100644 .planning/phases/03-tier-3-9-providers/03-01-SUMMARY.md diff --git a/.planning/REQUIREMENTS.md b/.planning/REQUIREMENTS.md index a382124..431f928 100644 --- a/.planning/REQUIREMENTS.md +++ b/.planning/REQUIREMENTS.md @@ -22,7 +22,7 @@ Requirements for initial release. Each maps to roadmap phases. - [x] **PROV-01**: 12 Tier 1 Frontier provider YAML definitions (OpenAI, Anthropic, Google AI, Vertex, AWS Bedrock, Azure OpenAI, Meta AI, xAI, Cohere, Mistral, Inflection, AI21) - [x] **PROV-02**: 14 Tier 2 Inference Platform provider definitions (Together, Fireworks, Groq, Replicate, Anyscale, DeepInfra, Lepton, Modal, Baseten, Cerebrium, NovitaAI, Sambanova, OctoAI, Friendli) - [x] **PROV-03**: 12 Tier 3 Specialized provider definitions (Perplexity, You.com, Voyage, Jina, Unstructured, AssemblyAI, Deepgram, ElevenLabs, Stability, Runway, Midjourney, HuggingFace) -- [ ] **PROV-04**: 16 Tier 4 Chinese/Regional provider definitions (DeepSeek, Baichuan, Zhipu, Moonshot, Yi, Qwen, Baidu, ByteDance, SenseTime, iFlytek, MiniMax, Stepfun, 360 AI, Kuaishou, Tencent, SiliconFlow) +- [x] **PROV-04**: 16 Tier 4 Chinese/Regional provider definitions (DeepSeek, Baichuan, Zhipu, Moonshot, Yi, Qwen, Baidu, ByteDance, SenseTime, iFlytek, MiniMax, Stepfun, 360 AI, Kuaishou, Tencent, SiliconFlow) - [ ] **PROV-05**: 11 Tier 5 Infrastructure/Gateway provider definitions (Cloudflare AI, Vercel AI, LiteLLM, Portkey, Helicone, OpenRouter, Martian, Kong, BricksAI, Aether, Not Diamond) - [ ] **PROV-06**: 15 Tier 6 Emerging/Niche provider definitions (Reka, Aleph Alpha, Writer, Jasper, Typeface, Comet, W&B, LangSmith, Pinecone, Weaviate, Qdrant, Chroma, Milvus, Neon, Lamini) - [x] **PROV-07**: 10 Tier 7 Code/Dev Tools provider definitions (GitHub Copilot, Cursor, Tabnine, Codeium, Sourcegraph, CodeWhisperer, Replit AI, Codestral, watsonx, Oracle AI) diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md index f477732..5f25154 100644 --- a/.planning/ROADMAP.md +++ b/.planning/ROADMAP.md @@ -82,13 +82,13 @@ Plans: **Plans**: 8 plans Plans: -- [ ] 03-01-PLAN.md — Tier 4 Chinese/regional providers (DeepSeek, Zhipu, Moonshot, Qwen, Baidu, ByteDance, 01.AI, MiniMax, Baichuan, StepFun, SenseTime, iFlytek, Tencent, SiliconFlow, 360 AI, Kuaishou) +- [x] 03-01-PLAN.md — Tier 4 Chinese/regional providers (DeepSeek, Zhipu, Moonshot, Qwen, Baidu, ByteDance, 01.AI, MiniMax, Baichuan, StepFun, SenseTime, iFlytek, Tencent, SiliconFlow, 360 AI, Kuaishou) - [x] 03-02-PLAN.md — Tier 3 Specialized (Perplexity, You.com, Voyage, Jina, Unstructured, AssemblyAI, Deepgram, ElevenLabs, Stability, Runway, Midjourney) - [x] 03-03-PLAN.md — Tier 5 Infrastructure/Gateway (OpenRouter, LiteLLM, Cloudflare AI, Vercel AI, Portkey, Helicone, Martian, Kong, BricksAI, Aether, Not Diamond) - [x] 03-04-PLAN.md — Tier 7 Code/Dev Tools (GitHub Copilot, Cursor, Tabnine, Codeium, Sourcegraph, CodeWhisperer, Replit AI, Codestral, watsonx, Oracle AI) - [ ] 03-05-PLAN.md — Tier 8 Self-Hosted runtimes (Ollama, vLLM, LocalAI, LM Studio, llama.cpp, GPT4All, text-gen-webui, TensorRT-LLM, Triton, Jan) - [x] 03-06-PLAN.md — Tier 9 Enterprise (Salesforce Einstein, ServiceNow, SAP AI Core, Palantir, Databricks, Snowflake, Oracle GenAI, HPE GreenLake) -- [ ] 03-07-PLAN.md — Tier 6 Emerging/Niche (Reka, Aleph Alpha, Lamini, Writer, Jasper, Typeface, Comet, W&B, LangSmith, Pinecone, Weaviate, Qdrant, Chroma, Milvus, Neon) +- [x] 03-07-PLAN.md — Tier 6 Emerging/Niche (Reka, Aleph Alpha, Lamini, Writer, Jasper, Typeface, Comet, W&B, LangSmith, Pinecone, Weaviate, Qdrant, Chroma, Milvus, Neon) - [ ] 03-08-PLAN.md — Tier 3-9 guardrail test: lock 108 total providers, per-tier counts, and name sets ### Phase 4: Input Sources diff --git a/.planning/STATE.md b/.planning/STATE.md index b7c75dc..9704175 100644 --- a/.planning/STATE.md +++ b/.planning/STATE.md @@ -3,14 +3,14 @@ gsd_state_version: 1.0 milestone: v1.0 milestone_name: milestone status: executing -stopped_at: Completed 03-06-PLAN.md -last_updated: "2026-04-05T11:42:57.673Z" +stopped_at: Completed 03-01-PLAN.md +last_updated: "2026-04-05T11:43:45.831Z" last_activity: 2026-04-05 progress: total_phases: 18 completed_phases: 2 total_plans: 18 - completed_plans: 14 + completed_plans: 16 percent: 20 --- @@ -26,7 +26,7 @@ See: .planning/PROJECT.md (updated 2026-04-04) ## Current Position Phase: 03 (tier-3-9-providers) — EXECUTING -Plan: 4 of 8 +Plan: 5 of 8 Status: Ready to execute Last activity: 2026-04-05 @@ -63,6 +63,7 @@ Progress: [██░░░░░░░░] 20% | Phase 03-tier-3-9-providers P04 | 3m | 2 tasks | 20 files | | Phase 03-tier-3-9-providers P02 | 70 | 2 tasks | 22 files | | Phase 03-tier-3-9-providers P06 | 3m | 2 tasks | 16 files | +| Phase 03-tier-3-9-providers P01 | 3m | 2 tasks | 32 files | ## Accumulated Context @@ -96,6 +97,6 @@ None yet. ## Session Continuity -Last session: 2026-04-05T11:42:57.670Z -Stopped at: Completed 03-06-PLAN.md +Last session: 2026-04-05T11:43:45.827Z +Stopped at: Completed 03-01-PLAN.md Resume file: None diff --git a/.planning/phases/03-tier-3-9-providers/03-01-SUMMARY.md b/.planning/phases/03-tier-3-9-providers/03-01-SUMMARY.md new file mode 100644 index 0000000..a0cfcac --- /dev/null +++ b/.planning/phases/03-tier-3-9-providers/03-01-SUMMARY.md @@ -0,0 +1,103 @@ +--- +phase: 03-tier-3-9-providers +plan: 01 +subsystem: providers +tags: [providers, tier4, chinese, regional, keyword-only] +requires: [pkg/providers/registry, pkg/providers/schema] +provides: + - "16 Tier 4 Chinese/regional provider YAML definitions" + - "Keyword-only detection strategy for providers without documented key formats" +affects: [pkg/providers/definitions, providers] +tech-stack: + added: [] + patterns: + - "Dual-located provider YAMLs (providers/ + pkg/providers/definitions/)" + - "Keyword-only detection (omit patterns field) for low-signal providers" +key-files: + created: + - providers/deepseek.yaml + - providers/zhipu.yaml + - providers/moonshot.yaml + - providers/qwen.yaml + - providers/baidu.yaml + - providers/bytedance.yaml + - providers/01ai.yaml + - providers/minimax.yaml + - providers/baichuan.yaml + - providers/stepfun.yaml + - providers/sensetime.yaml + - providers/iflytek.yaml + - providers/tencent.yaml + - providers/siliconflow.yaml + - providers/360ai.yaml + - providers/kuaishou.yaml + - pkg/providers/definitions/deepseek.yaml + - pkg/providers/definitions/zhipu.yaml + - pkg/providers/definitions/moonshot.yaml + - pkg/providers/definitions/qwen.yaml + - pkg/providers/definitions/baidu.yaml + - pkg/providers/definitions/bytedance.yaml + - pkg/providers/definitions/01ai.yaml + - pkg/providers/definitions/minimax.yaml + - pkg/providers/definitions/baichuan.yaml + - pkg/providers/definitions/stepfun.yaml + - pkg/providers/definitions/sensetime.yaml + - pkg/providers/definitions/iflytek.yaml + - pkg/providers/definitions/tencent.yaml + - pkg/providers/definitions/siliconflow.yaml + - pkg/providers/definitions/360ai.yaml + - pkg/providers/definitions/kuaishou.yaml + modified: [] +decisions: + - "Providers without documented key formats (12 of 16) use keyword-only detection — no patterns field — to avoid the Phase 2 false-positive regression from overly-generic regex" + - "DeepSeek, Moonshot, Qwen, and SiliconFlow use documented sk- prefix patterns with medium/low confidence" + - "Empty verify blocks (url: \"\", valid_status: [], invalid_status: []) used for providers whose auth flows require SigV4-style request signing (Baidu Qianfan, Tencent TC3, iFlytek, ByteDance Volcengine, 360, Kuaishou, SenseTime) — live verification deferred to Phase 5" +metrics: + completed: "2026-04-05" + duration_minutes: 3 + tasks: 2 + files_created: 32 +--- + +# Phase 03 Plan 01: Tier 4 Chinese/Regional Providers Summary + +Added 16 Tier 4 Chinese/regional LLM provider YAML definitions (DeepSeek, Zhipu, Moonshot, Qwen, Baidu, ByteDance, 01.AI, MiniMax, Baichuan, StepFun, SenseTime, iFlytek, Tencent, SiliconFlow, 360 AI, Kuaishou) using keyword-anchored detection to avoid false positives, dual-located in `providers/` and `pkg/providers/definitions/` for Go embed. + +## What Was Built + +- 32 new YAML files (16 providers × 2 locations for the Go embed pattern) +- 4 providers with documented key prefix patterns: DeepSeek (`sk-[a-f0-9]{32}`), Moonshot (`sk-[A-Za-z0-9]{48}`), Qwen DashScope (`sk-[a-f0-9]{32}`), SiliconFlow (`sk-[a-z]{20,}`, low confidence) +- 12 providers using keyword-only detection (patterns field omitted): Zhipu, Baidu, ByteDance, 01.AI, MiniMax, Baichuan, StepFun, SenseTime, iFlytek, Tencent, 360 AI, Kuaishou +- All 16 providers carry strong keyword lists anchored on: SDK environment variable names, API hostnames, SDK package names, and model family identifiers + +## Key Decisions + +- **Keyword-only for low-signal providers**: Chinese/regional providers mostly lack publicly documented key formats. Rather than introduce generic `[A-Za-z0-9]{32,64}` patterns (which caused false positives in Phase 2), the patterns field is omitted entirely, relying on the Aho-Corasick keyword pre-filter alone. The registry already allows keyword-only providers. +- **Empty verify blocks for SigV4-style auth**: Providers using cloud-vendor-style request signing (Baidu Qianfan AK/SK, Tencent TC3-HMAC-SHA256, iFlytek HMAC, ByteDance Volcengine, 360, Kuaishou, SenseTime) cannot be verified with a simple Bearer token. Their verify blocks are stubbed (`url: ""`, empty status arrays) and will be populated in Phase 5 when the verification engine gains signed-request support. + +## Deviations from Plan + +None — plan executed exactly as written. + +## Verification + +- `for f in [all 16]; do diff providers/$f.yaml pkg/providers/definitions/$f.yaml; done` — zero diffs, all dual-located copies identical +- `go test ./pkg/providers/... -count=1` — PASS (registry loads all 16 new YAMLs without validation errors) +- `go test ./pkg/engine/... -count=1` — PASS (no detector regressions from the new keyword set) +- `grep -l 'tier: 4' providers/*.yaml | wc -l` — 16 +- Keyword-only acceptance criterion satisfied: 12 of 16 YAMLs have no `patterns:` field + +## Commits + +- `35dbbc7` feat(03-01): add 8 Tier 4 Chinese providers (DeepSeek, Zhipu, Moonshot, Qwen, Baidu, ByteDance, 01.AI, MiniMax) +- `a019ba9` feat(03-01): add 8 Tier 4 providers (Baichuan, StepFun, SenseTime, iFlytek, Tencent, SiliconFlow, 360AI, Kuaishou) + +## Known Stubs + +The 12 keyword-only providers and 7 of the 16 verify blocks (empty URL/status) are intentional and documented above. They are not stubs blocking this plan's goal (PROV-04: 16 Tier 4 providers exist, registry loads them, engine tests green) — they are the correct engineering choice given provider documentation gaps and auth-flow complexity. Live verification for signed-request providers is tracked for Phase 5. + +## Self-Check: PASSED + +- All 32 YAML files verified present on disk +- Commits 35dbbc7 and a019ba9 verified in `git log` +- go test ./pkg/providers/... and ./pkg/engine/... both green