docs(03-04): complete Tier 7 code/dev tools providers plan

This commit is contained in:
salvacybersec
2026-04-05 14:42:43 +03:00
parent d50f83ac2d
commit 0789b662c3
4 changed files with 99 additions and 9 deletions

View File

@@ -25,7 +25,7 @@ Requirements for initial release. Each maps to roadmap phases.
- [ ] **PROV-04**: 16 Tier 4 Chinese/Regional provider definitions (DeepSeek, Baichuan, Zhipu, Moonshot, Yi, Qwen, Baidu, ByteDance, SenseTime, iFlytek, MiniMax, Stepfun, 360 AI, Kuaishou, Tencent, SiliconFlow)
- [ ] **PROV-05**: 11 Tier 5 Infrastructure/Gateway provider definitions (Cloudflare AI, Vercel AI, LiteLLM, Portkey, Helicone, OpenRouter, Martian, Kong, BricksAI, Aether, Not Diamond)
- [ ] **PROV-06**: 15 Tier 6 Emerging/Niche provider definitions (Reka, Aleph Alpha, Writer, Jasper, Typeface, Comet, W&B, LangSmith, Pinecone, Weaviate, Qdrant, Chroma, Milvus, Neon, Lamini)
- [ ] **PROV-07**: 10 Tier 7 Code/Dev Tools provider definitions (GitHub Copilot, Cursor, Tabnine, Codeium, Sourcegraph, CodeWhisperer, Replit AI, Codestral, watsonx, Oracle AI)
- [x] **PROV-07**: 10 Tier 7 Code/Dev Tools provider definitions (GitHub Copilot, Cursor, Tabnine, Codeium, Sourcegraph, CodeWhisperer, Replit AI, Codestral, watsonx, Oracle AI)
- [ ] **PROV-08**: 10 Tier 8 Self-Hosted provider definitions (Ollama, vLLM, LocalAI, LM Studio, llama.cpp, GPT4All, text-gen-webui, TensorRT-LLM, Triton, Jan AI)
- [ ] **PROV-09**: 8 Tier 9 Enterprise provider definitions (Salesforce Einstein, ServiceNow, SAP AI Core, Palantir, Databricks, Snowflake, Oracle GenAI, HPE GreenLake)
- [x] **PROV-10**: Provider YAML schema includes format_version and last_verified date for pattern health tracking

View File

@@ -84,8 +84,8 @@ Plans:
Plans:
- [ ] 03-01-PLAN.md — Tier 4 Chinese/regional providers (DeepSeek, Zhipu, Moonshot, Qwen, Baidu, ByteDance, 01.AI, MiniMax, Baichuan, StepFun, SenseTime, iFlytek, Tencent, SiliconFlow, 360 AI, Kuaishou)
- [ ] 03-02-PLAN.md — Tier 3 Specialized (Perplexity, You.com, Voyage, Jina, Unstructured, AssemblyAI, Deepgram, ElevenLabs, Stability, Runway, Midjourney)
- [ ] 03-03-PLAN.md — Tier 5 Infrastructure/Gateway (OpenRouter, LiteLLM, Cloudflare AI, Vercel AI, Portkey, Helicone, Martian, Kong, BricksAI, Aether, Not Diamond)
- [ ] 03-04-PLAN.md — Tier 7 Code/Dev Tools (GitHub Copilot, Cursor, Tabnine, Codeium, Sourcegraph, CodeWhisperer, Replit AI, Codestral, watsonx, Oracle AI)
- [x] 03-03-PLAN.md — Tier 5 Infrastructure/Gateway (OpenRouter, LiteLLM, Cloudflare AI, Vercel AI, Portkey, Helicone, Martian, Kong, BricksAI, Aether, Not Diamond)
- [x] 03-04-PLAN.md — Tier 7 Code/Dev Tools (GitHub Copilot, Cursor, Tabnine, Codeium, Sourcegraph, CodeWhisperer, Replit AI, Codestral, watsonx, Oracle AI)
- [ ] 03-05-PLAN.md — Tier 8 Self-Hosted runtimes (Ollama, vLLM, LocalAI, LM Studio, llama.cpp, GPT4All, text-gen-webui, TensorRT-LLM, Triton, Jan)
- [ ] 03-06-PLAN.md — Tier 9 Enterprise (Salesforce Einstein, ServiceNow, SAP AI Core, Palantir, Databricks, Snowflake, Oracle GenAI, HPE GreenLake)
- [ ] 03-07-PLAN.md — Tier 6 Emerging/Niche (Reka, Aleph Alpha, Lamini, Writer, Jasper, Typeface, Comet, W&B, LangSmith, Pinecone, Weaviate, Qdrant, Chroma, Milvus, Neon)

View File

@@ -4,13 +4,13 @@ milestone: v1.0
milestone_name: milestone
status: executing
stopped_at: Completed 02-tier-1-2-providers 02-05-PLAN.md
last_updated: "2026-04-05T11:23:32.224Z"
last_updated: "2026-04-05T11:42:39.202Z"
last_activity: 2026-04-05
progress:
total_phases: 18
completed_phases: 2
total_plans: 10
completed_plans: 10
total_plans: 18
completed_plans: 12
percent: 20
---
@@ -21,12 +21,12 @@ progress:
See: .planning/PROJECT.md (updated 2026-04-04)
**Core value:** Detect leaked LLM API keys across more providers and more internet sources than any other tool, with active verification to confirm keys are real and alive.
**Current focus:** Phase 02 — tier-1-2-providers
**Current focus:** Phase 03 — tier-3-9-providers
## Current Position
Phase: 3
Plan: Not started
Phase: 03 (tier-3-9-providers) — EXECUTING
Plan: 2 of 8
Status: Ready to execute
Last activity: 2026-04-05
@@ -60,6 +60,7 @@ Progress: [██░░░░░░░░] 20%
| Phase 02-tier-1-2-providers P01 | 3min | 2 tasks | 12 files |
| Phase 02-tier-1-2-providers P04 | 1min | 2 tasks tasks | 14 files files |
| Phase 02-tier-1-2-providers P05 | 2min | 1 tasks | 1 files |
| Phase 03-tier-3-9-providers P04 | 3m | 2 tasks | 20 files |
## Accumulated Context

View File

@@ -0,0 +1,89 @@
---
phase: 03-tier-3-9-providers
plan: 04
subsystem: providers
tags: [providers, tier-7, code-assistants, dev-tools]
requires: [pkg/providers/schema.go, pkg/providers/registry.go]
provides:
- "10 Tier 7 code/dev tools provider definitions"
- "GitHub Copilot ghu_/gho_ token detection"
- "Sourcegraph Cody sgp_ high-confidence pattern"
- "PROV-07 requirement satisfaction"
affects: [providers/, pkg/providers/definitions/]
tech-stack:
added: []
patterns: [dual-location YAML, keyword-only detection, documented-prefix regex]
key-files:
created:
- providers/github-copilot.yaml
- providers/cursor.yaml
- providers/tabnine.yaml
- providers/codeium.yaml
- providers/sourcegraph.yaml
- providers/codewhisperer.yaml
- providers/replit-ai.yaml
- providers/codestral.yaml
- providers/watsonx.yaml
- providers/oracle-ai.yaml
- pkg/providers/definitions/github-copilot.yaml
- pkg/providers/definitions/cursor.yaml
- pkg/providers/definitions/tabnine.yaml
- pkg/providers/definitions/codeium.yaml
- pkg/providers/definitions/sourcegraph.yaml
- pkg/providers/definitions/codewhisperer.yaml
- pkg/providers/definitions/replit-ai.yaml
- pkg/providers/definitions/codestral.yaml
- pkg/providers/definitions/watsonx.yaml
- pkg/providers/definitions/oracle-ai.yaml
modified: []
decisions:
- "Keyword-only detection for undocumented key formats (Cursor, Tabnine, Codeium, CodeWhisperer, Replit AI, Oracle AI) to avoid Phase-2 false-positive regression"
- "Codestral uses low-confidence generic 32-char pattern with entropy_min=4.5 per Phase-2 lesson"
- "GitHub Copilot reuses documented ghu_/gho_ GitHub token formats, disambiguated by 'copilot' keyword"
metrics:
duration: "~3 minutes"
tasks: 2
files: 20
completed: "2026-04-05"
---
# Phase 3 Plan 4: Tier 7 Code/Dev Tools Providers Summary
Added 10 Tier 7 code assistant/dev tool provider YAMLs (GitHub Copilot, Cursor, Tabnine, Codeium, Sourcegraph Cody, Amazon CodeWhisperer, Replit AI, Codestral, IBM watsonx, Oracle Generative AI) dual-located under `providers/` and `pkg/providers/definitions/`.
## Tasks Completed
| # | Name | Commit |
|---|------|--------|
| 1 | GitHub Copilot, Cursor, Tabnine, Codeium, Sourcegraph YAMLs | 9f10357 |
| 2 | CodeWhisperer, Replit AI, Codestral, watsonx, Oracle AI YAMLs | fbbb54b |
## Key Decisions
- **Lessons-from-Phase-2 applied:** Providers without distinctive prefixes (Cursor, Tabnine, Codeium, CodeWhisperer, Replit AI, Oracle AI) use keyword-only detection — no regex. This avoids Phase-2 false-positive regression on generic alphanumeric patterns.
- **Sourcegraph sgp_ pattern:** High-confidence regex for the documented `sgp_<hex>_<hex40>` format plus a medium-confidence fallback for the shorter `sgp_<hex40>` variant.
- **Codestral low-confidence regex:** The Mistral Codestral key is a 32-char alphanumeric without a distinctive prefix; entropy_min=4.5 + confidence=low + `codestral` keyword anchoring contain false positives.
- **GitHub Copilot reuses `ghu_`/`gho_` GitHub tokens:** These are real GitHub OAuth/user tokens; the `copilot` keyword disambiguates from generic GitHub tokens.
- **watsonx verification endpoint:** POST to `https://iam.cloud.ibm.com/identity/token` (IBM IAM exchange) — the actual watsonx inference endpoint needs project_id + bearer, not a plain API key.
## Verification Results
- `go test ./pkg/providers/... -count=1` — PASS
- `go test ./pkg/engine/... -count=1` — PASS
- `grep -l 'tier: 7' providers/*.yaml | wc -l` — 10
- All 10 dual-location pairs `diff`-identical
## Deviations from Plan
None — plan executed exactly as written.
## Requirements Satisfied
- PROV-07: 10 Tier 7 Code/Dev Tools providers
## Self-Check: PASSED
- All 20 YAML files exist at expected paths
- Commits 9f10357 and fbbb54b present in git log
- Tier 7 count = 10
- Provider and engine test suites green