From 0789b662c3fa1db399cf1e0d79d4cb1755ee170c Mon Sep 17 00:00:00 2001 From: salvacybersec Date: Sun, 5 Apr 2026 14:42:43 +0300 Subject: [PATCH] docs(03-04): complete Tier 7 code/dev tools providers plan --- .planning/REQUIREMENTS.md | 2 +- .planning/ROADMAP.md | 4 +- .planning/STATE.md | 13 +-- .../03-tier-3-9-providers/03-04-SUMMARY.md | 89 +++++++++++++++++++ 4 files changed, 99 insertions(+), 9 deletions(-) create mode 100644 .planning/phases/03-tier-3-9-providers/03-04-SUMMARY.md diff --git a/.planning/REQUIREMENTS.md b/.planning/REQUIREMENTS.md index ca934d0..33e6c82 100644 --- a/.planning/REQUIREMENTS.md +++ b/.planning/REQUIREMENTS.md @@ -25,7 +25,7 @@ Requirements for initial release. Each maps to roadmap phases. - [ ] **PROV-04**: 16 Tier 4 Chinese/Regional provider definitions (DeepSeek, Baichuan, Zhipu, Moonshot, Yi, Qwen, Baidu, ByteDance, SenseTime, iFlytek, MiniMax, Stepfun, 360 AI, Kuaishou, Tencent, SiliconFlow) - [ ] **PROV-05**: 11 Tier 5 Infrastructure/Gateway provider definitions (Cloudflare AI, Vercel AI, LiteLLM, Portkey, Helicone, OpenRouter, Martian, Kong, BricksAI, Aether, Not Diamond) - [ ] **PROV-06**: 15 Tier 6 Emerging/Niche provider definitions (Reka, Aleph Alpha, Writer, Jasper, Typeface, Comet, W&B, LangSmith, Pinecone, Weaviate, Qdrant, Chroma, Milvus, Neon, Lamini) -- [ ] **PROV-07**: 10 Tier 7 Code/Dev Tools provider definitions (GitHub Copilot, Cursor, Tabnine, Codeium, Sourcegraph, CodeWhisperer, Replit AI, Codestral, watsonx, Oracle AI) +- [x] **PROV-07**: 10 Tier 7 Code/Dev Tools provider definitions (GitHub Copilot, Cursor, Tabnine, Codeium, Sourcegraph, CodeWhisperer, Replit AI, Codestral, watsonx, Oracle AI) - [ ] **PROV-08**: 10 Tier 8 Self-Hosted provider definitions (Ollama, vLLM, LocalAI, LM Studio, llama.cpp, GPT4All, text-gen-webui, TensorRT-LLM, Triton, Jan AI) - [ ] **PROV-09**: 8 Tier 9 Enterprise provider definitions (Salesforce Einstein, ServiceNow, SAP AI Core, Palantir, Databricks, Snowflake, Oracle GenAI, HPE GreenLake) - [x] **PROV-10**: Provider YAML schema includes format_version and last_verified date for pattern health tracking diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md index 7398f0b..d38aa64 100644 --- a/.planning/ROADMAP.md +++ b/.planning/ROADMAP.md @@ -84,8 +84,8 @@ Plans: Plans: - [ ] 03-01-PLAN.md — Tier 4 Chinese/regional providers (DeepSeek, Zhipu, Moonshot, Qwen, Baidu, ByteDance, 01.AI, MiniMax, Baichuan, StepFun, SenseTime, iFlytek, Tencent, SiliconFlow, 360 AI, Kuaishou) - [ ] 03-02-PLAN.md — Tier 3 Specialized (Perplexity, You.com, Voyage, Jina, Unstructured, AssemblyAI, Deepgram, ElevenLabs, Stability, Runway, Midjourney) -- [ ] 03-03-PLAN.md — Tier 5 Infrastructure/Gateway (OpenRouter, LiteLLM, Cloudflare AI, Vercel AI, Portkey, Helicone, Martian, Kong, BricksAI, Aether, Not Diamond) -- [ ] 03-04-PLAN.md — Tier 7 Code/Dev Tools (GitHub Copilot, Cursor, Tabnine, Codeium, Sourcegraph, CodeWhisperer, Replit AI, Codestral, watsonx, Oracle AI) +- [x] 03-03-PLAN.md — Tier 5 Infrastructure/Gateway (OpenRouter, LiteLLM, Cloudflare AI, Vercel AI, Portkey, Helicone, Martian, Kong, BricksAI, Aether, Not Diamond) +- [x] 03-04-PLAN.md — Tier 7 Code/Dev Tools (GitHub Copilot, Cursor, Tabnine, Codeium, Sourcegraph, CodeWhisperer, Replit AI, Codestral, watsonx, Oracle AI) - [ ] 03-05-PLAN.md — Tier 8 Self-Hosted runtimes (Ollama, vLLM, LocalAI, LM Studio, llama.cpp, GPT4All, text-gen-webui, TensorRT-LLM, Triton, Jan) - [ ] 03-06-PLAN.md — Tier 9 Enterprise (Salesforce Einstein, ServiceNow, SAP AI Core, Palantir, Databricks, Snowflake, Oracle GenAI, HPE GreenLake) - [ ] 03-07-PLAN.md — Tier 6 Emerging/Niche (Reka, Aleph Alpha, Lamini, Writer, Jasper, Typeface, Comet, W&B, LangSmith, Pinecone, Weaviate, Qdrant, Chroma, Milvus, Neon) diff --git a/.planning/STATE.md b/.planning/STATE.md index dec7aa8..282dd83 100644 --- a/.planning/STATE.md +++ b/.planning/STATE.md @@ -4,13 +4,13 @@ milestone: v1.0 milestone_name: milestone status: executing stopped_at: Completed 02-tier-1-2-providers 02-05-PLAN.md -last_updated: "2026-04-05T11:23:32.224Z" +last_updated: "2026-04-05T11:42:39.202Z" last_activity: 2026-04-05 progress: total_phases: 18 completed_phases: 2 - total_plans: 10 - completed_plans: 10 + total_plans: 18 + completed_plans: 12 percent: 20 --- @@ -21,12 +21,12 @@ progress: See: .planning/PROJECT.md (updated 2026-04-04) **Core value:** Detect leaked LLM API keys across more providers and more internet sources than any other tool, with active verification to confirm keys are real and alive. -**Current focus:** Phase 02 — tier-1-2-providers +**Current focus:** Phase 03 — tier-3-9-providers ## Current Position -Phase: 3 -Plan: Not started +Phase: 03 (tier-3-9-providers) — EXECUTING +Plan: 2 of 8 Status: Ready to execute Last activity: 2026-04-05 @@ -60,6 +60,7 @@ Progress: [██░░░░░░░░] 20% | Phase 02-tier-1-2-providers P01 | 3min | 2 tasks | 12 files | | Phase 02-tier-1-2-providers P04 | 1min | 2 tasks tasks | 14 files files | | Phase 02-tier-1-2-providers P05 | 2min | 1 tasks | 1 files | +| Phase 03-tier-3-9-providers P04 | 3m | 2 tasks | 20 files | ## Accumulated Context diff --git a/.planning/phases/03-tier-3-9-providers/03-04-SUMMARY.md b/.planning/phases/03-tier-3-9-providers/03-04-SUMMARY.md new file mode 100644 index 0000000..8574ef5 --- /dev/null +++ b/.planning/phases/03-tier-3-9-providers/03-04-SUMMARY.md @@ -0,0 +1,89 @@ +--- +phase: 03-tier-3-9-providers +plan: 04 +subsystem: providers +tags: [providers, tier-7, code-assistants, dev-tools] +requires: [pkg/providers/schema.go, pkg/providers/registry.go] +provides: + - "10 Tier 7 code/dev tools provider definitions" + - "GitHub Copilot ghu_/gho_ token detection" + - "Sourcegraph Cody sgp_ high-confidence pattern" + - "PROV-07 requirement satisfaction" +affects: [providers/, pkg/providers/definitions/] +tech-stack: + added: [] + patterns: [dual-location YAML, keyword-only detection, documented-prefix regex] +key-files: + created: + - providers/github-copilot.yaml + - providers/cursor.yaml + - providers/tabnine.yaml + - providers/codeium.yaml + - providers/sourcegraph.yaml + - providers/codewhisperer.yaml + - providers/replit-ai.yaml + - providers/codestral.yaml + - providers/watsonx.yaml + - providers/oracle-ai.yaml + - pkg/providers/definitions/github-copilot.yaml + - pkg/providers/definitions/cursor.yaml + - pkg/providers/definitions/tabnine.yaml + - pkg/providers/definitions/codeium.yaml + - pkg/providers/definitions/sourcegraph.yaml + - pkg/providers/definitions/codewhisperer.yaml + - pkg/providers/definitions/replit-ai.yaml + - pkg/providers/definitions/codestral.yaml + - pkg/providers/definitions/watsonx.yaml + - pkg/providers/definitions/oracle-ai.yaml + modified: [] +decisions: + - "Keyword-only detection for undocumented key formats (Cursor, Tabnine, Codeium, CodeWhisperer, Replit AI, Oracle AI) to avoid Phase-2 false-positive regression" + - "Codestral uses low-confidence generic 32-char pattern with entropy_min=4.5 per Phase-2 lesson" + - "GitHub Copilot reuses documented ghu_/gho_ GitHub token formats, disambiguated by 'copilot' keyword" +metrics: + duration: "~3 minutes" + tasks: 2 + files: 20 + completed: "2026-04-05" +--- + +# Phase 3 Plan 4: Tier 7 Code/Dev Tools Providers Summary + +Added 10 Tier 7 code assistant/dev tool provider YAMLs (GitHub Copilot, Cursor, Tabnine, Codeium, Sourcegraph Cody, Amazon CodeWhisperer, Replit AI, Codestral, IBM watsonx, Oracle Generative AI) dual-located under `providers/` and `pkg/providers/definitions/`. + +## Tasks Completed + +| # | Name | Commit | +|---|------|--------| +| 1 | GitHub Copilot, Cursor, Tabnine, Codeium, Sourcegraph YAMLs | 9f10357 | +| 2 | CodeWhisperer, Replit AI, Codestral, watsonx, Oracle AI YAMLs | fbbb54b | + +## Key Decisions + +- **Lessons-from-Phase-2 applied:** Providers without distinctive prefixes (Cursor, Tabnine, Codeium, CodeWhisperer, Replit AI, Oracle AI) use keyword-only detection — no regex. This avoids Phase-2 false-positive regression on generic alphanumeric patterns. +- **Sourcegraph sgp_ pattern:** High-confidence regex for the documented `sgp__` format plus a medium-confidence fallback for the shorter `sgp_` variant. +- **Codestral low-confidence regex:** The Mistral Codestral key is a 32-char alphanumeric without a distinctive prefix; entropy_min=4.5 + confidence=low + `codestral` keyword anchoring contain false positives. +- **GitHub Copilot reuses `ghu_`/`gho_` GitHub tokens:** These are real GitHub OAuth/user tokens; the `copilot` keyword disambiguates from generic GitHub tokens. +- **watsonx verification endpoint:** POST to `https://iam.cloud.ibm.com/identity/token` (IBM IAM exchange) — the actual watsonx inference endpoint needs project_id + bearer, not a plain API key. + +## Verification Results + +- `go test ./pkg/providers/... -count=1` — PASS +- `go test ./pkg/engine/... -count=1` — PASS +- `grep -l 'tier: 7' providers/*.yaml | wc -l` — 10 +- All 10 dual-location pairs `diff`-identical + +## Deviations from Plan + +None — plan executed exactly as written. + +## Requirements Satisfied + +- PROV-07: 10 Tier 7 Code/Dev Tools providers + +## Self-Check: PASSED + +- All 20 YAML files exist at expected paths +- Commits 9f10357 and fbbb54b present in git log +- Tier 7 count = 10 +- Provider and engine test suites green