From a8c0a6db62ca5076d3a7da81e628eebde7288364 Mon Sep 17 00:00:00 2001 From: salvacybersec Date: Sun, 5 Apr 2026 14:12:49 +0300 Subject: [PATCH] docs(02-02): complete tier 1 medium/low-confidence providers plan --- .planning/REQUIREMENTS.md | 4 +- .planning/ROADMAP.md | 8 +- .planning/STATE.md | 28 +++--- .../02-tier-1-2-providers/02-02-SUMMARY.md | 97 +++++++++++++++++++ 4 files changed, 118 insertions(+), 19 deletions(-) create mode 100644 .planning/phases/02-tier-1-2-providers/02-02-SUMMARY.md diff --git a/.planning/REQUIREMENTS.md b/.planning/REQUIREMENTS.md index 11bce04..ca934d0 100644 --- a/.planning/REQUIREMENTS.md +++ b/.planning/REQUIREMENTS.md @@ -19,8 +19,8 @@ Requirements for initial release. Each maps to roadmap phases. ### Providers -- [ ] **PROV-01**: 12 Tier 1 Frontier provider YAML definitions (OpenAI, Anthropic, Google AI, Vertex, AWS Bedrock, Azure OpenAI, Meta AI, xAI, Cohere, Mistral, Inflection, AI21) -- [ ] **PROV-02**: 14 Tier 2 Inference Platform provider definitions (Together, Fireworks, Groq, Replicate, Anyscale, DeepInfra, Lepton, Modal, Baseten, Cerebrium, NovitaAI, Sambanova, OctoAI, Friendli) +- [x] **PROV-01**: 12 Tier 1 Frontier provider YAML definitions (OpenAI, Anthropic, Google AI, Vertex, AWS Bedrock, Azure OpenAI, Meta AI, xAI, Cohere, Mistral, Inflection, AI21) +- [x] **PROV-02**: 14 Tier 2 Inference Platform provider definitions (Together, Fireworks, Groq, Replicate, Anyscale, DeepInfra, Lepton, Modal, Baseten, Cerebrium, NovitaAI, Sambanova, OctoAI, Friendli) - [ ] **PROV-03**: 12 Tier 3 Specialized provider definitions (Perplexity, You.com, Voyage, Jina, Unstructured, AssemblyAI, Deepgram, ElevenLabs, Stability, Runway, Midjourney, HuggingFace) - [ ] **PROV-04**: 16 Tier 4 Chinese/Regional provider definitions (DeepSeek, Baichuan, Zhipu, Moonshot, Yi, Qwen, Baidu, ByteDance, SenseTime, iFlytek, MiniMax, Stepfun, 360 AI, Kuaishou, Tencent, SiliconFlow) - [ ] **PROV-05**: 11 Tier 5 Infrastructure/Gateway provider definitions (Cloudflare AI, Vercel AI, LiteLLM, Portkey, Helicone, OpenRouter, Martian, Kong, BricksAI, Aether, Not Diamond) diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md index 1bd20d9..b937f7c 100644 --- a/.planning/ROADMAP.md +++ b/.planning/ROADMAP.md @@ -64,10 +64,10 @@ Plans: **Plans**: 5 plans Plans: -- [ ] 02-01-PLAN.md — Tier 1 high-confidence prefixed providers (OpenAI upgrade, Anthropic upgrade, Google AI, Vertex AI, AWS Bedrock, xAI) -- [ ] 02-02-PLAN.md — Tier 1 keyword-anchored providers (Azure OpenAI, Meta AI, Cohere, Mistral, Inflection, AI21) -- [ ] 02-03-PLAN.md — Tier 2 inference platforms batch 1 (Groq, Replicate, Anyscale, Together, Fireworks, Baseten, DeepInfra) -- [ ] 02-04-PLAN.md — Tier 2 inference platforms batch 2 (Lepton, Modal, Cerebrium, Novita, SambaNova, OctoAI, Friendli) +- [x] 02-01-PLAN.md — Tier 1 high-confidence prefixed providers (OpenAI upgrade, Anthropic upgrade, Google AI, Vertex AI, AWS Bedrock, xAI) +- [x] 02-02-PLAN.md — Tier 1 keyword-anchored providers (Azure OpenAI, Meta AI, Cohere, Mistral, Inflection, AI21) +- [x] 02-03-PLAN.md — Tier 2 inference platforms batch 1 (Groq, Replicate, Anyscale, Together, Fireworks, Baseten, DeepInfra) +- [x] 02-04-PLAN.md — Tier 2 inference platforms batch 2 (Lepton, Modal, Cerebrium, Novita, SambaNova, OctoAI, Friendli) - [ ] 02-05-PLAN.md — Registry guardrail test: assert 12 Tier 1 + 14 Tier 2 + regex compilation ### Phase 3: Tier 3-9 Providers diff --git a/.planning/STATE.md b/.planning/STATE.md index a89ddc5..fa72c59 100644 --- a/.planning/STATE.md +++ b/.planning/STATE.md @@ -2,15 +2,15 @@ gsd_state_version: 1.0 milestone: v1.0 milestone_name: milestone -status: planning -stopped_at: Completed 01-foundation 01-05-PLAN.md -last_updated: "2026-04-05T09:32:56.054Z" -last_activity: 2026-04-05 +status: executing +stopped_at: Completed 02-tier-1-2-providers 02-03-PLAN.md +last_updated: "2026-04-05T11:12:42.470Z" +last_activity: 2026-04-05 -- Phase 02 execution started progress: total_phases: 18 completed_phases: 1 - total_plans: 5 - completed_plans: 5 + total_plans: 10 + completed_plans: 9 percent: 20 --- @@ -21,14 +21,14 @@ progress: See: .planning/PROJECT.md (updated 2026-04-04) **Core value:** Detect leaked LLM API keys across more providers and more internet sources than any other tool, with active verification to confirm keys are real and alive. -**Current focus:** Phase 1 — Foundation +**Current focus:** Phase 02 — tier-1-2-providers ## Current Position -Phase: 2 of 18 (tier 1 2 providers) -Plan: Not started -Status: Ready to plan -Last activity: 2026-04-05 +Phase: 02 (tier-1-2-providers) — EXECUTING +Plan: 1 of 5 +Status: Executing Phase 02 +Last activity: 2026-04-05 -- Phase 02 execution started Progress: [██░░░░░░░░] 20% @@ -55,6 +55,8 @@ Progress: [██░░░░░░░░] 20% | Phase 01-foundation P02 | 9 | 2 tasks | 11 files | | Phase 01-foundation P04 | 5min | 2 tasks | 12 files | | Phase 01-foundation P05 | 4min | 2 tasks | 8 files | +| Phase 02-tier-1-2-providers P02 | 1m | 2 tasks | 12 files | +| Phase 02-tier-1-2-providers P03 | 3min | 2 tasks | 14 files | ## Accumulated Context @@ -86,6 +88,6 @@ None yet. ## Session Continuity -Last session: 2026-04-05T09:28:33.649Z -Stopped at: Completed 01-foundation 01-05-PLAN.md +Last session: 2026-04-05T11:12:42.467Z +Stopped at: Completed 02-tier-1-2-providers 02-03-PLAN.md Resume file: None diff --git a/.planning/phases/02-tier-1-2-providers/02-02-SUMMARY.md b/.planning/phases/02-tier-1-2-providers/02-02-SUMMARY.md new file mode 100644 index 0000000..2b42178 --- /dev/null +++ b/.planning/phases/02-tier-1-2-providers/02-02-SUMMARY.md @@ -0,0 +1,97 @@ +--- +phase: 02-tier-1-2-providers +plan: 02 +subsystem: providers +tags: [providers, yaml, tier-1, keyword-anchoring] +requires: [pkg/providers/schema.go, pkg/providers/loader.go, pkg/providers/registry.go] +provides: + - Azure OpenAI 32-hex pattern provider definition + - Meta AI (Llama API) LLM|-prefixed token provider definition + - Cohere 40-char token provider definition + - Mistral AI 32-char keyword-anchored provider definition + - Inflection AI (Pi) opaque-token provider definition + - AI21 Labs 32+-char keyword-anchored provider definition +affects: [providers/, pkg/providers/definitions/] +tech_stack: + added: [] + patterns: [yaml-provider-definitions, dual-location-embed, keyword-ac-prefilter] +key_files: + created: + - providers/azure-openai.yaml + - providers/meta-ai.yaml + - providers/cohere.yaml + - providers/mistral.yaml + - providers/inflection.yaml + - providers/ai21.yaml + - pkg/providers/definitions/azure-openai.yaml + - pkg/providers/definitions/meta-ai.yaml + - pkg/providers/definitions/cohere.yaml + - pkg/providers/definitions/mistral.yaml + - pkg/providers/definitions/inflection.yaml + - pkg/providers/definitions/ai21.yaml + modified: [] +decisions: + - Low-confidence regex (generic character classes) compensated by strong keyword lists for Aho-Corasick pre-filtering + - Azure OpenAI and Inflection use empty verify blocks (no stable public endpoint / requires deployment URL) + - Meta AI pattern anchors on LLM| literal prefix to reduce false positives +metrics: + duration: 1m + tasks: 2 + files: 12 + completed: "2026-04-05T11:11:55Z" +requirements: [PROV-01] +--- + +# Phase 2 Plan 2: Tier 1 Medium/Low-Confidence Providers Summary + +Added 6 Tier 1 LLM provider YAML definitions (Azure OpenAI, Meta AI, Cohere, Mistral, Inflection, AI21) with strong keyword lists anchoring generic opaque-token regex patterns, completing Tier 1 at 12/12. + +## What Was Built + +Created 12 YAML files (6 providers x dual location) covering the remaining Tier 1 providers that lack distinctive high-confidence prefix patterns. These providers rely on Aho-Corasick keyword pre-filtering (azure, cohere, mistral, jamba, PI_API_KEY, etc.) to reduce false positives since their key formats are opaque hex/alphanumeric tokens. + +- **Azure OpenAI**: 32-hex pattern, no verify endpoint (requires per-deployment URL) +- **Meta AI**: `LLM|` literal prefix anchor, api.llama.com verify endpoint +- **Cohere**: 40-char alphanumeric, api.cohere.ai verify endpoint +- **Mistral AI**: 32-char alphanumeric, api.mistral.ai verify endpoint +- **Inflection AI**: 40+-char token, no public verify endpoint +- **AI21 Labs**: 32+-char alphanumeric, api.ai21.com/studio verify endpoint + +All providers dual-located for Go embed.FS (pkg/providers/definitions/) and user visibility (providers/). Registry loader test passes with all 9 providers (3 existing + 6 new). + +## Tasks Completed + +| Task | Name | Commit | Files | +|------|-----------------------------------------|---------|-------| +| 1 | Azure OpenAI, Meta AI, Cohere YAMLs | bca8422 | 6 | +| 2 | Mistral, Inflection, AI21 YAMLs | adad602 | 6 | + +## Verification + +- `go test ./pkg/providers/... -count=1` -- PASS (0.037s) +- `diff` on all 6 provider pairs -- no drift between providers/ and pkg/providers/definitions/ +- `grep -l 'tier: 1' providers/*.yaml | wc -l` -- 12 (matches Tier 1 complete criterion) + +## Deviations from Plan + +None - plan executed exactly as written. + +## Decisions Made + +- **Empty verify blocks for Azure OpenAI and Inflection**: Azure OpenAI requires per-deployment URL (`https://{resource}.openai.azure.com/...`) which cannot be hardcoded; Inflection lacks a stable public verify endpoint. Both use empty `url: ""` and empty status arrays to signal "verification unsupported" for Phase 5. +- **Low confidence on all 6**: Generic character-class regex patterns would cause high false-positive rates without keyword anchoring, so confidence is marked `low` and the AC automaton carries the detection weight. + +## Key Links + +- Provider `keywords[]` feeds `NewRegistry()` which builds the Aho-Corasick DFA across all providers +- Aho-Corasick pre-filter gates per-pattern regex evaluation (established in Phase 01) + +## Self-Check: PASSED + +All 12 files verified present: +- providers/{azure-openai,meta-ai,cohere,mistral,inflection,ai21}.yaml -- FOUND +- pkg/providers/definitions/{azure-openai,meta-ai,cohere,mistral,inflection,ai21}.yaml -- FOUND + +Commits verified: +- bca8422 -- FOUND +- adad602 -- FOUND