merge: phase 14-03 frontend leaks

2026-04-06 13:21:39 +03:00
parent aeebf37174 95ee768266
commit 095b90ec07
38 changed files with 2644 additions and 29 deletions
--- a/RESEARCH_REPORT.md
+++ b/RESEARCH_REPORT.md
@@ -0,0 +1,548 @@
+# API Key Scanner Market Research Report
+**Date: April 4, 2026**
+
+---
+
+## Table of Contents
+1. [Existing Open-Source API Key Scanners](#1-existing-open-source-api-key-scanners)
+2. [LLM-Specific API Key Tools](#2-llm-specific-api-key-tools)
+3. [Top LLM API Providers (100+)](#3-top-llm-api-providers)
+4. [API Key Patterns by Provider](#4-api-key-patterns-by-provider)
+5. [Key Validation Approaches](#5-key-validation-approaches)
+6. [Market Gaps & Opportunities](#6-market-gaps--opportunities)
+
+---
+
+## 1. Existing Open-Source API Key Scanners
+
+### 1.1 TruffleHog
+- **GitHub:** https://github.com/trufflesecurity/trufflehog
+- **Stars:** ~25,500
+- **Language:** Go
+- **Detectors:** 800+ secret types
+- **Approach:** Detector-based (each detector is a small Go program for a specific credential type)
+- **Detection methods:**
+  - Pattern matching via dedicated detectors
+  - Active verification against live APIs
+  - Permission/scope analysis (~20 credential types)
+- **AI/LLM detectors confirmed:** OpenAI, OpenAI Admin Key, Anthropic
+- **Scanning sources:** Git repos, GitHub orgs, S3 buckets, GCS, Docker images, Jenkins, Elasticsearch, Postman, Slack, local filesystems
+- **Key differentiator:** Verification — not just "this looks like a key" but "this is an active key with these permissions"
+- **Limitations:**
+  - Heavy/slow compared to regex-only scanners
+  - Not all 800+ detectors have verification
+  - LLM provider coverage still incomplete (no confirmed Cohere, Mistral, Groq detectors)
+
+### 1.2 Gitleaks
+- **GitHub:** https://github.com/gitleaks/gitleaks
+- **Stars:** ~25,800
+- **Language:** Go
+- **Rules:** 150+ regex patterns in `gitleaks.toml`
+- **Approach:** Regex pattern matching with optional entropy checks
+- **Detection methods:**
+  - Regex patterns defined in TOML config
+  - Keyword matching
+  - Entropy thresholds
+  - Allowlists for false positive reduction
+- **AI/LLM rules confirmed:**
+  - `anthropic-admin-api-key`: `sk-ant-admin01-[a-zA-Z0-9_\-]{93}AA`
+  - `anthropic-api-key`: `sk-ant-api03-[a-zA-Z0-9_\-]{93}AA`
+  - `openai-api-key`: Updated to include `sk-proj-` and `sk-svcacct-` formats
+  - `cohere-api-token`: Keyword-based detection
+  - `huggingface-access-token`: `hf_[a-z]{34}`
+  - `huggingface-organization-api-token`: `api_org_[a-z]{34}`
+- **Key differentiator:** Fast, simple, excellent as pre-commit hook
+- **Limitations:**
+  - No active verification of detected keys
+  - Regex-only means higher false positive rate for generic patterns
+  - Limited LLM provider coverage beyond the 5 above
+- **Note:** Gitleaks creator launched "Betterleaks" in 2026 as a successor built for the agentic era
+
+### 1.3 detect-secrets (Yelp)
+- **GitHub:** https://github.com/Yelp/detect-secrets
+- **Stars:** ~4,300
+- **Language:** Python
+- **Plugins:** 27 built-in detectors
+- **Approach:** Baseline methodology — tracks known secrets and flags new ones
+- **Detection methods:**
+  - Regex-based plugins (structured secrets)
+  - High entropy string detection (Base64, Hex)
+  - Keyword detection (variable name matching)
+  - Optional ML-based gibberish detector (v1.1+)
+- **AI/LLM plugins confirmed:**
+  - `OpenAIDetector` plugin exists
+  - No dedicated Anthropic, Cohere, Mistral, or Groq plugins
+- **Key differentiator:** Baseline approach — only flags NEW secrets, not historical ones; enterprise-friendly
+- **Limitations:**
+  - Minimal LLM provider coverage
+  - No active verification
+  - Fewer patterns than TruffleHog or Gitleaks
+  - Python-only (slower than Go/Rust alternatives)
+
+### 1.4 Nosey Parker (Praetorian)
+- **GitHub:** https://github.com/praetorian-inc/noseyparker
+- **Stars:** ~2,300
+- **Language:** Rust
+- **Rules:** 188 high-precision regex rules
+- **Approach:** Hybrid regex + ML denoising
+- **Detection methods:**
+  - 188 tested regex rules tuned for low false positives
+  - ML model for false positive reduction (10-1000x improvement)
+  - Deduplication/grouping of findings
+- **Performance:** GB/s scanning speeds, tested on 20TB+ datasets
+- **Key differentiator:** ML-enhanced denoising, extreme performance
+- **Status:** RETIRED — replaced by Titus (https://github.com/praetorian-inc/titus)
+- **Limitations:**
+  - No specific LLM provider rules documented
+  - No active verification
+  - Project discontinued
+
+### 1.5 GitGuardian
+- **Website:** https://www.gitguardian.com
+- **Type:** Commercial + free tier for public repos
+- **Detectors:** 450+ secret types
+- **Approach:** Regex + AI-powered false positive reduction
+- **Detection methods:**
+  - Specific prefix-based detectors
+  - Fine-tuned code-LLM for false positive filtering
+  - Validity checking for supported detectors
+- **AI/LLM coverage:**
+  - Groq API Key (prefixed, with validity check)
+  - OpenAI, Anthropic, HuggingFace (confirmed)
+  - AI-related leaked secrets up 81% YoY in 2025
+  - 1,275,105 leaked AI service secrets detected in 2025
+- **Key differentiator:** AI-powered false positive reduction, massive scale (scans all public GitHub)
+- **Limitations:**
+  - Commercial/proprietary for private repos
+  - Regex patterns not publicly disclosed
+
+### 1.6 GitHub Secret Scanning (Native)
+- **Type:** Built into GitHub
+- **Approach:** Provider-partnered pattern matching + Copilot AI
+- **AI/LLM patterns supported (with push protection and validity status):**
+
+| Provider | Pattern | Push Protection | Validity Check |
+|----------|---------|:-:|:-:|
+| Anthropic | `anthropic_admin_api_key` | Yes | Yes |
+| Anthropic | `anthropic_api_key` | Yes | Yes |
+| Anthropic | `anthropic_session_id` | Yes | No |
+| Cohere | `cohere_api_key` | Yes | No |
+| DeepSeek | `deepseek_api_key` | No | Yes |
+| Google | `google_gemini_api_key` | No | No |
+| Groq | `groq_api_key` | Yes | Yes |
+| Hugging Face | `hf_org_api_key` | Yes | No |
+| Hugging Face | `hf_user_access_token` | Yes | Yes |
+| Mistral AI | `mistral_ai_api_key` | No | No |
+| OpenAI | `openai_api_key` | Yes | Yes |
+| Replicate | `replicate_api_token` | Yes | Yes |
+| xAI | `xai_api_key` | Yes | Yes |
+| Azure | `azure_openai_key` | Yes | No |
+
+- **Recent developments (March 2026):**
+  - Added 37 new secret detectors including Langchain
+  - Extended scanning to AI coding agents via MCP
+  - Copilot uses GPT-3.5-Turbo + GPT-4 for unstructured secret detection (94% FP reduction)
+  - Base64-encoded secret detection with push protection
+
+### 1.7 Other Notable Tools
+
+| Tool | Stars | Language | Patterns | Key Feature |
+|------|-------|----------|----------|-------------|
+| **KeyHacks** (streaak) | 6,100 | Markdown/Shell | 100+ services | Validation curl commands for bug bounty |
+| **keyhacks.sh** (gwen001) | ~500 | Bash | 50+ | Automated version of KeyHacks |
+| **Secrets Patterns DB** (mazen160) | 1,400 | YAML/Regex | 1,600+ | Largest open-source regex DB, exports to TruffleHog/Gitleaks format |
+| **secret-regex-list** (h33tlit) | ~1,000 | Regex | 100+ | Regex patterns for scraping secrets |
+| **regextokens** (odomojuli) | ~300 | Regex | 50+ | OAuth/API token regex patterns |
+| **Betterleaks** | New (2026) | Go | — | Gitleaks successor for agentic era |
+
+---
+
+## 2. LLM-Specific API Key Tools
+
+### 2.1 Dedicated LLM Key Validators
+
+| Tool | URL | Providers | Approach |
+|------|-----|-----------|----------|
+| **TestMyAPIKey.com** | testmyapikey.com | OpenAI, Anthropic Claude, + 13 others | Client-side regex + live API validation |
+| **SecurityWall Checker** | securitywall.co/tools/api-key-checker | 455+ patterns, 350+ services (incl. OpenAI, Anthropic) | Client-side regex, generates curl commands |
+| **VibeFactory Scanner** | vibefactory.ai/api-key-security-scanner | 150+ types (incl. OpenAI) | Scans deployed websites for exposed keys |
+| **KeyLeak Detector** | github.com/Amal-David/keyleak-detector | Multiple | Headless browser + network interception |
+| **OpenAI Key Tester** | trevorfox.com/api-key-tester/openai | OpenAI, Anthropic | Direct API validation |
+| **Chatbot API Tester** | apikeytester.netlify.app | OpenAI, DeepSeek, OpenRouter | Endpoint validation |
+| **SecurityToolkits** | securitytoolkits.com/tools/apikey-validator | Multiple | API key/token checker |
+
+### 2.2 LLM Gateways with Key Validation
+
+These tools validate keys as part of their proxy/gateway functionality:
+
+| Tool | Stars | Providers | Validation Approach |
+|------|-------|-----------|---------------------|
+| **LiteLLM** | ~18k | 107 providers | AuthenticationError mapping from all providers |
+| **OpenRouter** | — | 60+ providers, 500+ models | Unified API key, provider-level validation |
+| **Portkey AI** | ~5k | 30+ providers | AI gateway with key validation |
+| **LLM-API-Key-Proxy** | ~200 | OpenAI, Anthropic compatible | Self-hosted proxy with key validation |
+
+### 2.3 Key Gap: No Comprehensive LLM-Focused Scanner
+
+**Critical finding:** There is NO dedicated open-source tool that:
+1. Detects API keys from all major LLM providers (50+)
+2. Validates them against live APIs
+3. Reports provider, model access, rate limits, and spend
+4. Covers both legacy and new key formats
+
+The closest tools are:
+- TruffleHog (broadest verification, but only ~3 confirmed LLM detectors)
+- GitHub Secret Scanning (14 AI-related patterns, but GitHub-only)
+- GitGuardian (broad AI coverage, but commercial)
+
+---
+
+## 3. Top LLM API Providers
+
+### Tier 1: Major Cloud & Frontier Model Providers
+| # | Provider | Key Product | Notes |
+|---|----------|-------------|-------|
+| 1 | **OpenAI** | GPT-5, GPT-4o, o-series | Market leader |
+| 2 | **Anthropic** | Claude Opus 4, Sonnet, Haiku | Enterprise focus |
+| 3 | **Google (Gemini/Vertex AI)** | Gemini 2.5 Pro/Flash | 2M token context |
+| 4 | **AWS Bedrock** | Multi-model (Claude, Llama, etc.) | AWS ecosystem |
+| 5 | **Azure OpenAI** | GPT-4o, o-series | Enterprise SLA 99.9% |
+| 6 | **Google AI Studio** | Gemini API | Developer-friendly |
+| 7 | **xAI** | Grok 4.1 | 2M context, low cost |
+
+### Tier 2: Specialized & Competitive Providers
+| # | Provider | Key Product | Notes |
+|---|----------|-------------|-------|
+| 8 | **Mistral AI** | Mistral Large, Codestral | European, open-weight |
+| 9 | **Cohere** | Command R+ | Enterprise RAG focus |
+| 10 | **DeepSeek** | DeepSeek R1, V3 | Ultra-low cost reasoning |
+| 11 | **Perplexity** | Sonar Pro | Search-augmented LLM |
+| 12 | **Together AI** | 200+ open-source models | Low latency inference |
+| 13 | **Groq** | LPU inference | Fastest inference speeds |
+| 14 | **Fireworks AI** | Open-source model hosting | Sub-100ms latency |
+| 15 | **Replicate** | Model hosting platform | Pay-per-use |
+| 16 | **Cerebras** | Wafer-scale inference | Ultra-fast inference |
+| 17 | **SambaNova** | Enterprise inference | Custom silicon |
+| 18 | **AI21** | Jamba models | Long context |
+| 19 | **Stability AI** | Stable Diffusion, text models | Image + text |
+| 20 | **NVIDIA NIM** | Optimized model serving | GPU-optimized |
+
+### Tier 3: Infrastructure, Platform & Gateway Providers
+| # | Provider | Key Product | Notes |
+|---|----------|-------------|-------|
+| 21 | **Cloudflare Workers AI** | Edge inference | Edge computing |
+| 22 | **Vercel AI** | AI SDK, v0 | Frontend-focused |
+| 23 | **OpenRouter** | Multi-model gateway | 500+ models |
+| 24 | **HuggingFace** | Inference API, 300+ models | Open-source hub |
+| 25 | **DeepInfra** | Inference platform | Cost-effective |
+| 26 | **Novita AI** | 200+ production APIs | Multi-modal |
+| 27 | **Baseten** | Model serving | Custom deployments |
+| 28 | **Anyscale** | Ray-based inference | Scalable |
+| 29 | **Lambda AI** | GPU cloud + inference | |
+| 30 | **OctoAI** | Optimized inference | |
+| 31 | **Databricks** | DBRX, model serving | Data + AI |
+| 32 | **Snowflake** | Cortex AI | Data warehouse + AI |
+| 33 | **Oracle OCI** | OCI AI | Enterprise |
+| 34 | **SAP Generative AI Hub** | Enterprise AI | SAP ecosystem |
+| 35 | **IBM WatsonX** | Granite models | Enterprise |
+
+### Tier 4: Chinese & Regional Providers
+| # | Provider | Key Product | Notes |
+|---|----------|-------------|-------|
+| 36 | **Alibaba (Qwen/Dashscope)** | Qwen 2.5/3 series | Top Chinese open-source |
+| 37 | **Baidu (Wenxin/ERNIE)** | ERNIE 4.0 | Chinese market leader |
+| 38 | **ByteDance (Doubao)** | Doubao/Kimi | TikTok parent |
+| 39 | **Zhipu AI** | GLM-4.5 | ChatGLM lineage |
+| 40 | **Baichuan** | Baichuan 4 | Domain-specific (law, finance) |
+| 41 | **Moonshot AI (Kimi)** | Kimi K1.5/K2 | 128K context |
+| 42 | **01.AI (Yi)** | Yi-Large, Yi-34B | Founded by Kai-Fu Lee |
+| 43 | **MiniMax** | MiniMax models | Chinese AI tiger |
+| 44 | **StepFun** | Step models | Chinese AI tiger |
+| 45 | **Tencent (Hunyuan)** | Hunyuan models | WeChat ecosystem |
+| 46 | **iFlyTek (Spark)** | Spark models | Voice/NLP specialist |
+| 47 | **SenseNova (SenseTime)** | SenseNova models | Vision + language |
+| 48 | **Volcano Engine (ByteDance)** | Cloud AI services | ByteDance cloud |
+| 49 | **Nebius AI** | Inference platform | Yandex spinoff |
+
+### Tier 5: Emerging, Niche & Specialized Providers
+| # | Provider | Key Product | Notes |
+|---|----------|-------------|-------|
+| 50 | **Aleph Alpha** | Luminous models | EU-focused, compliance |
+| 51 | **Comet API** | ML experiment tracking | |
+| 52 | **Writer** | Palmyra models | Enterprise content |
+| 53 | **Reka AI** | Reka Core/Flash | Multimodal |
+| 54 | **Upstage** | Solar models | Korean provider |
+| 55 | **FriendliAI** | Inference optimization | |
+| 56 | **Forefront AI** | Model hosting | |
+| 57 | **GooseAI** | GPT-NeoX hosting | Low cost |
+| 58 | **NLP Cloud** | Model hosting | |
+| 59 | **Predibase** | Fine-tuning platform | LoRA specialist |
+| 60 | **Clarifai** | Vision + LLM | |
+| 61 | **AiLAYER** | AI platform | |
+| 62 | **AIMLAPI** | Multi-model API | |
+| 63 | **Corcel** | Decentralized inference | Bittensor-based |
+| 64 | **HyperBee AI** | AI platform | |
+| 65 | **Lamini** | Fine-tuning + inference | |
+| 66 | **Monster API** | GPU inference | |
+| 67 | **Neets.ai** | TTS + LLM | |
+| 68 | **Featherless AI** | Inference | |
+| 69 | **Hyperbolic** | Inference platform | |
+| 70 | **Inference.net** | Open-source inference | |
+| 71 | **Galadriel** | Decentralized AI | |
+| 72 | **PublicAI** | Community inference | |
+| 73 | **Bytez** | Model hosting | |
+| 74 | **Chutes** | Inference | |
+| 75 | **GMI Cloud** | GPU cloud + inference | |
+| 76 | **Nscale** | Inference platform | |
+| 77 | **Scaleway** | European cloud AI | |
+| 78 | **OVHCloud AI** | European cloud AI | |
+| 79 | **Heroku AI** | PaaS AI add-on | |
+| 80 | **Sarvam.ai** | Indian AI models | |
+
+### Tier 6: Self-Hosted & Local Inference
+| # | Provider | Key Product | Notes |
+|---|----------|-------------|-------|
+| 81 | **Ollama** | Local LLM runner | No API key needed |
+| 82 | **LM Studio** | Desktop LLM | No API key needed |
+| 83 | **vLLM** | Inference engine | Self-hosted |
+| 84 | **Llamafile** | Single-file LLM | Self-hosted |
+| 85 | **Xinference** | Inference platform | Self-hosted |
+| 86 | **Triton Inference Server** | NVIDIA serving | Self-hosted |
+| 87 | **LlamaGate** | Gateway | Self-hosted |
+| 88 | **Docker Model Runner** | Container inference | Self-hosted |
+
+### Tier 7: Aggregators, Gateways & Middleware
+| # | Provider | Key Product | Notes |
+|---|----------|-------------|-------|
+| 89 | **LiteLLM** | AI gateway (107 providers) | Open-source |
+| 90 | **Portkey** | AI gateway | Observability |
+| 91 | **Helicone** | LLM observability | Proxy-based |
+| 92 | **Bifrost** | AI gateway (Go) | Fastest gateway |
+| 93 | **Kong AI Gateway** | API management | Enterprise |
+| 94 | **Vercel AI Gateway** | Edge AI | |
+| 95 | **Cloudflare AI Gateway** | Edge AI | |
+| 96 | **Agenta** | LLM ops platform | |
+| 97 | **Straico** | Multi-model | |
+| 98 | **AI302** | Gateway | |
+| 99 | **AIHubMix** | Gateway | |
+| 100 | **Zenmux** | Gateway | |
+| 101 | **Poe** | Multi-model chat | Quora |
+| 102 | **Gitee AI** | Chinese GitHub AI | |
+| 103 | **GitHub Models** | GitHub-hosted inference | |
+| 104 | **GitHub Copilot** | Code completion | |
+| 105 | **ModelScope** | Chinese model hub | Alibaba |
+| 106 | **Voyage AI** | Embeddings | |
+| 107 | **Jina AI** | Embeddings + search | |
+| 108 | **Deepgram** | Speech-to-text | |
+| 109 | **ElevenLabs** | Text-to-speech | |
+| 110 | **Black Forest Labs** | Image generation (FLUX) | |
+| 111 | **Fal AI** | Image/video generation | |
+| 112 | **RunwayML** | Video generation | |
+| 113 | **Recraft** | Image generation | |
+| 114 | **DataRobot** | ML platform | |
+| 115 | **Weights & Biases** | ML ops + inference | |
+| 116 | **CompactifAI** | Model compression | |
+| 117 | **GradientAI** | Fine-tuning | |
+| 118 | **Topaz** | AI platform | |
+| 119 | **Synthetic** | Data generation | |
+| 120 | **Infiniai** | Inference | |
+| 121 | **Higress** | AI gateway | Alibaba |
+| 122 | **PPIO** | Inference | |
+| 123 | **Qiniu** | Chinese cloud AI | |
+| 124 | **NanoGPT** | Lightweight inference | |
+| 125 | **Morph** | AI platform | |
+| 126 | **Milvus** | Vector DB + AI | |
+| 127 | **XiaoMi MiMo** | Xiaomi AI | |
+| 128 | **Petals** | Distributed inference | |
+| 129 | **ZeroOne** | AI platform | |
+| 130 | **Lemonade** | AI platform | |
+| 131 | **Taichu** | Chinese AI | |
+| 132 | **Amazon Nova** | AWS native models | |
+
+---
+
+## 4. API Key Patterns by Provider
+
+### 4.1 Confirmed Key Prefixes & Formats
+
+| Provider | Prefix | Regex Pattern | Confidence |
+|----------|--------|---------------|------------|
+| **OpenAI (legacy)** | `sk-` | `sk-[a-zA-Z0-9]{48}` | High |
+| **OpenAI (project)** | `sk-proj-` | `sk-proj-[a-zA-Z0-9_-]{80,}` | High |
+| **OpenAI (service account)** | `sk-svcacct-` | `sk-svcacct-[a-zA-Z0-9_-]{80,}` | High |
+| **OpenAI (legacy user)** | `sk-None-` | `sk-None-[a-zA-Z0-9_-]{80,}` | High |
+| **Anthropic (API)** | `sk-ant-api03-` | `sk-ant-api03-[a-zA-Z0-9_\-]{93}AA` | High |
+| **Anthropic (Admin)** | `sk-ant-admin01-` | `sk-ant-admin01-[a-zA-Z0-9_\-]{93}AA` | High |
+| **Google AI / Gemini** | `AIza` | `AIza[0-9A-Za-z\-_]{35}` | High |
+| **HuggingFace (user)** | `hf_` | `hf_[a-zA-Z]{34}` | High |
+| **HuggingFace (org)** | `api_org_` | `api_org_[a-zA-Z]{34}` | High |
+| **Groq** | `gsk_` | `gsk_[a-zA-Z0-9]{48,}` | High |
+| **Replicate** | `r8_` | `r8_[a-zA-Z0-9]{40}` | High |
+| **Fireworks AI** | `fw_` | `fw_[a-zA-Z0-9_-]{40,}` | Medium |
+| **Perplexity** | `pplx-` | `pplx-[a-zA-Z0-9]{48}` | High |
+| **AWS (general)** | `AKIA` | `AKIA[0-9A-Z]{16}` | High |
+| **GitHub PAT** | `ghp_` | `ghp_[a-zA-Z0-9]{36}` | High |
+| **Stripe (secret)** | `sk_live_` | `sk_live_[0-9a-zA-Z]{24}` | High |
+
+### 4.2 Providers with No Known Distinct Prefix
+
+These providers use generic-looking API keys without distinguishing prefixes, making detection harder:
+
+| Provider | Key Format | Detection Approach |
+|----------|-----------|-------------------|
+| **Mistral AI** | Generic alphanumeric | Keyword-based (`MISTRAL_API_KEY`) |
+| **Cohere** | Generic alphanumeric | Keyword-based (`COHERE_API_KEY`, `CO_API_KEY`) |
+| **Together AI** | Generic alphanumeric | Keyword-based |
+| **DeepSeek** | `sk-` prefix (same as OpenAI legacy) | Keyword context needed |
+| **Azure OpenAI** | 32-char hex | Keyword-based |
+| **Stability AI** | `sk-` prefix | Keyword context needed |
+| **AI21** | Generic alphanumeric | Keyword-based |
+| **Cerebras** | Generic alphanumeric | Keyword-based |
+| **SambaNova** | Generic alphanumeric | Keyword-based |
+
+### 4.3 Detection Difficulty Tiers
+
+**Easy (unique prefix):** OpenAI (sk-proj-, sk-svcacct-), Anthropic (sk-ant-), HuggingFace (hf_), Groq (gsk_), Replicate (r8_), Perplexity (pplx-), AWS (AKIA)
+
+**Medium (shared or short prefix):** OpenAI legacy (sk-), DeepSeek (sk-), Stability (sk-), Fireworks (fw_), Google (AIza)
+
+**Hard (no prefix, keyword-only):** Mistral, Cohere, Together AI, Azure OpenAI, AI21, Cerebras, most Chinese providers
+
+---
+
+## 5. Key Validation Approaches
+
+### 5.1 Common Validation Endpoints
+
+| Provider | Validation Method | Endpoint | Cost |
+|----------|-------------------|----------|------|
+| **OpenAI** | List models | `GET /v1/models` | Free (no tokens consumed) |
+| **Anthropic** | Send minimal message | `POST /v1/messages` (tiny prompt) | Minimal cost (~1 token) |
+| **Google Gemini** | List models | `GET /v1/models` | Free |
+| **Cohere** | Token check | `POST /v1/tokenize` or `/v1/generate` | Minimal |
+| **HuggingFace** | Whoami | `GET /api/whoami` | Free |
+| **Groq** | List models | `GET /v1/models` | Free |
+| **Replicate** | Get account | `GET /v1/account` | Free |
+| **Mistral** | List models | `GET /v1/models` | Free |
+| **AWS** | STS GetCallerIdentity | `POST sts.amazonaws.com` | Free |
+| **Azure OpenAI** | List deployments | `GET /openai/deployments` | Free |
+
+### 5.2 Validation Strategy Patterns
+
+1. **Passive detection (regex only):** Fastest, highest false positive rate. Used by Gitleaks, detect-secrets baseline mode.
+
+2. **Passive + entropy:** Combines regex with entropy scoring. Reduces false positives for generic patterns. Used by detect-secrets with entropy plugins.
+
+3. **Active verification (API call):** Makes lightweight API call to confirm key is live. Used by TruffleHog, GitHub secret scanning. Eliminates false positives but requires network access.
+
+4. **Deep analysis (permission enumeration):** Beyond verification, enumerates what the key can access. Used by TruffleHog for ~20 credential types. Most actionable but slowest.
+
+### 5.3 How Existing Tools Validate
+
+| Tool | Passive | Entropy | Active Verification | Permission Analysis |
+|------|:-------:|:-------:|:-------------------:|:-------------------:|
+| TruffleHog | Yes | No | Yes (800+ detectors) | Yes (~20 types) |
+| Gitleaks | Yes | Optional | No | No |
+| detect-secrets | Yes | Yes | Limited | No |
+| Nosey Parker | Yes | ML-based | No | No |
+| GitGuardian | Yes | Yes | Yes (selected) | Limited |
+| GitHub Scanning | Yes | AI-based | Yes (selected) | No |
+| SecurityWall | Yes | No | Generates curl cmds | No |
+| KeyHacks | No | No | Manual curl cmds | Limited |
+
+---
+
+## 6. Market Gaps & Opportunities
+
+### 6.1 Underserved Areas
+
+1. **LLM-specific comprehensive scanner:** No tool covers all 50+ LLM API providers with both detection and validation.
+
+2. **New key format coverage:** OpenAI's `sk-proj-` and `sk-svcacct-` formats are recent; many scanners only detect legacy `sk-` format. Gitleaks only added these in late 2025 via PR #1780.
+
+3. **Chinese/regional provider detection:** Almost zero coverage for Qwen, Baichuan, Zhipu, Moonshot, Yi, ERNIE, Doubao API keys in any scanner.
+
+4. **Key metadata extraction:** No tool extracts org, project, rate limits, or spend from detected LLM keys.
+
+5. **Agentic AI context:** With AI agents increasingly using API keys, there's a growing need for scanners that understand multi-key configurations (e.g., an agent with OpenAI + Anthropic + Serp API keys).
+
+6. **Vibe coding exposure:** VibeFactory's scanner addresses the problem of API keys exposed in frontend JavaScript by vibe-coded apps, but this is still nascent.
+
+### 6.2 Scale of the Problem
+
+- **28 million credentials leaked on GitHub in 2025** (Snyk)
+- **1,275,105 leaked AI service secrets in 2025** (GitGuardian), up 81% YoY
+- **8 of 10 fastest-growing leaked secret categories are AI-related** (GitGuardian)
+- Fastest growing: Brave Search API (+1,255%), Firecrawl (+796%), Supabase (+992%)
+- AI keys are found at **42.28 per million commits** for Groq alone (GitGuardian)
+
+### 6.3 Competitive Landscape Summary
+
+```
+                    Verification Depth
+                    |
+        TruffleHog  |  ████████████████  (800+ detectors, deep analysis)
+        GitGuardian |  ████████████      (450+ detectors, commercial)
+        GitHub      |  ██████████        (AI-powered, platform-locked)
+        Gitleaks    |  ████              (150+ regex, no verification)
+        detect-sec  |  ███               (27 plugins, baseline approach)
+        NoseyParker |  ██                (188 rules, ML denoising, retired)
+                    |
+                    +------ LLM Provider Coverage ------>
+                    
+        None of these tools provide >15 LLM provider detectors.
+        The market opportunity is a scanner focused on 50-100+ LLM providers
+        with active verification, permission analysis, and cost estimation.
+```
+
+---
+
+## Sources
+
+### Open-Source Scanner Tools
+- [TruffleHog - GitHub](https://github.com/trufflesecurity/trufflehog)
+- [TruffleHog Detectors](https://trufflesecurity.com/detectors)
+- [Gitleaks - GitHub](https://github.com/gitleaks/gitleaks)
+- [Gitleaks Config (gitleaks.toml)](https://github.com/gitleaks/gitleaks/blob/master/config/gitleaks.toml)
+- [detect-secrets - GitHub](https://github.com/Yelp/detect-secrets)
+- [Nosey Parker - GitHub](https://github.com/praetorian-inc/noseyparker)
+- [KeyHacks - GitHub](https://github.com/streaak/keyhacks)
+- [Secrets Patterns DB - GitHub](https://github.com/mazen160/secrets-patterns-db)
+- [regextokens - GitHub](https://github.com/odomojuli/regextokens)
+- [Betterleaks - Gitleaks Successor](https://www.aikido.dev/blog/betterleaks-gitleaks-successor)
+
+### Comparison & Analysis
+- [TruffleHog vs Gitleaks Comparison (Jit)](https://www.jit.io/resources/appsec-tools/trufflehog-vs-gitleaks-a-detailed-comparison-of-secret-scanning-tools)
+- [Best Secret Scanning Tools 2025 (Aikido)](https://www.aikido.dev/blog/top-secret-scanning-tools)
+- [8 Best Secret Scanning Tools 2026 (AppSec Santa)](https://appsecsanta.com/sast-tools/secret-scanning-tools)
+- [Secret Scanning Tools 2026 (GitGuardian)](https://blog.gitguardian.com/secret-scanning-tools/)
+
+### API Key Patterns & Validation
+- [OpenAI API Key Format Discussion](https://community.openai.com/t/regex-s-to-validate-api-key-and-org-id-format/44619)
+- [OpenAI sk-proj Key Format](https://community.openai.com/t/how-to-create-an-api-secret-key-with-prefix-sk-only-always-creates-sk-proj-keys/1263531)
+- [Gitleaks OpenAI Regex PR #1780](https://github.com/gitleaks/gitleaks/pull/1780)
+- [GitHub Leaked API Keys Patterns](https://gist.github.com/win3zz/0a1c70589fcbea64dba4588b93095855)
+- [GitGuardian Groq API Key Detector](https://docs.gitguardian.com/secrets-detection/secrets-detection-engine/detectors/specifics/groq_api_key)
+
+### LLM Key Validation Tools
+- [TestMyAPIKey.com](https://www.testmyapikey.com/)
+- [SecurityWall API Key Checker](https://securitywall.co/tools/api-key-checker)
+- [VibeFactory API Key Scanner](https://vibefactory.ai/api-key-security-scanner)
+- [KeyLeak Detector - GitHub](https://github.com/Amal-David/keyleak-detector)
+
+### LLM Provider Lists
+- [LiteLLM Providers (107)](https://docs.litellm.ai/docs/providers)
+- [Langbase Supported Providers](https://langbase.com/docs/supported-models-and-providers)
+- [LLM-Interface API Keys Doc](https://github.com/samestrin/llm-interface/blob/main/docs/api-keys.md)
+- [Artificial Analysis Provider Leaderboard](https://artificialanalysis.ai/leaderboards/providers)
+- [Top LLM API Providers 2026 (Future AGI)](https://futureagi.substack.com/p/top-11-llm-api-providers-in-2026)
+
+### GitHub Secret Scanning
+- [GitHub Supported Secret Scanning Patterns](https://docs.github.com/en/code-security/secret-scanning/introduction/supported-secret-scanning-patterns)
+- [GitHub Adds 37 New Detectors (March 2026)](https://devops.com/github-adds-37-new-secret-detectors-in-march-extends-scanning-to-ai-coding-agents/)
+- [GitHub Secret Scanning Coverage Update](https://github.blog/changelog/2026-03-31-github-secret-scanning-nine-new-types-and-more/)
+
+### Market Data
+- [State of Secrets Sprawl 2026 (GitGuardian/Hacker News)](https://thehackernews.com/2026/03/the-state-of-secrets-sprawl-2026-9.html)
+- [Why 28M Credentials Leaked on GitHub in 2025 (Snyk)](https://snyk.io/articles/state-of-secrets/)
+- [GitGuardian AI Security](https://www.gitguardian.com/agentic-ai-security)