merge: phase 14-03 frontend leaks

This commit is contained in:
salvacybersec
2026-04-06 13:21:39 +03:00
38 changed files with 2644 additions and 29 deletions

548
RESEARCH_REPORT.md Normal file
View File

@@ -0,0 +1,548 @@
# API Key Scanner Market Research Report
**Date: April 4, 2026**
---
## Table of Contents
1. [Existing Open-Source API Key Scanners](#1-existing-open-source-api-key-scanners)
2. [LLM-Specific API Key Tools](#2-llm-specific-api-key-tools)
3. [Top LLM API Providers (100+)](#3-top-llm-api-providers)
4. [API Key Patterns by Provider](#4-api-key-patterns-by-provider)
5. [Key Validation Approaches](#5-key-validation-approaches)
6. [Market Gaps & Opportunities](#6-market-gaps--opportunities)
---
## 1. Existing Open-Source API Key Scanners
### 1.1 TruffleHog
- **GitHub:** https://github.com/trufflesecurity/trufflehog
- **Stars:** ~25,500
- **Language:** Go
- **Detectors:** 800+ secret types
- **Approach:** Detector-based (each detector is a small Go program for a specific credential type)
- **Detection methods:**
- Pattern matching via dedicated detectors
- Active verification against live APIs
- Permission/scope analysis (~20 credential types)
- **AI/LLM detectors confirmed:** OpenAI, OpenAI Admin Key, Anthropic
- **Scanning sources:** Git repos, GitHub orgs, S3 buckets, GCS, Docker images, Jenkins, Elasticsearch, Postman, Slack, local filesystems
- **Key differentiator:** Verification — not just "this looks like a key" but "this is an active key with these permissions"
- **Limitations:**
- Heavy/slow compared to regex-only scanners
- Not all 800+ detectors have verification
- LLM provider coverage still incomplete (no confirmed Cohere, Mistral, Groq detectors)
### 1.2 Gitleaks
- **GitHub:** https://github.com/gitleaks/gitleaks
- **Stars:** ~25,800
- **Language:** Go
- **Rules:** 150+ regex patterns in `gitleaks.toml`
- **Approach:** Regex pattern matching with optional entropy checks
- **Detection methods:**
- Regex patterns defined in TOML config
- Keyword matching
- Entropy thresholds
- Allowlists for false positive reduction
- **AI/LLM rules confirmed:**
- `anthropic-admin-api-key`: `sk-ant-admin01-[a-zA-Z0-9_\-]{93}AA`
- `anthropic-api-key`: `sk-ant-api03-[a-zA-Z0-9_\-]{93}AA`
- `openai-api-key`: Updated to include `sk-proj-` and `sk-svcacct-` formats
- `cohere-api-token`: Keyword-based detection
- `huggingface-access-token`: `hf_[a-z]{34}`
- `huggingface-organization-api-token`: `api_org_[a-z]{34}`
- **Key differentiator:** Fast, simple, excellent as pre-commit hook
- **Limitations:**
- No active verification of detected keys
- Regex-only means higher false positive rate for generic patterns
- Limited LLM provider coverage beyond the 5 above
- **Note:** Gitleaks creator launched "Betterleaks" in 2026 as a successor built for the agentic era
### 1.3 detect-secrets (Yelp)
- **GitHub:** https://github.com/Yelp/detect-secrets
- **Stars:** ~4,300
- **Language:** Python
- **Plugins:** 27 built-in detectors
- **Approach:** Baseline methodology — tracks known secrets and flags new ones
- **Detection methods:**
- Regex-based plugins (structured secrets)
- High entropy string detection (Base64, Hex)
- Keyword detection (variable name matching)
- Optional ML-based gibberish detector (v1.1+)
- **AI/LLM plugins confirmed:**
- `OpenAIDetector` plugin exists
- No dedicated Anthropic, Cohere, Mistral, or Groq plugins
- **Key differentiator:** Baseline approach — only flags NEW secrets, not historical ones; enterprise-friendly
- **Limitations:**
- Minimal LLM provider coverage
- No active verification
- Fewer patterns than TruffleHog or Gitleaks
- Python-only (slower than Go/Rust alternatives)
### 1.4 Nosey Parker (Praetorian)
- **GitHub:** https://github.com/praetorian-inc/noseyparker
- **Stars:** ~2,300
- **Language:** Rust
- **Rules:** 188 high-precision regex rules
- **Approach:** Hybrid regex + ML denoising
- **Detection methods:**
- 188 tested regex rules tuned for low false positives
- ML model for false positive reduction (10-1000x improvement)
- Deduplication/grouping of findings
- **Performance:** GB/s scanning speeds, tested on 20TB+ datasets
- **Key differentiator:** ML-enhanced denoising, extreme performance
- **Status:** RETIRED — replaced by Titus (https://github.com/praetorian-inc/titus)
- **Limitations:**
- No specific LLM provider rules documented
- No active verification
- Project discontinued
### 1.5 GitGuardian
- **Website:** https://www.gitguardian.com
- **Type:** Commercial + free tier for public repos
- **Detectors:** 450+ secret types
- **Approach:** Regex + AI-powered false positive reduction
- **Detection methods:**
- Specific prefix-based detectors
- Fine-tuned code-LLM for false positive filtering
- Validity checking for supported detectors
- **AI/LLM coverage:**
- Groq API Key (prefixed, with validity check)
- OpenAI, Anthropic, HuggingFace (confirmed)
- AI-related leaked secrets up 81% YoY in 2025
- 1,275,105 leaked AI service secrets detected in 2025
- **Key differentiator:** AI-powered false positive reduction, massive scale (scans all public GitHub)
- **Limitations:**
- Commercial/proprietary for private repos
- Regex patterns not publicly disclosed
### 1.6 GitHub Secret Scanning (Native)
- **Type:** Built into GitHub
- **Approach:** Provider-partnered pattern matching + Copilot AI
- **AI/LLM patterns supported (with push protection and validity status):**
| Provider | Pattern | Push Protection | Validity Check |
|----------|---------|:-:|:-:|
| Anthropic | `anthropic_admin_api_key` | Yes | Yes |
| Anthropic | `anthropic_api_key` | Yes | Yes |
| Anthropic | `anthropic_session_id` | Yes | No |
| Cohere | `cohere_api_key` | Yes | No |
| DeepSeek | `deepseek_api_key` | No | Yes |
| Google | `google_gemini_api_key` | No | No |
| Groq | `groq_api_key` | Yes | Yes |
| Hugging Face | `hf_org_api_key` | Yes | No |
| Hugging Face | `hf_user_access_token` | Yes | Yes |
| Mistral AI | `mistral_ai_api_key` | No | No |
| OpenAI | `openai_api_key` | Yes | Yes |
| Replicate | `replicate_api_token` | Yes | Yes |
| xAI | `xai_api_key` | Yes | Yes |
| Azure | `azure_openai_key` | Yes | No |
- **Recent developments (March 2026):**
- Added 37 new secret detectors including Langchain
- Extended scanning to AI coding agents via MCP
- Copilot uses GPT-3.5-Turbo + GPT-4 for unstructured secret detection (94% FP reduction)
- Base64-encoded secret detection with push protection
### 1.7 Other Notable Tools
| Tool | Stars | Language | Patterns | Key Feature |
|------|-------|----------|----------|-------------|
| **KeyHacks** (streaak) | 6,100 | Markdown/Shell | 100+ services | Validation curl commands for bug bounty |
| **keyhacks.sh** (gwen001) | ~500 | Bash | 50+ | Automated version of KeyHacks |
| **Secrets Patterns DB** (mazen160) | 1,400 | YAML/Regex | 1,600+ | Largest open-source regex DB, exports to TruffleHog/Gitleaks format |
| **secret-regex-list** (h33tlit) | ~1,000 | Regex | 100+ | Regex patterns for scraping secrets |
| **regextokens** (odomojuli) | ~300 | Regex | 50+ | OAuth/API token regex patterns |
| **Betterleaks** | New (2026) | Go | — | Gitleaks successor for agentic era |
---
## 2. LLM-Specific API Key Tools
### 2.1 Dedicated LLM Key Validators
| Tool | URL | Providers | Approach |
|------|-----|-----------|----------|
| **TestMyAPIKey.com** | testmyapikey.com | OpenAI, Anthropic Claude, + 13 others | Client-side regex + live API validation |
| **SecurityWall Checker** | securitywall.co/tools/api-key-checker | 455+ patterns, 350+ services (incl. OpenAI, Anthropic) | Client-side regex, generates curl commands |
| **VibeFactory Scanner** | vibefactory.ai/api-key-security-scanner | 150+ types (incl. OpenAI) | Scans deployed websites for exposed keys |
| **KeyLeak Detector** | github.com/Amal-David/keyleak-detector | Multiple | Headless browser + network interception |
| **OpenAI Key Tester** | trevorfox.com/api-key-tester/openai | OpenAI, Anthropic | Direct API validation |
| **Chatbot API Tester** | apikeytester.netlify.app | OpenAI, DeepSeek, OpenRouter | Endpoint validation |
| **SecurityToolkits** | securitytoolkits.com/tools/apikey-validator | Multiple | API key/token checker |
### 2.2 LLM Gateways with Key Validation
These tools validate keys as part of their proxy/gateway functionality:
| Tool | Stars | Providers | Validation Approach |
|------|-------|-----------|---------------------|
| **LiteLLM** | ~18k | 107 providers | AuthenticationError mapping from all providers |
| **OpenRouter** | — | 60+ providers, 500+ models | Unified API key, provider-level validation |
| **Portkey AI** | ~5k | 30+ providers | AI gateway with key validation |
| **LLM-API-Key-Proxy** | ~200 | OpenAI, Anthropic compatible | Self-hosted proxy with key validation |
### 2.3 Key Gap: No Comprehensive LLM-Focused Scanner
**Critical finding:** There is NO dedicated open-source tool that:
1. Detects API keys from all major LLM providers (50+)
2. Validates them against live APIs
3. Reports provider, model access, rate limits, and spend
4. Covers both legacy and new key formats
The closest tools are:
- TruffleHog (broadest verification, but only ~3 confirmed LLM detectors)
- GitHub Secret Scanning (14 AI-related patterns, but GitHub-only)
- GitGuardian (broad AI coverage, but commercial)
---
## 3. Top LLM API Providers
### Tier 1: Major Cloud & Frontier Model Providers
| # | Provider | Key Product | Notes |
|---|----------|-------------|-------|
| 1 | **OpenAI** | GPT-5, GPT-4o, o-series | Market leader |
| 2 | **Anthropic** | Claude Opus 4, Sonnet, Haiku | Enterprise focus |
| 3 | **Google (Gemini/Vertex AI)** | Gemini 2.5 Pro/Flash | 2M token context |
| 4 | **AWS Bedrock** | Multi-model (Claude, Llama, etc.) | AWS ecosystem |
| 5 | **Azure OpenAI** | GPT-4o, o-series | Enterprise SLA 99.9% |
| 6 | **Google AI Studio** | Gemini API | Developer-friendly |
| 7 | **xAI** | Grok 4.1 | 2M context, low cost |
### Tier 2: Specialized & Competitive Providers
| # | Provider | Key Product | Notes |
|---|----------|-------------|-------|
| 8 | **Mistral AI** | Mistral Large, Codestral | European, open-weight |
| 9 | **Cohere** | Command R+ | Enterprise RAG focus |
| 10 | **DeepSeek** | DeepSeek R1, V3 | Ultra-low cost reasoning |
| 11 | **Perplexity** | Sonar Pro | Search-augmented LLM |
| 12 | **Together AI** | 200+ open-source models | Low latency inference |
| 13 | **Groq** | LPU inference | Fastest inference speeds |
| 14 | **Fireworks AI** | Open-source model hosting | Sub-100ms latency |
| 15 | **Replicate** | Model hosting platform | Pay-per-use |
| 16 | **Cerebras** | Wafer-scale inference | Ultra-fast inference |
| 17 | **SambaNova** | Enterprise inference | Custom silicon |
| 18 | **AI21** | Jamba models | Long context |
| 19 | **Stability AI** | Stable Diffusion, text models | Image + text |
| 20 | **NVIDIA NIM** | Optimized model serving | GPU-optimized |
### Tier 3: Infrastructure, Platform & Gateway Providers
| # | Provider | Key Product | Notes |
|---|----------|-------------|-------|
| 21 | **Cloudflare Workers AI** | Edge inference | Edge computing |
| 22 | **Vercel AI** | AI SDK, v0 | Frontend-focused |
| 23 | **OpenRouter** | Multi-model gateway | 500+ models |
| 24 | **HuggingFace** | Inference API, 300+ models | Open-source hub |
| 25 | **DeepInfra** | Inference platform | Cost-effective |
| 26 | **Novita AI** | 200+ production APIs | Multi-modal |
| 27 | **Baseten** | Model serving | Custom deployments |
| 28 | **Anyscale** | Ray-based inference | Scalable |
| 29 | **Lambda AI** | GPU cloud + inference | |
| 30 | **OctoAI** | Optimized inference | |
| 31 | **Databricks** | DBRX, model serving | Data + AI |
| 32 | **Snowflake** | Cortex AI | Data warehouse + AI |
| 33 | **Oracle OCI** | OCI AI | Enterprise |
| 34 | **SAP Generative AI Hub** | Enterprise AI | SAP ecosystem |
| 35 | **IBM WatsonX** | Granite models | Enterprise |
### Tier 4: Chinese & Regional Providers
| # | Provider | Key Product | Notes |
|---|----------|-------------|-------|
| 36 | **Alibaba (Qwen/Dashscope)** | Qwen 2.5/3 series | Top Chinese open-source |
| 37 | **Baidu (Wenxin/ERNIE)** | ERNIE 4.0 | Chinese market leader |
| 38 | **ByteDance (Doubao)** | Doubao/Kimi | TikTok parent |
| 39 | **Zhipu AI** | GLM-4.5 | ChatGLM lineage |
| 40 | **Baichuan** | Baichuan 4 | Domain-specific (law, finance) |
| 41 | **Moonshot AI (Kimi)** | Kimi K1.5/K2 | 128K context |
| 42 | **01.AI (Yi)** | Yi-Large, Yi-34B | Founded by Kai-Fu Lee |
| 43 | **MiniMax** | MiniMax models | Chinese AI tiger |
| 44 | **StepFun** | Step models | Chinese AI tiger |
| 45 | **Tencent (Hunyuan)** | Hunyuan models | WeChat ecosystem |
| 46 | **iFlyTek (Spark)** | Spark models | Voice/NLP specialist |
| 47 | **SenseNova (SenseTime)** | SenseNova models | Vision + language |
| 48 | **Volcano Engine (ByteDance)** | Cloud AI services | ByteDance cloud |
| 49 | **Nebius AI** | Inference platform | Yandex spinoff |
### Tier 5: Emerging, Niche & Specialized Providers
| # | Provider | Key Product | Notes |
|---|----------|-------------|-------|
| 50 | **Aleph Alpha** | Luminous models | EU-focused, compliance |
| 51 | **Comet API** | ML experiment tracking | |
| 52 | **Writer** | Palmyra models | Enterprise content |
| 53 | **Reka AI** | Reka Core/Flash | Multimodal |
| 54 | **Upstage** | Solar models | Korean provider |
| 55 | **FriendliAI** | Inference optimization | |
| 56 | **Forefront AI** | Model hosting | |
| 57 | **GooseAI** | GPT-NeoX hosting | Low cost |
| 58 | **NLP Cloud** | Model hosting | |
| 59 | **Predibase** | Fine-tuning platform | LoRA specialist |
| 60 | **Clarifai** | Vision + LLM | |
| 61 | **AiLAYER** | AI platform | |
| 62 | **AIMLAPI** | Multi-model API | |
| 63 | **Corcel** | Decentralized inference | Bittensor-based |
| 64 | **HyperBee AI** | AI platform | |
| 65 | **Lamini** | Fine-tuning + inference | |
| 66 | **Monster API** | GPU inference | |
| 67 | **Neets.ai** | TTS + LLM | |
| 68 | **Featherless AI** | Inference | |
| 69 | **Hyperbolic** | Inference platform | |
| 70 | **Inference.net** | Open-source inference | |
| 71 | **Galadriel** | Decentralized AI | |
| 72 | **PublicAI** | Community inference | |
| 73 | **Bytez** | Model hosting | |
| 74 | **Chutes** | Inference | |
| 75 | **GMI Cloud** | GPU cloud + inference | |
| 76 | **Nscale** | Inference platform | |
| 77 | **Scaleway** | European cloud AI | |
| 78 | **OVHCloud AI** | European cloud AI | |
| 79 | **Heroku AI** | PaaS AI add-on | |
| 80 | **Sarvam.ai** | Indian AI models | |
### Tier 6: Self-Hosted & Local Inference
| # | Provider | Key Product | Notes |
|---|----------|-------------|-------|
| 81 | **Ollama** | Local LLM runner | No API key needed |
| 82 | **LM Studio** | Desktop LLM | No API key needed |
| 83 | **vLLM** | Inference engine | Self-hosted |
| 84 | **Llamafile** | Single-file LLM | Self-hosted |
| 85 | **Xinference** | Inference platform | Self-hosted |
| 86 | **Triton Inference Server** | NVIDIA serving | Self-hosted |
| 87 | **LlamaGate** | Gateway | Self-hosted |
| 88 | **Docker Model Runner** | Container inference | Self-hosted |
### Tier 7: Aggregators, Gateways & Middleware
| # | Provider | Key Product | Notes |
|---|----------|-------------|-------|
| 89 | **LiteLLM** | AI gateway (107 providers) | Open-source |
| 90 | **Portkey** | AI gateway | Observability |
| 91 | **Helicone** | LLM observability | Proxy-based |
| 92 | **Bifrost** | AI gateway (Go) | Fastest gateway |
| 93 | **Kong AI Gateway** | API management | Enterprise |
| 94 | **Vercel AI Gateway** | Edge AI | |
| 95 | **Cloudflare AI Gateway** | Edge AI | |
| 96 | **Agenta** | LLM ops platform | |
| 97 | **Straico** | Multi-model | |
| 98 | **AI302** | Gateway | |
| 99 | **AIHubMix** | Gateway | |
| 100 | **Zenmux** | Gateway | |
| 101 | **Poe** | Multi-model chat | Quora |
| 102 | **Gitee AI** | Chinese GitHub AI | |
| 103 | **GitHub Models** | GitHub-hosted inference | |
| 104 | **GitHub Copilot** | Code completion | |
| 105 | **ModelScope** | Chinese model hub | Alibaba |
| 106 | **Voyage AI** | Embeddings | |
| 107 | **Jina AI** | Embeddings + search | |
| 108 | **Deepgram** | Speech-to-text | |
| 109 | **ElevenLabs** | Text-to-speech | |
| 110 | **Black Forest Labs** | Image generation (FLUX) | |
| 111 | **Fal AI** | Image/video generation | |
| 112 | **RunwayML** | Video generation | |
| 113 | **Recraft** | Image generation | |
| 114 | **DataRobot** | ML platform | |
| 115 | **Weights & Biases** | ML ops + inference | |
| 116 | **CompactifAI** | Model compression | |
| 117 | **GradientAI** | Fine-tuning | |
| 118 | **Topaz** | AI platform | |
| 119 | **Synthetic** | Data generation | |
| 120 | **Infiniai** | Inference | |
| 121 | **Higress** | AI gateway | Alibaba |
| 122 | **PPIO** | Inference | |
| 123 | **Qiniu** | Chinese cloud AI | |
| 124 | **NanoGPT** | Lightweight inference | |
| 125 | **Morph** | AI platform | |
| 126 | **Milvus** | Vector DB + AI | |
| 127 | **XiaoMi MiMo** | Xiaomi AI | |
| 128 | **Petals** | Distributed inference | |
| 129 | **ZeroOne** | AI platform | |
| 130 | **Lemonade** | AI platform | |
| 131 | **Taichu** | Chinese AI | |
| 132 | **Amazon Nova** | AWS native models | |
---
## 4. API Key Patterns by Provider
### 4.1 Confirmed Key Prefixes & Formats
| Provider | Prefix | Regex Pattern | Confidence |
|----------|--------|---------------|------------|
| **OpenAI (legacy)** | `sk-` | `sk-[a-zA-Z0-9]{48}` | High |
| **OpenAI (project)** | `sk-proj-` | `sk-proj-[a-zA-Z0-9_-]{80,}` | High |
| **OpenAI (service account)** | `sk-svcacct-` | `sk-svcacct-[a-zA-Z0-9_-]{80,}` | High |
| **OpenAI (legacy user)** | `sk-None-` | `sk-None-[a-zA-Z0-9_-]{80,}` | High |
| **Anthropic (API)** | `sk-ant-api03-` | `sk-ant-api03-[a-zA-Z0-9_\-]{93}AA` | High |
| **Anthropic (Admin)** | `sk-ant-admin01-` | `sk-ant-admin01-[a-zA-Z0-9_\-]{93}AA` | High |
| **Google AI / Gemini** | `AIza` | `AIza[0-9A-Za-z\-_]{35}` | High |
| **HuggingFace (user)** | `hf_` | `hf_[a-zA-Z]{34}` | High |
| **HuggingFace (org)** | `api_org_` | `api_org_[a-zA-Z]{34}` | High |
| **Groq** | `gsk_` | `gsk_[a-zA-Z0-9]{48,}` | High |
| **Replicate** | `r8_` | `r8_[a-zA-Z0-9]{40}` | High |
| **Fireworks AI** | `fw_` | `fw_[a-zA-Z0-9_-]{40,}` | Medium |
| **Perplexity** | `pplx-` | `pplx-[a-zA-Z0-9]{48}` | High |
| **AWS (general)** | `AKIA` | `AKIA[0-9A-Z]{16}` | High |
| **GitHub PAT** | `ghp_` | `ghp_[a-zA-Z0-9]{36}` | High |
| **Stripe (secret)** | `sk_live_` | `sk_live_[0-9a-zA-Z]{24}` | High |
### 4.2 Providers with No Known Distinct Prefix
These providers use generic-looking API keys without distinguishing prefixes, making detection harder:
| Provider | Key Format | Detection Approach |
|----------|-----------|-------------------|
| **Mistral AI** | Generic alphanumeric | Keyword-based (`MISTRAL_API_KEY`) |
| **Cohere** | Generic alphanumeric | Keyword-based (`COHERE_API_KEY`, `CO_API_KEY`) |
| **Together AI** | Generic alphanumeric | Keyword-based |
| **DeepSeek** | `sk-` prefix (same as OpenAI legacy) | Keyword context needed |
| **Azure OpenAI** | 32-char hex | Keyword-based |
| **Stability AI** | `sk-` prefix | Keyword context needed |
| **AI21** | Generic alphanumeric | Keyword-based |
| **Cerebras** | Generic alphanumeric | Keyword-based |
| **SambaNova** | Generic alphanumeric | Keyword-based |
### 4.3 Detection Difficulty Tiers
**Easy (unique prefix):** OpenAI (sk-proj-, sk-svcacct-), Anthropic (sk-ant-), HuggingFace (hf_), Groq (gsk_), Replicate (r8_), Perplexity (pplx-), AWS (AKIA)
**Medium (shared or short prefix):** OpenAI legacy (sk-), DeepSeek (sk-), Stability (sk-), Fireworks (fw_), Google (AIza)
**Hard (no prefix, keyword-only):** Mistral, Cohere, Together AI, Azure OpenAI, AI21, Cerebras, most Chinese providers
---
## 5. Key Validation Approaches
### 5.1 Common Validation Endpoints
| Provider | Validation Method | Endpoint | Cost |
|----------|-------------------|----------|------|
| **OpenAI** | List models | `GET /v1/models` | Free (no tokens consumed) |
| **Anthropic** | Send minimal message | `POST /v1/messages` (tiny prompt) | Minimal cost (~1 token) |
| **Google Gemini** | List models | `GET /v1/models` | Free |
| **Cohere** | Token check | `POST /v1/tokenize` or `/v1/generate` | Minimal |
| **HuggingFace** | Whoami | `GET /api/whoami` | Free |
| **Groq** | List models | `GET /v1/models` | Free |
| **Replicate** | Get account | `GET /v1/account` | Free |
| **Mistral** | List models | `GET /v1/models` | Free |
| **AWS** | STS GetCallerIdentity | `POST sts.amazonaws.com` | Free |
| **Azure OpenAI** | List deployments | `GET /openai/deployments` | Free |
### 5.2 Validation Strategy Patterns
1. **Passive detection (regex only):** Fastest, highest false positive rate. Used by Gitleaks, detect-secrets baseline mode.
2. **Passive + entropy:** Combines regex with entropy scoring. Reduces false positives for generic patterns. Used by detect-secrets with entropy plugins.
3. **Active verification (API call):** Makes lightweight API call to confirm key is live. Used by TruffleHog, GitHub secret scanning. Eliminates false positives but requires network access.
4. **Deep analysis (permission enumeration):** Beyond verification, enumerates what the key can access. Used by TruffleHog for ~20 credential types. Most actionable but slowest.
### 5.3 How Existing Tools Validate
| Tool | Passive | Entropy | Active Verification | Permission Analysis |
|------|:-------:|:-------:|:-------------------:|:-------------------:|
| TruffleHog | Yes | No | Yes (800+ detectors) | Yes (~20 types) |
| Gitleaks | Yes | Optional | No | No |
| detect-secrets | Yes | Yes | Limited | No |
| Nosey Parker | Yes | ML-based | No | No |
| GitGuardian | Yes | Yes | Yes (selected) | Limited |
| GitHub Scanning | Yes | AI-based | Yes (selected) | No |
| SecurityWall | Yes | No | Generates curl cmds | No |
| KeyHacks | No | No | Manual curl cmds | Limited |
---
## 6. Market Gaps & Opportunities
### 6.1 Underserved Areas
1. **LLM-specific comprehensive scanner:** No tool covers all 50+ LLM API providers with both detection and validation.
2. **New key format coverage:** OpenAI's `sk-proj-` and `sk-svcacct-` formats are recent; many scanners only detect legacy `sk-` format. Gitleaks only added these in late 2025 via PR #1780.
3. **Chinese/regional provider detection:** Almost zero coverage for Qwen, Baichuan, Zhipu, Moonshot, Yi, ERNIE, Doubao API keys in any scanner.
4. **Key metadata extraction:** No tool extracts org, project, rate limits, or spend from detected LLM keys.
5. **Agentic AI context:** With AI agents increasingly using API keys, there's a growing need for scanners that understand multi-key configurations (e.g., an agent with OpenAI + Anthropic + Serp API keys).
6. **Vibe coding exposure:** VibeFactory's scanner addresses the problem of API keys exposed in frontend JavaScript by vibe-coded apps, but this is still nascent.
### 6.2 Scale of the Problem
- **28 million credentials leaked on GitHub in 2025** (Snyk)
- **1,275,105 leaked AI service secrets in 2025** (GitGuardian), up 81% YoY
- **8 of 10 fastest-growing leaked secret categories are AI-related** (GitGuardian)
- Fastest growing: Brave Search API (+1,255%), Firecrawl (+796%), Supabase (+992%)
- AI keys are found at **42.28 per million commits** for Groq alone (GitGuardian)
### 6.3 Competitive Landscape Summary
```
Verification Depth
|
TruffleHog | ████████████████ (800+ detectors, deep analysis)
GitGuardian | ████████████ (450+ detectors, commercial)
GitHub | ██████████ (AI-powered, platform-locked)
Gitleaks | ████ (150+ regex, no verification)
detect-sec | ███ (27 plugins, baseline approach)
NoseyParker | ██ (188 rules, ML denoising, retired)
|
+------ LLM Provider Coverage ------>
None of these tools provide >15 LLM provider detectors.
The market opportunity is a scanner focused on 50-100+ LLM providers
with active verification, permission analysis, and cost estimation.
```
---
## Sources
### Open-Source Scanner Tools
- [TruffleHog - GitHub](https://github.com/trufflesecurity/trufflehog)
- [TruffleHog Detectors](https://trufflesecurity.com/detectors)
- [Gitleaks - GitHub](https://github.com/gitleaks/gitleaks)
- [Gitleaks Config (gitleaks.toml)](https://github.com/gitleaks/gitleaks/blob/master/config/gitleaks.toml)
- [detect-secrets - GitHub](https://github.com/Yelp/detect-secrets)
- [Nosey Parker - GitHub](https://github.com/praetorian-inc/noseyparker)
- [KeyHacks - GitHub](https://github.com/streaak/keyhacks)
- [Secrets Patterns DB - GitHub](https://github.com/mazen160/secrets-patterns-db)
- [regextokens - GitHub](https://github.com/odomojuli/regextokens)
- [Betterleaks - Gitleaks Successor](https://www.aikido.dev/blog/betterleaks-gitleaks-successor)
### Comparison & Analysis
- [TruffleHog vs Gitleaks Comparison (Jit)](https://www.jit.io/resources/appsec-tools/trufflehog-vs-gitleaks-a-detailed-comparison-of-secret-scanning-tools)
- [Best Secret Scanning Tools 2025 (Aikido)](https://www.aikido.dev/blog/top-secret-scanning-tools)
- [8 Best Secret Scanning Tools 2026 (AppSec Santa)](https://appsecsanta.com/sast-tools/secret-scanning-tools)
- [Secret Scanning Tools 2026 (GitGuardian)](https://blog.gitguardian.com/secret-scanning-tools/)
### API Key Patterns & Validation
- [OpenAI API Key Format Discussion](https://community.openai.com/t/regex-s-to-validate-api-key-and-org-id-format/44619)
- [OpenAI sk-proj Key Format](https://community.openai.com/t/how-to-create-an-api-secret-key-with-prefix-sk-only-always-creates-sk-proj-keys/1263531)
- [Gitleaks OpenAI Regex PR #1780](https://github.com/gitleaks/gitleaks/pull/1780)
- [GitHub Leaked API Keys Patterns](https://gist.github.com/win3zz/0a1c70589fcbea64dba4588b93095855)
- [GitGuardian Groq API Key Detector](https://docs.gitguardian.com/secrets-detection/secrets-detection-engine/detectors/specifics/groq_api_key)
### LLM Key Validation Tools
- [TestMyAPIKey.com](https://www.testmyapikey.com/)
- [SecurityWall API Key Checker](https://securitywall.co/tools/api-key-checker)
- [VibeFactory API Key Scanner](https://vibefactory.ai/api-key-security-scanner)
- [KeyLeak Detector - GitHub](https://github.com/Amal-David/keyleak-detector)
### LLM Provider Lists
- [LiteLLM Providers (107)](https://docs.litellm.ai/docs/providers)
- [Langbase Supported Providers](https://langbase.com/docs/supported-models-and-providers)
- [LLM-Interface API Keys Doc](https://github.com/samestrin/llm-interface/blob/main/docs/api-keys.md)
- [Artificial Analysis Provider Leaderboard](https://artificialanalysis.ai/leaderboards/providers)
- [Top LLM API Providers 2026 (Future AGI)](https://futureagi.substack.com/p/top-11-llm-api-providers-in-2026)
### GitHub Secret Scanning
- [GitHub Supported Secret Scanning Patterns](https://docs.github.com/en/code-security/secret-scanning/introduction/supported-secret-scanning-patterns)
- [GitHub Adds 37 New Detectors (March 2026)](https://devops.com/github-adds-37-new-secret-detectors-in-march-extends-scanning-to-ai-coding-agents/)
- [GitHub Secret Scanning Coverage Update](https://github.blog/changelog/2026-03-31-github-secret-scanning-nine-new-types-and-more/)
### Market Data
- [State of Secrets Sprawl 2026 (GitGuardian/Hacker News)](https://thehackernews.com/2026/03/the-state-of-secrets-sprawl-2026-9.html)
- [Why 28M Credentials Leaked on GitHub in 2025 (Snyk)](https://snyk.io/articles/state-of-secrets/)
- [GitGuardian AI Security](https://www.gitguardian.com/agentic-ai-security)