# API Key Scanner Market Research Report **Date: April 4, 2026** --- ## Table of Contents 1. [Existing Open-Source API Key Scanners](#1-existing-open-source-api-key-scanners) 2. [LLM-Specific API Key Tools](#2-llm-specific-api-key-tools) 3. [Top LLM API Providers (100+)](#3-top-llm-api-providers) 4. [API Key Patterns by Provider](#4-api-key-patterns-by-provider) 5. [Key Validation Approaches](#5-key-validation-approaches) 6. [Market Gaps & Opportunities](#6-market-gaps--opportunities) --- ## 1. Existing Open-Source API Key Scanners ### 1.1 TruffleHog - **GitHub:** https://github.com/trufflesecurity/trufflehog - **Stars:** ~25,500 - **Language:** Go - **Detectors:** 800+ secret types - **Approach:** Detector-based (each detector is a small Go program for a specific credential type) - **Detection methods:** - Pattern matching via dedicated detectors - Active verification against live APIs - Permission/scope analysis (~20 credential types) - **AI/LLM detectors confirmed:** OpenAI, OpenAI Admin Key, Anthropic - **Scanning sources:** Git repos, GitHub orgs, S3 buckets, GCS, Docker images, Jenkins, Elasticsearch, Postman, Slack, local filesystems - **Key differentiator:** Verification — not just "this looks like a key" but "this is an active key with these permissions" - **Limitations:** - Heavy/slow compared to regex-only scanners - Not all 800+ detectors have verification - LLM provider coverage still incomplete (no confirmed Cohere, Mistral, Groq detectors) ### 1.2 Gitleaks - **GitHub:** https://github.com/gitleaks/gitleaks - **Stars:** ~25,800 - **Language:** Go - **Rules:** 150+ regex patterns in `gitleaks.toml` - **Approach:** Regex pattern matching with optional entropy checks - **Detection methods:** - Regex patterns defined in TOML config - Keyword matching - Entropy thresholds - Allowlists for false positive reduction - **AI/LLM rules confirmed:** - `anthropic-admin-api-key`: `sk-ant-admin01-[a-zA-Z0-9_\-]{93}AA` - `anthropic-api-key`: `sk-ant-api03-[a-zA-Z0-9_\-]{93}AA` - `openai-api-key`: Updated to include `sk-proj-` and `sk-svcacct-` formats - `cohere-api-token`: Keyword-based detection - `huggingface-access-token`: `hf_[a-z]{34}` - `huggingface-organization-api-token`: `api_org_[a-z]{34}` - **Key differentiator:** Fast, simple, excellent as pre-commit hook - **Limitations:** - No active verification of detected keys - Regex-only means higher false positive rate for generic patterns - Limited LLM provider coverage beyond the 5 above - **Note:** Gitleaks creator launched "Betterleaks" in 2026 as a successor built for the agentic era ### 1.3 detect-secrets (Yelp) - **GitHub:** https://github.com/Yelp/detect-secrets - **Stars:** ~4,300 - **Language:** Python - **Plugins:** 27 built-in detectors - **Approach:** Baseline methodology — tracks known secrets and flags new ones - **Detection methods:** - Regex-based plugins (structured secrets) - High entropy string detection (Base64, Hex) - Keyword detection (variable name matching) - Optional ML-based gibberish detector (v1.1+) - **AI/LLM plugins confirmed:** - `OpenAIDetector` plugin exists - No dedicated Anthropic, Cohere, Mistral, or Groq plugins - **Key differentiator:** Baseline approach — only flags NEW secrets, not historical ones; enterprise-friendly - **Limitations:** - Minimal LLM provider coverage - No active verification - Fewer patterns than TruffleHog or Gitleaks - Python-only (slower than Go/Rust alternatives) ### 1.4 Nosey Parker (Praetorian) - **GitHub:** https://github.com/praetorian-inc/noseyparker - **Stars:** ~2,300 - **Language:** Rust - **Rules:** 188 high-precision regex rules - **Approach:** Hybrid regex + ML denoising - **Detection methods:** - 188 tested regex rules tuned for low false positives - ML model for false positive reduction (10-1000x improvement) - Deduplication/grouping of findings - **Performance:** GB/s scanning speeds, tested on 20TB+ datasets - **Key differentiator:** ML-enhanced denoising, extreme performance - **Status:** RETIRED — replaced by Titus (https://github.com/praetorian-inc/titus) - **Limitations:** - No specific LLM provider rules documented - No active verification - Project discontinued ### 1.5 GitGuardian - **Website:** https://www.gitguardian.com - **Type:** Commercial + free tier for public repos - **Detectors:** 450+ secret types - **Approach:** Regex + AI-powered false positive reduction - **Detection methods:** - Specific prefix-based detectors - Fine-tuned code-LLM for false positive filtering - Validity checking for supported detectors - **AI/LLM coverage:** - Groq API Key (prefixed, with validity check) - OpenAI, Anthropic, HuggingFace (confirmed) - AI-related leaked secrets up 81% YoY in 2025 - 1,275,105 leaked AI service secrets detected in 2025 - **Key differentiator:** AI-powered false positive reduction, massive scale (scans all public GitHub) - **Limitations:** - Commercial/proprietary for private repos - Regex patterns not publicly disclosed ### 1.6 GitHub Secret Scanning (Native) - **Type:** Built into GitHub - **Approach:** Provider-partnered pattern matching + Copilot AI - **AI/LLM patterns supported (with push protection and validity status):** | Provider | Pattern | Push Protection | Validity Check | |----------|---------|:-:|:-:| | Anthropic | `anthropic_admin_api_key` | Yes | Yes | | Anthropic | `anthropic_api_key` | Yes | Yes | | Anthropic | `anthropic_session_id` | Yes | No | | Cohere | `cohere_api_key` | Yes | No | | DeepSeek | `deepseek_api_key` | No | Yes | | Google | `google_gemini_api_key` | No | No | | Groq | `groq_api_key` | Yes | Yes | | Hugging Face | `hf_org_api_key` | Yes | No | | Hugging Face | `hf_user_access_token` | Yes | Yes | | Mistral AI | `mistral_ai_api_key` | No | No | | OpenAI | `openai_api_key` | Yes | Yes | | Replicate | `replicate_api_token` | Yes | Yes | | xAI | `xai_api_key` | Yes | Yes | | Azure | `azure_openai_key` | Yes | No | - **Recent developments (March 2026):** - Added 37 new secret detectors including Langchain - Extended scanning to AI coding agents via MCP - Copilot uses GPT-3.5-Turbo + GPT-4 for unstructured secret detection (94% FP reduction) - Base64-encoded secret detection with push protection ### 1.7 Other Notable Tools | Tool | Stars | Language | Patterns | Key Feature | |------|-------|----------|----------|-------------| | **KeyHacks** (streaak) | 6,100 | Markdown/Shell | 100+ services | Validation curl commands for bug bounty | | **keyhacks.sh** (gwen001) | ~500 | Bash | 50+ | Automated version of KeyHacks | | **Secrets Patterns DB** (mazen160) | 1,400 | YAML/Regex | 1,600+ | Largest open-source regex DB, exports to TruffleHog/Gitleaks format | | **secret-regex-list** (h33tlit) | ~1,000 | Regex | 100+ | Regex patterns for scraping secrets | | **regextokens** (odomojuli) | ~300 | Regex | 50+ | OAuth/API token regex patterns | | **Betterleaks** | New (2026) | Go | — | Gitleaks successor for agentic era | --- ## 2. LLM-Specific API Key Tools ### 2.1 Dedicated LLM Key Validators | Tool | URL | Providers | Approach | |------|-----|-----------|----------| | **TestMyAPIKey.com** | testmyapikey.com | OpenAI, Anthropic Claude, + 13 others | Client-side regex + live API validation | | **SecurityWall Checker** | securitywall.co/tools/api-key-checker | 455+ patterns, 350+ services (incl. OpenAI, Anthropic) | Client-side regex, generates curl commands | | **VibeFactory Scanner** | vibefactory.ai/api-key-security-scanner | 150+ types (incl. OpenAI) | Scans deployed websites for exposed keys | | **KeyLeak Detector** | github.com/Amal-David/keyleak-detector | Multiple | Headless browser + network interception | | **OpenAI Key Tester** | trevorfox.com/api-key-tester/openai | OpenAI, Anthropic | Direct API validation | | **Chatbot API Tester** | apikeytester.netlify.app | OpenAI, DeepSeek, OpenRouter | Endpoint validation | | **SecurityToolkits** | securitytoolkits.com/tools/apikey-validator | Multiple | API key/token checker | ### 2.2 LLM Gateways with Key Validation These tools validate keys as part of their proxy/gateway functionality: | Tool | Stars | Providers | Validation Approach | |------|-------|-----------|---------------------| | **LiteLLM** | ~18k | 107 providers | AuthenticationError mapping from all providers | | **OpenRouter** | — | 60+ providers, 500+ models | Unified API key, provider-level validation | | **Portkey AI** | ~5k | 30+ providers | AI gateway with key validation | | **LLM-API-Key-Proxy** | ~200 | OpenAI, Anthropic compatible | Self-hosted proxy with key validation | ### 2.3 Key Gap: No Comprehensive LLM-Focused Scanner **Critical finding:** There is NO dedicated open-source tool that: 1. Detects API keys from all major LLM providers (50+) 2. Validates them against live APIs 3. Reports provider, model access, rate limits, and spend 4. Covers both legacy and new key formats The closest tools are: - TruffleHog (broadest verification, but only ~3 confirmed LLM detectors) - GitHub Secret Scanning (14 AI-related patterns, but GitHub-only) - GitGuardian (broad AI coverage, but commercial) --- ## 3. Top LLM API Providers ### Tier 1: Major Cloud & Frontier Model Providers | # | Provider | Key Product | Notes | |---|----------|-------------|-------| | 1 | **OpenAI** | GPT-5, GPT-4o, o-series | Market leader | | 2 | **Anthropic** | Claude Opus 4, Sonnet, Haiku | Enterprise focus | | 3 | **Google (Gemini/Vertex AI)** | Gemini 2.5 Pro/Flash | 2M token context | | 4 | **AWS Bedrock** | Multi-model (Claude, Llama, etc.) | AWS ecosystem | | 5 | **Azure OpenAI** | GPT-4o, o-series | Enterprise SLA 99.9% | | 6 | **Google AI Studio** | Gemini API | Developer-friendly | | 7 | **xAI** | Grok 4.1 | 2M context, low cost | ### Tier 2: Specialized & Competitive Providers | # | Provider | Key Product | Notes | |---|----------|-------------|-------| | 8 | **Mistral AI** | Mistral Large, Codestral | European, open-weight | | 9 | **Cohere** | Command R+ | Enterprise RAG focus | | 10 | **DeepSeek** | DeepSeek R1, V3 | Ultra-low cost reasoning | | 11 | **Perplexity** | Sonar Pro | Search-augmented LLM | | 12 | **Together AI** | 200+ open-source models | Low latency inference | | 13 | **Groq** | LPU inference | Fastest inference speeds | | 14 | **Fireworks AI** | Open-source model hosting | Sub-100ms latency | | 15 | **Replicate** | Model hosting platform | Pay-per-use | | 16 | **Cerebras** | Wafer-scale inference | Ultra-fast inference | | 17 | **SambaNova** | Enterprise inference | Custom silicon | | 18 | **AI21** | Jamba models | Long context | | 19 | **Stability AI** | Stable Diffusion, text models | Image + text | | 20 | **NVIDIA NIM** | Optimized model serving | GPU-optimized | ### Tier 3: Infrastructure, Platform & Gateway Providers | # | Provider | Key Product | Notes | |---|----------|-------------|-------| | 21 | **Cloudflare Workers AI** | Edge inference | Edge computing | | 22 | **Vercel AI** | AI SDK, v0 | Frontend-focused | | 23 | **OpenRouter** | Multi-model gateway | 500+ models | | 24 | **HuggingFace** | Inference API, 300+ models | Open-source hub | | 25 | **DeepInfra** | Inference platform | Cost-effective | | 26 | **Novita AI** | 200+ production APIs | Multi-modal | | 27 | **Baseten** | Model serving | Custom deployments | | 28 | **Anyscale** | Ray-based inference | Scalable | | 29 | **Lambda AI** | GPU cloud + inference | | | 30 | **OctoAI** | Optimized inference | | | 31 | **Databricks** | DBRX, model serving | Data + AI | | 32 | **Snowflake** | Cortex AI | Data warehouse + AI | | 33 | **Oracle OCI** | OCI AI | Enterprise | | 34 | **SAP Generative AI Hub** | Enterprise AI | SAP ecosystem | | 35 | **IBM WatsonX** | Granite models | Enterprise | ### Tier 4: Chinese & Regional Providers | # | Provider | Key Product | Notes | |---|----------|-------------|-------| | 36 | **Alibaba (Qwen/Dashscope)** | Qwen 2.5/3 series | Top Chinese open-source | | 37 | **Baidu (Wenxin/ERNIE)** | ERNIE 4.0 | Chinese market leader | | 38 | **ByteDance (Doubao)** | Doubao/Kimi | TikTok parent | | 39 | **Zhipu AI** | GLM-4.5 | ChatGLM lineage | | 40 | **Baichuan** | Baichuan 4 | Domain-specific (law, finance) | | 41 | **Moonshot AI (Kimi)** | Kimi K1.5/K2 | 128K context | | 42 | **01.AI (Yi)** | Yi-Large, Yi-34B | Founded by Kai-Fu Lee | | 43 | **MiniMax** | MiniMax models | Chinese AI tiger | | 44 | **StepFun** | Step models | Chinese AI tiger | | 45 | **Tencent (Hunyuan)** | Hunyuan models | WeChat ecosystem | | 46 | **iFlyTek (Spark)** | Spark models | Voice/NLP specialist | | 47 | **SenseNova (SenseTime)** | SenseNova models | Vision + language | | 48 | **Volcano Engine (ByteDance)** | Cloud AI services | ByteDance cloud | | 49 | **Nebius AI** | Inference platform | Yandex spinoff | ### Tier 5: Emerging, Niche & Specialized Providers | # | Provider | Key Product | Notes | |---|----------|-------------|-------| | 50 | **Aleph Alpha** | Luminous models | EU-focused, compliance | | 51 | **Comet API** | ML experiment tracking | | | 52 | **Writer** | Palmyra models | Enterprise content | | 53 | **Reka AI** | Reka Core/Flash | Multimodal | | 54 | **Upstage** | Solar models | Korean provider | | 55 | **FriendliAI** | Inference optimization | | | 56 | **Forefront AI** | Model hosting | | | 57 | **GooseAI** | GPT-NeoX hosting | Low cost | | 58 | **NLP Cloud** | Model hosting | | | 59 | **Predibase** | Fine-tuning platform | LoRA specialist | | 60 | **Clarifai** | Vision + LLM | | | 61 | **AiLAYER** | AI platform | | | 62 | **AIMLAPI** | Multi-model API | | | 63 | **Corcel** | Decentralized inference | Bittensor-based | | 64 | **HyperBee AI** | AI platform | | | 65 | **Lamini** | Fine-tuning + inference | | | 66 | **Monster API** | GPU inference | | | 67 | **Neets.ai** | TTS + LLM | | | 68 | **Featherless AI** | Inference | | | 69 | **Hyperbolic** | Inference platform | | | 70 | **Inference.net** | Open-source inference | | | 71 | **Galadriel** | Decentralized AI | | | 72 | **PublicAI** | Community inference | | | 73 | **Bytez** | Model hosting | | | 74 | **Chutes** | Inference | | | 75 | **GMI Cloud** | GPU cloud + inference | | | 76 | **Nscale** | Inference platform | | | 77 | **Scaleway** | European cloud AI | | | 78 | **OVHCloud AI** | European cloud AI | | | 79 | **Heroku AI** | PaaS AI add-on | | | 80 | **Sarvam.ai** | Indian AI models | | ### Tier 6: Self-Hosted & Local Inference | # | Provider | Key Product | Notes | |---|----------|-------------|-------| | 81 | **Ollama** | Local LLM runner | No API key needed | | 82 | **LM Studio** | Desktop LLM | No API key needed | | 83 | **vLLM** | Inference engine | Self-hosted | | 84 | **Llamafile** | Single-file LLM | Self-hosted | | 85 | **Xinference** | Inference platform | Self-hosted | | 86 | **Triton Inference Server** | NVIDIA serving | Self-hosted | | 87 | **LlamaGate** | Gateway | Self-hosted | | 88 | **Docker Model Runner** | Container inference | Self-hosted | ### Tier 7: Aggregators, Gateways & Middleware | # | Provider | Key Product | Notes | |---|----------|-------------|-------| | 89 | **LiteLLM** | AI gateway (107 providers) | Open-source | | 90 | **Portkey** | AI gateway | Observability | | 91 | **Helicone** | LLM observability | Proxy-based | | 92 | **Bifrost** | AI gateway (Go) | Fastest gateway | | 93 | **Kong AI Gateway** | API management | Enterprise | | 94 | **Vercel AI Gateway** | Edge AI | | | 95 | **Cloudflare AI Gateway** | Edge AI | | | 96 | **Agenta** | LLM ops platform | | | 97 | **Straico** | Multi-model | | | 98 | **AI302** | Gateway | | | 99 | **AIHubMix** | Gateway | | | 100 | **Zenmux** | Gateway | | | 101 | **Poe** | Multi-model chat | Quora | | 102 | **Gitee AI** | Chinese GitHub AI | | | 103 | **GitHub Models** | GitHub-hosted inference | | | 104 | **GitHub Copilot** | Code completion | | | 105 | **ModelScope** | Chinese model hub | Alibaba | | 106 | **Voyage AI** | Embeddings | | | 107 | **Jina AI** | Embeddings + search | | | 108 | **Deepgram** | Speech-to-text | | | 109 | **ElevenLabs** | Text-to-speech | | | 110 | **Black Forest Labs** | Image generation (FLUX) | | | 111 | **Fal AI** | Image/video generation | | | 112 | **RunwayML** | Video generation | | | 113 | **Recraft** | Image generation | | | 114 | **DataRobot** | ML platform | | | 115 | **Weights & Biases** | ML ops + inference | | | 116 | **CompactifAI** | Model compression | | | 117 | **GradientAI** | Fine-tuning | | | 118 | **Topaz** | AI platform | | | 119 | **Synthetic** | Data generation | | | 120 | **Infiniai** | Inference | | | 121 | **Higress** | AI gateway | Alibaba | | 122 | **PPIO** | Inference | | | 123 | **Qiniu** | Chinese cloud AI | | | 124 | **NanoGPT** | Lightweight inference | | | 125 | **Morph** | AI platform | | | 126 | **Milvus** | Vector DB + AI | | | 127 | **XiaoMi MiMo** | Xiaomi AI | | | 128 | **Petals** | Distributed inference | | | 129 | **ZeroOne** | AI platform | | | 130 | **Lemonade** | AI platform | | | 131 | **Taichu** | Chinese AI | | | 132 | **Amazon Nova** | AWS native models | | --- ## 4. API Key Patterns by Provider ### 4.1 Confirmed Key Prefixes & Formats | Provider | Prefix | Regex Pattern | Confidence | |----------|--------|---------------|------------| | **OpenAI (legacy)** | `sk-` | `sk-[a-zA-Z0-9]{48}` | High | | **OpenAI (project)** | `sk-proj-` | `sk-proj-[a-zA-Z0-9_-]{80,}` | High | | **OpenAI (service account)** | `sk-svcacct-` | `sk-svcacct-[a-zA-Z0-9_-]{80,}` | High | | **OpenAI (legacy user)** | `sk-None-` | `sk-None-[a-zA-Z0-9_-]{80,}` | High | | **Anthropic (API)** | `sk-ant-api03-` | `sk-ant-api03-[a-zA-Z0-9_\-]{93}AA` | High | | **Anthropic (Admin)** | `sk-ant-admin01-` | `sk-ant-admin01-[a-zA-Z0-9_\-]{93}AA` | High | | **Google AI / Gemini** | `AIza` | `AIza[0-9A-Za-z\-_]{35}` | High | | **HuggingFace (user)** | `hf_` | `hf_[a-zA-Z]{34}` | High | | **HuggingFace (org)** | `api_org_` | `api_org_[a-zA-Z]{34}` | High | | **Groq** | `gsk_` | `gsk_[a-zA-Z0-9]{48,}` | High | | **Replicate** | `r8_` | `r8_[a-zA-Z0-9]{40}` | High | | **Fireworks AI** | `fw_` | `fw_[a-zA-Z0-9_-]{40,}` | Medium | | **Perplexity** | `pplx-` | `pplx-[a-zA-Z0-9]{48}` | High | | **AWS (general)** | `AKIA` | `AKIA[0-9A-Z]{16}` | High | | **GitHub PAT** | `ghp_` | `ghp_[a-zA-Z0-9]{36}` | High | | **Stripe (secret)** | `sk_live_` | `sk_live_[0-9a-zA-Z]{24}` | High | ### 4.2 Providers with No Known Distinct Prefix These providers use generic-looking API keys without distinguishing prefixes, making detection harder: | Provider | Key Format | Detection Approach | |----------|-----------|-------------------| | **Mistral AI** | Generic alphanumeric | Keyword-based (`MISTRAL_API_KEY`) | | **Cohere** | Generic alphanumeric | Keyword-based (`COHERE_API_KEY`, `CO_API_KEY`) | | **Together AI** | Generic alphanumeric | Keyword-based | | **DeepSeek** | `sk-` prefix (same as OpenAI legacy) | Keyword context needed | | **Azure OpenAI** | 32-char hex | Keyword-based | | **Stability AI** | `sk-` prefix | Keyword context needed | | **AI21** | Generic alphanumeric | Keyword-based | | **Cerebras** | Generic alphanumeric | Keyword-based | | **SambaNova** | Generic alphanumeric | Keyword-based | ### 4.3 Detection Difficulty Tiers **Easy (unique prefix):** OpenAI (sk-proj-, sk-svcacct-), Anthropic (sk-ant-), HuggingFace (hf_), Groq (gsk_), Replicate (r8_), Perplexity (pplx-), AWS (AKIA) **Medium (shared or short prefix):** OpenAI legacy (sk-), DeepSeek (sk-), Stability (sk-), Fireworks (fw_), Google (AIza) **Hard (no prefix, keyword-only):** Mistral, Cohere, Together AI, Azure OpenAI, AI21, Cerebras, most Chinese providers --- ## 5. Key Validation Approaches ### 5.1 Common Validation Endpoints | Provider | Validation Method | Endpoint | Cost | |----------|-------------------|----------|------| | **OpenAI** | List models | `GET /v1/models` | Free (no tokens consumed) | | **Anthropic** | Send minimal message | `POST /v1/messages` (tiny prompt) | Minimal cost (~1 token) | | **Google Gemini** | List models | `GET /v1/models` | Free | | **Cohere** | Token check | `POST /v1/tokenize` or `/v1/generate` | Minimal | | **HuggingFace** | Whoami | `GET /api/whoami` | Free | | **Groq** | List models | `GET /v1/models` | Free | | **Replicate** | Get account | `GET /v1/account` | Free | | **Mistral** | List models | `GET /v1/models` | Free | | **AWS** | STS GetCallerIdentity | `POST sts.amazonaws.com` | Free | | **Azure OpenAI** | List deployments | `GET /openai/deployments` | Free | ### 5.2 Validation Strategy Patterns 1. **Passive detection (regex only):** Fastest, highest false positive rate. Used by Gitleaks, detect-secrets baseline mode. 2. **Passive + entropy:** Combines regex with entropy scoring. Reduces false positives for generic patterns. Used by detect-secrets with entropy plugins. 3. **Active verification (API call):** Makes lightweight API call to confirm key is live. Used by TruffleHog, GitHub secret scanning. Eliminates false positives but requires network access. 4. **Deep analysis (permission enumeration):** Beyond verification, enumerates what the key can access. Used by TruffleHog for ~20 credential types. Most actionable but slowest. ### 5.3 How Existing Tools Validate | Tool | Passive | Entropy | Active Verification | Permission Analysis | |------|:-------:|:-------:|:-------------------:|:-------------------:| | TruffleHog | Yes | No | Yes (800+ detectors) | Yes (~20 types) | | Gitleaks | Yes | Optional | No | No | | detect-secrets | Yes | Yes | Limited | No | | Nosey Parker | Yes | ML-based | No | No | | GitGuardian | Yes | Yes | Yes (selected) | Limited | | GitHub Scanning | Yes | AI-based | Yes (selected) | No | | SecurityWall | Yes | No | Generates curl cmds | No | | KeyHacks | No | No | Manual curl cmds | Limited | --- ## 6. Market Gaps & Opportunities ### 6.1 Underserved Areas 1. **LLM-specific comprehensive scanner:** No tool covers all 50+ LLM API providers with both detection and validation. 2. **New key format coverage:** OpenAI's `sk-proj-` and `sk-svcacct-` formats are recent; many scanners only detect legacy `sk-` format. Gitleaks only added these in late 2025 via PR #1780. 3. **Chinese/regional provider detection:** Almost zero coverage for Qwen, Baichuan, Zhipu, Moonshot, Yi, ERNIE, Doubao API keys in any scanner. 4. **Key metadata extraction:** No tool extracts org, project, rate limits, or spend from detected LLM keys. 5. **Agentic AI context:** With AI agents increasingly using API keys, there's a growing need for scanners that understand multi-key configurations (e.g., an agent with OpenAI + Anthropic + Serp API keys). 6. **Vibe coding exposure:** VibeFactory's scanner addresses the problem of API keys exposed in frontend JavaScript by vibe-coded apps, but this is still nascent. ### 6.2 Scale of the Problem - **28 million credentials leaked on GitHub in 2025** (Snyk) - **1,275,105 leaked AI service secrets in 2025** (GitGuardian), up 81% YoY - **8 of 10 fastest-growing leaked secret categories are AI-related** (GitGuardian) - Fastest growing: Brave Search API (+1,255%), Firecrawl (+796%), Supabase (+992%) - AI keys are found at **42.28 per million commits** for Groq alone (GitGuardian) ### 6.3 Competitive Landscape Summary ``` Verification Depth | TruffleHog | ████████████████ (800+ detectors, deep analysis) GitGuardian | ████████████ (450+ detectors, commercial) GitHub | ██████████ (AI-powered, platform-locked) Gitleaks | ████ (150+ regex, no verification) detect-sec | ███ (27 plugins, baseline approach) NoseyParker | ██ (188 rules, ML denoising, retired) | +------ LLM Provider Coverage ------> None of these tools provide >15 LLM provider detectors. The market opportunity is a scanner focused on 50-100+ LLM providers with active verification, permission analysis, and cost estimation. ``` --- ## Sources ### Open-Source Scanner Tools - [TruffleHog - GitHub](https://github.com/trufflesecurity/trufflehog) - [TruffleHog Detectors](https://trufflesecurity.com/detectors) - [Gitleaks - GitHub](https://github.com/gitleaks/gitleaks) - [Gitleaks Config (gitleaks.toml)](https://github.com/gitleaks/gitleaks/blob/master/config/gitleaks.toml) - [detect-secrets - GitHub](https://github.com/Yelp/detect-secrets) - [Nosey Parker - GitHub](https://github.com/praetorian-inc/noseyparker) - [KeyHacks - GitHub](https://github.com/streaak/keyhacks) - [Secrets Patterns DB - GitHub](https://github.com/mazen160/secrets-patterns-db) - [regextokens - GitHub](https://github.com/odomojuli/regextokens) - [Betterleaks - Gitleaks Successor](https://www.aikido.dev/blog/betterleaks-gitleaks-successor) ### Comparison & Analysis - [TruffleHog vs Gitleaks Comparison (Jit)](https://www.jit.io/resources/appsec-tools/trufflehog-vs-gitleaks-a-detailed-comparison-of-secret-scanning-tools) - [Best Secret Scanning Tools 2025 (Aikido)](https://www.aikido.dev/blog/top-secret-scanning-tools) - [8 Best Secret Scanning Tools 2026 (AppSec Santa)](https://appsecsanta.com/sast-tools/secret-scanning-tools) - [Secret Scanning Tools 2026 (GitGuardian)](https://blog.gitguardian.com/secret-scanning-tools/) ### API Key Patterns & Validation - [OpenAI API Key Format Discussion](https://community.openai.com/t/regex-s-to-validate-api-key-and-org-id-format/44619) - [OpenAI sk-proj Key Format](https://community.openai.com/t/how-to-create-an-api-secret-key-with-prefix-sk-only-always-creates-sk-proj-keys/1263531) - [Gitleaks OpenAI Regex PR #1780](https://github.com/gitleaks/gitleaks/pull/1780) - [GitHub Leaked API Keys Patterns](https://gist.github.com/win3zz/0a1c70589fcbea64dba4588b93095855) - [GitGuardian Groq API Key Detector](https://docs.gitguardian.com/secrets-detection/secrets-detection-engine/detectors/specifics/groq_api_key) ### LLM Key Validation Tools - [TestMyAPIKey.com](https://www.testmyapikey.com/) - [SecurityWall API Key Checker](https://securitywall.co/tools/api-key-checker) - [VibeFactory API Key Scanner](https://vibefactory.ai/api-key-security-scanner) - [KeyLeak Detector - GitHub](https://github.com/Amal-David/keyleak-detector) ### LLM Provider Lists - [LiteLLM Providers (107)](https://docs.litellm.ai/docs/providers) - [Langbase Supported Providers](https://langbase.com/docs/supported-models-and-providers) - [LLM-Interface API Keys Doc](https://github.com/samestrin/llm-interface/blob/main/docs/api-keys.md) - [Artificial Analysis Provider Leaderboard](https://artificialanalysis.ai/leaderboards/providers) - [Top LLM API Providers 2026 (Future AGI)](https://futureagi.substack.com/p/top-11-llm-api-providers-in-2026) ### GitHub Secret Scanning - [GitHub Supported Secret Scanning Patterns](https://docs.github.com/en/code-security/secret-scanning/introduction/supported-secret-scanning-patterns) - [GitHub Adds 37 New Detectors (March 2026)](https://devops.com/github-adds-37-new-secret-detectors-in-march-extends-scanning-to-ai-coding-agents/) - [GitHub Secret Scanning Coverage Update](https://github.blog/changelog/2026-03-31-github-secret-scanning-nine-new-types-and-more/) ### Market Data - [State of Secrets Sprawl 2026 (GitGuardian/Hacker News)](https://thehackernews.com/2026/03/the-state-of-secrets-sprawl-2026-9.html) - [Why 28M Credentials Leaked on GitHub in 2025 (Snyk)](https://snyk.io/articles/state-of-secrets/) - [GitGuardian AI Security](https://www.gitguardian.com/agentic-ai-security)