docs(01-02): complete provider registry plan

- SUMMARY.md: schema validation + embed loader + Aho-Corasick registry
- STATE.md: updated progress (20%), decisions, metrics
- ROADMAP.md: phase 01 in-progress (1/5 summaries)
- REQUIREMENTS.md: marked CORE-02, CORE-03, CORE-06, PROV-10 complete
This commit is contained in:
salvacybersec
2026-04-05 00:13:03 +03:00
parent a9859b3384
commit 62fdb14162
4 changed files with 187 additions and 9 deletions

View File

@@ -10,11 +10,11 @@ Requirements for initial release. Each maps to roadmap phases.
### Core Engine
- [ ] **CORE-01**: Scanner engine detects API keys using keyword pre-filtering + regex matching pipeline
- [ ] **CORE-02**: Provider definitions loaded from YAML files embedded at compile time via Go embed
- [ ] **CORE-03**: Provider registry manages 108+ provider definitions with pattern, keyword, confidence, and verify metadata
- [x] **CORE-02**: Provider definitions loaded from YAML files embedded at compile time via Go embed
- [x] **CORE-03**: Provider registry manages 108+ provider definitions with pattern, keyword, confidence, and verify metadata
- [ ] **CORE-04**: Entropy analysis as secondary signal for low-confidence providers (generic key formats)
- [ ] **CORE-05**: Worker pool parallelism with configurable worker count (default: CPU count)
- [ ] **CORE-06**: Aho-Corasick keyword pre-filter runs before regex for 10x performance on large files
- [x] **CORE-06**: Aho-Corasick keyword pre-filter runs before regex for 10x performance on large files
- [ ] **CORE-07**: mmap-based large file reading for memory efficiency
### Providers
@@ -28,7 +28,7 @@ Requirements for initial release. Each maps to roadmap phases.
- [ ] **PROV-07**: 10 Tier 7 Code/Dev Tools provider definitions (GitHub Copilot, Cursor, Tabnine, Codeium, Sourcegraph, CodeWhisperer, Replit AI, Codestral, watsonx, Oracle AI)
- [ ] **PROV-08**: 10 Tier 8 Self-Hosted provider definitions (Ollama, vLLM, LocalAI, LM Studio, llama.cpp, GPT4All, text-gen-webui, TensorRT-LLM, Triton, Jan AI)
- [ ] **PROV-09**: 8 Tier 9 Enterprise provider definitions (Salesforce Einstein, ServiceNow, SAP AI Core, Palantir, Databricks, Snowflake, Oracle GenAI, HPE GreenLake)
- [ ] **PROV-10**: Provider YAML schema includes format_version and last_verified date for pattern health tracking
- [x] **PROV-10**: Provider YAML schema includes format_version and last_verified date for pattern health tracking
### Input Sources
@@ -288,7 +288,7 @@ Requirements for initial release. Each maps to roadmap phases.
| CORE-01, CORE-02, CORE-03, CORE-04, CORE-05, CORE-06, CORE-07 | Phase 1 | Pending |
| STOR-01, STOR-02, STOR-03 | Phase 1 | Pending |
| CLI-01, CLI-02, CLI-03, CLI-04, CLI-05 | Phase 1 | Pending |
| PROV-10 | Phase 1 | Pending |
| PROV-10 | Phase 1 | Complete |
| PROV-01, PROV-02 | Phase 2 | Pending |
| PROV-03, PROV-04, PROV-05, PROV-06, PROV-07, PROV-08, PROV-09 | Phase 3 | Pending |
| INPUT-01, INPUT-02, INPUT-03, INPUT-04, INPUT-05, INPUT-06 | Phase 4 | Pending |