576 lines
23 KiB
Markdown
576 lines
23 KiB
Markdown
# Architecture Patterns
|
|
|
|
**Domain:** API key / secret scanner with OSINT recon, web dashboard, and notification system
|
|
**Project:** KeyHunter
|
|
**Researched:** 2026-04-04
|
|
**Overall confidence:** HIGH (TruffleHog/Gitleaks internals verified via DeepWiki and official repos; Go patterns verified via official docs and production examples)
|
|
|
|
---
|
|
|
|
## Recommended Architecture
|
|
|
|
KeyHunter is a single Go binary composed of seven discrete subsystems. Each subsystem owns its own package boundary. Communication between subsystems flows through well-defined interfaces — not direct struct coupling.
|
|
|
|
```
|
|
CLI (Cobra)
|
|
|
|
|
+---> Scanning Engine (regex + entropy + Aho-Corasick pre-filter)
|
|
| |
|
|
| +--> Provider Registry (YAML definitions, embed.FS at compile time)
|
|
| +--> Source Adapters (file, dir, git, URL, stdin, clipboard)
|
|
| +--> Worker Pool (goroutine pool + buffered channels)
|
|
| +--> Verification Engine (opt-in, per-provider HTTP endpoints)
|
|
|
|
|
+---> OSINT / Recon Engine (80+ sources, category-based orchestration)
|
|
| |
|
|
| +--> Source Modules (one module per category, rate-limited)
|
|
| +--> Dork Engine (YAML dorks, multi-search-engine dispatch)
|
|
| +--> Recon Worker Pool (per-source concurrency + throttle)
|
|
|
|
|
+---> Import Adapters (TruffleHog JSON, Gitleaks JSON -> internal Finding)
|
|
|
|
|
+---> Storage Layer (SQLite via go-sqlcipher, AES-256 at rest)
|
|
|
|
|
+---> Web Dashboard (htmx + Tailwind, Go templates, embed.FS, SSE)
|
|
|
|
|
+---> Notification System (Telegram bot, long polling, command router)
|
|
|
|
|
+---> Scheduler (gocron, cron expressions, persisted job state)
|
|
```
|
|
|
|
---
|
|
|
|
## Component Boundaries
|
|
|
|
### 1. CLI Layer (`pkg/cli`)
|
|
|
|
**Responsibility:** Command routing only. Zero business logic. Parses flags, wires subcommands, starts the correct subsystem. Uses Cobra (industry standard for Go CLIs, used by TruffleHog v3 and Gitleaks).
|
|
|
|
**Communicates with:** All subsystems as the top-level entry point.
|
|
|
|
**Key commands:** `scan`, `verify`, `import`, `recon`, `keys`, `serve`, `dorks`, `providers`, `config`, `hook`, `schedule`.
|
|
|
|
**Build notes:** Cobra subcommand tree should be defined as a package; main.go should remain under 30 lines.
|
|
|
|
---
|
|
|
|
### 2. Provider Registry (`pkg/providers`)
|
|
|
|
**Responsibility:** Load and serve provider definitions. Providers are YAML files embedded at compile time via `//go:embed providers/*.yaml`. The registry parses them on startup into an in-memory slice of `Provider` structs.
|
|
|
|
**Provider YAML schema:**
|
|
```yaml
|
|
name: openai
|
|
version: 1
|
|
keywords: ["sk-proj-", "openai"]
|
|
patterns:
|
|
- regex: 'sk-proj-[A-Za-z0-9]{48}'
|
|
entropy_min: 3.5
|
|
confidence: high
|
|
verify:
|
|
method: POST
|
|
url: https://api.openai.com/v1/models
|
|
headers:
|
|
Authorization: "Bearer {KEY}"
|
|
valid_status: [200]
|
|
invalid_status: [401, 403]
|
|
```
|
|
|
|
**Communicates with:** Scanning Engine (provides patterns and keywords), Verification Engine (provides verify endpoint specs), Web Dashboard (provider listing pages).
|
|
|
|
**Build rationale:** Must be implemented first. Everything downstream depends on it. No external loading at runtime — compile-time embed gives single binary advantage TruffleHog documented as a key design goal.
|
|
|
|
---
|
|
|
|
### 3. Scanning Engine (`pkg/engine`)
|
|
|
|
**Responsibility:** Core detection pipeline. Replicates TruffleHog v3's three-stage approach: keyword pre-filter → regex/entropy detection → optional verification. Manages the goroutine worker pool.
|
|
|
|
**Pipeline stages (mirrors TruffleHog's architecture):**
|
|
|
|
```
|
|
Source Adapter → chunker → [keyword pre-filter: Aho-Corasick]
|
|
|
|
|
[detector workers] (8x CPU multiplier)
|
|
|
|
|
[verification workers] (1x multiplier, opt-in)
|
|
|
|
|
results channel
|
|
|
|
|
[output formatter]
|
|
```
|
|
|
|
**Aho-Corasick pre-filter:** Before running expensive regex, scan chunks for keyword presence. TruffleHog documented this delivers approximately 10x performance improvement on large codebases. Each provider supplies `keywords` — the Aho-Corasick automaton is built from all keywords at startup.
|
|
|
|
**Channel-based communication:**
|
|
- `chunksChan chan Chunk` — raw chunks from sources
|
|
- `detectableChan chan Chunk` — keyword-matched chunks only
|
|
- `resultsChan chan Finding` — confirmed detections
|
|
- All channels are buffered to prevent goroutine starvation.
|
|
|
|
**Source adapters implement a single interface:**
|
|
```go
|
|
type Source interface {
|
|
Name() string
|
|
Chunks(ctx context.Context, ch chan<- Chunk) error
|
|
}
|
|
```
|
|
|
|
**Concrete source adapters:** `FileSource`, `DirSource`, `GitSource`, `URLSource`, `StdinSource`, `ClipboardSource`.
|
|
|
|
**Communicates with:** Provider Registry (fetches detector specs), Verification Engine (forwards candidates), Storage Layer (persists findings), Output Formatter (writes CLI results).
|
|
|
|
---
|
|
|
|
### 4. Verification Engine (`pkg/verify`)
|
|
|
|
**Responsibility:** Active key validation. Off by default, activated with `--verify`. Makes HTTP calls to provider-defined endpoints with the discovered key. Classifies results as `verified` (valid key), `invalid` (key rejected), or `unknown` (endpoint unreachable/ambiguous).
|
|
|
|
**Caching:** Results are cached in-memory per session to avoid duplicate API calls for the same key. Cache key = `provider:key_hash`.
|
|
|
|
**Rate limiting:** Per-provider rate limiter (token bucket) prevents triggering account lockouts or abuse detection.
|
|
|
|
**Communicates with:** Scanning Engine (receives candidates), Storage Layer (updates finding status), Notification System (triggers alerts on verified finds).
|
|
|
|
---
|
|
|
|
### 5. OSINT / Recon Engine (`pkg/recon`)
|
|
|
|
**Responsibility:** Orchestrates searches across 80+ external sources in 18 categories. Acts as a dispatcher: receives a target query, fans out to all configured source modules, aggregates raw text results, and pipes them into the Scanning Engine.
|
|
|
|
**Category-module mapping:**
|
|
```
|
|
pkg/recon/
|
|
sources/
|
|
iot/ (shodan, censys, zoomeye, fofa, netlas, binaryedge)
|
|
code/ (github, gitlab, bitbucket, huggingface, kaggle, ...)
|
|
search/ (google, bing, duckduckgo, yandex, brave dorking)
|
|
paste/ (pastebin, dpaste, hastebin, rentry, ix.io, ...)
|
|
registry/ (npm, pypi, rubygems, crates.io, maven, nuget, ...)
|
|
container/ (docker hub layers, k8s configs, terraform, helm)
|
|
cloud/ (s3, gcs, azure blob, do spaces, minio)
|
|
cicd/ (travis, circleci, github actions, jenkins)
|
|
archive/ (wayback machine, commoncrawl)
|
|
forum/ (stackoverflow, reddit, hackernews, dev.to, medium)
|
|
collab/ (notion, confluence, trello)
|
|
frontend/ (source maps, webpack, exposed .env, swagger)
|
|
log/ (elasticsearch, grafana, sentry)
|
|
intel/ (virustotal, intelx, urlhaus)
|
|
mobile/ (apk decompile)
|
|
dns/ (crt.sh, endpoint probing)
|
|
api/ (postman, swaggerhub)
|
|
```
|
|
|
|
**Each source module implements:**
|
|
```go
|
|
type ReconSource interface {
|
|
Name() string
|
|
Category() string
|
|
Search(ctx context.Context, query string, opts Options) ([]string, error)
|
|
RateLimit() rate.Limit
|
|
}
|
|
```
|
|
|
|
**Orchestrator behavior:**
|
|
1. Fan out to all enabled source modules concurrently.
|
|
2. Each module uses its own `rate.Limiter` (respects per-source limits).
|
|
3. Stealth mode adds jitter delays and respects robots.txt.
|
|
4. Aggregated text results → chunked → fed to Scanning Engine.
|
|
|
|
**Dork Engine (`pkg/recon/dorks`):** Separate sub-component. Reads YAML dork definitions, formats them per search engine syntax, dispatches to search source modules.
|
|
|
|
**Communicates with:** Scanning Engine (sends chunked recon text for detection), Storage Layer (persists recon job state and results), CLI Layer.
|
|
|
|
---
|
|
|
|
### 6. Import Adapters (`pkg/importers`)
|
|
|
|
**Responsibility:** Parse external tool JSON output (TruffleHog, Gitleaks) and convert to internal `Finding` structs for storage. Decouples third-party formats from internal model.
|
|
|
|
**Adapters:**
|
|
- `TruffleHogAdapter` — parses TruffleHog v3 JSON output
|
|
- `GitleaksAdapter` — parses Gitleaks v8 JSON output
|
|
|
|
**Communicates with:** Storage Layer only (writes normalized findings).
|
|
|
|
---
|
|
|
|
### 7. Storage Layer (`pkg/storage`)
|
|
|
|
**Responsibility:** Persistence. All findings, provider data, recon jobs, scan metadata, dorks, and scheduler state live here. SQLite via go-sqlcipher (AES-256 encryption at rest).
|
|
|
|
**Schema boundaries:**
|
|
```
|
|
findings (id, provider, key_masked, key_encrypted, status, source, path, timestamp, verified)
|
|
scans (id, type, target, started_at, finished_at, finding_count)
|
|
recon_jobs (id, query, categories, started_at, finished_at, source_count)
|
|
scheduled_jobs (id, cron_expr, scan_config_json, last_run, next_run, enabled)
|
|
settings (key, value)
|
|
```
|
|
|
|
**Key masking:** Full keys are AES-256 encrypted in `key_encrypted`. Display value in `key_masked` is truncated to first 8 / last 4 characters. `--unmask` flag decrypts on access.
|
|
|
|
**Communicates with:** All subsystems that need persistence (Scanning Engine, Recon Engine, Import Adapters, Dashboard, Scheduler, Notification System).
|
|
|
|
---
|
|
|
|
### 8. Web Dashboard (`pkg/dashboard`)
|
|
|
|
**Responsibility:** Embedded web UI. Go templates + htmx + Tailwind CSS, all embedded via `//go:embed` at compile time. No external JS framework. Server-sent events (SSE) for live scan progress without WebSocket complexity.
|
|
|
|
**Pages:** scans, keys, recon, providers, dorks, settings.
|
|
|
|
**HTTP server:** Standard library `net/http` is sufficient. No framework overhead needed for this scale.
|
|
|
|
**SSE pattern for live updates:**
|
|
```go
|
|
// Scan progress pushed to browser via SSE
|
|
// Browser uses hx-sse extension to update scan status table
|
|
```
|
|
|
|
**Communicates with:** Storage Layer (reads/writes), Scanning Engine (triggers scans, receives SSE events), Recon Engine (triggers recon jobs).
|
|
|
|
---
|
|
|
|
### 9. Notification System (`pkg/notify`)
|
|
|
|
**Responsibility:** Telegram bot integration. Sends alerts on verified findings, responds to bot commands. Uses long polling (preferred for single-instance local tools — no public URL needed, simpler setup than webhooks).
|
|
|
|
**Bot commands map to CLI commands:** `/scan`, `/verify`, `/recon`, `/status`, `/stats`, `/subscribe`, `/key`.
|
|
|
|
**Subscribe pattern:** Users `/subscribe` to be notified when verified findings are discovered. Subscriber chat IDs stored in SQLite settings.
|
|
|
|
**Communicates with:** Storage Layer (reads findings, subscriber list), Scanning Engine (receives verified finding events).
|
|
|
|
---
|
|
|
|
### 10. Scheduler (`pkg/scheduler`)
|
|
|
|
**Responsibility:** Cron-based recurring scan scheduling. Uses `go-co-op/gocron` (actively maintained fork of jasonlvhit/gocron). Scheduled job definitions persisted in SQLite so they survive restarts.
|
|
|
|
**Communicates with:** Storage Layer (reads/writes job definitions), Scanning Engine (triggers scans), Notification System (notifies on scan completion).
|
|
|
|
---
|
|
|
|
## Data Flow
|
|
|
|
### Flow 1: CLI Scan
|
|
|
|
```
|
|
User: keyhunter scan --path ./repo --verify
|
|
|
|
CLI Layer
|
|
-> parses flags, builds ScanConfig
|
|
-> calls Engine.Scan(ctx, config)
|
|
|
|
Scanning Engine
|
|
-> GitSource.Chunks() produces chunks onto chunksChan
|
|
-> Aho-Corasick filter passes keyword-matched chunks to detectableChan
|
|
-> Detector Workers apply provider patterns, produce candidates on resultsChan
|
|
-> Verification Workers (if --verify) call provider verify endpoints
|
|
-> Findings written to Storage Layer
|
|
-> Output Formatter writes colored table / JSON / SARIF to stdout
|
|
```
|
|
|
|
### Flow 2: Recon Job
|
|
|
|
```
|
|
User: keyhunter recon --query "OPENAI_API_KEY" --categories code,paste,search
|
|
|
|
CLI Layer
|
|
-> calls Recon Engine with query + categories
|
|
|
|
Recon Engine
|
|
-> fans out to all enabled source modules for selected categories
|
|
-> each module rate-limits itself, fetches content
|
|
-> raw text results chunked and sent to Scanning Engine via internal channel
|
|
|
|
Scanning Engine
|
|
-> same pipeline as Flow 1
|
|
-> findings tagged with recon source metadata
|
|
-> persisted to Storage Layer
|
|
```
|
|
|
|
### Flow 3: Web Dashboard Live Scan
|
|
|
|
```
|
|
Browser: POST /api/scan (hx-post from htmx)
|
|
-> Dashboard handler creates scan record in Storage Layer
|
|
-> Dashboard handler starts Scanning Engine in goroutine
|
|
-> Browser subscribes to SSE endpoint GET /api/scan/:id/events
|
|
-> Engine emits progress events to SSE channel
|
|
-> htmx SSE extension updates scan status table in real time
|
|
-> On completion, full findings table rendered via hx-get
|
|
```
|
|
|
|
### Flow 4: Scheduled Scan + Telegram Notification
|
|
|
|
```
|
|
Scheduler (gocron)
|
|
-> fires job at cron time
|
|
-> reads ScanConfig from SQLite scheduled_jobs
|
|
-> triggers Scanning Engine
|
|
|
|
Scanning Engine
|
|
-> runs scan, persists findings
|
|
|
|
Notification System
|
|
-> on verified finding: reads subscriber list from SQLite
|
|
-> sends Telegram message to each subscriber via bot API (long poll loop)
|
|
```
|
|
|
|
### Flow 5: Import from External Tool
|
|
|
|
```
|
|
User: keyhunter import --tool trufflehog --file th_output.json
|
|
|
|
CLI Layer -> Import Adapter (TruffleHogAdapter)
|
|
-> reads JSON, maps to []Finding
|
|
-> writes to Storage Layer
|
|
-> prints import summary to stdout
|
|
```
|
|
|
|
---
|
|
|
|
## Build Order (Phase Dependencies)
|
|
|
|
This ordering reflects hard dependencies — a later component cannot be meaningfully built without the earlier ones.
|
|
|
|
| Order | Component | Depends On | Why First |
|
|
|-------|-----------|-----------|-----------|
|
|
| 1 | Provider Registry | nothing | All other subsystems depend on provider definitions. Must exist before any detection can be designed. |
|
|
| 2 | Storage Layer | nothing (schema only) | Findings model must be defined before anything writes to it. |
|
|
| 3 | Scanning Engine (core pipeline) | Provider Registry, Storage Layer | Engine is the critical path. Source adapters and worker pool pattern established here. |
|
|
| 4 | Verification Engine | Scanning Engine, Provider Registry | Layered on top of scanning, needs provider verify specs. |
|
|
| 5 | Output Formatters (table, JSON, SARIF, CSV) | Scanning Engine | Needed to validate scanner output before building anything on top. |
|
|
| 6 | Import Adapters | Storage Layer | Self-contained, only needs storage model. Can be parallel with 4/5. |
|
|
| 7 | OSINT / Recon Engine | Scanning Engine, Storage Layer | Builds on the established scanning pipeline as its consumer. |
|
|
| 8 | Dork Engine | Recon Engine (search sources) | Sub-component of Recon; needs search source modules to exist. |
|
|
| 9 | Scheduler | Scanning Engine, Storage Layer | Requires engine and persistence. Adds recurring execution on top. |
|
|
| 10 | Web Dashboard | Storage Layer, Scanning Engine, Recon Engine | Aggregates all subsystems into UI; must be last. |
|
|
| 11 | Notification System | Storage Layer, Verification Engine | Triggered by verification events; needs findings and subscriber storage. |
|
|
|
|
**MVP critical path:** Provider Registry → Storage Layer → Scanning Engine → Verification Engine → Output Formatters.
|
|
|
|
Everything else (OSINT, Dashboard, Notifications, Scheduler) layers on top of this proven core.
|
|
|
|
---
|
|
|
|
## Patterns to Follow
|
|
|
|
### Pattern 1: Buffered Channel Pipeline (TruffleHog-derived)
|
|
|
|
**What:** Goroutine stages connected by buffered channels. Each stage has a configurable concurrency multiplier.
|
|
|
|
**When:** Any multi-stage concurrent processing (scanning, recon aggregation).
|
|
|
|
**Example:**
|
|
```go
|
|
// Engine spin-up
|
|
chunksChan := make(chan Chunk, 1000)
|
|
detectableChan := make(chan Chunk, 500)
|
|
resultsChan := make(chan Finding, 100)
|
|
|
|
// Stage goroutines
|
|
for i := 0; i < runtime.NumCPU()*8; i++ {
|
|
go detectorWorker(detectableChan, resultsChan, providers)
|
|
}
|
|
for i := 0; i < runtime.NumCPU(); i++ {
|
|
go verifyWorker(resultsChan, storage, notify)
|
|
}
|
|
```
|
|
|
|
**Why:** Decouples stages, prevents fast producers from blocking slow consumers, enables independent scaling of each stage.
|
|
|
|
---
|
|
|
|
### Pattern 2: Source Interface + Adapter
|
|
|
|
**What:** All scan inputs implement a single `Source` interface. New sources are added by implementing the interface, not changing the engine.
|
|
|
|
**When:** Adding any new input type (new code host, new file format).
|
|
|
|
**Example:**
|
|
```go
|
|
type Source interface {
|
|
Name() string
|
|
Chunks(ctx context.Context, ch chan<- Chunk) error
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### Pattern 3: YAML Provider with compile-time embed
|
|
|
|
**What:** Provider definitions live in `providers/*.yaml`, embedded at compile time. No runtime file loading.
|
|
|
|
**When:** Adding new LLM provider detection support.
|
|
|
|
**Why:** Single binary distribution. Zero external dependencies at runtime. Community can submit PRs with YAML files — no Go code required to add a provider.
|
|
|
|
```go
|
|
//go:embed providers/*.yaml
|
|
var providersFS embed.FS
|
|
```
|
|
|
|
---
|
|
|
|
### Pattern 4: Rate Limiter per Recon Source
|
|
|
|
**What:** Each recon source module holds its own `golang.org/x/time/rate.Limiter`. The orchestrator does not centrally throttle.
|
|
|
|
**When:** All external HTTP calls in the recon engine.
|
|
|
|
**Why:** Different sources have wildly different rate limits (Shodan: 1 req/s free; GitHub: 30 req/min unauthenticated; Pastebin: no documented limit). Centralizing would set all to the slowest.
|
|
|
|
---
|
|
|
|
### Pattern 5: SSE for Dashboard Live Updates
|
|
|
|
**What:** Server-Sent Events pushed from Go HTTP handler to htmx SSE extension. One-way server→browser push. No WebSocket needed.
|
|
|
|
**When:** Live scan progress, recon job status.
|
|
|
|
**Why:** SSE uses standard HTTP, works through proxies, simpler than WebSockets for one-way push, supported natively by htmx SSE extension.
|
|
|
|
---
|
|
|
|
## Anti-Patterns to Avoid
|
|
|
|
### Anti-Pattern 1: Global State for Provider Registry
|
|
|
|
**What:** Storing providers as package-level globals loaded once at startup.
|
|
|
|
**Why bad:** Makes testing impossible without full initialization. Prevents future per-scan provider subsets.
|
|
|
|
**Instead:** Pass a `*ProviderRegistry` explicitly to the engine constructor.
|
|
|
|
---
|
|
|
|
### Anti-Pattern 2: Unbuffered Result Channels
|
|
|
|
**What:** Using `make(chan Finding)` (unbuffered) for the results pipeline.
|
|
|
|
**Why bad:** A slow output writer blocks detector workers, collapsing parallelism. TruffleHog's architecture explicitly uses buffered channels managing thousands of concurrent operations.
|
|
|
|
**Instead:** Buffer proportional to expected throughput (`make(chan Finding, 1000)`).
|
|
|
|
---
|
|
|
|
### Anti-Pattern 3: Direct HTTP in Detector Workers
|
|
|
|
**What:** Detector goroutines making HTTP calls to verify endpoints inline.
|
|
|
|
**Why bad:** Verification is slow (network I/O). It would block detector workers, killing throughput.
|
|
|
|
**Instead:** Separate verification worker pool as a distinct pipeline stage (TruffleHog's design).
|
|
|
|
---
|
|
|
|
### Anti-Pattern 4: Runtime YAML Loading for Providers
|
|
|
|
**What:** Loading provider YAML from filesystem at scan time.
|
|
|
|
**Why bad:** Breaks single binary distribution. Users must manage provider files separately. Security risk (external file modification).
|
|
|
|
**Instead:** `//go:embed providers/*.yaml` at compile time.
|
|
|
|
---
|
|
|
|
### Anti-Pattern 5: Storing Plaintext Keys in SQLite
|
|
|
|
**What:** Storing full API keys as plaintext in the database.
|
|
|
|
**Why bad:** Database file = credential dump. Any process with file access can read all found keys.
|
|
|
|
**Instead:** AES-256 encrypt the full key column. Store only masked version for display. Decrypt on explicit `--unmask` or via auth-gated dashboard endpoint.
|
|
|
|
---
|
|
|
|
### Anti-Pattern 6: Monolithic Recon Orchestrator
|
|
|
|
**What:** One giant function that loops through all 80+ sources sequentially.
|
|
|
|
**Why bad:** Recon over 80 sources sequentially would take hours. No per-source error isolation.
|
|
|
|
**Instead:** Fan-out pattern. Each source module runs concurrently in its own goroutine. Errors are per-source (one failing source doesn't abort the job).
|
|
|
|
---
|
|
|
|
## Package Structure
|
|
|
|
```
|
|
keyhunter/
|
|
main.go (< 30 lines, cobra root init)
|
|
cmd/ (cobra command definitions)
|
|
scan.go
|
|
recon.go
|
|
keys.go
|
|
serve.go
|
|
...
|
|
pkg/
|
|
providers/ (Provider struct, YAML loader, embed.FS)
|
|
engine/ (scanning pipeline, worker pool, Aho-Corasick)
|
|
sources/ (Source interface + concrete adapters)
|
|
file.go
|
|
dir.go
|
|
git.go
|
|
url.go
|
|
stdin.go
|
|
clipboard.go
|
|
verify/ (Verification engine, HTTP client, cache)
|
|
recon/ (Recon orchestrator)
|
|
sources/ (ReconSource interface + category modules)
|
|
iot/
|
|
code/
|
|
search/
|
|
paste/
|
|
...
|
|
dorks/ (Dork engine, YAML dork loader)
|
|
importers/ (TruffleHog + Gitleaks JSON adapters)
|
|
storage/ (SQLite layer, go-sqlcipher, schema, migrations)
|
|
dashboard/ (HTTP handlers, Go templates, embed.FS)
|
|
static/ (tailwind CSS, htmx JS — embedded)
|
|
templates/ (HTML templates — embedded)
|
|
notify/ (Telegram bot, long polling, command router)
|
|
scheduler/ (gocron wrapper, SQLite persistence)
|
|
output/ (Table, JSON, SARIF, CSV formatters)
|
|
config/ (Config struct, YAML config file, env vars)
|
|
providers/ (YAML provider definitions — embedded at build)
|
|
openai.yaml
|
|
anthropic.yaml
|
|
...
|
|
dorks/ (YAML dork definitions — embedded at build)
|
|
github.yaml
|
|
google.yaml
|
|
...
|
|
```
|
|
|
|
---
|
|
|
|
## Scalability Considerations
|
|
|
|
| Concern | Single user / local tool | Team / shared instance |
|
|
|---------|--------------------------|------------------------|
|
|
| Concurrency | Worker pool default: `8x NumCPU` detectors | Configurable via `--concurrency` flag |
|
|
| Storage | SQLite handles millions of findings at local scale | SQLite WAL mode for concurrent readers; migrate to PostgreSQL only if needed (out of scope per PROJECT.md) |
|
|
| Recon rate limits | Per-source rate limiters; stealth mode adds jitter | API keys / tokens configured per source for higher limits |
|
|
| Dashboard | Embedded single-instance; no auth by default | Optionally add basic auth via config for shared deployments |
|
|
| Verification | Opt-in; per-provider rate limiting prevents API abuse | Same — no change needed at team scale |
|
|
|
|
---
|
|
|
|
## Sources
|
|
|
|
- DeepWiki TruffleHog engine architecture: https://deepwiki.com/trufflesecurity/trufflehog/2.1-engine-configuration (HIGH confidence — generated from official source)
|
|
- TruffleHog v3 official repo: https://github.com/trufflesecurity/trufflehog (HIGH confidence)
|
|
- TruffleHog v3 source packages: https://pkg.go.dev/github.com/trufflesecurity/trufflehog/v3/pkg/sources (HIGH confidence)
|
|
- Gitleaks official repo: https://github.com/gitleaks/gitleaks (HIGH confidence)
|
|
- Go embed package: https://pkg.go.dev/embed (HIGH confidence — official)
|
|
- go-co-op/gocron: https://github.com/go-co-op/gocron (HIGH confidence)
|
|
- go-sqlcipher (AES-256): https://github.com/mutecomm/go-sqlcipher (MEDIUM confidence — check active maintenance status)
|
|
- SQLCipher: https://github.com/sqlcipher/sqlcipher (HIGH confidence)
|
|
- SSE with Go + htmx: https://threedots.tech/post/live-website-updates-go-sse-htmx/ (MEDIUM confidence — community blog, well-verified pattern)
|
|
- Telego (Telegram bot Go): https://github.com/mymmrac/telego (MEDIUM confidence)
|
|
- TruffleHog v3 introducing blog: https://trufflesecurity.com/blog/introducing-trufflehog-v3 (HIGH confidence — official)
|