Populate the GitHub source with 50 production dork queries covering every provider
category. Each dork is a real GitHub Code Search query formatted per the Dork schema
from Plan 08-01. Mirrored into `dorks/github/` (user-visible) and
`pkg/dorks/definitions/github/` (go:embed target) per the Phase 1 dual-location
pattern.
Purpose: Half of the 150+ dork requirement (DORK-02) lives here. GitHub is the
largest single source because it is the primary live executor (Plan 08-05) and
because leaked keys overwhelmingly show up in .env/config files.
Output: 50 GitHub dorks, embedded and loadable.
@.planning/phases/08-dork-engine/08-CONTEXT.md
@.planning/phases/08-dork-engine/08-01-PLAN.md
@pkg/providers/definitions/openai.yaml
@pkg/dorks/schema.go
Task 1: 25 GitHub dorks — frontier + specialized categories
pkg/dorks/definitions/github/frontier.yaml,
pkg/dorks/definitions/github/specialized.yaml,
dorks/github/frontier.yaml,
dorks/github/specialized.yaml
Create both files with the YAML list format supported by the loader. Each file
is a YAML document containing a top-level list of Dork entries. If the loader
in 08-01 was written to expect one-Dork-per-file, update it here to also
accept a list — check pkg/dorks/loader.go and adapt (preferred: loader accepts
both `type dorkFile struct { Dorks []Dork }` wrapper OR top-level list). Use
the list form.
File format (list of Dork):
```yaml
- id: openai-github-envfile
name: "OpenAI API Key in .env files"
source: github
category: frontier
query: 'sk-proj- extension:env'
description: "Finds OpenAI project keys committed in .env files"
tags: [openai, env, tier1]
- id: openai-github-pyfile
...
```
**frontier.yaml — 15 dorks** covering Tier 1/2 providers. Each provider gets
1-2 dorks. Use real, validated prefixes from pkg/providers/definitions/*.yaml:
- openai-github-envfile: `sk-proj- extension:env`
- openai-github-pyfile: `sk-proj- extension:py`
- openai-github-jsonfile: `sk-proj- extension:json`
- anthropic-github-envfile: `sk-ant-api03- extension:env`
- anthropic-github-pyfile: `sk-ant-api03- extension:py`
- google-ai-github-envfile: `AIzaSy extension:env "GOOGLE_API_KEY"`
- google-ai-github-jsonfile: `AIzaSy extension:json "generativelanguage"`
- azure-openai-envfile: `AZURE_OPENAI_KEY extension:env`
- aws-bedrock-envfile: `AKIA extension:env "bedrock"`
- xai-envfile: `xai- extension:env`
- cohere-envfile: `COHERE_API_KEY extension:env`
- mistral-envfile: `MISTRAL_API_KEY extension:env`
- groq-envfile: `gsk_ extension:env`
- together-envfile: `TOGETHER_API_KEY extension:env`
- replicate-envfile: `r8_ extension:env`
All with category: frontier, appropriate tags. Each query MUST be a literal
GitHub Code Search query — no templating.
**specialized.yaml — 10 dorks** covering Tier 3 providers:
- perplexity-envfile: `pplx- extension:env`
- voyage-envfile: `VOYAGE_API_KEY extension:env`
- jina-envfile: `jina_ extension:env`
- assemblyai-envfile: `ASSEMBLYAI_API_KEY extension:env`
- deepgram-envfile: `DEEPGRAM_API_KEY extension:env`
- elevenlabs-envfile: `ELEVENLABS_API_KEY extension:env`
- stability-envfile: `sk-stability- extension:env`
- huggingface-envfile: `hf_ extension:env`
- perplexity-config: `pplx- filename:config.yaml`
- deepgram-config: `DEEPGRAM filename:.env.local`
category: specialized.
Write identical content to both `pkg/dorks/definitions/github/{file}.yaml`
and `dorks/github/{file}.yaml`. The pkg/ copy is for go:embed, the dorks/
copy is user-visible.
**Adapt loader if needed.** If 08-01 wrote `yaml.Unmarshal(data, &Dork{})`
(single dork per file), change to:
```go
var list []Dork
if err := yaml.Unmarshal(data, &list); err != nil { return err }
dorks = append(dorks, list...)
```
Run `go test ./pkg/dorks/...` to confirm.
cd /home/salva/Documents/apikey && go test ./pkg/dorks/... && go run ./cmd/... 2>&1 || true; awk 'FNR==1{print FILENAME}/^- id:/{c++}END{print "count:",c}' pkg/dorks/definitions/github/frontier.yaml pkg/dorks/definitions/github/specialized.yaml
25 dorks loaded, all pass Validate(), tests pass.
Task 2: 25 GitHub dorks — infrastructure + emerging + enterprise
pkg/dorks/definitions/github/infrastructure.yaml,
pkg/dorks/definitions/github/emerging.yaml,
pkg/dorks/definitions/github/enterprise.yaml,
dorks/github/infrastructure.yaml,
dorks/github/emerging.yaml,
dorks/github/enterprise.yaml
Create six YAML files (three pairs) using the same list format as Task 1.
cd /home/salva/Documents/apikey && go test ./pkg/dorks/... && grep -c '^- id:' pkg/dorks/definitions/github/*.yaml | awk -F: '{s+=$NF}END{print "total github dorks:",s; if(s<50) exit 1}'
50 total GitHub dorks across 5 category files, loader picks all up, counts pass.
`cd /home/salva/Documents/apikey && go test ./pkg/dorks/...` passes
Registry reports >= 50 dorks via a throwaway main or test assertion.
<success_criteria>
50 GitHub dorks loadable via pkg/dorks.NewRegistry()