docs(13-01): complete package registry sources plan

- SUMMARY.md with 4 sources, 16 tests, 8 files
- STATE.md updated with decisions and metrics
- Requirements RECON-PKG-01, RECON-PKG-02 marked complete
This commit is contained in:
salvacybersec
2026-04-06 12:55:06 +03:00
parent 9907e2497a
commit c595fef148
3 changed files with 116 additions and 7 deletions

View File

@@ -0,0 +1,106 @@
---
phase: 13-osint_package_registries_container_iac
plan: 01
subsystem: recon
tags: [npm, pypi, crates.io, rubygems, package-registry, osint]
requires:
- phase: 10-osint-code-hosting
provides: ReconSource interface, Client, BuildQueries, LimiterRegistry patterns
provides:
- NpmSource searching npm registry JSON API
- PyPISource scraping pypi.org search HTML
- CratesIOSource searching crates.io JSON API with custom User-Agent
- RubyGemsSource searching rubygems.org search.json API
affects: [13-osint_package_registries_container_iac, register.go]
tech-stack:
added: []
patterns: [JSON API source pattern, HTML scraping source pattern with extractAnchorHrefs reuse]
key-files:
created:
- pkg/recon/sources/npm.go
- pkg/recon/sources/npm_test.go
- pkg/recon/sources/pypi.go
- pkg/recon/sources/pypi_test.go
- pkg/recon/sources/cratesio.go
- pkg/recon/sources/cratesio_test.go
- pkg/recon/sources/rubygems.go
- pkg/recon/sources/rubygems_test.go
modified: []
key-decisions:
- "PyPI uses HTML scraping with extractAnchorHrefs (reusing Replit pattern) since PyPI has no public search JSON API"
- "CratesIO sets custom User-Agent per crates.io API requirements"
patterns-established:
- "Package registry source pattern: credentialless, JSON API search, bare keyword queries via BuildQueries"
requirements-completed: [RECON-PKG-01, RECON-PKG-02]
duration: 3min
completed: 2026-04-06
---
# Phase 13 Plan 01: Package Registry Sources Summary
**Four package registry ReconSources (npm, PyPI, crates.io, RubyGems) searching JS/Python/Rust/Ruby ecosystems for provider keyword matches**
## Performance
- **Duration:** 3 min
- **Started:** 2026-04-06T09:51:16Z
- **Completed:** 2026-04-06T09:54:00Z
- **Tasks:** 2
- **Files modified:** 8
## Accomplishments
- NpmSource searches npm registry JSON API with 20-result pagination per keyword
- PyPISource scrapes pypi.org search HTML reusing extractAnchorHrefs from Replit pattern
- CratesIOSource queries crates.io JSON API with required custom User-Agent header
- RubyGemsSource queries rubygems.org search.json with fallback URL construction
- All four sources credentialless, rate-limited, context-aware with httptest test coverage
## Task Commits
Each task was committed atomically:
1. **Task 1: Implement NpmSource and PyPISource** - `4b268d1` (feat)
2. **Task 2: Implement CratesIOSource and RubyGemsSource** - `9907e24` (feat)
## Files Created/Modified
- `pkg/recon/sources/npm.go` - NpmSource searching npm registry JSON API
- `pkg/recon/sources/npm_test.go` - httptest tests for NpmSource (4 tests)
- `pkg/recon/sources/pypi.go` - PyPISource scraping pypi.org search HTML
- `pkg/recon/sources/pypi_test.go` - httptest tests for PyPISource (4 tests)
- `pkg/recon/sources/cratesio.go` - CratesIOSource with custom User-Agent
- `pkg/recon/sources/cratesio_test.go` - httptest tests verifying User-Agent header (4 tests)
- `pkg/recon/sources/rubygems.go` - RubyGemsSource searching rubygems.org JSON API
- `pkg/recon/sources/rubygems_test.go` - httptest tests for RubyGemsSource (4 tests)
## Decisions Made
- PyPI uses HTML scraping with extractAnchorHrefs (reusing Replit pattern) since PyPI has no public search JSON API
- CratesIO sets custom User-Agent header per crates.io API policy requirements
- All sources use bare keyword queries via BuildQueries default path
## Deviations from Plan
None - plan executed exactly as written.
## Issues Encountered
None
## User Setup Required
None - no external service configuration required.
## Known Stubs
None - all sources fully wired with real API endpoints and functional Sweep implementations.
## Next Phase Readiness
- Four package registry sources ready for RegisterAll wiring
- Pattern established for remaining registry sources (Maven, NuGet, GoProxy)
---
*Phase: 13-osint_package_registries_container_iac*
*Completed: 2026-04-06*