10 KiB
10 KiB
INSECURE FILE UPLOADS
Critical
Upload surfaces are high risk: server-side execution (RCE), stored XSS, malware distribution, storage takeover, and DoS. Modern stacks mix direct-to-cloud uploads, background processors, and CDNs—authorization and validation must hold across every step.
Scope
- Web/mobile/API uploads, direct-to-cloud (S3/GCS/Azure) presigned flows, resumable/multipart protocols (tus, S3 MPU)
- Image/document/media pipelines (ImageMagick/GraphicsMagick, Ghostscript, ExifTool, PDF engines, office converters)
- Admin/bulk importers, archive uploads (zip/tar), report/template uploads, rich text with attachments
- Serving paths: app directly, object storage, CDN, email attachments, previews/thumbnails
Methodology
- Map the pipeline: client → ingress (edge/app/gateway) → storage → processors (thumb, OCR, AV, CDR) → serving (app/storage/CDN). Note where validation and auth occur.
- Identify allowed types, size limits, filename rules, storage keys, and who serves the content. Collect baseline uploads per type and capture resulting URLs and headers.
- Exercise bypass families systematically: extension games, MIME/content-type, magic bytes, polyglots, metadata payloads, archive structure, chunk/finalize differentials.
- Validate execution and rendering: can uploaded content execute on server or client? Confirm with minimal PoCs and headers analysis.
Discovery Techniques
Surface Map
- Endpoints/fields: upload, file, avatar, image, attachment, import, media, document, template
- Direct-to-cloud params: key, bucket, acl, Content-Type, Content-Disposition, x-amz-meta-*, cache-control
- Resumable APIs: create/init → upload/chunk → complete/finalize; check if metadata/headers can be altered late
- Background processors: thumbnails, PDF→image, virus scan queues; identify timing and status transitions
Capability Probes
- Small probe files of each claimed type; diff resulting Content-Type, Content-Disposition, and X-Content-Type-Options on download
- Magic bytes vs extension: JPEG/GIF/PNG headers; mismatches reveal reliance on extension or MIME sniffing
- SVG/HTML probe: do they render inline (text/html or image/svg+xml) or download (attachment)?
- Archive probe: simple zip with nested path traversal entries and symlinks to detect extraction rules
Detection Channels
Server Execution
- Web shell execution (language dependent), config/handler uploads (.htaccess, .user.ini, web.config) enabling execution
- Interpreter-side template/script evaluation during conversion (ImageMagick/Ghostscript/ExifTool)
Client Execution
- Stored XSS via SVG/HTML/JS if served inline without correct headers; PDF JavaScript; office macros in previewers
Header And Render
- Missing X-Content-Type-Options: nosniff enabling browser sniff to script
- Content-Type reflection from upload vs server-set; Content-Disposition: inline vs attachment
Process Side Effects
- AV/CDR race or absence; background job status allows access before scan completes; password-protected archives bypass scanning
Core Payloads
Web Shells And Configs
- PHP: GIF polyglot (starts with GIF89a) followed by <?php echo 1; ?>; place where PHP is executed
- .htaccess to map extensions to code (AddType/AddHandler); .user.ini (auto_prepend/append_file) for PHP-FPM
- ASP/JSP equivalents where supported; IIS web.config to enable script execution
Stored Xss
- SVG with onload/onerror handlers served as image/svg+xml or text/html
- HTML file with script when served as text/html or sniffed due to missing nosniff
Mime Magic Polyglots
- Double extensions: avatar.jpg.php, report.pdf.html; mixed casing: .pHp, .PhAr
- Magic-byte spoofing: valid JPEG header then embedded script; verify server uses content inspection, not extensions alone
Archive Attacks
- Zip Slip: entries with ../../ to escape extraction dir; symlink-in-zip pointing outside target; nested zips
- Zip bomb: extreme compression ratios (e.g., 42.zip) to exhaust resources in processors
Toolchain Exploits
- ImageMagick/GraphicsMagick legacy vectors (policy.xml may mitigate): crafted SVG/PS/EPS invoking external commands or reading files
- Ghostscript in PDF/PS with file operators (%pipe%)
- ExifTool metadata parsing bugs; overly large or crafted EXIF/IPTC/XMP fields
Cloud Storage Vectors
- S3/GCS presigned uploads: attacker controls Content-Type/Disposition; set text/html or image/svg+xml and inline rendering
- Public-read ACL or permissive bucket policies expose uploads broadly; object key injection via user-controlled path prefixes
- Signed URL reuse and stale URLs; serving directly from bucket without attachment + nosniff headers
Advanced Techniques
Resumable Multipart
- Change metadata between init and complete (e.g., swap Content-Type/Disposition at finalize)
- Upload benign chunks, then swap last chunk or complete with different source if server trusts client-side digests only
Filename And Path
- Unicode homoglyphs, trailing dots/spaces, device names, reserved characters to bypass validators and filesystem rules
- Null-byte truncation on legacy stacks; overlong paths; case-insensitive collisions overwriting existing files
Processing Races
- Request file immediately after upload but before AV/CDR completes; or during derivative creation to get unprocessed content
- Trigger heavy conversions (large images, deep PDFs) to widen race windows
Metadata Abuse
- Oversized EXIF/XMP/IPTC blocks to trigger parser flaws; payloads in document properties of Office/PDF rendered by previewers
Header Manipulation
- Force inline rendering with Content-Type + inline Content-Disposition; test browsers with and without nosniff
- Cache poisoning via CDN with keys missing Vary on Content-Type/Disposition
Filter Bypasses
Validation Gaps
- Client-side only checks; relying on JS/MIME provided by browser; trusting multipart boundary part headers blindly
- Extension allowlists without server-side content inspection; magic-bytes only without full parsing
Evasion Tricks
- Double extensions, mixed case, hidden dotfiles, extra dots (file..png), long paths with allowed suffix
- Multipart name vs filename vs path discrepancies; duplicate parameters and late parameter precedence
Special Contexts
Rich Text Editors
- RTEs allow image/attachment uploads and embed links; verify sanitization and serving headers for embedded content
Mobile Clients
- Mobile SDKs may send nonstandard MIME or metadata; servers sometimes trust client-side transformations or EXIF orientation
Serverless And Cdn
- Direct-to-bucket uploads with Lambda/Workers post-processing; verify that security decisions are not delegated to frontends
- CDN caching of uploaded content; ensure correct cache keys and headers (attachment, nosniff)
Parser Hardening
- Validate on server: strict allowlist by true type (parse enough to confirm), size caps, and structural checks (dimensions, page count)
- Strip active content: convert SVG→PNG; remove scripts/JS from PDF; disable macros; normalize EXIF; consider CDR for risky types
- Store outside web root; serve via application or signed, time-limited URLs with Content-Disposition: attachment and X-Content-Type-Options: nosniff
- For cloud: private buckets, per-request signed GET, enforce Content-Type/Disposition on GET responses from your app/gateway
- Disable execution in upload paths; ignore .htaccess/.user.ini; sanitize keys to prevent path injections; randomize filenames
- AV + CDR: scan synchronously when possible; quarantine until verdict; block password-protected archives or process in sandbox
Validation
- Demonstrate execution or rendering of active content: web shell reachable, or SVG/HTML executing JS when viewed.
- Show filter bypass: upload accepted despite restrictions (extension/MIME/magic mismatch) with evidence on retrieval.
- Prove header weaknesses: inline rendering without nosniff or missing attachment; present exact response headers.
- Show race or pipeline gap: access before AV/CDR; extraction outside intended directory; derivative creation from malicious input.
- Provide reproducible steps: request/response for upload and subsequent access, with minimal PoCs.
False Positives
- Upload stored but never served back; or always served as attachment with strict nosniff
- Converters run in locked-down sandboxes with no external IO and no script engines; no path traversal on archive extraction
- AV/CDR blocks the payload and quarantines; access before scan is impossible by design
Impact
- Remote code execution on application stack or media toolchain host
- Persistent cross-site scripting and session/token exfiltration via served uploads
- Malware distribution via public storage/CDN; brand/reputation damage
- Data loss or corruption via overwrite/zip slip; service degradation via zip bombs or oversized assets
Pro Tips
- Keep PoCs minimal: tiny SVG/HTML for XSS, a single-line PHP/ASP where relevant, and benign magic-byte polyglots.
- Always capture download response headers and final MIME from the server/CDN; that decides browser behavior.
- Prefer transforming risky formats to safe renderings (SVG→PNG) rather than attempting complex sanitization.
- In presigned flows, constrain all headers and object keys server-side; ignore client-supplied ACL and metadata.
- For archives, extract in a chroot/jail with explicit allowlist; drop symlinks and reject traversal.
- Test finalize/complete steps in resumable flows; many validations only run on init, not at completion.
- Verify background processors with EICAR and tiny polyglots; ensure quarantine gates access until safe.
- When you cannot get execution, aim for stored XSS or header-driven script execution; both are impactful.
- Validate that CDNs honor attachment/nosniff and do not override Content-Type/Disposition.
- Document full pipeline behavior per asset type; defenses must match actual processors and serving paths.
Remember
Secure uploads are a pipeline property. Enforce strict type, size, and header controls; transform or strip active content; never execute or inline-render untrusted uploads; and keep storage private with controlled, signed access.