Implements issue #335: improve detection of deleted/missing posts
## Changes
### New Deletion Detection System
- Created `deletion_detection.py` utility module with platform-specific
indicators for Twitter, Facebook, Instagram, TikTok, YouTube, Reddit,
VK, and Telegram
- Detects deletion via HTML content, page titles, error messages, and
video metadata
- Stores detailed deletion context (indicator, source, platform) in
metadata for investigators
### Integration Points
- **Antibot Extractor**: Checks HTML and page titles after page load;
resolves TODO about detecting deleted videos
- **Generic Extractor**: Checks yt-dlp video data and error messages
for deletion indicators
- **Twitter Dropin**: Enhanced detection when user/created_at fields
are missing
### Test Coverage
- Comprehensive test suite covering all platforms
- Tests for HTML, title, error message, and metadata detection
- Validates that normal content is not falsely flagged
## Impact for Conflict Documentation
This fix is critical for evidence preservation in war-torn regions:
- Investigators can now document that evidence existed but was deleted
- Prevents wasted archival attempts on deleted content
- Tracks patterns of content removal
- Preserves metadata about what was deleted and when
Twitter example: Detects "Hmm...this page doesn't exist. Try searching
for something else" and flags content as deleted_or_unavailable.
* wacz: allow exceptional cases where more than one resource image is available
* improves generic extractor edge-cases and yt-dlp updates
* REMOVES vk_extractor until further notice
* bumps browsertrix in docker image
* npm version bump on scripts/settings
* poetry updates
* Changed log level on gsheet_feeder_db started from warning to info (#301)
* closes 305 and further fixes finding local downloads from uncommon ytdlp extractors
* use ffmpeg -bitexact to reduce duplicate content storing
* formatting
* adds yt-dlp curl-cffi
* version bump
* linting
---------
Co-authored-by: Dave Mateer <davemateer@gmail.com>