msramalho
a6e3240af1
closes #399 and global dependency updates
2026-02-23 11:13:31 +00:00
dependabot[bot]
bf4c196cc2
Bump the actions group with 5 updates
...
Bumps the actions group with 5 updates:
| Package | From | To |
| --- | --- | --- |
| [actions/checkout](https://github.com/actions/checkout ) | `4` | `6` |
| [docker/login-action](https://github.com/docker/login-action ) | `3.4.0` | `3.7.0` |
| [docker/metadata-action](https://github.com/docker/metadata-action ) | `5.7.0` | `5.10.0` |
| [actions/setup-python](https://github.com/actions/setup-python ) | `5` | `6` |
| [actions/cache](https://github.com/actions/cache ) | `4` | `5` |
Updates `actions/checkout` from 4 to 6
- [Release notes](https://github.com/actions/checkout/releases )
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md )
- [Commits](https://github.com/actions/checkout/compare/v4...v6 )
Updates `docker/login-action` from 3.4.0 to 3.7.0
- [Release notes](https://github.com/docker/login-action/releases )
- [Commits](74a5d14239...c94ce9fb46 )
Updates `docker/metadata-action` from 5.7.0 to 5.10.0
- [Release notes](https://github.com/docker/metadata-action/releases )
- [Commits](902fa8ec7d...c299e40c65 )
Updates `actions/setup-python` from 5 to 6
- [Release notes](https://github.com/actions/setup-python/releases )
- [Commits](https://github.com/actions/setup-python/compare/v5...v6 )
Updates `actions/cache` from 4 to 5
- [Release notes](https://github.com/actions/cache/releases )
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md )
- [Commits](https://github.com/actions/cache/compare/v4...v5 )
---
updated-dependencies:
- dependency-name: actions/checkout
dependency-version: '6'
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: actions
- dependency-name: docker/login-action
dependency-version: 3.7.0
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: actions
- dependency-name: docker/metadata-action
dependency-version: 5.10.0
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: actions
- dependency-name: actions/setup-python
dependency-version: '6'
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: actions
- dependency-name: actions/cache
dependency-version: '5'
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: actions
...
Signed-off-by: dependabot[bot] <support@github.com >
2026-02-01 20:17:43 +00:00
Miguel Sozinho Ramalho
c640cc898a
Merge pull request #385 from bellingcat/dev
...
1.2.0 dependencies, small bugs, 1st time contributors
v1.2.0
2026-01-08 15:55:40 +00:00
msramalho
3e2c0b564b
wiki fix
2026-01-08 15:49:42 +00:00
msramalho
5fd23baa55
this is ruff
2026-01-08 15:48:08 +00:00
msramalho
8a450310c7
version bump for new release
2026-01-08 15:41:27 +00:00
msramalho
bef8a14089
pyperclip version bump closes #339
2026-01-08 15:40:17 +00:00
msramalho
cd0b093e7a
browsertrix-crawler to 1.9.2 see #383
2026-01-08 15:33:40 +00:00
msramalho
096c9d09ef
fix for unexpected types for json.dump
2026-01-08 15:18:19 +00:00
Miguel Sozinho Ramalho
df3521e9ca
Merge pull request #377 from m4cd4r4/fix/improve-deleted-post-detection
...
Fix #335 : Add comprehensive deletion detection for removed/unavailable content
2026-01-08 15:06:21 +00:00
msramalho
a89d0193e4
removes patch file
2026-01-08 15:02:00 +00:00
msramalho
536cbd905f
puts tests file in correct directory
2026-01-08 14:55:40 +00:00
msramalho
a936921c4e
updates new utils file and test
2026-01-08 14:54:06 +00:00
Miguel Sozinho Ramalho
68f672a4fa
Merge branch 'dev' into fix/improve-deleted-post-detection
2026-01-08 14:36:17 +00:00
Miguel Sozinho Ramalho
4ee0ad1cf8
Merge pull request #359 from mjgaughan/specify-medatada-feature
...
implementing default metadata omission/user metadata selection
2026-01-08 14:34:50 +00:00
msramalho
bac809451c
expands tests to included non predefined metadata keys
2026-01-08 14:33:16 +00:00
msramalho
53dc9904ce
refactorws PR to obey standard code approach
2026-01-08 14:30:26 +00:00
Miguel Sozinho Ramalho
c1f312d42a
Merge branch 'dev' into specify-medatada-feature
2026-01-08 14:04:42 +00:00
msramalho
23c9dfe717
updating dependencies
2026-01-08 13:53:44 +00:00
m4cd4r4
d02e7e0f02
Add comprehensive deletion detection for removed/unavailable content
...
Implements issue #335 : improve detection of deleted/missing posts
## Changes
### New Deletion Detection System
- Created `deletion_detection.py` utility module with platform-specific
indicators for Twitter, Facebook, Instagram, TikTok, YouTube, Reddit,
VK, and Telegram
- Detects deletion via HTML content, page titles, error messages, and
video metadata
- Stores detailed deletion context (indicator, source, platform) in
metadata for investigators
### Integration Points
- **Antibot Extractor**: Checks HTML and page titles after page load;
resolves TODO about detecting deleted videos
- **Generic Extractor**: Checks yt-dlp video data and error messages
for deletion indicators
- **Twitter Dropin**: Enhanced detection when user/created_at fields
are missing
### Test Coverage
- Comprehensive test suite covering all platforms
- Tests for HTML, title, error message, and metadata detection
- Validates that normal content is not falsely flagged
## Impact for Conflict Documentation
This fix is critical for evidence preservation in war-torn regions:
- Investigators can now document that evidence existed but was deleted
- Prevents wasted archival attempts on deleted content
- Tracks patterns of content removal
- Preserves metadata about what was deleted and when
Twitter example: Detects "Hmm...this page doesn't exist. Try searching
for something else" and flags content as deleted_or_unavailable.
2025-12-17 18:40:58 +08:00
Miguel Sozinho Ramalho
56526a9ac7
Merge pull request #365 from bellingcat/dev
...
Facebook reels fix
v1.1.6
2025-10-23 10:40:43 +01:00
msramalho
3a22cc28c0
skip tiktok antibot test in CI
2025-10-23 10:17:14 +01:00
msramalho
dbb3dfa04f
fixes wikipedia test
2025-10-23 10:04:44 +01:00
msramalho
01bdb35f5d
version bump
2025-10-23 09:51:31 +01:00
msramalho
43cbc6ac56
generic extractor improvements
2025-10-23 09:51:14 +01:00
msramalho
9c7cab1ae2
dependencies update
2025-10-22 21:07:12 +01:00
msramalho
a9a0bae083
dependencies update
v1.1.5
2025-10-22 18:11:36 +01:00
Miguel Sozinho Ramalho
97d133ce79
Merge pull request #357 from bellingcat/dev
...
small improvements on tiktok and verison bumps
v1.1.4
2025-10-22 16:02:26 +01:00
msramalho
432ee3dcfd
version bump
2025-10-22 15:50:50 +01:00
mgaughan
94e0803fb3
implementing default metadata omission/user metadata selection
2025-09-22 20:16:40 -05:00
msramalho
794b4f6052
Merge branch 'dev' of https://github.com/bellingcat/auto-archiver into dev
2025-09-11 15:06:27 +01:00
msramalho
965d7d41dd
dependency updates
2025-09-11 15:06:25 +01:00
Miguel Sozinho Ramalho
e73faa70cc
Merge pull request #352 from mjgaughan/developer-documentation-updates
...
updating the style-checking code in the documentation
2025-08-11 10:42:53 +01:00
mgaughan
80beab9f23
ruff-fix -> ruff-clean; there is no ruff-fix in the Makefile. Maybe the command /should/ be ruff-fix to align with the underlying ruff command; for later discussion. This at least reconciles the documentation to the Makefile
2025-08-05 21:36:32 -04:00
Miguel Sozinho Ramalho
200cea4e12
Merge pull request #345 from mjgaughan/main
...
Correction of small documentation typos
2025-07-29 09:36:10 +01:00
mgaughan
1256fde159
updating location of .env.test.example in documentation
2025-07-23 13:04:48 -04:00
mgaughan
65e222e177
fixing typo in documentation pytest -> poetry
2025-07-22 17:20:59 -04:00
mgaughan
f2eb9ef784
correcting to double-dash in the poetry install documentation
2025-07-21 17:55:48 -04:00
msramalho
2081c16555
embed retry into timestamping
2025-07-10 14:49:53 +01:00
msramalho
d3efd7121c
avoid empty metadata comments
2025-07-06 14:05:17 +01:00
msramalho
9d3cd5774b
an improved approach for #295
2025-07-06 14:04:01 +01:00
Miguel Sozinho Ramalho
80d61e8b85
Merge pull request #341 from bellingcat/dev
...
Address several small bugs, includes tiktok photos extraction, and data-saving for proxy usage in generic_extractor.
v1.1.2
2025-07-05 20:28:00 +01:00
msramalho
d36cdbfa87
fixing pypaperclip see issue #339
2025-07-05 19:07:23 +01:00
msramalho
c1506ee1cf
some wayback errors are expected and should be warnings
2025-07-05 18:31:39 +01:00
msramalho
3a34a49822
adds antibot tiktok logic for photos closes #295
2025-07-05 18:31:12 +01:00
msramalho
37c6d97275
new auth wall check logic and escaped CSS selector in selenium
2025-07-05 18:30:31 +01:00
msramalho
7234eda85f
expands Sheets API retries for really large spreadsheets
2025-07-05 18:29:33 +01:00
msramalho
a8c1ef3912
generic_extractor config to use proxy only when needed to avoid overzealousness
2025-07-05 16:54:58 +01:00
msramalho
52ed8196a5
updates dependencies
2025-07-05 16:03:47 +01:00
msramalho
2051e8e491
adds further exponential backoff for Sheets API worksheet enumeration
2025-07-05 16:02:07 +01:00