Commit Graph

636 Commits

Author SHA1 Message Date
Patrick Robertson
e2442b2f6b Merge pull request #243 from bellingcat/fix-long-path-names
Unit tests for storage types + fix storage too long issues for local storage
2025-03-11 10:05:09 +00:00
Patrick Robertson
3f6acc0917 fully working timestamping enricher 2025-03-11 10:04:46 +00:00
erinhmclark
e7fa88f1c7 Implementing ruff suggestions. 2025-03-10 21:45:30 +00:00
erinhmclark
ca44a40b88 Ruff fix on src. 2025-03-10 19:03:45 +00:00
erinhmclark
85abe1837a Ruff format with defaults. 2025-03-10 18:44:54 +00:00
Miguel Sozinho Ramalho
3fcec57492 minor string fix 2025-03-10 17:17:59 +00:00
Patrick Robertson
a9c3477289 Improve docs on the path_generator and filename_generator config options 2025-03-10 16:43:14 +00:00
Patrick Robertson
770f4c8a3d Refactoring of storage code:
1. Fix some bugs in local_storage
2. Refactor unit tests to not set Media.key explicitly (unless it's well-known beforehand, which it isn't)
3. Limit length of URL for 'url' type path_generator
4. Throw an error if 'save_to' of local storage is too long
5. A few other tidyups
2025-03-10 16:39:48 +00:00
Miguel Sozinho Ramalho
58bd38e292 Adds new extractor for tiktok via unofficial API (#237)
* minor update to defaults in api_db

* readme typo

* adds and tests new tikwm tiktok downloader

* addresses PR comments
2025-03-10 11:56:45 +00:00
Patrick Robertson
e89a8da3b4 Unit tests for storage types + fix storage too long issues for local storage 2025-03-10 11:30:15 +00:00
erinhmclark
76bb1496c8 Merge branch 'main' into feat/yt-dlp-pots
# Conflicts:
#	src/auto_archiver/modules/generic_extractor/__manifest__.py
2025-03-07 16:54:01 +00:00
Patrick Robertson
e519ba2433 Add 'reject all' cookie button 2025-03-07 16:40:34 +00:00
Patrick Robertson
be513e95aa Merge branch 'main' into merge_modules 2025-03-07 16:19:51 +00:00
Patrick Robertson
3fac353407 Merge pull request #217 from bellingcat/settings_page
Settings page user interface
2025-03-07 16:10:50 +00:00
erinhmclark
8fcec692b7 Add comments to highlight different steps of atlos_feeder_db_storage.py 2025-03-07 15:42:20 +00:00
erinhmclark
65109e377f Remove raising exception in atlos_feeder_db_storage.py 2025-03-07 15:39:15 +00:00
Erin Clark
85a75755e2 Merge pull request #236 from bellingcat/cleanup_fixes
Cleanup fixes
2025-03-07 15:37:05 +00:00
Patrick Robertson
333201acec Merge branch 'main' into settings_page 2025-03-07 15:17:42 +00:00
Patrick Robertson
027985024b Merge pull request #234 from bellingcat/update_suggestions
Auto Updates
2025-03-07 15:12:03 +00:00
Patrick Robertson
48b29d43f7 Merge pull request #233 from bellingcat/docker-webdriver-aarch64
Docker webdriver aarch64
2025-03-07 15:04:45 +00:00
erinhmclark
4df03255a4 Fix typo in __manifest__.py 2025-03-07 14:56:35 +00:00
Patrick Robertson
503ba3d1c1 Add note on auto updates to readme 2025-03-07 14:46:50 +00:00
erinhmclark
40e5fe7a7e Update __manifest__.py for merged Atlos module. 2025-03-07 13:46:09 +00:00
erinhmclark
89d2a8bb54 Update the __manifest__.py of the Instagram Extractor. 2025-03-07 12:34:19 +00:00
Patrick Robertson
e72b3e14ba Change default height of screenshots to attempt to capture more information 2025-03-07 12:08:29 +00:00
Patrick Robertson
dba44b1ac1 Use WebDriverWait when waiting for elements in screenshot enricher 2025-03-07 12:07:54 +00:00
Patrick Robertson
2c5e138263 Add a note on disabling the auto-update for yt-dlp 2025-03-07 11:44:24 +00:00
erinhmclark
fb56aac15e Catch edge case to ensure iterator is reached in instagram_tbot_extractor.py 2025-03-07 11:24:25 +00:00
erinhmclark
bdd35408ce Fix ref before assignment in orchestrator.py 2025-03-07 11:23:51 +00:00
Patrick Robertson
478f0b2171 Tidy-ups to auto-updating code 2025-03-07 09:59:18 +00:00
erinhmclark
fa1e65f54c Fix instagram_extractor.py typo, add warning to docs, and add basic regex test. 2025-03-06 16:25:38 +00:00
erinhmclark
b9c2f98f46 Update Atlos tests 2025-03-05 21:24:38 +00:00
erinhmclark
0f911543cd Atlos refactor 2025-03-05 13:49:11 +00:00
erinhmclark
6cb7afefdc Initial Atlos merge 2025-03-05 10:24:54 +00:00
Patrick Robertson
358884c5d1 Fix unit tests for yt-dlp update 2025-03-04 17:04:23 +00:00
Patrick Robertson
be09aa927d Make 'STARTED' command INFO not warning 2025-03-04 16:51:17 +00:00
Patrick Robertson
e6a578e60e Check for auto-archiver updates and present warning if there's a newer version available 2025-03-04 16:51:17 +00:00
Patrick Robertson
0eb112431b Auto-update yt-dlp based on generic_extractor.ytdlp_update_interval (default=5 days) 2025-03-04 16:43:46 +00:00
erinhmclark
d1c8d4ba0e Initial merge of Atlos Feeder and DB 2025-03-04 14:06:46 +00:00
erinhmclark
077b56c150 Merge GSheet Feeder and Database. 2025-03-04 14:05:19 +00:00
erinhmclark
7e4b44883b Add temp options for testing 2025-03-04 14:03:39 +00:00
erinhmclark
77b517cfc1 Merge remote-tracking branch 'origin/feat/yt-dlp-pots' into feat/yt-dlp-pots 2025-03-03 22:02:14 +00:00
erinhmclark
dd07b0b830 Allow flexible extractor_args in generic_extractor.py. 2025-03-03 21:11:34 +00:00
erinhmclark
a705a78632 Fix instagram_extractor.py typo in config value. 2025-03-03 21:06:09 +00:00
Patrick Robertson
0b5a0fcb32 Better error logs if users have XXXX_archiver modules enabled in config 2025-03-03 19:57:09 +00:00
Patrick Robertson
1fe023cd70 Throw a nicer error if a user has an orchestration.yaml file in the old format (feeder: / archivers: / formatter: ) 2025-03-03 19:51:55 +00:00
Patrick Robertson
0dfab2d1bc Add some code to attempt to click the cookies banners on various websites 2025-03-03 15:55:04 +00:00
Patrick Robertson
dea0a49600 Download correct gecko-driver for the platform + fix setting executable path when running in Docker
Fixes #232
2025-03-03 15:41:44 +00:00
Patrick Robertson
a0869bb3b2 Fixed up timestamp verifying - waiting on issue with rfc-client to be fixed
Ref: https://github.com/trailofbits/rfc3161-client/issues/104#issuecomment-2693890607
2025-03-03 10:28:30 +00:00
Patrick Robertson
65a9885d86 A few more manifest types 2025-02-27 21:33:04 +00:00