Commit Graph

976 Commits

Author SHA1 Message Date
Erin Clark
2e0e989793 Merge pull request #238 from bellingcat/cache_docker_to_registry
Add cache-from and cache-to to docker-publish.yaml, using Dockerhub Registry as the cache.
2025-03-07 15:23:50 +00:00
erinhmclark
87ab98c270 Update docker/build-push version 2025-03-07 15:14:20 +00:00
Patrick Robertson
027985024b Merge pull request #234 from bellingcat/update_suggestions
Auto Updates
2025-03-07 15:12:03 +00:00
erinhmclark
7bbf0da0d1 Add cache-from and cache-to to docker-publish.yaml. 2025-03-07 15:09:10 +00:00
Patrick Robertson
48b29d43f7 Merge pull request #233 from bellingcat/docker-webdriver-aarch64
Docker webdriver aarch64
2025-03-07 15:04:45 +00:00
Erin Clark
8ae3d9c031 Merge pull request #235 from bellingcat/instagram_extractor_bugfix
Instagram extractor bugfix:
- Fix typo from config changes
- Add warning message to documentation to alert to it not being maintained.
2025-03-07 15:02:05 +00:00
erinhmclark
4df03255a4 Fix typo in __manifest__.py 2025-03-07 14:56:35 +00:00
Patrick Robertson
503ba3d1c1 Add note on auto updates to readme 2025-03-07 14:46:50 +00:00
erinhmclark
89d2a8bb54 Update the __manifest__.py of the Instagram Extractor. 2025-03-07 12:34:19 +00:00
Patrick Robertson
e72b3e14ba Change default height of screenshots to attempt to capture more information 2025-03-07 12:08:29 +00:00
Patrick Robertson
dba44b1ac1 Use WebDriverWait when waiting for elements in screenshot enricher 2025-03-07 12:07:54 +00:00
Patrick Robertson
e756f1504f Remove geckodriver .tar file 2025-03-07 11:52:14 +00:00
Patrick Robertson
2c5e138263 Add a note on disabling the auto-update for yt-dlp 2025-03-07 11:44:24 +00:00
Patrick Robertson
478f0b2171 Tidy-ups to auto-updating code 2025-03-07 09:59:18 +00:00
erinhmclark
fa1e65f54c Fix instagram_extractor.py typo, add warning to docs, and add basic regex test. 2025-03-06 16:25:38 +00:00
Patrick Robertson
358884c5d1 Fix unit tests for yt-dlp update 2025-03-04 17:04:23 +00:00
Patrick Robertson
be09aa927d Make 'STARTED' command INFO not warning 2025-03-04 16:51:17 +00:00
Patrick Robertson
e6a578e60e Check for auto-archiver updates and present warning if there's a newer version available 2025-03-04 16:51:17 +00:00
Patrick Robertson
0eb112431b Auto-update yt-dlp based on generic_extractor.ytdlp_update_interval (default=5 days) 2025-03-04 16:43:46 +00:00
erinhmclark
a705a78632 Fix instagram_extractor.py typo in config value. 2025-03-03 21:06:09 +00:00
Patrick Robertson
a47e18ef9a Bump gecko driver to 0.36.0 2025-03-03 16:00:11 +00:00
Patrick Robertson
0dfab2d1bc Add some code to attempt to click the cookies banners on various websites 2025-03-03 15:55:04 +00:00
Patrick Robertson
dea0a49600 Download correct gecko-driver for the platform + fix setting executable path when running in Docker
Fixes #232
2025-03-03 15:41:44 +00:00
Erin Clark
011ded2bde Merge pull request #225 from bellingcat/small_issues
## GSheets Columns updates
- Update the available columns in the Google Sheet Feeder and Database.
- Update the Sheet Template to reflect this.

## Other Fixes
- Ensure test file cleanup.
- Additional tests.
- Correctly mark download test.
- Small typos.
2025-03-03 13:06:27 +00:00
erinhmclark
4280791f07 Fix mocking in test_wayback_enricher.py. 2025-02-27 11:25:58 +00:00
erinhmclark
8124bb831d Merge branch 'main' into small_issues
# Conflicts:
#	src/auto_archiver/core/base_module.py
#	src/auto_archiver/utils/misc.py
2025-02-26 13:19:49 +00:00
erinhmclark
b2e654aef9 Remove context manager from test_pdq_hash_enricher.py 2025-02-26 12:57:33 +00:00
erinhmclark
9157846930 Add docstrings to explain date formats. 2025-02-26 10:01:52 +00:00
erinhmclark
35b5ab2eb1 Update poetry.lock 2025-02-25 20:17:48 +00:00
erinhmclark
83a08dd215 Update date parsing to use dateutil.parser in misc.py 2025-02-25 20:17:31 +00:00
erinhmclark
9bc6dd5c3c Add set_content into generic_extractor.py. 2025-02-25 20:07:00 +00:00
erinhmclark
cf1219f798 Add text content into gsheet. 2025-02-25 20:06:44 +00:00
Patrick Robertson
1ad158c016 Merge pull request #211 from bellingcat/docs_improvements
Docs tidyups, howto on logging and authentication, remove exit(), small fixes
2025-02-25 14:13:13 +00:00
erinhmclark
1df5129268 Small typos. 2025-02-25 14:08:38 +00:00
erinhmclark
73b434aafc Tests for test_vk_extractor.py. 2025-02-25 14:08:28 +00:00
erinhmclark
2d276cb9c4 Fix tmp test file. 2025-02-25 14:08:14 +00:00
Patrick Robertson
d10c7fbe55 Better documentation based on the discord feedbackgst 2025-02-24 22:42:57 +00:00
Patrick Robertson
ca1ed418aa Throw an error for invalid __manifest__ syntax + fix: allow default values of False/None 2025-02-24 21:46:24 +00:00
Patrick Robertson
73a2e2d752 Fix tests for moving orchestration to secrets/orchestration.yaml 2025-02-21 19:05:39 +00:00
Patrick Robertson
091a19e25c Further docs improvements/tidy ups 2025-02-21 16:52:30 +00:00
Patrick Robertson
77212e8e3f Finishing touches to the how-tos 2025-02-20 15:45:48 +00:00
Patrick Robertson
9661e90a05 Allow disabling logging in auto_archiver with logging: enabled: false 2025-02-20 15:45:32 +00:00
Patrick Robertson
0bec71d203 Finish how to on authentication 2025-02-20 15:33:50 +00:00
Patrick Robertson
4174285898 Fix unit tests 2025-02-20 13:18:06 +00:00
Patrick Robertson
eda359a1ef Fix json loader - it should go in 'validators' not 'utils'
Fixes #214
2025-02-20 13:10:39 +00:00
Patrick Robertson
40488e0869 Use 'Auto Archiver' naming for consistency.
auto-archiver is reserved in the docs for when talking about the command line usage
2025-02-20 11:50:29 +00:00
Patrick Robertson
061f29c885 How-to on updating config file to version 0.13+ 2025-02-20 11:46:57 +00:00
Patrick Robertson
cbea551876 Better display name for wayback machine to emphasise it's typically used as an enricher 2025-02-20 11:46:57 +00:00
Patrick Robertson
b978484a89 Rename wacz_enricher to wacz_extractor_enricher. Fixes #205 2025-02-20 11:46:57 +00:00
Patrick Robertson
49b6c32058 Fix the 'full' mode which creates a complete config file 2025-02-20 11:34:05 +00:00