Commit Graph

16 Commits

Author SHA1 Message Date
Patrick Robertson
dea0a49600 Download correct gecko-driver for the platform + fix setting executable path when running in Docker
Fixes #232
2025-03-03 15:41:44 +00:00
erinhmclark
8124bb831d Merge branch 'main' into small_issues
# Conflicts:
#	src/auto_archiver/core/base_module.py
#	src/auto_archiver/utils/misc.py
2025-02-26 13:19:49 +00:00
erinhmclark
1df5129268 Small typos. 2025-02-25 14:08:38 +00:00
Patrick Robertson
0bec71d203 Finish how to on authentication 2025-02-20 15:33:50 +00:00
Patrick Robertson
7734a551fa Move 'assert_valid_url' out into utils, don't use assert but raise
assert is recommended only for debugging
2025-02-20 11:19:29 +00:00
Patrick Robertson
7dde8d609d Merge main 2025-02-20 10:29:57 +00:00
Patrick Robertson
a9802dd004 Remove the global _LAZY_LOADED_MODULES and allow each instance of ArchivingOrchestrator to load its own modules 2025-02-19 12:25:35 +00:00
Patrick Robertson
222a94563f WIP: Docs tidyups+add howto on logging and authentication
(Authentication is WIP)
2025-02-19 10:37:04 +00:00
Patrick Robertson
3c543a3a6a Various fixes for issues with new architecture (#208)
* Add formatters to the TOC - fixes #204

* Add 'steps' settings to the example YAML in the docs. Fixes #206

* Improved docs on authentication architecture

* Fix setting modules on the command line - they now override any module settings in the orchestration as opposed to appending

* Fix tests for gsheet-feeder: add a test service_account.json (note: not real keys in there)

* Rename the command line entrypoint to _command_line_run

Also: make it clear that code implementation should not call this
Make sure the command line entry returns (we don't want a generator)

* Fix unit tests to use now code-entry points

* Version bump

* Move iterating of generator up to __main__

* Breakpoint

* two minor fixes

* Fix unit tests + add new '__main__' entry point implementation test

* Skip youtube tests if running on CI. Should still run them locally

* Fix full implementation run on GH actions

* Fix skipif test for GH Actions CI

* Add skipifs for truth - it blocks GH:

---------

Co-authored-by: msramalho <19508417+msramalho@users.noreply.github.com>
2025-02-18 19:10:09 +00:00
erinhmclark
e97ccf8a73 Separate setup() and module_setup(). 2025-02-10 18:07:47 +00:00
erinhmclark
2c3d1f591f Separate setup() and module_setup(). 2025-02-10 17:25:15 +00:00
msramalho
15abf686b1 decouples s3_storage from hash_enricher 2025-02-10 15:48:54 +00:00
Patrick Robertson
c574b694ed Set up screenshot enricher to use authentication/cookies 2025-02-03 17:25:59 +01:00
Patrick Robertson
7a2be5a0da Add cookie extraction to 'authentication' options, get generic_extractor working using this info 2025-02-03 16:03:07 +01:00
Patrick Robertson
c25d5cae84 Remove ArchivingContext completely
Context for a specific url/item is now passed around via the metadata (metadata.set_context('key', 'val') and metadata.get_context('key', default='something')
The only other thing that was passed around in ArchivingContext was the storage info, which is already accessible now via self.config
2025-01-30 17:50:54 +01:00
Patrick Robertson
d6b4b7a932 Further cleanup
* Removes (partly) the ArchivingOrchestrator
* Removes the cli_feeder module, and makes it the 'default', allowing you to pass URLs directly on the command line, without having to use the cumbersome --cli_feeder.urls. Just do auto-archiver https://my.url.com
* More unit tests
* Improved error handling
2025-01-30 16:44:40 +01:00