Commit Graph

62 Commits

Author SHA1 Message Date
Patrick Robertson
a9802dd004 Remove the global _LAZY_LOADED_MODULES and allow each instance of ArchivingOrchestrator to load its own modules 2025-02-19 12:25:35 +00:00
Patrick Robertson
3c543a3a6a Various fixes for issues with new architecture (#208)
* Add formatters to the TOC - fixes #204

* Add 'steps' settings to the example YAML in the docs. Fixes #206

* Improved docs on authentication architecture

* Fix setting modules on the command line - they now override any module settings in the orchestration as opposed to appending

* Fix tests for gsheet-feeder: add a test service_account.json (note: not real keys in there)

* Rename the command line entrypoint to _command_line_run

Also: make it clear that code implementation should not call this
Make sure the command line entry returns (we don't want a generator)

* Fix unit tests to use now code-entry points

* Version bump

* Move iterating of generator up to __main__

* Breakpoint

* two minor fixes

* Fix unit tests + add new '__main__' entry point implementation test

* Skip youtube tests if running on CI. Should still run them locally

* Fix full implementation run on GH actions

* Fix skipif test for GH Actions CI

* Add skipifs for truth - it blocks GH:

---------

Co-authored-by: msramalho <19508417+msramalho@users.noreply.github.com>
2025-02-18 19:10:09 +00:00
Patrick Robertson
6d43bc7d4d Fix generator programmatic setup (#197)
* Fix returning a generator of a generator

* Move download test test to pytest.mark.download
2025-02-15 17:36:44 +00:00
Patrick Robertson
29901da601 Merge branch 'load_modules' into docs_update 2025-02-11 14:10:56 +00:00
Patrick Robertson
2f51d3917a Further addition to docs: creating modules, configurations, installation 2025-02-11 13:49:30 +00:00
erinhmclark
d1d6cde008 Set mock timestamp without z format 2025-02-11 12:27:48 +00:00
erinhmclark
5e2e93382f Test fixes for 3.10 compliance. 2025-02-11 12:17:42 +00:00
erinhmclark
f97ec6a9e0 Fixed S3 module import 2025-02-11 11:58:28 +00:00
erinhmclark
89d9140d15 Fixed setup/ config_setup reference 2025-02-11 11:47:11 +00:00
erinhmclark
1792e02d1d skip authenticated tests in test_gdrive_storage.py 2025-02-11 11:34:36 +00:00
erinhmclark
18666ff027 skip authenticated tests in test_gsheet_feeder.py 2025-02-11 11:28:24 +00:00
erinhmclark
a69ac3e509 Fix file hash reference in S3 tests 2025-02-11 09:46:22 +00:00
erinhmclark
c4bb667cec Merge branch 'load_modules' into add_module_tests
# Conflicts:
#	src/auto_archiver/modules/s3_storage/s3_storage.py
#	src/auto_archiver/utils/gsheet.py
#	src/auto_archiver/utils/misc.py
2025-02-10 16:17:08 +00:00
erinhmclark
f311621e58 Small fixes.
Add timestamp helper method.
2025-02-10 15:57:42 +00:00
Patrick Robertson
f3f6b92817 Implementation test cleanup 2025-02-10 12:43:21 +00:00
Patrick Robertson
74207d7821 Implementation tests for auto-archiver 2025-02-10 13:27:11 +01:00
erinhmclark
e9ad1e1b85 Pass media to storage cdn_call 2025-02-06 22:01:55 +00:00
erinhmclark
266c7a14e6 Context related fixes, some more tests. 2025-02-06 16:53:00 +00:00
erinhmclark
67504a683e Merge branch 'load_modules' into add_module_tests 2025-02-06 10:13:37 +00:00
erinhmclark
5b0bad832f Updated test, test metadata 2025-02-06 10:11:56 +00:00
Patrick Robertson
6ab8fd2ee4 Tidy up setting modules as Orchestrator attributes on startup.
Don't override the values in config['steps'] – the config should be left as is
2025-02-06 10:20:05 +01:00
erinhmclark
52542812dc Merge tests from version with context. 2025-02-05 16:42:58 +00:00
Patrick Robertson
78e6418249 Unit tests for csv feeder + fix some bugs 2025-02-04 13:37:26 +01:00
Patrick Robertson
c25d5cae84 Remove ArchivingContext completely
Context for a specific url/item is now passed around via the metadata (metadata.set_context('key', 'val') and metadata.get_context('key', default='something')
The only other thing that was passed around in ArchivingContext was the storage info, which is already accessible now via self.config
2025-01-30 17:50:54 +01:00
Patrick Robertson
d76063c3f3 Fix unit tests 2025-01-30 16:46:53 +01:00
Patrick Robertson
d6b4b7a932 Further cleanup
* Removes (partly) the ArchivingOrchestrator
* Removes the cli_feeder module, and makes it the 'default', allowing you to pass URLs directly on the command line, without having to use the cumbersome --cli_feeder.urls. Just do auto-archiver https://my.url.com
* More unit tests
* Improved error handling
2025-01-30 16:44:40 +01:00
Patrick Robertson
fade68c6f4 Fix up unit tests - dataclass + subclasses not having @dataclass was breaking it 2025-01-30 13:45:24 +01:00
Patrick Robertson
b7d9145f6c Further tidyups + refactoring for new structure
* Add implementation tests for orchestrator + logging tests
* Standardise method/class vars for extractors to see if they are suitable
* Fix bugs with removing default loguru logger (allows further customisation)
* Fix bug loading required fields from file
*
2025-01-30 13:21:10 +01:00
Patrick Robertson
00a7018f36 Fix up dependency checking (use 'dependencies' instead of 'external_dependencies' -> simpler/easier to remember 2025-01-29 19:25:22 +01:00
Patrick Robertson
3d37c494aa Tidy ups + unit tests:
1. Allow loading modules from --module_paths=/extra/path/here
2. Improved unit tests for module loading
3. Further small tidy ups/clean ups
2025-01-29 18:42:49 +01:00
Patrick Robertson
7a4871db6b Fix up unit tests for new structure 2025-01-28 14:40:12 +01:00
Patrick Robertson
14e2479599 Merge branch 'more_mainifests' into load_modules 2025-01-27 11:05:56 +01:00
erinhmclark
aa7ca93a43 Update manifests and modules 2025-01-24 12:58:16 +00:00
Patrick Robertson
9befb9776c Fix loading modules when entry_point isn't set 2025-01-23 21:08:54 +01:00
Patrick Robertson
b27bf8ffeb Fix up loading/storing configs + unit tests 2025-01-23 20:32:19 +01:00
erinhmclark
1274a1b231 More manifests, base modules and rename from archiver to extractor. 2025-01-23 16:40:48 +00:00
erinhmclark
79684f8348 Set up feeder manifests (not merged by source yet) 2025-01-23 09:16:42 +00:00
Patrick Robertson
241b35002c Initial changes to move to '__manifest__' format 2025-01-21 19:02:38 +01:00
Patrick Robertson
d3e3eb7639 unit tests for loading dropins 2025-01-21 16:59:45 +01:00
Patrick Robertson
dff0105659 Small fixups + implement Truth code for posts with multiple media 2025-01-20 18:40:46 +01:00
Patrick Robertson
fd2e7f973b Further tidy-ups, also adds some ytdlp utils to 'utils' 2025-01-20 16:31:28 +01:00
Patrick Robertson
befc92deb4 Further unit test tidy ups 2025-01-17 17:29:13 +01:00
Patrick Robertson
d4893ee05e Fix unit tests for base_archiver->generic_archiver rename 2025-01-17 17:08:00 +01:00
Patrick Robertson
17c1c9c360 Fix up core unit tests when a twitter api key isn't provided 2025-01-17 12:02:38 +01:00
Patrick Robertson
394bcd8d47 Further refactoring of youtubedl_archiver->base_archiver
* Keep twitter_api_archiver
* Remove unit tests for obsolete archivers
* Guess filename of media using the 'Content-Type' header
* Add mechanism to run 'expensive' tests last (see conftest.py) and also flag expensive tests to fail straight off (pytest.mark.incremental)
2025-01-17 11:56:08 +01:00
Patrick Robertson
3168bed0d9 Add (skipped) test for twitter extraction with youtubedlp 2025-01-15 19:00:57 +01:00
Patrick Robertson
5626bba815 Add test on bluesky and note on why it doesn't work 2025-01-15 18:31:20 +01:00
Patrick Robertson
74cf1f5f23 Merge branch 'main' into youtubedlp-rewrite 2025-01-15 17:47:23 +01:00
Patrick Robertson
4f2b9baa73 refactor youtubedlp archiver to work for all valid websites
1. Extract more metadata
2. Better extract thumbnail
3. Setup framework for specific sites to provide more granular metadata processing
2025-01-15 17:46:47 +01:00
Patrick Robertson
20726c1116 Remove tiktok-downloader - getting info is broken
TODO: switch to using youtube-dlp
2025-01-14 17:40:45 +01:00