Commit Graph

394 Commits

Author SHA1 Message Date
erinhmclark
96b35a272c Rm gsheet references in utils 2025-01-24 18:51:15 +00:00
erinhmclark
dd402b456f Fix and add types to manifest 2025-01-24 18:50:11 +00:00
Patrick Robertson
3fc6ddfe85 Tweaks to logging strings 2025-01-24 15:30:00 +01:00
Patrick Robertson
f1e9ab6751 Merge branch 'main' into load_modules 2025-01-24 15:23:15 +01:00
Patrick Robertson
e8138eac1c Add ubuntu-latest to the matrix of test runners (#181)
* Don't clutter logs with info about generic dropin

* Add ubuntu-latest to unit tests

This is currently failing due to an issue with oscrypto and newer openssl https://github.com/wbond/oscrypto/issues/78#issuecomment-1756317472

* fix oscrypto version for ubuntu 24 compatibility (boto3 too see #180)

---------

Co-authored-by: msramalho <19508417+msramalho@users.noreply.github.com>
2025-01-24 14:03:55 +00:00
Miguel Sozinho Ramalho
a6fc4e1bb1 modifies base docker image to use browsertrix 1.4.2 (#182)
* modifies base image to newest browsertrix version

* modify browsertrix cmd args based on recent experience
2025-01-24 13:59:29 +00:00
erinhmclark
1942e8b819 Gsheets utility revert 2025-01-24 13:34:30 +00:00
erinhmclark
024fe58377 fix config parsing in manifests, remove module level configs 2025-01-24 13:33:12 +00:00
erinhmclark
0453d95f56 fix config parsing in manifests 2025-01-24 13:24:54 +00:00
erinhmclark
aa7ca93a43 Update manifests and modules 2025-01-24 12:58:16 +00:00
Patrick Robertson
9befb9776c Fix loading modules when entry_point isn't set 2025-01-23 21:08:54 +01:00
Patrick Robertson
06f6e34d9d Revert changes to orchestrator to avoid merge conflicts 2025-01-23 20:38:36 +01:00
Patrick Robertson
b27bf8ffeb Fix up loading/storing configs + unit tests 2025-01-23 20:32:19 +01:00
erinhmclark
50f4ebcdc3 Move storage configs into individual manifests, assert format on useage. 2025-01-23 17:01:30 +00:00
erinhmclark
c3403ced26 Rename storages for clarity 2025-01-23 16:51:17 +00:00
erinhmclark
1274a1b231 More manifests, base modules and rename from archiver to extractor. 2025-01-23 16:40:48 +00:00
erinhmclark
9db26cdfc2 Merge branch 'load_modules' into more_mainifests
# Conflicts:
#	src/auto_archiver/core/orchestrator.py
2025-01-23 09:19:54 +00:00
erinhmclark
79684f8348 Set up feeder manifests (not merged by source yet) 2025-01-23 09:16:42 +00:00
Patrick Robertson
65ef46d01e Fix loading already loaded modules - don't load them twice 2025-01-23 00:09:39 +01:00
Patrick Robertson
550097ab7b Get module loading working properly 2025-01-22 23:54:21 +01:00
erinhmclark
c517d35bdf Merge branch 'load_modules' into more_mainifests
# Conflicts:
#	src/auto_archiver/databases/__init__.py
2025-01-22 18:19:43 +00:00
erinhmclark
99c8c69085 Manifests for databases 2025-01-22 18:18:13 +00:00
Patrick Robertson
ade5ea0f6f Tidy up imports + start on loading modules - program now starts much faster 2025-01-22 18:45:58 +01:00
Patrick Robertson
b6b085854c Switch back to using yaml with dot notation
(two simple helper functions to convert between dot and dict notation)
2025-01-22 17:40:51 +01:00
Patrick Robertson
54995ad6ab Further tweaks based on __manifest__.py files
Loading configs now works
2025-01-22 13:11:43 +01:00
erinhmclark
7b3a1468cd Create manifest files for archiver modules. 2025-01-22 10:21:27 +01:00
Patrick Robertson
4830f99300 Get parsing of manifest and combining with config file working 2025-01-21 20:03:10 +01:00
Patrick Robertson
241b35002c Initial changes to move to '__manifest__' format 2025-01-21 19:02:38 +01:00
Patrick Robertson
03f3770223 Add __manifest__.py for generic_extractor 2025-01-21 18:00:45 +01:00
Patrick Robertson
c41d93a634 Use already implemented helper to get version 2025-01-21 17:53:37 +01:00
Patrick Robertson
cd2ae3763f Minor adjustments
Co-authored-by: Miguel Sozinho Ramalho <19508417+msramalho@users.noreply.github.com>
2025-01-21 16:24:37 +00:00
Patrick Robertson
d3e3eb7639 unit tests for loading dropins 2025-01-21 16:59:45 +01:00
Patrick Robertson
9dde9b26d0 Patch in upstream changes to ytdlp for now
Seems like ytdlp may not merge https://github.com/yt-dlp/yt-dlp/pull/12098 anytime soon
2025-01-21 16:49:49 +01:00
Patrick Robertson
7c0dcbfd81 Re-add doc string to generic_archiver
(renamed from youtube_archiver)
2025-01-21 16:49:30 +01:00
Patrick Robertson
6388983815 Merge branch 'main' into youtubedlp-rewrite 2025-01-21 16:43:14 +01:00
Patrick Robertson
4bb4ebdf82 Further cleanup, abstracts 'dropins' out into generic files 2025-01-21 16:36:45 +01:00
erinhmclark
e83ccc0d7f Cleaning up configs reference and module level. 2025-01-21 09:48:46 +00:00
Patrick Robertson
dff0105659 Small fixups + implement Truth code for posts with multiple media 2025-01-20 18:40:46 +01:00
Patrick Robertson
fd2e7f973b Further tidy-ups, also adds some ytdlp utils to 'utils' 2025-01-20 16:31:28 +01:00
Patrick Robertson
9c5a9e1bcd Rename BaseArchiver to GenericArchiver + some other tidyups 2025-01-17 17:06:04 +01:00
Patrick Robertson
5b20288d06 Add a 'version' arg to get the current running version 2025-01-17 16:59:57 +01:00
Patrick Robertson
394bcd8d47 Further refactoring of youtubedl_archiver->base_archiver
* Keep twitter_api_archiver
* Remove unit tests for obsolete archivers
* Guess filename of media using the 'Content-Type' header
* Add mechanism to run 'expensive' tests last (see conftest.py) and also flag expensive tests to fail straight off (pytest.mark.incremental)
2025-01-17 11:56:08 +01:00
erinhmclark
6fabe2a189 Fixed twitter_archiver.py changes. 2025-01-16 09:56:54 +00:00
erinhmclark
bbb3269c2b Changes from main. 2025-01-16 09:30:32 +00:00
erinhmclark
d3eec5d90f Basic docs structure for RTD 2025-01-15 21:45:29 +00:00
Patrick Robertson
74cf1f5f23 Merge branch 'main' into youtubedlp-rewrite 2025-01-15 17:47:23 +01:00
Patrick Robertson
4f2b9baa73 refactor youtubedlp archiver to work for all valid websites
1. Extract more metadata
2. Better extract thumbnail
3. Setup framework for specific sites to provide more granular metadata processing
2025-01-15 17:46:47 +01:00
Patrick Robertson
c3dd19f309 Sniff filetype of downloaded media and add extension
Also download in chunks - fixes 2 x TODOs
2025-01-15 17:46:47 +01:00
Patrick Robertson
306df62a98 Fix all instances of utcnow() 2025-01-14 17:51:41 +01:00
Patrick Robertson
20726c1116 Remove tiktok-downloader - getting info is broken
TODO: switch to using youtube-dlp
2025-01-14 17:40:45 +01:00