Commit Graph

161 Commits

Author SHA1 Message Date
Patrick Robertson
b6b085854c Switch back to using yaml with dot notation
(two simple helper functions to convert between dot and dict notation)
2025-01-22 17:40:51 +01:00
Patrick Robertson
54995ad6ab Further tweaks based on __manifest__.py files
Loading configs now works
2025-01-22 13:11:43 +01:00
Patrick Robertson
4830f99300 Get parsing of manifest and combining with config file working 2025-01-21 20:03:10 +01:00
Patrick Robertson
241b35002c Initial changes to move to '__manifest__' format 2025-01-21 19:02:38 +01:00
Patrick Robertson
c41d93a634 Use already implemented helper to get version 2025-01-21 17:53:37 +01:00
Patrick Robertson
6388983815 Merge branch 'main' into youtubedlp-rewrite 2025-01-21 16:43:14 +01:00
Patrick Robertson
5b20288d06 Add a 'version' arg to get the current running version 2025-01-17 16:59:57 +01:00
erinhmclark
d3eec5d90f Basic docs structure for RTD 2025-01-15 21:45:29 +00:00
Patrick Robertson
306df62a98 Fix all instances of utcnow() 2025-01-14 17:51:41 +01:00
Patrick Robertson
4e13a09a87 Fix deprecation warning about utcnow 2025-01-14 11:01:40 +01:00
R. Miles McCain
f603400d0d Add direct Atlos integration (#137)
* Add Atlos feeder

* Add Atlos db

* Add Atlos storage

* Fix Atlos storages

* Fix Atlos feeder

* Only include URLs in Atlos feeder once they're processed

* Remove print

* Add Atlos documentation to README

* Formatting fixes

* Don't archive existing material

* avoid KeyError in atlos_db

* version bump

---------

Co-authored-by: msramalho <19508417+msramalho@users.noreply.github.com>
2024-04-15 19:25:17 +01:00
msramalho
75497f5773 minor bug fix when using an archiver_enricher in enrichers only 2024-04-15 19:02:40 +01:00
msramalho
601572d76e strip url 2024-02-29 11:54:01 +00:00
msramalho
d21e79a272 general security updates 2024-02-29 11:40:30 +00:00
msramalho
5324d562ba cleanup wacz patch 2024-02-21 18:14:30 +00:00
Miguel Sozinho Ramalho
7a21ae96af V0.9.0 - closes several open issues: new enrichers and bug fixes (#133)
* clean orchestrator code, add archiver cleanup logic

* improves documentation for database.py

* telethon archivers isolate sessions into copied files

* closes #127

* closes #125

* closes #84

* meta enricher applies to all media

* closes #61 adds subtitles and comments

* minor update

* minor fixes to yt-dlp subtitles and comments

* closes #17 but logic is imperfect.

* closes #85 ssl enhancer

* minimifies html, JS refactor for preview of certificates

* closes #91 adds freetsa timestamp authority

* version bump

* simplify download_url method

* skip ssl if nothing archived

* html preview improvements

* adds retrying lib

* manual download archiver improvements

* meta only runs when relevant data available

* new metadata convenience method

* html template improvements

* removes debug message

* does not close #91 yet, will need a few more certificate chaing logging

* adds verbosity config

* new instagram api archiver

* adds proxy support we

* adds proxy/end support and bug fix for yt-dlp

* proxy support for webdriver

* adds socks proxy to wacz_enricher

* refactor recursivity in inner media and display

* infinite recursive display

* foolproofing timestamping authortities

* version to 0.9.0

* minor fixes from code-review
2024-02-20 18:05:29 +00:00
msramalho
499832d146 fix datetime parsing 2023-12-13 18:41:48 +00:00
Miguel Sozinho Ramalho
a786d4bb0e chooses most complete result from api (#116) 2023-12-13 11:26:46 +00:00
Miguel Sozinho Ramalho
98fb574d89 fixing older db entries formats (#114) 2023-12-12 22:47:54 +00:00
Miguel Sozinho Ramalho
6f36e92e02 enables api_db cache queries if configured with new option (#113) 2023-12-12 19:20:26 +00:00
msramalho
a1742b5565 fixing whisper enricher 2023-08-05 13:57:09 +01:00
msramalho
bd231488ff parameter fix 2023-07-28 13:10:06 +01:00
msramalho
aa71c85a98 improving ignored content from waczs 2023-07-28 12:19:14 +01:00
msramalho
7a5c9c65bd detects duplicates before storing, eg: wacz getting media already fetched by another archiver 2023-07-28 10:51:48 +01:00
msramalho
fc93ebaba0 cleanup 2023-07-28 10:49:39 +01:00
msramalho
3dd3775cbd removes rearchiving logic 2023-07-27 20:14:50 +01:00
msramalho
e8f44b652e minor improvements 2023-07-27 15:42:23 +01:00
msramalho
a0971fc601 final code review changes 2023-06-26 17:32:19 +01:00
msramalho
0cba2c25c6 get all media method 2023-06-26 17:28:19 +01:00
msramalho
6cf3e109ed refactor discovery of inner media elements 2023-06-26 17:05:25 +01:00
msramalho
0a91863212 typing fixes 2023-05-24 11:18:39 +01:00
msramalho
2768225cd1 fix: generator not called 2023-05-23 19:05:47 +01:00
msramalho
1a5797d0f8 feat: orchestrator fed returns archive result 2023-05-23 18:12:04 +01:00
msramalho
613b1f1e50 properly overwrite configs 2023-05-19 12:35:19 +01:00
msramalho
a655b3c987 gsheet accepts ID too 2023-05-19 12:17:34 +01:00
msramalho
68e9d2a2ce allows yaml config to be overwritten 2023-05-19 11:49:02 +01:00
msramalho
9c25b33f1c fix: multiple storages with folder column 2023-05-09 12:14:07 +01:00
msramalho
c1a60fde8a fix: deprecates duration column 2023-05-09 11:26:19 +01:00
msramalho
9d44f4b207 content append instead of replace 2023-05-02 19:06:00 +01:00
msramalho
ae7ceba0e5 better debug 2023-05-02 19:05:18 +01:00
msramalho
97821a81bc log cleanup 2023-05-02 19:05:06 +01:00
msramalho
8c22a9df72 fixes "url-not-found" 2023-05-02 14:30:07 +01:00
msramalho
3d389ee05b add url info 2023-04-18 19:14:47 +01:00
msramalho
69bcfea2eb to_json fix 2023-04-18 18:48:51 +01:00
msramalho
493055a8d9 cleanup 2023-03-23 18:50:30 +00:00
msramalho
6f6eb2db7a Archiving Context refactor complete 2023-03-23 14:28:45 +00:00
msramalho
906ed0f6e0 creating global context and refactoring tmp_dir logic 2023-03-23 11:17:38 +00:00
msramalho
aa5430451e instagram archiver via telegram bot 2023-02-17 15:46:29 +00:00
msramalho
2a7ece5dcc cleanups and docs 2023-02-08 22:13:19 +00:00
msramalho
4854929a1d thumbnail and bot token 2023-02-02 13:49:56 +00:00