Patrick Robertson
168dfb6254
Unit tests for url utils
2025-03-21 11:53:47 +04:00
erinhmclark
85abe1837a
Ruff format with defaults.
2025-03-10 18:44:54 +00:00
Patrick Robertson
7734a551fa
Move 'assert_valid_url' out into utils, don't use assert but raise
...
assert is recommended only for debugging
2025-02-20 11:19:29 +00:00
Patrick Robertson
c574b694ed
Set up screenshot enricher to use authentication/cookies
2025-02-03 17:25:59 +01:00
Patrick Robertson
b7d9145f6c
Further tidyups + refactoring for new structure
...
* Add implementation tests for orchestrator + logging tests
* Standardise method/class vars for extractors to see if they are suitable
* Fix bugs with removing default loguru logger (allows further customisation)
* Fix bug loading required fields from file
*
2025-01-30 13:21:10 +01:00
Galen Reich
381940f5a8
Fix Selenium headless invokation ( #106 )
...
Co-authored-by: msramalho <19508417+msramalho@users.noreply.github.com >
2023-11-13 11:56:35 +01:00
msramalho
ceb717ea65
exclude vk emojis
2023-08-17 18:11:26 +01:00
msramalho
6e4fb76940
exclude ok resource images from wacz enricher
2023-08-09 11:26:46 +01:00
msramalho
60a1f3a27a
minor fixes
2023-07-31 16:08:48 +01:00
msramalho
fb197f1064
excluding telegram embeds
2023-07-28 12:57:15 +01:00
msramalho
aa71c85a98
improving ignored content from waczs
2023-07-28 12:19:14 +01:00
msramalho
59551b3b20
minor improvements: finding best twitter image quality
2023-07-27 21:36:15 +01:00
msramalho
f086d89111
new escape message
2023-07-27 20:14:59 +01:00
msramalho
dd034da844
feat: WACZ enricher can now be probed for media, and used as an archiver OR enricher
2023-07-27 15:42:10 +01:00
msramalho
5505255ea3
url auth wall detect
2023-02-17 15:45:58 +00:00