msramalho
|
23b781c866
|
new check for edge case
|
2025-06-17 20:36:22 +01:00 |
|
msramalho
|
2aec240128
|
thumbnail enricher always run probe by default
|
2025-06-17 20:28:20 +01:00 |
|
msramalho
|
c5a2fd45f9
|
log levels updated
|
2025-06-17 20:04:40 +01:00 |
|
msramalho
|
ad168785e7
|
retry for Google API 503s
|
2025-06-17 19:22:09 +01:00 |
|
msramalho
|
74a1561c3d
|
logging and clean up
|
2025-06-17 19:21:40 +01:00 |
|
msramalho
|
55d9ffaacd
|
typo
|
2025-06-17 18:51:21 +01:00 |
|
msramalho
|
f19fb575a7
|
logging updates
|
2025-06-17 18:50:54 +01:00 |
|
msramalho
|
f53b2075ba
|
fixes gdrive error
|
2025-06-17 18:45:55 +01:00 |
|
msramalho
|
2f1a07abbf
|
renaming and code improvements to json_e richer
|
2025-06-17 16:06:04 +01:00 |
|
Dave Mateer
|
b3adc5603a
|
metadata.json hardcode in storage. add new metadata_json_enricher. log level change in orchestrator
|
2025-06-17 09:51:19 +01:00 |
|
msramalho
|
dfb361e3a0
|
reset generic_extractor description in result
|
2025-06-11 19:55:54 +01:00 |
|
msramalho
|
aaa9ead39d
|
adds documentation for dropins
|
2025-06-11 17:58:53 +01:00 |
|
msramalho
|
2adcf231f7
|
new LinkedIn Dropin for Antibot
|
2025-06-11 16:51:52 +01:00 |
|
msramalho
|
b60469767a
|
more flexibility to antibot dropins media finding process
|
2025-06-11 16:51:22 +01:00 |
|
msramalho
|
e567bba6f9
|
improves docs for how-to and migrations
|
2025-06-11 13:37:03 +01:00 |
|
msramalho
|
1039e9631f
|
new reddit tests with .env.test
|
2025-06-11 11:22:23 +01:00 |
|
msramalho
|
8314833ae8
|
removes exclude_media_extensions option
|
2025-06-10 18:34:33 +01:00 |
|
msramalho
|
fc89d96517
|
escape sequence
|
2025-06-10 18:04:33 +01:00 |
|
msramalho
|
54fda9cad4
|
antibot in docker uses a different user_data_dir
|
2025-06-10 18:04:27 +01:00 |
|
msramalho
|
71636233cb
|
adds migration information and VkDropin info.
|
2025-06-10 17:07:10 +01:00 |
|
msramalho
|
fdbe96f2e4
|
vk and reddit should work without credentials but log the error
|
2025-06-10 16:44:14 +01:00 |
|
msramalho
|
773fa82f06
|
introduces reddit dropin
|
2025-06-10 16:31:19 +01:00 |
|
msramalho
|
6bbc7fb47a
|
improves antibot flow and makes auth_wall detection optional
|
2025-06-10 16:29:07 +01:00 |
|
msramalho
|
809b8c7749
|
default dropin introduced
|
2025-06-10 16:14:42 +01:00 |
|
msramalho
|
6d82655cc4
|
manifest improvement for antibot
|
2025-06-10 16:14:34 +01:00 |
|
msramalho
|
6bd493a791
|
dropin with new ytdlp feature and helper method
|
2025-06-10 16:11:55 +01:00 |
|
msramalho
|
6f02493ff1
|
adds clips extraction to VK, though generic_extractor should still be run for those
|
2025-06-08 14:36:55 +01:00 |
|
msramalho
|
1f2d637928
|
minor improvements
|
2025-06-08 14:16:21 +01:00 |
|
msramalho
|
c96fd71f35
|
minor cleanup
|
2025-06-07 20:06:53 +01:00 |
|
msramalho
|
d13a5ef003
|
adds tests in minor improvements
|
2025-06-07 19:58:18 +01:00 |
|
msramalho
|
48c1ab3c1f
|
doc improvements
|
2025-06-07 19:14:16 +01:00 |
|
msramalho
|
b2ee42ee95
|
adds the first antibot dropin: VKontakte
|
2025-06-07 19:10:01 +01:00 |
|
msramalho
|
07ff5baf07
|
adds Dropin flexible integration for antibot
|
2025-06-07 19:09:37 +01:00 |
|
msramalho
|
d202d79e0f
|
lint
|
2025-06-07 19:06:14 +01:00 |
|
msramalho
|
952487da30
|
adds missing bin dependency
|
2025-06-07 18:14:42 +01:00 |
|
msramalho
|
c7a84bc97a
|
generalizes ydl info to filename method for reusing
|
2025-06-07 18:14:08 +01:00 |
|
msramalho
|
8a897cf601
|
minimal changes: standard naming
|
2025-06-04 15:06:08 +01:00 |
|
Miguel Sozinho Ramalho
|
14c8af5cc8
|
Merge pull request #310 from djhmateer/waczscreenshot bug fix
counter_screenshots to counter_warc_files in wacz_extractor so don't …
|
2025-06-04 15:01:12 +01:00 |
|
msramalho
|
264ba82ea0
|
finish removing screenshot_enricher references
|
2025-06-04 14:31:07 +01:00 |
|
msramalho
|
bc8cf2fb29
|
minor TODO
|
2025-06-04 14:10:19 +01:00 |
|
msramalho
|
e6f3826a3a
|
dropping screenshot enricher
|
2025-06-04 12:08:59 +01:00 |
|
msramalho
|
e5a78a5d06
|
antibot can be used out of the box
|
2025-06-04 12:01:42 +01:00 |
|
msramalho
|
258fb4faaf
|
visual HTML preview improvements
|
2025-06-04 12:00:40 +01:00 |
|
msramalho
|
22408e2a98
|
adds test for antibot
|
2025-06-04 11:59:59 +01:00 |
|
msramalho
|
378b1a6d22
|
expand S3 objects content type for better preview results in non-latin languages
|
2025-06-04 11:53:41 +01:00 |
|
msramalho
|
d130c1b3fa
|
WIP attempt at ytdlp impersonation
|
2025-06-04 11:53:18 +01:00 |
|
msramalho
|
cbd189c97d
|
general cleanup
|
2025-06-04 11:53:01 +01:00 |
|
msramalho
|
d2e8f1a512
|
introduces antibot step with seleniumbase
|
2025-06-04 11:20:46 +01:00 |
|
Dave Mateer
|
c772082f0e
|
counter_screenshots to counter_warc_files in wacz_extractor so don't get error about add mulitple items with same id.
|
2025-06-03 12:34:41 +01:00 |
|
msramalho
|
ee68f3efee
|
Merge remote-tracking branch 'origin/main' into feat/seleniumbase
|
2025-06-03 11:05:16 +01:00 |
|