msramalho
|
2081c16555
|
embed retry into timestamping
|
2025-07-10 14:49:53 +01:00 |
|
msramalho
|
d3efd7121c
|
avoid empty metadata comments
|
2025-07-06 14:05:17 +01:00 |
|
msramalho
|
9d3cd5774b
|
an improved approach for #295
|
2025-07-06 14:04:01 +01:00 |
|
msramalho
|
c1506ee1cf
|
some wayback errors are expected and should be warnings
|
2025-07-05 18:31:39 +01:00 |
|
msramalho
|
3a34a49822
|
adds antibot tiktok logic for photos closes #295
|
2025-07-05 18:31:12 +01:00 |
|
msramalho
|
37c6d97275
|
new auth wall check logic and escaped CSS selector in selenium
|
2025-07-05 18:30:31 +01:00 |
|
msramalho
|
7234eda85f
|
expands Sheets API retries for really large spreadsheets
|
2025-07-05 18:29:33 +01:00 |
|
msramalho
|
a8c1ef3912
|
generic_extractor config to use proxy only when needed to avoid overzealousness
|
2025-07-05 16:54:58 +01:00 |
|
msramalho
|
2051e8e491
|
adds further exponential backoff for Sheets API worksheet enumeration
|
2025-07-05 16:02:07 +01:00 |
|
msramalho
|
21255db86a
|
stops using service that is not up for timestamping
|
2025-07-05 16:00:46 +01:00 |
|
msramalho
|
eae0da08b3
|
fix issue with two runs of anitbot extractor
|
2025-07-05 16:00:03 +01:00 |
|
msramalho
|
649412053e
|
exclude non-ready code
|
2025-06-30 02:27:21 +01:00 |
|
msramalho
|
b2648fa3cd
|
follow docs advice on exponential backoff of SheetsAPI
|
2025-06-30 01:47:12 +01:00 |
|
msramalho
|
4ad71b3589
|
adds retry to worksheet read for slow worksheets
|
2025-06-30 01:42:34 +01:00 |
|
msramalho
|
7c9475cde2
|
allow for human readable console logs, but defaults to JSON on file logs.
|
2025-06-30 00:53:10 +01:00 |
|
msramalho
|
afd9090a4c
|
concludes logging standardization refactor
|
2025-06-26 17:20:04 +01:00 |
|
msramalho
|
ad29cb4447
|
adds post_data to metadata for instagram
|
2025-06-26 15:48:10 +01:00 |
|
msramalho
|
ce4d7ac649
|
WIP refactor logging
|
2025-06-21 15:54:51 +01:00 |
|
msramalho
|
12b457706b
|
closes #166 adds story URL feature to telethon extractor
|
2025-06-18 17:37:44 +01:00 |
|
msramalho
|
592dc30415
|
closes #330
|
2025-06-18 16:40:55 +01:00 |
|
msramalho
|
d46eeee9b6
|
docs improved
|
2025-06-18 13:35:51 +01:00 |
|
msramalho
|
302e6f4258
|
logs improved
|
2025-06-18 13:35:43 +01:00 |
|
msramalho
|
76fd329fe5
|
twitter tests fix
|
2025-06-17 23:51:03 +01:00 |
|
msramalho
|
a3ae9ebbb3
|
log level updates
|
2025-06-17 20:36:33 +01:00 |
|
msramalho
|
23b781c866
|
new check for edge case
|
2025-06-17 20:36:22 +01:00 |
|
msramalho
|
2aec240128
|
thumbnail enricher always run probe by default
|
2025-06-17 20:28:20 +01:00 |
|
msramalho
|
c5a2fd45f9
|
log levels updated
|
2025-06-17 20:04:40 +01:00 |
|
msramalho
|
ad168785e7
|
retry for Google API 503s
|
2025-06-17 19:22:09 +01:00 |
|
msramalho
|
74a1561c3d
|
logging and clean up
|
2025-06-17 19:21:40 +01:00 |
|
msramalho
|
55d9ffaacd
|
typo
|
2025-06-17 18:51:21 +01:00 |
|
msramalho
|
f19fb575a7
|
logging updates
|
2025-06-17 18:50:54 +01:00 |
|
msramalho
|
f53b2075ba
|
fixes gdrive error
|
2025-06-17 18:45:55 +01:00 |
|
msramalho
|
6085a66c58
|
revert metadata json renaming
|
2025-06-17 16:10:24 +01:00 |
|
msramalho
|
33cca734d9
|
original_url changes still constitute empty result
|
2025-06-17 16:06:25 +01:00 |
|
msramalho
|
2f1a07abbf
|
renaming and code improvements to json_e richer
|
2025-06-17 16:06:04 +01:00 |
|
msramalho
|
664ee8d037
|
fixes bugs and limited configuration of multi-level logs
|
2025-06-17 14:10:46 +01:00 |
|
msramalho
|
1b260788de
|
do not add commit comments to code
|
2025-06-17 13:18:12 +01:00 |
|
Dave Mateer
|
b3adc5603a
|
metadata.json hardcode in storage. add new metadata_json_enricher. log level change in orchestrator
|
2025-06-17 09:51:19 +01:00 |
|
Dave Mateer
|
ba3f1a52e8
|
Logging each_level_in_separate_file feature
|
2025-06-16 16:15:54 +01:00 |
|
Dave Mateer
|
a60d800b31
|
Changed log level for media
|
2025-06-16 15:07:39 +01:00 |
|
msramalho
|
dfb361e3a0
|
reset generic_extractor description in result
|
2025-06-11 19:55:54 +01:00 |
|
msramalho
|
aaa9ead39d
|
adds documentation for dropins
|
2025-06-11 17:58:53 +01:00 |
|
msramalho
|
2adcf231f7
|
new LinkedIn Dropin for Antibot
|
2025-06-11 16:51:52 +01:00 |
|
msramalho
|
cd19181d8f
|
minor improvements
|
2025-06-11 16:51:42 +01:00 |
|
msramalho
|
b60469767a
|
more flexibility to antibot dropins media finding process
|
2025-06-11 16:51:22 +01:00 |
|
msramalho
|
d60d02c16e
|
improves download_from_url
|
2025-06-11 16:50:31 +01:00 |
|
msramalho
|
e567bba6f9
|
improves docs for how-to and migrations
|
2025-06-11 13:37:03 +01:00 |
|
msramalho
|
3cf51dd874
|
adds tracker remove feature and tests
|
2025-06-11 11:56:42 +01:00 |
|
msramalho
|
1039e9631f
|
new reddit tests with .env.test
|
2025-06-11 11:22:23 +01:00 |
|
msramalho
|
8314833ae8
|
removes exclude_media_extensions option
|
2025-06-10 18:34:33 +01:00 |
|