Tristan Lee
|
bb2e2806e6
|
got post transformers and channel_info transformers working for Rumble, Bitchute, Gettr
|
2022-06-21 19:05:41 -05:00 |
|
Tristan Lee
|
619fe42a31
|
got transformers for Bitchute, Rumble, and Gettr working for all raw_posts.
|
2022-06-20 21:45:41 -05:00 |
|
Tristan Lee
|
a2a7882f1c
|
fixed Gettr and Bitchute info transformers, added missing or incorrect TelegramTransformer fields, added Telegram mentions to the transformer.
|
2022-06-13 13:42:33 -05:00 |
|
Logan Williams
|
6e962de244
|
Don't scrape channel info unless specifically scraping channel info
|
2022-06-10 08:41:45 +00:00 |
|
Logan Williams
|
6183972b1a
|
Merge branch 'more-channel-info-transformers' of https://github.com/bellingcat/cisticola into main
|
2022-06-10 08:07:02 +00:00 |
|
Logan Williams
|
6294ea7ea7
|
Increment TelegramTelethonScraper version
|
2022-06-09 15:30:56 +02:00 |
|
Logan Williams
|
92d4839b5e
|
Revise Telethon scraper to use the same client connection
|
2022-06-09 10:01:27 +02:00 |
|
Logan Williams
|
9a30ecb243
|
Stop overwriting media when a large file is detected
|
2022-06-08 17:01:28 +02:00 |
|
Logan Williams
|
39358c7f23
|
Update platform ID and screenname when synchronizing with gsheet; highlight dupes
|
2022-06-06 16:36:39 +02:00 |
|
Tristan Lee
|
f4072183be
|
added transformer for Gettr
|
2022-05-20 02:22:34 -05:00 |
|
Tristan Lee
|
591f1986e8
|
added Rumble transformers and test
|
2022-05-19 19:40:48 -05:00 |
|
Tristan Lee
|
e2094522c9
|
updated Bitchute transformer and addewd test
|
2022-05-19 18:13:50 -05:00 |
|
Tristan Lee
|
f0414a4f4d
|
updated transformer tests
|
2022-05-19 16:34:19 -05:00 |
|
Tristan Lee
|
424c063ef2
|
Merge pull request #55 from bellingcat/channel-info-transformers
Transformers for raw channel info
|
2022-05-18 06:58:18 -07:00 |
|
Logan Williams
|
c279ced73d
|
Minor bug fixes from testing
|
2022-05-18 09:29:53 +01:00 |
|
Logan Williams
|
6145fd0b6b
|
Add Telegram transformer for channel info
|
2022-05-18 09:17:49 +01:00 |
|
Tristan Lee
|
317da2c9d4
|
Merge pull request #54 from bellingcat/transformers
Functional Telegram transformer
|
2022-05-17 10:18:12 -07:00 |
|
Logan Williams
|
9869612b67
|
Merge pull request #50 from bellingcat/odysee-refactor
Implemented Polyphemus refactoring changes into Odysee scraper
|
2022-05-16 12:25:35 +01:00 |
|
Logan Williams
|
7f55b721dd
|
Bug fixes in transformers
|
2022-05-13 15:39:01 +00:00 |
|
Logan Williams
|
34da733e7c
|
Add date_transformed; refinements to telethon transformer
|
2022-05-12 13:03:30 +00:00 |
|
Logan Williams
|
ab482443db
|
Merge branch 'main' of https://github.com/bellingcat/cisticola into transformers
|
2022-04-16 13:55:23 +00:00 |
|
Logan Williams
|
8535a87def
|
Ad hoc changes to transformers
|
2022-04-16 13:46:26 +00:00 |
|
Logan Williams
|
38e0104078
|
Separate logging; limit Telegram archive file size
|
2022-04-14 10:43:27 +00:00 |
|
Logan Williams
|
4c221d1133
|
Transformer for Telegram, base transformer NLP hydration; no media
|
2022-04-14 11:45:09 +02:00 |
|
Logan Williams
|
1ac8d6c603
|
Close sessions; sort channel info by least recently archived
|
2022-04-13 10:38:08 +00:00 |
|
Logan Williams
|
a0dbe7d92b
|
Catch errors in channel info
|
2022-04-13 10:10:29 +02:00 |
|
Tristan Lee
|
27b51267a7
|
fixed bugs from incorporating polyphemus refactoring changes
|
2022-04-13 00:02:12 -05:00 |
|
Tristan Lee
|
ef7afc0715
|
Merge branch 'main' into odysee-refactor
|
2022-04-12 23:26:18 -05:00 |
|
Tristan Lee
|
dfc5b77726
|
incorporated polyphemus refactoring changes
|
2022-04-12 23:23:21 -05:00 |
|
Logan Williams
|
d5f6ce485b
|
Merge pull request #46 from bellingcat/next-release
Release 2022-04-12
|
2022-04-12 14:59:01 +02:00 |
|
Logan Williams
|
d1f9dd0e01
|
Limit max # of archived files per session
|
2022-04-12 12:57:04 +00:00 |
|
Logan Williams
|
bbb9d283d5
|
Add RumbleScraper, YoutubeScraper, and BitchuteScraper to the active scrapers
|
2022-04-12 14:55:45 +02:00 |
|
Logan Williams
|
6f11b88f94
|
Use Youtube cookie for Rumble too
|
2022-04-12 14:55:18 +02:00 |
|
Logan Williams
|
7b8236e6db
|
No recursive retries
|
2022-04-12 14:55:05 +02:00 |
|
Logan Williams
|
b596d3e055
|
Merge pull request #42 from bellingcat/youtube-age-restricted
Enable download of age-restricted videos on YouTube
|
2022-04-12 11:14:45 +02:00 |
|
Logan Williams
|
1f7f957e62
|
Merge pull request #44 from bellingcat/bitchute-error
Catch errors while retrieving Bitchute videos
|
2022-04-12 11:13:52 +02:00 |
|
Logan Williams
|
e05f69bbee
|
Merge pull request #38 from bellingcat/youtube-dl-retry
Added 'retries' argument to youtube_dl options
|
2022-04-12 11:11:51 +02:00 |
|
Tristan Lee
|
1f667d532e
|
made get_videos_user use request_from_bitchute requests wrapper to catch errors
|
2022-04-06 11:40:43 -05:00 |
|
Tristan Lee
|
f17800b797
|
added required YOUTUBE_COOKIESTRING environment variable to be used by YoutubeScraper
|
2022-04-05 21:22:41 -05:00 |
|
Tristan Lee
|
a204041480
|
made requested changes to scraper version numbers
|
2022-04-05 17:03:45 -05:00 |
|
Logan Williams
|
b6386747d4
|
Add indices on appropriate columns; limit # of posts to archive
|
2022-04-04 10:54:27 +00:00 |
|
Tristan Lee
|
ed74c5692b
|
merged main
|
2022-04-03 19:35:16 -05:00 |
|
Tristan Lee
|
c7253148d1
|
added 'retries' argument to youtube-dl options, and made options consistent across youtube-dl instances.
|
2022-04-03 19:31:32 -05:00 |
|
Logan Williams
|
fccbad7a93
|
Remove 200 post limit; add log rotation
|
2022-04-03 16:32:00 +00:00 |
|
Logan Williams
|
0140b09ee8
|
Release Telethon, VK, and Gettr as 0.0.1; specify unrelease 0.0.0 otherwise
|
2022-04-03 15:29:24 +02:00 |
|
Logan Williams
|
96db662572
|
Don't add a timestamp to media that failed to archive
|
2022-04-03 14:16:03 +02:00 |
|
Logan Williams
|
ecae1aad05
|
Catch exceptions in archive_files so that archiver continues to run
|
2022-04-03 14:12:23 +02:00 |
|
Logan Williams
|
9c838aae39
|
Update media_archived column even when TG post has no media
|
2022-04-03 13:29:10 +02:00 |
|
Logan Williams
|
a82ec15f0e
|
Change archived_media to be timestamp for all scrapers
|
2022-04-03 12:02:27 +02:00 |
|
Logan Williams
|
8ee20a239c
|
Merge branch 'main' into initial-release
|
2022-04-03 11:35:12 +02:00 |
|