37 Commits

Author SHA1 Message Date
Tristan Lee
1eb82c5f3e sorted imports using isort and tried to add pre-commit hook for isort 2023-08-07 20:04:16 -05:00
Tristan Lee
1ec1d6190a implemented minor fixes recommended by pyling (unused imports, f-strings without patterns, etc.) 2023-08-07 19:39:03 -05:00
Tristan Lee
fab65a5d67 formatted with black, added pre-commit hook, pegged typing_extensions package version to fix spaCy issue 2023-08-04 14:51:00 -05:00
Tristan Lee
d3b8e1a3b3 removed unused archive_media argument passed to methods throughout codebase 2023-08-03 18:05:50 -05:00
Logan Williams
91de6482e0 Add rather hacky bulk insert functionality 2023-05-04 15:26:52 +02:00
Logan Williams
9dbf05fccb Streamline logging; fix markdown formating in Telegram 2023-05-04 10:00:14 +00:00
Logan Williams
2320ea1efd Use telethon session CLI argument always; improvements to Telegram transformer (author id/username for chats, min_id via CLI argument, use the same session) 2023-03-04 09:51:15 +01:00
Logan Williams
531059ca02 Support related Telegram chats (associated discussion groups) 2023-03-02 16:21:43 +01:00
Logan Williams
351e471ff4 Change log retention and hackily improve transform speed 2023-01-26 13:21:07 +00:00
Logan Williams
c15022402d Add an option to scape posts older than the database record as well as newer (Telegram only) 2022-09-05 13:48:01 +00:00
Logan Williams
ee24367caa Add features for running archive-media simultaneously 2022-07-20 09:26:47 +00:00
Logan Williams
9948af2c4a Media archiving ETL working for Telegram 2022-07-05 10:03:36 +02:00
Logan Williams
ed4723ed1e Fix merge error 2022-06-30 11:04:41 +00:00
Logan Williams
589ac3ba5b Merge branch 'main' of https://github.com/bellingcat/cisticola into main 2022-06-24 10:34:25 +00:00
Logan Williams
7215469a74 Correct logs for transforming posts 2022-06-24 10:32:27 +00:00
Tristan Lee
289a47d7b1 tested telegram transformers and implemented vk transformers 2022-06-23 15:06:10 -05:00
Tristan Lee
bb2e2806e6 got post transformers and channel_info transformers working for Rumble, Bitchute, Gettr 2022-06-21 19:05:41 -05:00
Logan Williams
39358c7f23 Update platform ID and screenname when synchronizing with gsheet; highlight dupes 2022-06-06 16:36:39 +02:00
Logan Williams
7c8147bb2a Add CLI for channel info transform 2022-05-18 09:20:33 +01:00
Logan Williams
4493618801 Synchronizing channels will update other info for existing channels 2022-05-12 13:02:14 +00:00
Logan Williams
ab482443db Merge branch 'main' of https://github.com/bellingcat/cisticola into transformers 2022-04-16 13:55:23 +00:00
Logan Williams
38e0104078 Separate logging; limit Telegram archive file size 2022-04-14 10:43:27 +00:00
Logan Williams
4c221d1133 Transformer for Telegram, base transformer NLP hydration; no media 2022-04-14 11:45:09 +02:00
Logan Williams
59bab0d812 Disable Youtube scraper for now 2022-04-13 10:12:20 +02:00
Logan Williams
209152ea69 Synchronize channels that have changed info 2022-04-12 18:13:52 +02:00
Logan Williams
bbb9d283d5 Add RumbleScraper, YoutubeScraper, and BitchuteScraper to the active scrapers 2022-04-12 14:55:45 +02:00
Logan Williams
fccbad7a93 Remove 200 post limit; add log rotation 2022-04-03 16:32:00 +00:00
Logan Williams
4c580519dd Remove Rumble scraper 2022-04-03 15:59:39 +02:00
Logan Williams
57b9082271 Remove Odysee scraper due to errors 2022-04-03 13:26:05 +02:00
Logan Williams
a82ec15f0e Change archived_media to be timestamp for all scrapers 2022-04-03 12:02:27 +02:00
Logan Williams
63633617d2 Configure with Telethon and VK only 2022-04-02 18:34:14 +00:00
Logan Williams
d20db5f828 Catch exceptions in get_posts so that archiving continues despites errors 2022-03-31 20:27:18 +02:00
Logan Williams
7f87b03de5 Add option to clear registered scrapers, necessary for tests 2022-03-31 16:17:35 +02:00
Logan Williams
a5cffa615f Fix Twitter profile scraper, catch exceptions in controller 2022-03-31 15:37:58 +02:00
Logan Williams
2dc9213d64 Use new RawChannelInfo class 2022-03-31 15:17:25 +02:00
Logan Williams
61c99d33f6 Add Postgres support with psycopg2 2022-03-31 08:15:53 +02:00
Logan Williams
cff1953d21 Initial CLI tool 2022-03-31 08:15:11 +02:00