Tristan Lee
|
1eb82c5f3e
|
sorted imports using isort and tried to add pre-commit hook for isort
|
2023-08-07 20:04:16 -05:00 |
|
Tristan Lee
|
1ec1d6190a
|
implemented minor fixes recommended by pyling (unused imports, f-strings without patterns, etc.)
|
2023-08-07 19:39:03 -05:00 |
|
Tristan Lee
|
fab65a5d67
|
formatted with black, added pre-commit hook, pegged typing_extensions package version to fix spaCy issue
|
2023-08-04 14:51:00 -05:00 |
|
Tristan Lee
|
d3b8e1a3b3
|
removed unused archive_media argument passed to methods throughout codebase
|
2023-08-03 18:05:50 -05:00 |
|
Logan Williams
|
91de6482e0
|
Add rather hacky bulk insert functionality
|
2023-05-04 15:26:52 +02:00 |
|
Logan Williams
|
9dbf05fccb
|
Streamline logging; fix markdown formating in Telegram
|
2023-05-04 10:00:14 +00:00 |
|
Logan Williams
|
2320ea1efd
|
Use telethon session CLI argument always; improvements to Telegram transformer (author id/username for chats, min_id via CLI argument, use the same session)
|
2023-03-04 09:51:15 +01:00 |
|
Logan Williams
|
531059ca02
|
Support related Telegram chats (associated discussion groups)
|
2023-03-02 16:21:43 +01:00 |
|
Logan Williams
|
351e471ff4
|
Change log retention and hackily improve transform speed
|
2023-01-26 13:21:07 +00:00 |
|
Logan Williams
|
c15022402d
|
Add an option to scape posts older than the database record as well as newer (Telegram only)
|
2022-09-05 13:48:01 +00:00 |
|
Logan Williams
|
ee24367caa
|
Add features for running archive-media simultaneously
|
2022-07-20 09:26:47 +00:00 |
|
Logan Williams
|
9948af2c4a
|
Media archiving ETL working for Telegram
|
2022-07-05 10:03:36 +02:00 |
|
Logan Williams
|
ed4723ed1e
|
Fix merge error
|
2022-06-30 11:04:41 +00:00 |
|
Logan Williams
|
589ac3ba5b
|
Merge branch 'main' of https://github.com/bellingcat/cisticola into main
|
2022-06-24 10:34:25 +00:00 |
|
Logan Williams
|
7215469a74
|
Correct logs for transforming posts
|
2022-06-24 10:32:27 +00:00 |
|
Tristan Lee
|
289a47d7b1
|
tested telegram transformers and implemented vk transformers
|
2022-06-23 15:06:10 -05:00 |
|
Tristan Lee
|
bb2e2806e6
|
got post transformers and channel_info transformers working for Rumble, Bitchute, Gettr
|
2022-06-21 19:05:41 -05:00 |
|
Logan Williams
|
39358c7f23
|
Update platform ID and screenname when synchronizing with gsheet; highlight dupes
|
2022-06-06 16:36:39 +02:00 |
|
Logan Williams
|
7c8147bb2a
|
Add CLI for channel info transform
|
2022-05-18 09:20:33 +01:00 |
|
Logan Williams
|
4493618801
|
Synchronizing channels will update other info for existing channels
|
2022-05-12 13:02:14 +00:00 |
|
Logan Williams
|
ab482443db
|
Merge branch 'main' of https://github.com/bellingcat/cisticola into transformers
|
2022-04-16 13:55:23 +00:00 |
|
Logan Williams
|
38e0104078
|
Separate logging; limit Telegram archive file size
|
2022-04-14 10:43:27 +00:00 |
|
Logan Williams
|
4c221d1133
|
Transformer for Telegram, base transformer NLP hydration; no media
|
2022-04-14 11:45:09 +02:00 |
|
Logan Williams
|
59bab0d812
|
Disable Youtube scraper for now
|
2022-04-13 10:12:20 +02:00 |
|
Logan Williams
|
209152ea69
|
Synchronize channels that have changed info
|
2022-04-12 18:13:52 +02:00 |
|
Logan Williams
|
bbb9d283d5
|
Add RumbleScraper, YoutubeScraper, and BitchuteScraper to the active scrapers
|
2022-04-12 14:55:45 +02:00 |
|
Logan Williams
|
fccbad7a93
|
Remove 200 post limit; add log rotation
|
2022-04-03 16:32:00 +00:00 |
|
Logan Williams
|
4c580519dd
|
Remove Rumble scraper
|
2022-04-03 15:59:39 +02:00 |
|
Logan Williams
|
57b9082271
|
Remove Odysee scraper due to errors
|
2022-04-03 13:26:05 +02:00 |
|
Logan Williams
|
a82ec15f0e
|
Change archived_media to be timestamp for all scrapers
|
2022-04-03 12:02:27 +02:00 |
|
Logan Williams
|
63633617d2
|
Configure with Telethon and VK only
|
2022-04-02 18:34:14 +00:00 |
|
Logan Williams
|
d20db5f828
|
Catch exceptions in get_posts so that archiving continues despites errors
|
2022-03-31 20:27:18 +02:00 |
|
Logan Williams
|
7f87b03de5
|
Add option to clear registered scrapers, necessary for tests
|
2022-03-31 16:17:35 +02:00 |
|
Logan Williams
|
a5cffa615f
|
Fix Twitter profile scraper, catch exceptions in controller
|
2022-03-31 15:37:58 +02:00 |
|
Logan Williams
|
2dc9213d64
|
Use new RawChannelInfo class
|
2022-03-31 15:17:25 +02:00 |
|
Logan Williams
|
61c99d33f6
|
Add Postgres support with psycopg2
|
2022-03-31 08:15:53 +02:00 |
|
Logan Williams
|
cff1953d21
|
Initial CLI tool
|
2022-03-31 08:15:11 +02:00 |
|