Commit Graph

161 Commits

Author SHA1 Message Date
Tristan Lee
289a47d7b1 tested telegram transformers and implemented vk transformers 2022-06-23 15:06:10 -05:00
Tristan Lee
bb2e2806e6 got post transformers and channel_info transformers working for Rumble, Bitchute, Gettr 2022-06-21 19:05:41 -05:00
Tristan Lee
619fe42a31 got transformers for Bitchute, Rumble, and Gettr working for all raw_posts. 2022-06-20 21:45:41 -05:00
Tristan Lee
a2a7882f1c fixed Gettr and Bitchute info transformers, added missing or incorrect TelegramTransformer fields, added Telegram mentions to the transformer. 2022-06-13 13:42:33 -05:00
Logan Williams
6e962de244 Don't scrape channel info unless specifically scraping channel info 2022-06-10 08:41:45 +00:00
Logan Williams
6183972b1a Merge branch 'more-channel-info-transformers' of https://github.com/bellingcat/cisticola into main 2022-06-10 08:07:02 +00:00
Logan Williams
d83a13e0cc Merge branch 'main' of https://github.com/bellingcat/cisticola into main 2022-06-09 14:11:52 +00:00
Logan Williams
fba3a661c7 Sleep after every gsheet API call 2022-06-09 14:11:29 +00:00
Logan Williams
6294ea7ea7 Increment TelegramTelethonScraper version 2022-06-09 15:30:56 +02:00
Logan Williams
92d4839b5e Revise Telethon scraper to use the same client connection 2022-06-09 10:01:27 +02:00
Logan Williams
9a30ecb243 Stop overwriting media when a large file is detected 2022-06-08 17:01:28 +02:00
Logan Williams
708d952937 Merge branch 'main' of https://github.com/bellingcat/cisticola into main 2022-06-08 08:11:34 +00:00
Logan Williams
bf96f248c1 Merge pull request #57 from bellingcat/synchronization-improvements
Google sheet channel synchronization improvements
2022-06-08 10:08:23 +02:00
Logan Williams
143b20fc56 Update Pipfile to load pyexiftool from PyPi
Co-authored-by: Tristan Lee <tristan@bellingcat.com>
2022-06-08 10:08:18 +02:00
Logan Williams
f932034ab2 Add a bit more logging 2022-06-07 13:14:48 +02:00
Logan Williams
39358c7f23 Update platform ID and screenname when synchronizing with gsheet; highlight dupes 2022-06-06 16:36:39 +02:00
Logan Williams
4d22838a94 Fixes to Pipfile; easier spacy setup 2022-06-06 16:35:52 +02:00
Tristan Lee
f4072183be added transformer for Gettr 2022-05-20 02:22:34 -05:00
Tristan Lee
591f1986e8 added Rumble transformers and test 2022-05-19 19:40:48 -05:00
Tristan Lee
e2094522c9 updated Bitchute transformer and addewd test 2022-05-19 18:13:50 -05:00
Tristan Lee
f0414a4f4d updated transformer tests 2022-05-19 16:34:19 -05:00
Tristan Lee
424c063ef2 Merge pull request #55 from bellingcat/channel-info-transformers
Transformers for raw channel info
2022-05-18 06:58:18 -07:00
Logan Williams
c279ced73d Minor bug fixes from testing 2022-05-18 09:29:53 +01:00
Logan Williams
7c8147bb2a Add CLI for channel info transform 2022-05-18 09:20:33 +01:00
Logan Williams
6145fd0b6b Add Telegram transformer for channel info 2022-05-18 09:17:49 +01:00
Tristan Lee
317da2c9d4 Merge pull request #54 from bellingcat/transformers
Functional Telegram transformer
2022-05-17 10:18:12 -07:00
Logan Williams
9869612b67 Merge pull request #50 from bellingcat/odysee-refactor
Implemented Polyphemus refactoring changes into Odysee scraper
2022-05-16 12:25:35 +01:00
Logan Williams
7f55b721dd Bug fixes in transformers 2022-05-13 15:39:01 +00:00
Logan Williams
34da733e7c Add date_transformed; refinements to telethon transformer 2022-05-12 13:03:30 +00:00
Logan Williams
4493618801 Synchronizing channels will update other info for existing channels 2022-05-12 13:02:14 +00:00
Logan Williams
ab482443db Merge branch 'main' of https://github.com/bellingcat/cisticola into transformers 2022-04-16 13:55:23 +00:00
Logan Williams
8535a87def Ad hoc changes to transformers 2022-04-16 13:46:26 +00:00
Logan Williams
3b8b03283a Add logging to transform 2022-04-14 11:32:15 +00:00
Logan Williams
428af3575f Merge branch 'transformers' of https://github.com/bellingcat/cisticola into main 2022-04-14 11:31:34 +00:00
Logan Williams
38e0104078 Separate logging; limit Telegram archive file size 2022-04-14 10:43:27 +00:00
Logan Williams
4c221d1133 Transformer for Telegram, base transformer NLP hydration; no media 2022-04-14 11:45:09 +02:00
Logan Williams
214a4d7d19 Merge pull request #53 from bellingcat/sync-channels
Close sessions; sort channel info by least recently archived
v2022-04-13
2022-04-13 12:39:41 +02:00
Logan Williams
1ac8d6c603 Close sessions; sort channel info by least recently archived 2022-04-13 10:38:08 +00:00
Logan Williams
59bab0d812 Disable Youtube scraper for now 2022-04-13 10:12:20 +02:00
Logan Williams
a2e62cc489 Merge pull request #52 from bellingcat/next-release
Next release
2022-04-13 10:11:49 +02:00
Logan Williams
d96b8177a5 Merge pull request #49 from bellingcat/sync-channels
Synchronize channels as well as adding new ones
2022-04-13 10:11:34 +02:00
Logan Williams
e7c3771788 Merge pull request #51 from bellingcat/channel-info
Channel info
2022-04-13 10:11:21 +02:00
Logan Williams
a0dbe7d92b Catch errors in channel info 2022-04-13 10:10:29 +02:00
Tristan Lee
27b51267a7 fixed bugs from incorporating polyphemus refactoring changes 2022-04-13 00:02:12 -05:00
Tristan Lee
ef7afc0715 Merge branch 'main' into odysee-refactor 2022-04-12 23:26:18 -05:00
Tristan Lee
dfc5b77726 incorporated polyphemus refactoring changes 2022-04-12 23:23:21 -05:00
Logan Williams
209152ea69 Synchronize channels that have changed info 2022-04-12 18:13:52 +02:00
Logan Williams
d5f6ce485b Merge pull request #46 from bellingcat/next-release
Release 2022-04-12
v2022-04-12
2022-04-12 14:59:01 +02:00
Logan Williams
d1f9dd0e01 Limit max # of archived files per session 2022-04-12 12:57:04 +00:00
Logan Williams
bbb9d283d5 Add RumbleScraper, YoutubeScraper, and BitchuteScraper to the active scrapers 2022-04-12 14:55:45 +02:00