Commit Graph

218 Commits

Author SHA1 Message Date
Tristan Lee
edd772eb94 added and made more consistent docstrings, wrote script that makes minor edits to Sphinx apidocs to improve documentation clarity 2023-08-03 17:27:33 -05:00
Tristan Lee
b8ddc400f3 updated documentation, minor fixes like excluding very long cookiestring from docs 2023-08-03 01:59:30 -05:00
Tristan Lee
e2142966e7 refactored tests to reduce redundancy, got tests workig for Telegram, Bitchute, Gettr, and Rumble 2023-08-03 00:53:38 -05:00
Tristan Lee
bd67806ed2 got Telegram scraper tests all working 2023-08-01 10:46:50 -05:00
Tristan Lee
249f411a1d fixed some issues with Telegram tests 2023-07-27 13:07:44 -05:00
Logan Williams
99cc4d80b2 Cache screenname ID lookup 2023-05-04 16:23:24 +02:00
Logan Williams
ca6e284cb3 Cache reply_to post IDs too 2023-05-04 16:14:03 +02:00
Logan Williams
91de6482e0 Add rather hacky bulk insert functionality 2023-05-04 15:26:52 +02:00
Logan Williams
f9bf2bc2ee Merge branch 'main' of github.com:bellingcat/cisticola 2023-05-04 14:06:59 +02:00
Logan Williams
ebbc6b69dd Add new function for insert post (faster/bulk) 2023-05-04 14:04:55 +02:00
Logan Williams
9dbf05fccb Streamline logging; fix markdown formating in Telegram 2023-05-04 10:00:14 +00:00
Logan Williams
2320ea1efd Use telethon session CLI argument always; improvements to Telegram transformer (author id/username for chats, min_id via CLI argument, use the same session) 2023-03-04 09:51:15 +01:00
Logan Williams
7d55eace3d Update platform_id when it is empty 2023-03-03 15:28:30 +01:00
Logan Williams
eced79b278 Fix issue with insert_or_select 2023-03-03 10:47:21 +01:00
Logan Williams
793a783963 Revert to previous insert or select behavior 2023-03-02 23:04:34 +01:00
Logan Williams
d2db83ae93 Update other channel properties too for a linked channel 2023-03-02 17:03:24 +01:00
Logan Williams
3a6905e9c1 Adjust logic for changing source label 2023-03-02 16:48:15 +01:00
Logan Williams
7b2c597a24 Update channel source, only if non-researcher 2023-03-02 16:45:38 +01:00
Logan Williams
64aef7238c Merge branch 'main' of github.com:bellingcat/cisticola 2023-03-02 16:34:42 +01:00
Logan Williams
ffa8cdd8c6 Fix bitchute transformer 2023-03-02 16:33:48 +01:00
Logan Williams
7adf51b5d1 Merge branch 'main' of https://github.com/bellingcat/cisticola into main 2023-03-02 15:28:24 +00:00
Logan Williams
6226a9a76e Ignore lobs 2023-03-02 15:26:38 +00:00
Logan Williams
531059ca02 Support related Telegram chats (associated discussion groups) 2023-03-02 16:21:43 +01:00
Logan Williams
351e471ff4 Change log retention and hackily improve transform speed 2023-01-26 13:21:07 +00:00
Logan Williams
5c4dd51435 Fix issues with Gsheet sync 2023-01-11 14:44:17 +00:00
Logan Williams
3ec6f50213 Merge pull request #67 from bellingcat/sync-bug-fixes
fixed channel sync bugs
2022-10-27 09:27:02 +02:00
Tristan Lee
6dc61af7a5 fixed problem from gspread update where empty columns raised error, fixed problem where sync tried to process empty channel 2022-10-26 14:47:59 -05:00
Tristan Lee
10b33b3dbb Merge pull request #66 from bellingcat/country-language-searching
Updated ORM and sync to improve filtering by language and country
2022-10-26 12:25:59 -05:00
Tristan Lee
d9e2250c5a added country index 2022-10-26 08:42:35 -05:00
Tristan Lee
5a53ebacd0 removed special case 2022-10-26 08:22:13 -05:00
Tristan Lee
3bb5af11e6 changed ORM and Google Sheet sync to reflect converting channels.country to JSONB array, added index for detected_language 2022-10-26 08:16:49 -05:00
Logan Williams
90d1d0f29f Merge branch 'main' of https://github.com/bellingcat/cisticola into main 2022-10-26 13:12:12 +00:00
Logan Williams
b023e8044c Scrape snowball_complete sampled channels 2022-10-26 13:11:20 +00:00
Logan Williams
c15022402d Add an option to scape posts older than the database record as well as newer (Telegram only) 2022-09-05 13:48:01 +00:00
Logan Williams
f000c6246e Merge branch 'main' of https://github.com/bellingcat/cisticola into main 2022-08-29 09:11:21 +00:00
Logan Williams
1a29c06062 Fix case where post is dummy (-1) 2022-08-29 09:11:06 +00:00
Logan Williams
86656f8ba3 Scrape snowball_it channels too 2022-08-26 15:56:46 +02:00
Logan Williams
a01d139bef Remove normalized_url column from channel creation 2022-08-24 15:35:08 +02:00
Logan Williams
4a17c3475d Add explicit source column to gsheet 2022-08-24 15:32:19 +02:00
Logan Williams
002e9458f5 Merge branch 'main' of https://github.com/bellingcat/cisticola into main 2022-08-01 10:00:07 +00:00
Logan Williams
f3997ff6ae Catch errors in Bitchute channel profile scraper; add multi index on posts forwarded from/channel 2022-08-01 09:58:52 +00:00
Logan Williams
7d72c0de05 Add index for network analysis 2022-07-29 12:16:17 +02:00
Logan Williams
3a04fb51d4 Merge branch 'main' of https://github.com/bellingcat/cisticola into main 2022-07-28 09:18:34 +00:00
Logan Williams
fee216386b Fix issue with chronological media archiving 2022-07-28 09:18:07 +00:00
Logan Williams
d05584a09f Minor bug fixes; helper tool for Telethon sessions 2022-07-28 08:42:59 +00:00
Logan Williams
ee24367caa Add features for running archive-media simultaneously 2022-07-20 09:26:47 +00:00
Logan Williams
fbb846b8d6 Fix two small bugs with media archiving 2022-07-05 13:30:39 +02:00
Logan Williams
b99958a894 Merge pull request #58 from bellingcat/media-etl
Media ETL
2022-07-05 11:51:08 +02:00
Logan Williams
51e5ca1f04 Use smaller batches for now 2022-07-05 09:48:57 +00:00
Logan Williams
6149c4279d Add some more fields to media DB, fix bugs in testing 2022-07-05 11:11:43 +02:00