241 Commits

Author SHA1 Message Date
Tristan Lee
d27ea4d3e5 merged recent changes in main 2023-08-07 20:42:02 -05:00
Tristan Lee
8a10451a72 updated developer guide 2023-08-07 20:28:52 -05:00
Tristan Lee
a89925b99e manually specified all steps in linting workflow 2023-08-07 20:17:59 -05:00
Tristan Lee
39f7dd0997 edited workflow file 2023-08-07 20:14:30 -05:00
Tristan Lee
dd7d8861cd edited github workflow file 2023-08-07 20:07:29 -05:00
Tristan Lee
1eb82c5f3e sorted imports using isort and tried to add pre-commit hook for isort 2023-08-07 20:04:16 -05:00
Tristan Lee
1ec1d6190a implemented minor fixes recommended by pyling (unused imports, f-strings without patterns, etc.) 2023-08-07 19:39:03 -05:00
Tristan Lee
6f4eb21ad0 started addressing mypy issues, updated several method type annotation signatures to be consistent with changes 2023-08-07 19:15:39 -05:00
Tristan Lee
89b5068108 added descriptions for undocumented attributes for classes in cisticola.base module 2023-08-07 17:07:45 -05:00
Logan Williams
1e2b62be57 Add link to documentation 2023-08-07 11:03:04 +02:00
Logan Williams
3aec25f74c Simplify transform method signature 2023-08-07 10:08:13 +02:00
Tristan Lee
1f0197200e removed pipenv run prefix from commands, to not use pipenv virtual env 2023-08-04 16:40:21 -05:00
Tristan Lee
7fd4260d71 added sloppy workaround to avoid build error from spaCy models not being downloaded 2023-08-04 16:34:34 -05:00
Tristan Lee
e6ca0fe515 added readtehdocs job to install pipenv and use pipenv to install dependencies 2023-08-04 16:18:30 -05:00
Tristan Lee
4811240091 added readthedocs config file 2023-08-04 15:46:21 -05:00
Tristan Lee
1d7e82ae4a revised database schema diagram 2023-08-04 15:39:06 -05:00
Tristan Lee
fab65a5d67 formatted with black, added pre-commit hook, pegged typing_extensions package version to fix spaCy issue 2023-08-04 14:51:00 -05:00
Tristan Lee
070ee3391d temporarily removed Dockerfile and crontab 2023-08-04 09:48:48 -05:00
Tristan Lee
8421fe7c48 Merge branch 'main' into tests-and-docs 2023-08-04 09:33:23 -05:00
Tristan Lee
30bb4e43e4 removed broken scrapers and added basic README 2023-08-04 09:15:53 -05:00
Logan Williams
d55c13c95d Add chat attributes, don't overwrite from the sheet if sheet is empty 2023-08-04 14:40:48 +02:00
Tristan Lee
ef9292bc90 added table diagram, and brief developer guide and deployment info for docs 2023-08-03 23:58:12 -05:00
Tristan Lee
d3b8e1a3b3 removed unused archive_media argument passed to methods throughout codebase 2023-08-03 18:05:50 -05:00
Tristan Lee
edd772eb94 added and made more consistent docstrings, wrote script that makes minor edits to Sphinx apidocs to improve documentation clarity 2023-08-03 17:27:33 -05:00
Tristan Lee
b8ddc400f3 updated documentation, minor fixes like excluding very long cookiestring from docs 2023-08-03 01:59:30 -05:00
Tristan Lee
e2142966e7 refactored tests to reduce redundancy, got tests workig for Telegram, Bitchute, Gettr, and Rumble 2023-08-03 00:53:38 -05:00
Tristan Lee
bd67806ed2 got Telegram scraper tests all working 2023-08-01 10:46:50 -05:00
Tristan Lee
249f411a1d fixed some issues with Telegram tests 2023-07-27 13:07:44 -05:00
Logan Williams
99cc4d80b2 Cache screenname ID lookup 2023-05-04 16:23:24 +02:00
Logan Williams
ca6e284cb3 Cache reply_to post IDs too 2023-05-04 16:14:03 +02:00
Logan Williams
91de6482e0 Add rather hacky bulk insert functionality 2023-05-04 15:26:52 +02:00
Logan Williams
f9bf2bc2ee Merge branch 'main' of github.com:bellingcat/cisticola 2023-05-04 14:06:59 +02:00
Logan Williams
ebbc6b69dd Add new function for insert post (faster/bulk) 2023-05-04 14:04:55 +02:00
Logan Williams
9dbf05fccb Streamline logging; fix markdown formating in Telegram 2023-05-04 10:00:14 +00:00
Logan Williams
2320ea1efd Use telethon session CLI argument always; improvements to Telegram transformer (author id/username for chats, min_id via CLI argument, use the same session) 2023-03-04 09:51:15 +01:00
Logan Williams
7d55eace3d Update platform_id when it is empty 2023-03-03 15:28:30 +01:00
Logan Williams
eced79b278 Fix issue with insert_or_select 2023-03-03 10:47:21 +01:00
Logan Williams
793a783963 Revert to previous insert or select behavior 2023-03-02 23:04:34 +01:00
Logan Williams
d2db83ae93 Update other channel properties too for a linked channel 2023-03-02 17:03:24 +01:00
Logan Williams
3a6905e9c1 Adjust logic for changing source label 2023-03-02 16:48:15 +01:00
Logan Williams
7b2c597a24 Update channel source, only if non-researcher 2023-03-02 16:45:38 +01:00
Logan Williams
64aef7238c Merge branch 'main' of github.com:bellingcat/cisticola 2023-03-02 16:34:42 +01:00
Logan Williams
ffa8cdd8c6 Fix bitchute transformer 2023-03-02 16:33:48 +01:00
Logan Williams
7adf51b5d1 Merge branch 'main' of https://github.com/bellingcat/cisticola into main 2023-03-02 15:28:24 +00:00
Logan Williams
6226a9a76e Ignore lobs 2023-03-02 15:26:38 +00:00
Logan Williams
531059ca02 Support related Telegram chats (associated discussion groups) 2023-03-02 16:21:43 +01:00
Logan Williams
351e471ff4 Change log retention and hackily improve transform speed 2023-01-26 13:21:07 +00:00
Logan Williams
5c4dd51435 Fix issues with Gsheet sync 2023-01-11 14:44:17 +00:00
Logan Williams
3ec6f50213 Merge pull request #67 from bellingcat/sync-bug-fixes
fixed channel sync bugs
2022-10-27 09:27:02 +02:00
Tristan Lee
6dc61af7a5 fixed problem from gspread update where empty columns raised error, fixed problem where sync tried to process empty channel 2022-10-26 14:47:59 -05:00