Tristan Lee
|
10b33b3dbb
|
Merge pull request #66 from bellingcat/country-language-searching
Updated ORM and sync to improve filtering by language and country
|
2022-10-26 12:25:59 -05:00 |
|
Tristan Lee
|
d9e2250c5a
|
added country index
|
2022-10-26 08:42:35 -05:00 |
|
Tristan Lee
|
5a53ebacd0
|
removed special case
|
2022-10-26 08:22:13 -05:00 |
|
Tristan Lee
|
3bb5af11e6
|
changed ORM and Google Sheet sync to reflect converting channels.country to JSONB array, added index for detected_language
|
2022-10-26 08:16:49 -05:00 |
|
Logan Williams
|
90d1d0f29f
|
Merge branch 'main' of https://github.com/bellingcat/cisticola into main
|
2022-10-26 13:12:12 +00:00 |
|
Logan Williams
|
b023e8044c
|
Scrape snowball_complete sampled channels
|
2022-10-26 13:11:20 +00:00 |
|
Logan Williams
|
c15022402d
|
Add an option to scape posts older than the database record as well as newer (Telegram only)
|
2022-09-05 13:48:01 +00:00 |
|
Logan Williams
|
f000c6246e
|
Merge branch 'main' of https://github.com/bellingcat/cisticola into main
|
2022-08-29 09:11:21 +00:00 |
|
Logan Williams
|
1a29c06062
|
Fix case where post is dummy (-1)
|
2022-08-29 09:11:06 +00:00 |
|
Logan Williams
|
86656f8ba3
|
Scrape snowball_it channels too
|
2022-08-26 15:56:46 +02:00 |
|
Logan Williams
|
a01d139bef
|
Remove normalized_url column from channel creation
|
2022-08-24 15:35:08 +02:00 |
|
Logan Williams
|
4a17c3475d
|
Add explicit source column to gsheet
|
2022-08-24 15:32:19 +02:00 |
|
Logan Williams
|
002e9458f5
|
Merge branch 'main' of https://github.com/bellingcat/cisticola into main
|
2022-08-01 10:00:07 +00:00 |
|
Logan Williams
|
f3997ff6ae
|
Catch errors in Bitchute channel profile scraper; add multi index on posts forwarded from/channel
|
2022-08-01 09:58:52 +00:00 |
|
Logan Williams
|
7d72c0de05
|
Add index for network analysis
|
2022-07-29 12:16:17 +02:00 |
|
Logan Williams
|
3a04fb51d4
|
Merge branch 'main' of https://github.com/bellingcat/cisticola into main
|
2022-07-28 09:18:34 +00:00 |
|
Logan Williams
|
fee216386b
|
Fix issue with chronological media archiving
|
2022-07-28 09:18:07 +00:00 |
|
Logan Williams
|
d05584a09f
|
Minor bug fixes; helper tool for Telethon sessions
|
2022-07-28 08:42:59 +00:00 |
|
Logan Williams
|
ee24367caa
|
Add features for running archive-media simultaneously
|
2022-07-20 09:26:47 +00:00 |
|
Logan Williams
|
fbb846b8d6
|
Fix two small bugs with media archiving
|
2022-07-05 13:30:39 +02:00 |
|
Logan Williams
|
b99958a894
|
Merge pull request #58 from bellingcat/media-etl
Media ETL
|
2022-07-05 11:51:08 +02:00 |
|
Logan Williams
|
51e5ca1f04
|
Use smaller batches for now
|
2022-07-05 09:48:57 +00:00 |
|
Logan Williams
|
6149c4279d
|
Add some more fields to media DB, fix bugs in testing
|
2022-07-05 11:11:43 +02:00 |
|
Logan Williams
|
4ddd8d6b63
|
Only select untransformed media; simplify insert function
|
2022-07-05 10:03:38 +02:00 |
|
Logan Williams
|
9948af2c4a
|
Media archiving ETL working for Telegram
|
2022-07-05 10:03:36 +02:00 |
|
Logan Williams
|
c24babb081
|
Fix bugs in Gettr/Rumble transformers, avoid offset in batch requests
|
2022-07-04 14:30:40 +00:00 |
|
Logan Williams
|
ed4723ed1e
|
Fix merge error
|
2022-06-30 11:04:41 +00:00 |
|
Logan Williams
|
589ac3ba5b
|
Merge branch 'main' of https://github.com/bellingcat/cisticola into main
|
2022-06-24 10:34:25 +00:00 |
|
Logan Williams
|
7215469a74
|
Correct logs for transforming posts
|
2022-06-24 10:32:27 +00:00 |
|
Logan Williams
|
fe0f4f9e2c
|
Merge pull request #62 from bellingcat/other-transformer-fixes
Fixed broken channel_info transformers, added Telegram post transformer fields
|
2022-06-24 11:00:50 +02:00 |
|
Tristan Lee
|
289a47d7b1
|
tested telegram transformers and implemented vk transformers
|
2022-06-23 15:06:10 -05:00 |
|
Tristan Lee
|
bb2e2806e6
|
got post transformers and channel_info transformers working for Rumble, Bitchute, Gettr
|
2022-06-21 19:05:41 -05:00 |
|
Tristan Lee
|
619fe42a31
|
got transformers for Bitchute, Rumble, and Gettr working for all raw_posts.
|
2022-06-20 21:45:41 -05:00 |
|
Tristan Lee
|
a2a7882f1c
|
fixed Gettr and Bitchute info transformers, added missing or incorrect TelegramTransformer fields, added Telegram mentions to the transformer.
|
2022-06-13 13:42:33 -05:00 |
|
Logan Williams
|
6e962de244
|
Don't scrape channel info unless specifically scraping channel info
|
2022-06-10 08:41:45 +00:00 |
|
Logan Williams
|
6183972b1a
|
Merge branch 'more-channel-info-transformers' of https://github.com/bellingcat/cisticola into main
|
2022-06-10 08:07:02 +00:00 |
|
Logan Williams
|
d83a13e0cc
|
Merge branch 'main' of https://github.com/bellingcat/cisticola into main
|
2022-06-09 14:11:52 +00:00 |
|
Logan Williams
|
fba3a661c7
|
Sleep after every gsheet API call
|
2022-06-09 14:11:29 +00:00 |
|
Logan Williams
|
6294ea7ea7
|
Increment TelegramTelethonScraper version
|
2022-06-09 15:30:56 +02:00 |
|
Logan Williams
|
92d4839b5e
|
Revise Telethon scraper to use the same client connection
|
2022-06-09 10:01:27 +02:00 |
|
Logan Williams
|
9a30ecb243
|
Stop overwriting media when a large file is detected
|
2022-06-08 17:01:28 +02:00 |
|
Logan Williams
|
708d952937
|
Merge branch 'main' of https://github.com/bellingcat/cisticola into main
|
2022-06-08 08:11:34 +00:00 |
|
Logan Williams
|
bf96f248c1
|
Merge pull request #57 from bellingcat/synchronization-improvements
Google sheet channel synchronization improvements
|
2022-06-08 10:08:23 +02:00 |
|
Logan Williams
|
143b20fc56
|
Update Pipfile to load pyexiftool from PyPi
Co-authored-by: Tristan Lee <tristan@bellingcat.com>
|
2022-06-08 10:08:18 +02:00 |
|
Logan Williams
|
f932034ab2
|
Add a bit more logging
|
2022-06-07 13:14:48 +02:00 |
|
Logan Williams
|
39358c7f23
|
Update platform ID and screenname when synchronizing with gsheet; highlight dupes
|
2022-06-06 16:36:39 +02:00 |
|
Logan Williams
|
4d22838a94
|
Fixes to Pipfile; easier spacy setup
|
2022-06-06 16:35:52 +02:00 |
|
Tristan Lee
|
f4072183be
|
added transformer for Gettr
|
2022-05-20 02:22:34 -05:00 |
|
Tristan Lee
|
591f1986e8
|
added Rumble transformers and test
|
2022-05-19 19:40:48 -05:00 |
|
Tristan Lee
|
e2094522c9
|
updated Bitchute transformer and addewd test
|
2022-05-19 18:13:50 -05:00 |
|