Logan Williams
|
c15022402d
|
Add an option to scape posts older than the database record as well as newer (Telegram only)
|
2022-09-05 13:48:01 +00:00 |
|
Logan Williams
|
6149c4279d
|
Add some more fields to media DB, fix bugs in testing
|
2022-07-05 11:11:43 +02:00 |
|
Logan Williams
|
6183972b1a
|
Merge branch 'more-channel-info-transformers' of https://github.com/bellingcat/cisticola into main
|
2022-06-10 08:07:02 +00:00 |
|
Logan Williams
|
4d22838a94
|
Fixes to Pipfile; easier spacy setup
|
2022-06-06 16:35:52 +02:00 |
|
Tristan Lee
|
f0414a4f4d
|
updated transformer tests
|
2022-05-19 16:34:19 -05:00 |
|
Logan Williams
|
8535a87def
|
Ad hoc changes to transformers
|
2022-04-16 13:46:26 +00:00 |
|
Logan Williams
|
4c221d1133
|
Transformer for Telegram, base transformer NLP hydration; no media
|
2022-04-14 11:45:09 +02:00 |
|
Logan Williams
|
bbb9d283d5
|
Add RumbleScraper, YoutubeScraper, and BitchuteScraper to the active scrapers
|
2022-04-12 14:55:45 +02:00 |
|
Logan Williams
|
fccbad7a93
|
Remove 200 post limit; add log rotation
|
2022-04-03 16:32:00 +00:00 |
|
Tristan Lee
|
8ecb904249
|
merged main
|
2022-04-01 02:05:25 -05:00 |
|
Tristan Lee
|
282f33eff3
|
implemented deferred media archiving for all scrapers, and implemented tests for them. Refactored archiving methods of Instagram and Gettr scrapers to be able to use default archiving method
|
2022-04-01 01:30:49 -05:00 |
|
Logan Williams
|
d20db5f828
|
Catch exceptions in get_posts so that archiving continues despites errors
|
2022-03-31 20:27:18 +02:00 |
|
Logan Williams
|
2dc9213d64
|
Use new RawChannelInfo class
|
2022-03-31 15:17:25 +02:00 |
|
Logan Williams
|
1c1ff7fb6f
|
Fix bug with Telethon scraper and certain media; add media_archived flag to TwitterScraper
|
2022-03-31 08:15:09 +02:00 |
|
Tristan Lee
|
1f99e52436
|
refactored Gab scraper to use gabber instead of garc
|
2022-03-30 08:05:10 -05:00 |
|
Tristan Lee
|
b805d50132
|
made tesets work, fixed several issues with Rumble scraper
|
2022-03-29 16:09:51 -05:00 |
|
Tristan Lee
|
16870d7daa
|
implemented methods for extracting profile metadata (still need to test)
|
2022-03-28 20:16:59 -05:00 |
|
Tristan Lee
|
ee9a8c10dd
|
merged main into branch
|
2022-03-15 09:16:11 -05:00 |
|
Tristan Lee
|
c3eab2f176
|
merged main
|
2022-03-14 18:19:57 -05:00 |
|
Tristan Lee
|
e4cf9daf73
|
added docstrings, improved Sphinx docs
|
2022-03-14 18:04:27 -05:00 |
|
Tristan Lee
|
750f0cc887
|
added scraper for Instagram
|
2022-03-14 10:28:10 -05:00 |
|
Tristan Lee
|
5783206ad8
|
implemented method to reset database, to enable the 'contoller' fixture scope to be shared across the whole package, which will enable the transformer tests to be run without re-running the scrapers
|
2022-03-10 10:20:49 -06:00 |
|
Tristan Lee
|
739e1d8484
|
added capability of running scraper without archiving media, and implemented prototype Telethon scraper for Telegram
|
2022-03-09 12:12:01 -06:00 |
|
Tristan Lee
|
cd5f68e9e5
|
added basic unit tests
|
2022-03-04 12:36:09 -06:00 |
|
Tristan Lee
|
ee4d64750b
|
added prototype Rumble scraper
|
2022-02-28 18:38:33 -06:00 |
|
Tristan Lee
|
bc840e631d
|
added Gab scraper
|
2022-02-28 12:11:21 -06:00 |
|
Tristan Lee
|
7a257ea9f5
|
included comments in Odysee scraper
|
2022-02-28 09:15:09 -06:00 |
|
Tristan Lee
|
47dad8fb00
|
added odysee scraper, minor refactoring of url_to_blob method (added url_to_key method that can be overridden by child classes while still using the parent url_to_blob method) and changed test file to include only channels with a relatively small number of posts, to make testing faster
|
2022-02-25 20:28:00 -06:00 |
|
Tristan Lee
|
ef83cc4b0a
|
converted bitchute to yield, got video archiving working on bitchute and gettr, added url_to_blob method that downloads media bytes blob from url and converted archive_media to take in the media bytes blob instead of the media url.
|
2022-02-25 13:43:30 -06:00 |
|
Logan Williams
|
e64d845002
|
Archive media in Twitter scraper
|
2022-02-24 18:48:48 +01:00 |
|
Logan Williams
|
6092e4caa5
|
Add method for archiving media, reoranize scraper base classes
|
2022-02-24 16:36:55 +01:00 |
|
Logan Williams
|
e3d29bf811
|
Add documentation generation with Sphinx
|
2022-02-21 17:52:38 +01:00 |
|
Tristan Lee
|
139459e3b2
|
implemented Bitchute scraper
|
2022-02-18 12:45:10 -06:00 |
|
Tristan Lee
|
4668d4df11
|
implemented Gettr scraper
|
2022-02-18 10:13:37 -06:00 |
|
Logan Williams
|
0e5f9f77f3
|
Configure pipenv
|
2022-02-18 15:05:02 +01:00 |
|