Commit Graph

26 Commits

Author SHA1 Message Date
Tristan Lee
8ecb904249 merged main 2022-04-01 02:05:25 -05:00
Tristan Lee
282f33eff3 implemented deferred media archiving for all scrapers, and implemented tests for them. Refactored archiving methods of Instagram and Gettr scrapers to be able to use default archiving method 2022-04-01 01:30:49 -05:00
Logan Williams
d20db5f828 Catch exceptions in get_posts so that archiving continues despites errors 2022-03-31 20:27:18 +02:00
Logan Williams
2dc9213d64 Use new RawChannelInfo class 2022-03-31 15:17:25 +02:00
Logan Williams
1c1ff7fb6f Fix bug with Telethon scraper and certain media; add media_archived flag to TwitterScraper 2022-03-31 08:15:09 +02:00
Tristan Lee
1f99e52436 refactored Gab scraper to use gabber instead of garc 2022-03-30 08:05:10 -05:00
Tristan Lee
b805d50132 made tesets work, fixed several issues with Rumble scraper 2022-03-29 16:09:51 -05:00
Tristan Lee
16870d7daa implemented methods for extracting profile metadata (still need to test) 2022-03-28 20:16:59 -05:00
Tristan Lee
ee9a8c10dd merged main into branch 2022-03-15 09:16:11 -05:00
Tristan Lee
c3eab2f176 merged main 2022-03-14 18:19:57 -05:00
Tristan Lee
e4cf9daf73 added docstrings, improved Sphinx docs 2022-03-14 18:04:27 -05:00
Tristan Lee
750f0cc887 added scraper for Instagram 2022-03-14 10:28:10 -05:00
Tristan Lee
5783206ad8 implemented method to reset database, to enable the 'contoller' fixture scope to be shared across the whole package, which will enable the transformer tests to be run without re-running the scrapers 2022-03-10 10:20:49 -06:00
Tristan Lee
739e1d8484 added capability of running scraper without archiving media, and implemented prototype Telethon scraper for Telegram 2022-03-09 12:12:01 -06:00
Tristan Lee
cd5f68e9e5 added basic unit tests 2022-03-04 12:36:09 -06:00
Tristan Lee
ee4d64750b added prototype Rumble scraper 2022-02-28 18:38:33 -06:00
Tristan Lee
bc840e631d added Gab scraper 2022-02-28 12:11:21 -06:00
Tristan Lee
7a257ea9f5 included comments in Odysee scraper 2022-02-28 09:15:09 -06:00
Tristan Lee
47dad8fb00 added odysee scraper, minor refactoring of url_to_blob method (added url_to_key method that can be overridden by child classes while still using the parent url_to_blob method) and changed test file to include only channels with a relatively small number of posts, to make testing faster 2022-02-25 20:28:00 -06:00
Tristan Lee
ef83cc4b0a converted bitchute to yield, got video archiving working on bitchute and gettr, added url_to_blob method that downloads media bytes blob from url and converted archive_media to take in the media bytes blob instead of the media url. 2022-02-25 13:43:30 -06:00
Logan Williams
e64d845002 Archive media in Twitter scraper 2022-02-24 18:48:48 +01:00
Logan Williams
6092e4caa5 Add method for archiving media, reoranize scraper base classes 2022-02-24 16:36:55 +01:00
Logan Williams
e3d29bf811 Add documentation generation with Sphinx 2022-02-21 17:52:38 +01:00
Tristan Lee
139459e3b2 implemented Bitchute scraper 2022-02-18 12:45:10 -06:00
Tristan Lee
4668d4df11 implemented Gettr scraper 2022-02-18 10:13:37 -06:00
Logan Williams
0e5f9f77f3 Configure pipenv 2022-02-18 15:05:02 +01:00