Commit Graph

17 Commits

Author SHA1 Message Date
Tristan Lee
fab65a5d67 formatted with black, added pre-commit hook, pegged typing_extensions package version to fix spaCy issue 2023-08-04 14:51:00 -05:00
Tristan Lee
e2142966e7 refactored tests to reduce redundancy, got tests workig for Telegram, Bitchute, Gettr, and Rumble 2023-08-03 00:53:38 -05:00
Tristan Lee
249f411a1d fixed some issues with Telegram tests 2023-07-27 13:07:44 -05:00
Tristan Lee
a2a7882f1c fixed Gettr and Bitchute info transformers, added missing or incorrect TelegramTransformer fields, added Telegram mentions to the transformer. 2022-06-13 13:42:33 -05:00
Tristan Lee
282f33eff3 implemented deferred media archiving for all scrapers, and implemented tests for them. Refactored archiving methods of Instagram and Gettr scrapers to be able to use default archiving method 2022-04-01 01:30:49 -05:00
Logan Williams
94cf6c3d84 TelegramTelethonScraper: Use channel_id when channel has been previously encountered 2022-03-31 16:37:54 +02:00
Tristan Lee
b7871b060d added capability to scrape Gab group posts 2022-03-30 09:11:07 -05:00
Logan Williams
571b019137 Fix tests for Twitter transformer 2022-03-22 11:33:27 +01:00
Tristan Lee
e287fd03d9 merged scraper into main and fixed minor merge conflict 2022-03-15 09:12:12 -05:00
Tristan Lee
750f0cc887 added scraper for Instagram 2022-03-14 10:28:10 -05:00
Logan Williams
fd4b617743 Add TwitterTransformer test 2022-03-14 13:39:10 +01:00
Tristan Lee
965bf1e2dc added youtube scraper, moved from official youtube-dl repo to using yt-dlp because download speed for youtube videos is much better 2022-03-11 17:19:52 -06:00
Tristan Lee
821c39004b incorporated vkontakte scraper 2022-03-10 22:32:39 -06:00
Tristan Lee
5783206ad8 implemented method to reset database, to enable the 'contoller' fixture scope to be shared across the whole package, which will enable the transformer tests to be run without re-running the scrapers 2022-03-10 10:20:49 -06:00
Tristan Lee
6cf3b8842d renamed 'archive_media' and 'media' to avoid name collision, changed scope of test fixture controller to 'function' so that db is fresh for each executed test 2022-03-09 13:19:35 -06:00
Tristan Lee
739e1d8484 added capability of running scraper without archiving media, and implemented prototype Telethon scraper for Telegram 2022-03-09 12:12:01 -06:00
Tristan Lee
cd5f68e9e5 added basic unit tests 2022-03-04 12:36:09 -06:00