Commit Graph

31 Commits

Author SHA1 Message Date
Tristan Lee
c21e43ddfa refactored import structure 2022-03-04 10:55:54 -06:00
Tristan Lee
75240bb060 fixed various bugs related to archived URL creation and media downloading. Things seem to work well now 2022-03-01 15:58:18 -06:00
Tristan Lee
f3d9dc91c6 changed URL parsing to use urllib 2022-03-01 14:13:04 -06:00
Tristan Lee
ee4d64750b added prototype Rumble scraper 2022-02-28 18:38:33 -06:00
Tristan Lee
bc840e631d added Gab scraper 2022-02-28 12:11:21 -06:00
Tristan Lee
7a257ea9f5 included comments in Odysee scraper 2022-02-28 09:15:09 -06:00
Tristan Lee
36fb95d9ae Merge branch 'main' into media 2022-02-25 20:30:36 -06:00
Tristan Lee
47dad8fb00 added odysee scraper, minor refactoring of url_to_blob method (added url_to_key method that can be overridden by child classes while still using the parent url_to_blob method) and changed test file to include only channels with a relatively small number of posts, to make testing faster 2022-02-25 20:28:00 -06:00
Tristan Lee
ef83cc4b0a converted bitchute to yield, got video archiving working on bitchute and gettr, added url_to_blob method that downloads media bytes blob from url and converted archive_media to take in the media bytes blob instead of the media url. 2022-02-25 13:43:30 -06:00
Tristan Lee
bd7bbdf993 Merge pull request #2 from bellingcat/media
WIP: Archiving media, organization improvements
2022-02-25 08:26:58 -06:00
Logan Williams
8ab56ff5ba Remove MAX_POSTS, auto detect MIME type
Co-authored-by: Tristan Lee <tristan@bellingcat.com>
2022-02-25 08:52:42 +01:00
Logan Williams
e6085689b5 On second thought, don't share secrets 2022-02-24 20:47:46 +01:00
Logan Williams
3480452fac Fix type hints 2022-02-24 20:36:23 +01:00
Logan Williams
1ad7c8bc11 Search for since per-channel 2022-02-24 20:26:10 +01:00
Logan Williams
0b1c175dd9 Modify GettrScraper to yield results, archive media (videos incomplete) 2022-02-24 20:25:14 +01:00
Logan Williams
456d592792 Use user id for TwitterScraper 2022-02-24 20:24:03 +01:00
Logan Williams
d159c09aa4 yield data rather than returning a list 2022-02-24 18:58:08 +01:00
Logan Williams
d163e6b3d9 Fix logging logic in scraper controller 2022-02-24 18:49:06 +01:00
Logan Williams
e64d845002 Archive media in Twitter scraper 2022-02-24 18:48:48 +01:00
Logan Williams
214287b7a8 Archive media in dictionary 2022-02-24 17:35:24 +01:00
Logan Williams
a87cfd570a Add Telegram channel scraper 2022-02-24 16:37:13 +01:00
Logan Williams
6092e4caa5 Add method for archiving media, reoranize scraper base classes 2022-02-24 16:36:55 +01:00
Logan Williams
e09e0f5202 Merge pull request #1 from bellingcat/add-docs
Add documentation generation with Sphinx
2022-02-22 15:19:54 +01:00
Tristan Lee
9fe3d90b0b fixed warnings from sphinx-build, made build path consistent with gitignore (removed sphinx build directory from version control) 2022-02-21 16:13:16 -06:00
Logan Williams
e3d29bf811 Add documentation generation with Sphinx 2022-02-21 17:52:38 +01:00
Tristan Lee
139459e3b2 implemented Bitchute scraper 2022-02-18 12:45:10 -06:00
Tristan Lee
4668d4df11 implemented Gettr scraper 2022-02-18 10:13:37 -06:00
Logan Williams
0e5f9f77f3 Configure pipenv 2022-02-18 15:05:02 +01:00
Logan Williams
b824b98a95 Reorganize transformer defition location 2022-02-18 14:57:10 +01:00
Logan Williams
c5d49ef521 Reorganize class definitions slightly 2022-02-18 14:14:25 +01:00
Logan Williams
82ad210b8e Initial commit 2022-02-18 14:01:49 +01:00