Logan Williams
|
d5f6ce485b
|
Merge pull request #46 from bellingcat/next-release
Release 2022-04-12
v2022-04-12
|
2022-04-12 14:59:01 +02:00 |
|
Logan Williams
|
d1f9dd0e01
|
Limit max # of archived files per session
|
2022-04-12 12:57:04 +00:00 |
|
Logan Williams
|
bbb9d283d5
|
Add RumbleScraper, YoutubeScraper, and BitchuteScraper to the active scrapers
|
2022-04-12 14:55:45 +02:00 |
|
Logan Williams
|
6f11b88f94
|
Use Youtube cookie for Rumble too
|
2022-04-12 14:55:18 +02:00 |
|
Logan Williams
|
7b8236e6db
|
No recursive retries
|
2022-04-12 14:55:05 +02:00 |
|
Logan Williams
|
b596d3e055
|
Merge pull request #42 from bellingcat/youtube-age-restricted
Enable download of age-restricted videos on YouTube
|
2022-04-12 11:14:45 +02:00 |
|
Logan Williams
|
1f7f957e62
|
Merge pull request #44 from bellingcat/bitchute-error
Catch errors while retrieving Bitchute videos
|
2022-04-12 11:13:52 +02:00 |
|
Logan Williams
|
e05f69bbee
|
Merge pull request #38 from bellingcat/youtube-dl-retry
Added 'retries' argument to youtube_dl options
|
2022-04-12 11:11:51 +02:00 |
|
Tristan Lee
|
1f667d532e
|
made get_videos_user use request_from_bitchute requests wrapper to catch errors
|
2022-04-06 11:40:43 -05:00 |
|
Tristan Lee
|
f17800b797
|
added required YOUTUBE_COOKIESTRING environment variable to be used by YoutubeScraper
|
2022-04-05 21:22:41 -05:00 |
|
Tristan Lee
|
a204041480
|
made requested changes to scraper version numbers
|
2022-04-05 17:03:45 -05:00 |
|
Logan Williams
|
36c81c8e17
|
Merge pull request #40 from bellingcat/next-release
Add indices on appropriate columns; limit # of posts to archive
v2022-04-04
|
2022-04-04 13:03:16 +02:00 |
|
Logan Williams
|
b6386747d4
|
Add indices on appropriate columns; limit # of posts to archive
|
2022-04-04 10:54:27 +00:00 |
|
Tristan Lee
|
ed74c5692b
|
merged main
|
2022-04-03 19:35:16 -05:00 |
|
Tristan Lee
|
c7253148d1
|
added 'retries' argument to youtube-dl options, and made options consistent across youtube-dl instances.
|
2022-04-03 19:31:32 -05:00 |
|
Logan Williams
|
fccbad7a93
|
Remove 200 post limit; add log rotation
v2022-04-03
|
2022-04-03 16:32:00 +00:00 |
|
Logan Williams
|
4c580519dd
|
Remove Rumble scraper
|
2022-04-03 15:59:39 +02:00 |
|
Logan Williams
|
0140b09ee8
|
Release Telethon, VK, and Gettr as 0.0.1; specify unrelease 0.0.0 otherwise
|
2022-04-03 15:29:24 +02:00 |
|
Logan Williams
|
96db662572
|
Don't add a timestamp to media that failed to archive
|
2022-04-03 14:16:03 +02:00 |
|
Logan Williams
|
ecae1aad05
|
Catch exceptions in archive_files so that archiver continues to run
|
2022-04-03 14:12:23 +02:00 |
|
Logan Williams
|
9c838aae39
|
Update media_archived column even when TG post has no media
|
2022-04-03 13:29:10 +02:00 |
|
Logan Williams
|
57b9082271
|
Remove Odysee scraper due to errors
|
2022-04-03 13:26:05 +02:00 |
|
Logan Williams
|
a82ec15f0e
|
Change archived_media to be timestamp for all scrapers
|
2022-04-03 12:02:27 +02:00 |
|
Logan Williams
|
8ee20a239c
|
Merge branch 'main' into initial-release
|
2022-04-03 11:35:12 +02:00 |
|
Tristan Lee
|
90c99aec00
|
ensured that Gettr username is lowercase for API requests to work correctly
|
2022-04-02 22:36:25 -05:00 |
|
Tristan Lee
|
b0a52e5ad7
|
handled case where Rumble video has no view information displayed
|
2022-04-02 21:26:29 -05:00 |
|
Logan Williams
|
01bbabe0cb
|
Fix issues with new datetime baed 'media_archived' column
|
2022-04-02 18:45:08 +00:00 |
|
Logan Williams
|
63633617d2
|
Configure with Telethon and VK only
|
2022-04-02 18:34:14 +00:00 |
|
Logan Williams
|
0099558c68
|
Merge pull request #26 from bellingcat/deferred-media-archiving
Implemented deferred media archiving for all scrapers
|
2022-04-02 14:15:35 +02:00 |
|
Tristan Lee
|
0bab20e371
|
ensured that before being scraped, all channels are added to the database, preventing channel.platform_id from being null.
|
2022-04-01 17:03:02 -05:00 |
|
Tristan Lee
|
8ecb904249
|
merged main
|
2022-04-01 02:05:25 -05:00 |
|
Tristan Lee
|
282f33eff3
|
implemented deferred media archiving for all scrapers, and implemented tests for them. Refactored archiving methods of Instagram and Gettr scrapers to be able to use default archiving method
|
2022-04-01 01:30:49 -05:00 |
|
Logan Williams
|
d20db5f828
|
Catch exceptions in get_posts so that archiving continues despites errors
|
2022-03-31 20:27:18 +02:00 |
|
Logan Williams
|
16aad4ef2c
|
TelegramTelethonScraper: Using the username is fine.
|
2022-03-31 16:50:20 +02:00 |
|
Logan Williams
|
94cf6c3d84
|
TelegramTelethonScraper: Use channel_id when channel has been previously encountered
|
2022-03-31 16:37:54 +02:00 |
|
Logan Williams
|
061af984ee
|
Merge pull request #20 from bellingcat/separate-media-archiving
WIP: Separate media archiving and CLI
|
2022-03-31 16:28:30 +02:00 |
|
Logan Williams
|
7f87b03de5
|
Add option to clear registered scrapers, necessary for tests
|
2022-03-31 16:17:35 +02:00 |
|
Logan Williams
|
c8d1b96e3f
|
Fix bug in handling retweets without media
|
2022-03-31 15:51:17 +02:00 |
|
Logan Williams
|
a5cffa615f
|
Fix Twitter profile scraper, catch exceptions in controller
|
2022-03-31 15:37:58 +02:00 |
|
Logan Williams
|
2dc9213d64
|
Use new RawChannelInfo class
|
2022-03-31 15:17:25 +02:00 |
|
Logan Williams
|
61c99d33f6
|
Add Postgres support with psycopg2
|
2022-03-31 08:15:53 +02:00 |
|
Logan Williams
|
cff1953d21
|
Initial CLI tool
|
2022-03-31 08:15:11 +02:00 |
|
Logan Williams
|
1c1ff7fb6f
|
Fix bug with Telethon scraper and certain media; add media_archived flag to TwitterScraper
|
2022-03-31 08:15:09 +02:00 |
|
Logan Williams
|
19056a1d9a
|
Merge pull request #23 from bellingcat/profile
Added methods for retrieving channel profile metadata, refactored Gab scraper to use gabber
|
2022-03-31 08:13:17 +02:00 |
|
Tristan Lee
|
b7871b060d
|
added capability to scrape Gab group posts
|
2022-03-30 09:11:07 -05:00 |
|
Tristan Lee
|
1f99e52436
|
refactored Gab scraper to use gabber instead of garc
|
2022-03-30 08:05:10 -05:00 |
|
Tristan Lee
|
b805d50132
|
made tesets work, fixed several issues with Rumble scraper
|
2022-03-29 16:09:51 -05:00 |
|
Tristan Lee
|
67d1abf024
|
added methods for extracting channel profile metadata, and tests
|
2022-03-28 21:11:34 -05:00 |
|
Tristan Lee
|
ea40ea2640
|
merged main
|
2022-03-28 20:22:34 -05:00 |
|
Tristan Lee
|
5d6473e946
|
Merge pull request #19 from bellingcat/separate-media-archiving
Separate media archiving
|
2022-03-28 20:20:57 -05:00 |
|