Commit Graph

336 Commits

Author SHA1 Message Date
Tristan Lee
0822a9c354 Merge pull request #4 from JustAnotherArchivist/master
upstream merge
2022-05-24 23:10:38 -07:00
JustAnotherArchivist
faeffe2603 Merge pull request #474 from GeraniumKF/GeraniumKF-reddit-since-crash
Fix crash using --since with Reddit
2022-05-23 23:06:16 +00:00
Geranium
e3bdc02a7c Reddit: deprecate 'created' property for 'date'
This fixes a crash when using --since with the Reddit scraper,
as the CLI code expects items to have a date property.
2022-05-23 23:31:44 +01:00
Tristan Lee
e2d922301e forgot to save modified twitter.py module 2022-05-09 09:37:36 -05:00
Tristan Lee
b13e62eb5d Merge branch 'JustAnotherArchivist-master' 2022-05-09 09:35:35 -05:00
Tristan Lee
f38513503d fixed merge conflicts 2022-05-09 09:35:19 -05:00
Tristan Lee
0a4bd39ca6 Merge pull request #2 from bellingcat/telegram-media
Implemented JustAnotherArchivist's requested changes to Telegram scraper from PR
2022-05-09 07:23:39 -07:00
Tristan Lee
c18ca0f047 Merge branch 'master' into telegram-media 2022-05-09 09:21:40 -05:00
Tristan Lee
5648e957d0 improved consistency of code formatting and added _STYLE_MEDIA_URL_PATTERN as variable 2022-04-27 16:41:24 -05:00
Tristan Lee
21f7b620ec moved forward finding out of tgme_widget_message_text clause, since it wasn't correctly getting the forwarding information in forwarded posts that contained attachments but no text 2022-04-21 18:26:31 -05:00
Tristan Lee
9b3faec980 added additional attributes for hashtags and user mentions, removed redundant outlinks 2022-04-21 18:06:43 -05:00
Tristan Lee
97d38e5cde added additional termination criteria to Telegram scraper 2022-04-21 09:41:53 -05:00
Tristan Lee
b276c3cc27 fixed issue where some videos and photos weren't being scraped (because they weren't in a post containing a 'tgme_widget_message_text' div 2022-04-17 06:50:43 -05:00
Tristan Lee
1e4e0c278d fixed issue where Telegram scraper terminated early because some pages didn't have a next page link (added reasonable default) 2022-04-17 04:33:22 -05:00
Tristan Lee
babcddda19 made Telegram scraper not return full channel info for forwarded_from attribute; fixed video edge cases. 2022-04-17 03:55:37 -05:00
JustAnotherArchivist
ed3ea944d1 Fix newsletter issue cards without an issue description
Fixes #456
2022-04-16 19:44:36 +00:00
JustAnotherArchivist
e7a6d38a5f Add support for community_details cards 2022-04-15 20:07:01 +00:00
JustAnotherArchivist
6c50eee31b Fix proxies not being applied correctly due to missing merge with environment settings
Fixes #447
2022-04-15 19:23:54 +00:00
JustAnotherArchivist
5103a33afa Fix t.co card URL replacement on retweets
Fixes #411
2022-04-15 03:18:45 +00:00
JustAnotherArchivist
247bd82d79 Refactor to tweetId variable 2022-04-15 03:14:29 +00:00
JustAnotherArchivist
5fc67f2bcf Add support for 'message me' cards 2022-04-15 02:52:37 +00:00
JustAnotherArchivist
65e7d8bd24 Fix warning on card URL translation to include the tweet ID 2022-04-15 02:52:03 +00:00
JustAnotherArchivist
3870282a42 Fix broadcast and event card crashes 2022-04-12 20:53:38 +00:00
JustAnotherArchivist
7c0fcdec43 Fix Periscope card crashes 2022-04-12 18:29:51 +00:00
JustAnotherArchivist
9af1f19034 Properly support all card types
Fixes #407
2022-04-12 18:11:26 +00:00
JustAnotherArchivist
5fc3c0e290 Fix crash in locals dumping on module-less frames 2022-04-12 18:03:36 +00:00
Tristan Lee
f978954bb3 Merge branch 'JustAnotherArchivist:master' into master 2022-04-03 01:49:28 -05:00
Tristan Lee
2ce014ade4 fixed edge case for videos that have data-link-attr but no href attribute 2022-04-03 01:45:25 -05:00
JustAnotherArchivist
5d156c6a15 Detect and raise error on redirect from GraphQL endpoint to login
#165
2022-04-03 02:34:30 +00:00
Tristan Lee
4e59638e7c added a forwardedUrl attribute to TelegramPost and made forwarded attribute type Channel. 2022-03-30 21:33:03 -05:00
Tristan Lee
a7eb54d226 implemented Media dataclasses for Telegram, and added variable for extracting a post's view count 2022-03-30 21:07:17 -05:00
Tristan Lee
d32c9add8a added capability to scrape multiple videos from a single post 2022-03-30 18:13:15 -05:00
Tristan Lee
fb8d73ac95 handled case where channel has no profile image 2022-03-29 13:15:53 -05:00
Tristan Lee
ed829163a0 added capability to extract the number of channel members when the the string in membersDiv has the word 'subscribers' rather than 'members'. 2022-03-29 01:12:07 -05:00
JustAnotherArchivist
694657ef80 Fix broken exception references 2022-03-09 01:01:47 +00:00
JustAnotherArchivist
1ab0f4fccb Fix missing quoted tweet reference in certain buggy cases 2022-03-07 22:16:58 +00:00
JustAnotherArchivist
3a92b5bf0d Add log message for guest token file deletion 2022-02-26 19:32:55 +00:00
JustAnotherArchivist
2480b173f4 Fix crash on race condition in CLI guest token manager resets
Fixes #414
2022-02-26 19:31:08 +00:00
Logan Williams
de4ebed81f Fix KeyError caused by retweets without URLs in TwitterProfileScraper 2022-02-24 18:08:12 +01:00
Logan Williams
72b26f2373 Scrape images, video, and post forwarding information for Telegram channel posts 2022-02-24 15:31:02 +01:00
JustAnotherArchivist
77bbb9f61f Remove useless pass 2022-02-20 18:54:51 +00:00
JustAnotherArchivist
57a624c618 Merge pull request #410 from AccentuSoft/master
Fix Vkontakte-user module crash on users with millions of followers
2022-02-18 06:01:35 +00:00
AccentuSoft
b1cfd51121 Implementing changes 2022-02-17 21:52:15 +02:00
AccentuSoft
ace2c16f54 Fix Vkontakte-user module crash on users with millions of followers 2022-02-17 15:42:46 +02:00
JustAnotherArchivist
2f9c0457df Convert t.co card URLs to unshortened when possible 2022-02-17 01:50:15 +00:00
JustAnotherArchivist
878f2a3c7a Handle cards without descriptions and thumbnails
Fixes #407
2022-02-17 01:49:32 +00:00
JustAnotherArchivist
25ee014e29 Extract cards 2022-02-16 02:59:21 +00:00
JustAnotherArchivist
a192dc6236 Handle TweetWithVisibilityResults
Fixes #400
2022-02-14 18:08:59 +00:00
JustAnotherArchivist
a7242f340b Remove obsolete TODO
There is no retweetedTweetRef in Twitter's JS.
2022-02-14 18:08:29 +00:00
JustAnotherArchivist
359cc25cdf Fix crash on entity attribute when scraping suspended users
Fixes #396
2022-02-10 04:22:59 +00:00