Tristan Lee
|
0822a9c354
|
Merge pull request #4 from JustAnotherArchivist/master
upstream merge
|
2022-05-24 23:10:38 -07:00 |
|
JustAnotherArchivist
|
faeffe2603
|
Merge pull request #474 from GeraniumKF/GeraniumKF-reddit-since-crash
Fix crash using --since with Reddit
|
2022-05-23 23:06:16 +00:00 |
|
Geranium
|
e3bdc02a7c
|
Reddit: deprecate 'created' property for 'date'
This fixes a crash when using --since with the Reddit scraper,
as the CLI code expects items to have a date property.
|
2022-05-23 23:31:44 +01:00 |
|
Tristan Lee
|
e2d922301e
|
forgot to save modified twitter.py module
|
2022-05-09 09:37:36 -05:00 |
|
Tristan Lee
|
b13e62eb5d
|
Merge branch 'JustAnotherArchivist-master'
|
2022-05-09 09:35:35 -05:00 |
|
Tristan Lee
|
f38513503d
|
fixed merge conflicts
|
2022-05-09 09:35:19 -05:00 |
|
Tristan Lee
|
0a4bd39ca6
|
Merge pull request #2 from bellingcat/telegram-media
Implemented JustAnotherArchivist's requested changes to Telegram scraper from PR
|
2022-05-09 07:23:39 -07:00 |
|
Tristan Lee
|
c18ca0f047
|
Merge branch 'master' into telegram-media
|
2022-05-09 09:21:40 -05:00 |
|
Tristan Lee
|
5648e957d0
|
improved consistency of code formatting and added _STYLE_MEDIA_URL_PATTERN as variable
|
2022-04-27 16:41:24 -05:00 |
|
Tristan Lee
|
21f7b620ec
|
moved forward finding out of tgme_widget_message_text clause, since it wasn't correctly getting the forwarding information in forwarded posts that contained attachments but no text
|
2022-04-21 18:26:31 -05:00 |
|
Tristan Lee
|
9b3faec980
|
added additional attributes for hashtags and user mentions, removed redundant outlinks
|
2022-04-21 18:06:43 -05:00 |
|
Tristan Lee
|
97d38e5cde
|
added additional termination criteria to Telegram scraper
|
2022-04-21 09:41:53 -05:00 |
|
Tristan Lee
|
b276c3cc27
|
fixed issue where some videos and photos weren't being scraped (because they weren't in a post containing a 'tgme_widget_message_text' div
|
2022-04-17 06:50:43 -05:00 |
|
Tristan Lee
|
1e4e0c278d
|
fixed issue where Telegram scraper terminated early because some pages didn't have a next page link (added reasonable default)
|
2022-04-17 04:33:22 -05:00 |
|
Tristan Lee
|
babcddda19
|
made Telegram scraper not return full channel info for forwarded_from attribute; fixed video edge cases.
|
2022-04-17 03:55:37 -05:00 |
|
JustAnotherArchivist
|
ed3ea944d1
|
Fix newsletter issue cards without an issue description
Fixes #456
|
2022-04-16 19:44:36 +00:00 |
|
JustAnotherArchivist
|
e7a6d38a5f
|
Add support for community_details cards
|
2022-04-15 20:07:01 +00:00 |
|
JustAnotherArchivist
|
6c50eee31b
|
Fix proxies not being applied correctly due to missing merge with environment settings
Fixes #447
|
2022-04-15 19:23:54 +00:00 |
|
JustAnotherArchivist
|
5103a33afa
|
Fix t.co card URL replacement on retweets
Fixes #411
|
2022-04-15 03:18:45 +00:00 |
|
JustAnotherArchivist
|
247bd82d79
|
Refactor to tweetId variable
|
2022-04-15 03:14:29 +00:00 |
|
JustAnotherArchivist
|
5fc67f2bcf
|
Add support for 'message me' cards
|
2022-04-15 02:52:37 +00:00 |
|
JustAnotherArchivist
|
65e7d8bd24
|
Fix warning on card URL translation to include the tweet ID
|
2022-04-15 02:52:03 +00:00 |
|
JustAnotherArchivist
|
3870282a42
|
Fix broadcast and event card crashes
|
2022-04-12 20:53:38 +00:00 |
|
JustAnotherArchivist
|
7c0fcdec43
|
Fix Periscope card crashes
|
2022-04-12 18:29:51 +00:00 |
|
JustAnotherArchivist
|
9af1f19034
|
Properly support all card types
Fixes #407
|
2022-04-12 18:11:26 +00:00 |
|
JustAnotherArchivist
|
5fc3c0e290
|
Fix crash in locals dumping on module-less frames
|
2022-04-12 18:03:36 +00:00 |
|
Tristan Lee
|
f978954bb3
|
Merge branch 'JustAnotherArchivist:master' into master
|
2022-04-03 01:49:28 -05:00 |
|
Tristan Lee
|
2ce014ade4
|
fixed edge case for videos that have data-link-attr but no href attribute
|
2022-04-03 01:45:25 -05:00 |
|
JustAnotherArchivist
|
5d156c6a15
|
Detect and raise error on redirect from GraphQL endpoint to login
#165
|
2022-04-03 02:34:30 +00:00 |
|
Tristan Lee
|
4e59638e7c
|
added a forwardedUrl attribute to TelegramPost and made forwarded attribute type Channel.
|
2022-03-30 21:33:03 -05:00 |
|
Tristan Lee
|
a7eb54d226
|
implemented Media dataclasses for Telegram, and added variable for extracting a post's view count
|
2022-03-30 21:07:17 -05:00 |
|
Tristan Lee
|
d32c9add8a
|
added capability to scrape multiple videos from a single post
|
2022-03-30 18:13:15 -05:00 |
|
Tristan Lee
|
fb8d73ac95
|
handled case where channel has no profile image
|
2022-03-29 13:15:53 -05:00 |
|
Tristan Lee
|
ed829163a0
|
added capability to extract the number of channel members when the the string in membersDiv has the word 'subscribers' rather than 'members'.
|
2022-03-29 01:12:07 -05:00 |
|
JustAnotherArchivist
|
694657ef80
|
Fix broken exception references
|
2022-03-09 01:01:47 +00:00 |
|
JustAnotherArchivist
|
1ab0f4fccb
|
Fix missing quoted tweet reference in certain buggy cases
|
2022-03-07 22:16:58 +00:00 |
|
JustAnotherArchivist
|
3a92b5bf0d
|
Add log message for guest token file deletion
|
2022-02-26 19:32:55 +00:00 |
|
JustAnotherArchivist
|
2480b173f4
|
Fix crash on race condition in CLI guest token manager resets
Fixes #414
|
2022-02-26 19:31:08 +00:00 |
|
Logan Williams
|
de4ebed81f
|
Fix KeyError caused by retweets without URLs in TwitterProfileScraper
|
2022-02-24 18:08:12 +01:00 |
|
Logan Williams
|
72b26f2373
|
Scrape images, video, and post forwarding information for Telegram channel posts
|
2022-02-24 15:31:02 +01:00 |
|
JustAnotherArchivist
|
77bbb9f61f
|
Remove useless pass
|
2022-02-20 18:54:51 +00:00 |
|
JustAnotherArchivist
|
57a624c618
|
Merge pull request #410 from AccentuSoft/master
Fix Vkontakte-user module crash on users with millions of followers
|
2022-02-18 06:01:35 +00:00 |
|
AccentuSoft
|
b1cfd51121
|
Implementing changes
|
2022-02-17 21:52:15 +02:00 |
|
AccentuSoft
|
ace2c16f54
|
Fix Vkontakte-user module crash on users with millions of followers
|
2022-02-17 15:42:46 +02:00 |
|
JustAnotherArchivist
|
2f9c0457df
|
Convert t.co card URLs to unshortened when possible
|
2022-02-17 01:50:15 +00:00 |
|
JustAnotherArchivist
|
878f2a3c7a
|
Handle cards without descriptions and thumbnails
Fixes #407
|
2022-02-17 01:49:32 +00:00 |
|
JustAnotherArchivist
|
25ee014e29
|
Extract cards
|
2022-02-16 02:59:21 +00:00 |
|
JustAnotherArchivist
|
a192dc6236
|
Handle TweetWithVisibilityResults
Fixes #400
|
2022-02-14 18:08:59 +00:00 |
|
JustAnotherArchivist
|
a7242f340b
|
Remove obsolete TODO
There is no retweetedTweetRef in Twitter's JS.
|
2022-02-14 18:08:29 +00:00 |
|
JustAnotherArchivist
|
359cc25cdf
|
Fix crash on entity attribute when scraping suspended users
Fixes #396
|
2022-02-10 04:22:59 +00:00 |
|