Commit Graph

317 Commits

Author SHA1 Message Date
JustAnotherArchivist
dc6bc9bf9d Refactor how links on Twitter are handled
All links in text (tweets, profile descriptions, and profile links) are now represented by TextLink objects, which contain all relevant information: the displayed text (if available), the URL, the short t.co URL, and the indices in the text at which it appears.

Closes #478
2022-05-29 07:16:04 +00:00
JustAnotherArchivist
01cf6a09b3 Fix type of description URL objects 2022-05-29 05:08:23 +00:00
JustAnotherArchivist
ef7c4fad3e Fix AttributeError for DescriptionURL on from-import 2022-05-29 05:08:23 +00:00
JustAnotherArchivist
faeffe2603 Merge pull request #474 from GeraniumKF/GeraniumKF-reddit-since-crash
Fix crash using --since with Reddit
2022-05-23 23:06:16 +00:00
Geranium
e3bdc02a7c Reddit: deprecate 'created' property for 'date'
This fixes a crash when using --since with the Reddit scraper,
as the CLI code expects items to have a date property.
2022-05-23 23:31:44 +01:00
JustAnotherArchivist
ed3ea944d1 Fix newsletter issue cards without an issue description
Fixes #456
2022-04-16 19:44:36 +00:00
JustAnotherArchivist
e7a6d38a5f Add support for community_details cards 2022-04-15 20:07:01 +00:00
JustAnotherArchivist
6c50eee31b Fix proxies not being applied correctly due to missing merge with environment settings
Fixes #447
2022-04-15 19:23:54 +00:00
JustAnotherArchivist
5103a33afa Fix t.co card URL replacement on retweets
Fixes #411
2022-04-15 03:18:45 +00:00
JustAnotherArchivist
247bd82d79 Refactor to tweetId variable 2022-04-15 03:14:29 +00:00
JustAnotherArchivist
5fc67f2bcf Add support for 'message me' cards 2022-04-15 02:52:37 +00:00
JustAnotherArchivist
65e7d8bd24 Fix warning on card URL translation to include the tweet ID 2022-04-15 02:52:03 +00:00
JustAnotherArchivist
3870282a42 Fix broadcast and event card crashes 2022-04-12 20:53:38 +00:00
JustAnotherArchivist
7c0fcdec43 Fix Periscope card crashes 2022-04-12 18:29:51 +00:00
JustAnotherArchivist
9af1f19034 Properly support all card types
Fixes #407
2022-04-12 18:11:26 +00:00
JustAnotherArchivist
5fc3c0e290 Fix crash in locals dumping on module-less frames 2022-04-12 18:03:36 +00:00
JustAnotherArchivist
5d156c6a15 Detect and raise error on redirect from GraphQL endpoint to login
#165
2022-04-03 02:34:30 +00:00
JustAnotherArchivist
694657ef80 Fix broken exception references 2022-03-09 01:01:47 +00:00
JustAnotherArchivist
1ab0f4fccb Fix missing quoted tweet reference in certain buggy cases 2022-03-07 22:16:58 +00:00
JustAnotherArchivist
3a92b5bf0d Add log message for guest token file deletion 2022-02-26 19:32:55 +00:00
JustAnotherArchivist
2480b173f4 Fix crash on race condition in CLI guest token manager resets
Fixes #414
2022-02-26 19:31:08 +00:00
JustAnotherArchivist
77bbb9f61f Remove useless pass 2022-02-20 18:54:51 +00:00
JustAnotherArchivist
57a624c618 Merge pull request #410 from AccentuSoft/master
Fix Vkontakte-user module crash on users with millions of followers
2022-02-18 06:01:35 +00:00
AccentuSoft
b1cfd51121 Implementing changes 2022-02-17 21:52:15 +02:00
AccentuSoft
ace2c16f54 Fix Vkontakte-user module crash on users with millions of followers 2022-02-17 15:42:46 +02:00
JustAnotherArchivist
2f9c0457df Convert t.co card URLs to unshortened when possible 2022-02-17 01:50:15 +00:00
JustAnotherArchivist
878f2a3c7a Handle cards without descriptions and thumbnails
Fixes #407
2022-02-17 01:49:32 +00:00
JustAnotherArchivist
25ee014e29 Extract cards 2022-02-16 02:59:21 +00:00
JustAnotherArchivist
a192dc6236 Handle TweetWithVisibilityResults
Fixes #400
2022-02-14 18:08:59 +00:00
JustAnotherArchivist
a7242f340b Remove obsolete TODO
There is no retweetedTweetRef in Twitter's JS.
2022-02-14 18:08:29 +00:00
JustAnotherArchivist
359cc25cdf Fix crash on entity attribute when scraping suspended users
Fixes #396
2022-02-10 04:22:59 +00:00
JustAnotherArchivist
01799a7391 Detect when CLI guest token from file has expired 2022-02-08 19:38:45 +00:00
JustAnotherArchivist
b0753c34ed Fix forgotten method name changes in 7d939c11
Fixes #393
2022-02-08 15:35:49 +00:00
JustAnotherArchivist
7f78fa0bc0 Recurse through all tweets encountered, not only ones with a positive replyCount
Fixes #266
2022-02-07 18:13:56 +00:00
JustAnotherArchivist
8702a9c7e2 Add Reddit submission scraper
Closes #312
2022-02-07 04:43:54 +00:00
JustAnotherArchivist
8ac1fd3ea8 Refactor Pushshift code to separate the general things from the search 2022-02-07 04:43:19 +00:00
JustAnotherArchivist
9235890f9a Fix KeyError crash on attempting to scrape inexistent tweet ID 2022-02-07 04:04:21 +00:00
JustAnotherArchivist
7d939c110c Port profile and tweet scrapers to GraphQL API
Fixes #367
2022-02-07 03:49:14 +00:00
JustAnotherArchivist
8e95e9a9a7 Fix crash on places without a bounding box
Fixes #374
2022-02-07 00:38:22 +00:00
JustAnotherArchivist
aa7d7d3dc3 Refactor automatic importing in snscrape.modules to something less hacky
Cf. #357
2022-02-05 03:22:55 +00:00
JustAnotherArchivist
560c78c5cf Make all optional scraper arguments keyword-only and fix Mastodon argument style to conform with the other scrapers
Cf. #376
2022-01-30 00:21:18 +00:00
JustAnotherArchivist
107c3c71c2 Remove unnecessary f-strings
Cf. #370
2022-01-28 21:22:13 +00:00
JustAnotherArchivist
7f88678253 Merge pull request #359 from own3dh2so4/master
Added proxy option to Scraper base
2022-01-13 23:08:28 +00:00
David Garcia Alvarez
52e4f9fb69 Added proxy option to Scraper base 2022-01-13 16:56:00 +01:00
JustAnotherArchivist
eebdfc1c55 Refactor username vs ID mess
Closes #354
2022-01-12 22:36:26 +00:00
JustAnotherArchivist
e6076353c8 Fix user ID being a string instead of an int on the entity 2022-01-12 22:35:50 +00:00
JustAnotherArchivist
a32d79fab2 Fix crash on certain mblogs that lack the raw_text attribute 2022-01-12 22:31:49 +00:00
JustAnotherArchivist
65391297f6 Move CLI methods to end of class definition for consistent code style 2022-01-12 21:09:38 +00:00
JustAnotherArchivist
deb2659dd6 Prefix CLI-related methods with an underscore
Closes #355
2022-01-12 21:07:10 +00:00
JustAnotherArchivist
93e62744d7 Fix missing timezone info 2022-01-07 00:42:09 +00:00