Commit Graph

299 Commits

Author SHA1 Message Date
Tristan Lee
ed829163a0 added capability to extract the number of channel members when the the string in membersDiv has the word 'subscribers' rather than 'members'. 2022-03-29 01:12:07 -05:00
Logan Williams
de4ebed81f Fix KeyError caused by retweets without URLs in TwitterProfileScraper 2022-02-24 18:08:12 +01:00
Logan Williams
72b26f2373 Scrape images, video, and post forwarding information for Telegram channel posts 2022-02-24 15:31:02 +01:00
JustAnotherArchivist
77bbb9f61f Remove useless pass 2022-02-20 18:54:51 +00:00
JustAnotherArchivist
57a624c618 Merge pull request #410 from AccentuSoft/master
Fix Vkontakte-user module crash on users with millions of followers
2022-02-18 06:01:35 +00:00
AccentuSoft
b1cfd51121 Implementing changes 2022-02-17 21:52:15 +02:00
AccentuSoft
ace2c16f54 Fix Vkontakte-user module crash on users with millions of followers 2022-02-17 15:42:46 +02:00
JustAnotherArchivist
2f9c0457df Convert t.co card URLs to unshortened when possible 2022-02-17 01:50:15 +00:00
JustAnotherArchivist
878f2a3c7a Handle cards without descriptions and thumbnails
Fixes #407
2022-02-17 01:49:32 +00:00
JustAnotherArchivist
25ee014e29 Extract cards 2022-02-16 02:59:21 +00:00
JustAnotherArchivist
a192dc6236 Handle TweetWithVisibilityResults
Fixes #400
2022-02-14 18:08:59 +00:00
JustAnotherArchivist
a7242f340b Remove obsolete TODO
There is no retweetedTweetRef in Twitter's JS.
2022-02-14 18:08:29 +00:00
JustAnotherArchivist
359cc25cdf Fix crash on entity attribute when scraping suspended users
Fixes #396
2022-02-10 04:22:59 +00:00
JustAnotherArchivist
01799a7391 Detect when CLI guest token from file has expired 2022-02-08 19:38:45 +00:00
JustAnotherArchivist
b0753c34ed Fix forgotten method name changes in 7d939c11
Fixes #393
2022-02-08 15:35:49 +00:00
JustAnotherArchivist
7f78fa0bc0 Recurse through all tweets encountered, not only ones with a positive replyCount
Fixes #266
2022-02-07 18:13:56 +00:00
JustAnotherArchivist
8702a9c7e2 Add Reddit submission scraper
Closes #312
2022-02-07 04:43:54 +00:00
JustAnotherArchivist
8ac1fd3ea8 Refactor Pushshift code to separate the general things from the search 2022-02-07 04:43:19 +00:00
JustAnotherArchivist
9235890f9a Fix KeyError crash on attempting to scrape inexistent tweet ID 2022-02-07 04:04:21 +00:00
JustAnotherArchivist
7d939c110c Port profile and tweet scrapers to GraphQL API
Fixes #367
2022-02-07 03:49:14 +00:00
JustAnotherArchivist
8e95e9a9a7 Fix crash on places without a bounding box
Fixes #374
2022-02-07 00:38:22 +00:00
JustAnotherArchivist
aa7d7d3dc3 Refactor automatic importing in snscrape.modules to something less hacky
Cf. #357
2022-02-05 03:22:55 +00:00
JustAnotherArchivist
560c78c5cf Make all optional scraper arguments keyword-only and fix Mastodon argument style to conform with the other scrapers
Cf. #376
2022-01-30 00:21:18 +00:00
JustAnotherArchivist
107c3c71c2 Remove unnecessary f-strings
Cf. #370
2022-01-28 21:22:13 +00:00
JustAnotherArchivist
7f88678253 Merge pull request #359 from own3dh2so4/master
Added proxy option to Scraper base
2022-01-13 23:08:28 +00:00
David Garcia Alvarez
52e4f9fb69 Added proxy option to Scraper base 2022-01-13 16:56:00 +01:00
JustAnotherArchivist
eebdfc1c55 Refactor username vs ID mess
Closes #354
2022-01-12 22:36:26 +00:00
JustAnotherArchivist
e6076353c8 Fix user ID being a string instead of an int on the entity 2022-01-12 22:35:50 +00:00
JustAnotherArchivist
a32d79fab2 Fix crash on certain mblogs that lack the raw_text attribute 2022-01-12 22:31:49 +00:00
JustAnotherArchivist
65391297f6 Move CLI methods to end of class definition for consistent code style 2022-01-12 21:09:38 +00:00
JustAnotherArchivist
deb2659dd6 Prefix CLI-related methods with an underscore
Closes #355
2022-01-12 21:07:10 +00:00
JustAnotherArchivist
93e62744d7 Fix missing timezone info 2022-01-07 00:42:09 +00:00
JustAnotherArchivist
3f3632d341 Add support for Mastodon profile and toot scrapes
Closes #43
2022-01-06 03:25:06 +00:00
JustAnotherArchivist
5070953feb Skip private fields and properties on dataclass-to-JSON conversion 2022-01-06 02:08:48 +00:00
JustAnotherArchivist
853848ed5d ScrollDirection is not part of the public API 2022-01-05 19:43:19 +00:00
JustAnotherArchivist
0b4abdc43f Fix baseUrl on tweet scrapes 2022-01-05 02:39:54 +00:00
JustAnotherArchivist
267b7d0e32 Rename CLI classmethods 2022-01-05 02:27:09 +00:00
JustAnotherArchivist
acb7f10a4f Cache Twitter tokens on disk from the CLI for reuse between scrapes
Closes #339
2022-01-05 02:20:40 +00:00
JustAnotherArchivist
ca00b480b1 Fix AssertionError on quoted comments
Fixes #340
2022-01-04 01:15:08 +00:00
JustAnotherArchivist
f189ab4241 Prefix all private API names with an underscore
Cf. #328
2022-01-03 17:51:23 +00:00
JustAnotherArchivist
c6e1e33a23 Fix crashing typos 2022-01-03 17:49:55 +00:00
JustAnotherArchivist
a37ea528d3 Refactor Reddit scrapers again to merge RedditPushshiftScraper and RedditScraper
Cf. #328
2022-01-03 17:48:35 +00:00
JustAnotherArchivist
eee06d8593 Refactor Reddit scrapers into a more reasonable code structure
Cf. #328
2021-12-24 04:58:32 +00:00
JustAnotherArchivist
4dd3ee6e47 Refactor Instagram scrapers to get rid of the awkward mode parameter
Cf. #328
2021-12-24 04:50:53 +00:00
JustAnotherArchivist
0336ce13ed Add support for fetching a guest token from the API 2021-12-23 04:26:50 +00:00
JustAnotherArchivist
193d4f80d6 Fix user agent in API headers staying constant 2021-12-23 04:25:23 +00:00
JustAnotherArchivist
e7d35ec1eb Fix date parsing on quoted posts 2021-12-15 16:55:14 +00:00
JustAnotherArchivist
8540045658 Fix typo 2021-12-15 16:36:28 +00:00
JustAnotherArchivist
1f1c1bd8af Fix docstring style 2021-12-14 20:05:51 +00:00
JustAnotherArchivist
7fdc8bcb53 Randomise user agent when the guest token can't be found 2021-12-14 20:04:46 +00:00