Commit Graph

287 Commits

Author SHA1 Message Date
JustAnotherArchivist
359cc25cdf Fix crash on entity attribute when scraping suspended users
Fixes #396
2022-02-10 04:22:59 +00:00
JustAnotherArchivist
01799a7391 Detect when CLI guest token from file has expired 2022-02-08 19:38:45 +00:00
JustAnotherArchivist
b0753c34ed Fix forgotten method name changes in 7d939c11
Fixes #393
2022-02-08 15:35:49 +00:00
JustAnotherArchivist
7f78fa0bc0 Recurse through all tweets encountered, not only ones with a positive replyCount
Fixes #266
2022-02-07 18:13:56 +00:00
JustAnotherArchivist
8702a9c7e2 Add Reddit submission scraper
Closes #312
2022-02-07 04:43:54 +00:00
JustAnotherArchivist
8ac1fd3ea8 Refactor Pushshift code to separate the general things from the search 2022-02-07 04:43:19 +00:00
JustAnotherArchivist
9235890f9a Fix KeyError crash on attempting to scrape inexistent tweet ID 2022-02-07 04:04:21 +00:00
JustAnotherArchivist
7d939c110c Port profile and tweet scrapers to GraphQL API
Fixes #367
2022-02-07 03:49:14 +00:00
JustAnotherArchivist
8e95e9a9a7 Fix crash on places without a bounding box
Fixes #374
2022-02-07 00:38:22 +00:00
JustAnotherArchivist
aa7d7d3dc3 Refactor automatic importing in snscrape.modules to something less hacky
Cf. #357
2022-02-05 03:22:55 +00:00
JustAnotherArchivist
560c78c5cf Make all optional scraper arguments keyword-only and fix Mastodon argument style to conform with the other scrapers
Cf. #376
2022-01-30 00:21:18 +00:00
JustAnotherArchivist
107c3c71c2 Remove unnecessary f-strings
Cf. #370
2022-01-28 21:22:13 +00:00
JustAnotherArchivist
7f88678253 Merge pull request #359 from own3dh2so4/master
Added proxy option to Scraper base
2022-01-13 23:08:28 +00:00
David Garcia Alvarez
52e4f9fb69 Added proxy option to Scraper base 2022-01-13 16:56:00 +01:00
JustAnotherArchivist
eebdfc1c55 Refactor username vs ID mess
Closes #354
2022-01-12 22:36:26 +00:00
JustAnotherArchivist
e6076353c8 Fix user ID being a string instead of an int on the entity 2022-01-12 22:35:50 +00:00
JustAnotherArchivist
a32d79fab2 Fix crash on certain mblogs that lack the raw_text attribute 2022-01-12 22:31:49 +00:00
JustAnotherArchivist
65391297f6 Move CLI methods to end of class definition for consistent code style 2022-01-12 21:09:38 +00:00
JustAnotherArchivist
deb2659dd6 Prefix CLI-related methods with an underscore
Closes #355
2022-01-12 21:07:10 +00:00
JustAnotherArchivist
93e62744d7 Fix missing timezone info 2022-01-07 00:42:09 +00:00
JustAnotherArchivist
3f3632d341 Add support for Mastodon profile and toot scrapes
Closes #43
2022-01-06 03:25:06 +00:00
JustAnotherArchivist
5070953feb Skip private fields and properties on dataclass-to-JSON conversion 2022-01-06 02:08:48 +00:00
JustAnotherArchivist
853848ed5d ScrollDirection is not part of the public API 2022-01-05 19:43:19 +00:00
JustAnotherArchivist
0b4abdc43f Fix baseUrl on tweet scrapes 2022-01-05 02:39:54 +00:00
JustAnotherArchivist
267b7d0e32 Rename CLI classmethods 2022-01-05 02:27:09 +00:00
JustAnotherArchivist
acb7f10a4f Cache Twitter tokens on disk from the CLI for reuse between scrapes
Closes #339
2022-01-05 02:20:40 +00:00
JustAnotherArchivist
ca00b480b1 Fix AssertionError on quoted comments
Fixes #340
2022-01-04 01:15:08 +00:00
JustAnotherArchivist
f189ab4241 Prefix all private API names with an underscore
Cf. #328
2022-01-03 17:51:23 +00:00
JustAnotherArchivist
c6e1e33a23 Fix crashing typos 2022-01-03 17:49:55 +00:00
JustAnotherArchivist
a37ea528d3 Refactor Reddit scrapers again to merge RedditPushshiftScraper and RedditScraper
Cf. #328
2022-01-03 17:48:35 +00:00
JustAnotherArchivist
eee06d8593 Refactor Reddit scrapers into a more reasonable code structure
Cf. #328
2021-12-24 04:58:32 +00:00
JustAnotherArchivist
4dd3ee6e47 Refactor Instagram scrapers to get rid of the awkward mode parameter
Cf. #328
2021-12-24 04:50:53 +00:00
JustAnotherArchivist
0336ce13ed Add support for fetching a guest token from the API 2021-12-23 04:26:50 +00:00
JustAnotherArchivist
193d4f80d6 Fix user agent in API headers staying constant 2021-12-23 04:25:23 +00:00
JustAnotherArchivist
e7d35ec1eb Fix date parsing on quoted posts 2021-12-15 16:55:14 +00:00
JustAnotherArchivist
8540045658 Fix typo 2021-12-15 16:36:28 +00:00
JustAnotherArchivist
1f1c1bd8af Fix docstring style 2021-12-14 20:05:51 +00:00
JustAnotherArchivist
7fdc8bcb53 Randomise user agent when the guest token can't be found 2021-12-14 20:04:46 +00:00
JustAnotherArchivist
4b3c6aefe7 Add default values to user and tweet scrapers for a more untuitive usage 2021-12-12 04:57:16 +00:00
JustAnotherArchivist
525cd71225 Retry guest token retrieval
Fixes #325 (hopefully)
2021-12-12 00:10:59 +00:00
JustAnotherArchivist
72abff9e5c Reuse guest tokens across scrapes
Cf. #326
2021-12-11 23:18:42 +00:00
JustAnotherArchivist
bcaa477b3d Update list of scrapers 2021-12-08 08:29:02 +00:00
JustAnotherArchivist
66d4c99f82 Remove dev version notice 2021-12-08 08:25:21 +00:00
JustAnotherArchivist
0ac50f1383 Add README to package metadata 2021-12-08 08:18:25 +00:00
JustAnotherArchivist
c2257ad16e Add Python 3.10 classifier 2021-12-08 08:15:05 +00:00
JustAnotherArchivist
58f654405f Add --citation
Closes #229
2021-12-08 07:51:28 +00:00
JustAnotherArchivist
35fb61a327 Fix crash on dumping scopes which have a variable pointing to a dataclass 2021-11-24 03:39:06 +00:00
JustAnotherArchivist
a6b6f3faaa Throw an error on empty arguments
Fixes #290
2021-10-10 17:43:27 +00:00
JustAnotherArchivist
5e829e2541 Refactor class instantiation to remove the need to repeat 'retries' everywhere 2021-09-30 09:58:10 +00:00
JustAnotherArchivist
d4567da23c Improve list of scrapers on --help output
Don't list all scrapers in the usage line, and provide a sorted readable list instead.
2021-09-30 09:35:17 +00:00