JustAnotherArchivist
|
359cc25cdf
|
Fix crash on entity attribute when scraping suspended users
Fixes #396
|
2022-02-10 04:22:59 +00:00 |
|
JustAnotherArchivist
|
01799a7391
|
Detect when CLI guest token from file has expired
|
2022-02-08 19:38:45 +00:00 |
|
JustAnotherArchivist
|
b0753c34ed
|
Fix forgotten method name changes in 7d939c11
Fixes #393
|
2022-02-08 15:35:49 +00:00 |
|
JustAnotherArchivist
|
7f78fa0bc0
|
Recurse through all tweets encountered, not only ones with a positive replyCount
Fixes #266
|
2022-02-07 18:13:56 +00:00 |
|
JustAnotherArchivist
|
8702a9c7e2
|
Add Reddit submission scraper
Closes #312
|
2022-02-07 04:43:54 +00:00 |
|
JustAnotherArchivist
|
8ac1fd3ea8
|
Refactor Pushshift code to separate the general things from the search
|
2022-02-07 04:43:19 +00:00 |
|
JustAnotherArchivist
|
9235890f9a
|
Fix KeyError crash on attempting to scrape inexistent tweet ID
|
2022-02-07 04:04:21 +00:00 |
|
JustAnotherArchivist
|
7d939c110c
|
Port profile and tweet scrapers to GraphQL API
Fixes #367
|
2022-02-07 03:49:14 +00:00 |
|
JustAnotherArchivist
|
8e95e9a9a7
|
Fix crash on places without a bounding box
Fixes #374
|
2022-02-07 00:38:22 +00:00 |
|
JustAnotherArchivist
|
aa7d7d3dc3
|
Refactor automatic importing in snscrape.modules to something less hacky
Cf. #357
|
2022-02-05 03:22:55 +00:00 |
|
JustAnotherArchivist
|
560c78c5cf
|
Make all optional scraper arguments keyword-only and fix Mastodon argument style to conform with the other scrapers
Cf. #376
|
2022-01-30 00:21:18 +00:00 |
|
JustAnotherArchivist
|
107c3c71c2
|
Remove unnecessary f-strings
Cf. #370
|
2022-01-28 21:22:13 +00:00 |
|
JustAnotherArchivist
|
7f88678253
|
Merge pull request #359 from own3dh2so4/master
Added proxy option to Scraper base
|
2022-01-13 23:08:28 +00:00 |
|
David Garcia Alvarez
|
52e4f9fb69
|
Added proxy option to Scraper base
|
2022-01-13 16:56:00 +01:00 |
|
JustAnotherArchivist
|
eebdfc1c55
|
Refactor username vs ID mess
Closes #354
|
2022-01-12 22:36:26 +00:00 |
|
JustAnotherArchivist
|
e6076353c8
|
Fix user ID being a string instead of an int on the entity
|
2022-01-12 22:35:50 +00:00 |
|
JustAnotherArchivist
|
a32d79fab2
|
Fix crash on certain mblogs that lack the raw_text attribute
|
2022-01-12 22:31:49 +00:00 |
|
JustAnotherArchivist
|
65391297f6
|
Move CLI methods to end of class definition for consistent code style
|
2022-01-12 21:09:38 +00:00 |
|
JustAnotherArchivist
|
deb2659dd6
|
Prefix CLI-related methods with an underscore
Closes #355
|
2022-01-12 21:07:10 +00:00 |
|
JustAnotherArchivist
|
93e62744d7
|
Fix missing timezone info
|
2022-01-07 00:42:09 +00:00 |
|
JustAnotherArchivist
|
3f3632d341
|
Add support for Mastodon profile and toot scrapes
Closes #43
|
2022-01-06 03:25:06 +00:00 |
|
JustAnotherArchivist
|
5070953feb
|
Skip private fields and properties on dataclass-to-JSON conversion
|
2022-01-06 02:08:48 +00:00 |
|
JustAnotherArchivist
|
853848ed5d
|
ScrollDirection is not part of the public API
|
2022-01-05 19:43:19 +00:00 |
|
JustAnotherArchivist
|
0b4abdc43f
|
Fix baseUrl on tweet scrapes
|
2022-01-05 02:39:54 +00:00 |
|
JustAnotherArchivist
|
267b7d0e32
|
Rename CLI classmethods
|
2022-01-05 02:27:09 +00:00 |
|
JustAnotherArchivist
|
acb7f10a4f
|
Cache Twitter tokens on disk from the CLI for reuse between scrapes
Closes #339
|
2022-01-05 02:20:40 +00:00 |
|
JustAnotherArchivist
|
ca00b480b1
|
Fix AssertionError on quoted comments
Fixes #340
|
2022-01-04 01:15:08 +00:00 |
|
JustAnotherArchivist
|
f189ab4241
|
Prefix all private API names with an underscore
Cf. #328
|
2022-01-03 17:51:23 +00:00 |
|
JustAnotherArchivist
|
c6e1e33a23
|
Fix crashing typos
|
2022-01-03 17:49:55 +00:00 |
|
JustAnotherArchivist
|
a37ea528d3
|
Refactor Reddit scrapers again to merge RedditPushshiftScraper and RedditScraper
Cf. #328
|
2022-01-03 17:48:35 +00:00 |
|
JustAnotherArchivist
|
eee06d8593
|
Refactor Reddit scrapers into a more reasonable code structure
Cf. #328
|
2021-12-24 04:58:32 +00:00 |
|
JustAnotherArchivist
|
4dd3ee6e47
|
Refactor Instagram scrapers to get rid of the awkward mode parameter
Cf. #328
|
2021-12-24 04:50:53 +00:00 |
|
JustAnotherArchivist
|
0336ce13ed
|
Add support for fetching a guest token from the API
|
2021-12-23 04:26:50 +00:00 |
|
JustAnotherArchivist
|
193d4f80d6
|
Fix user agent in API headers staying constant
|
2021-12-23 04:25:23 +00:00 |
|
JustAnotherArchivist
|
e7d35ec1eb
|
Fix date parsing on quoted posts
|
2021-12-15 16:55:14 +00:00 |
|
JustAnotherArchivist
|
8540045658
|
Fix typo
|
2021-12-15 16:36:28 +00:00 |
|
JustAnotherArchivist
|
1f1c1bd8af
|
Fix docstring style
|
2021-12-14 20:05:51 +00:00 |
|
JustAnotherArchivist
|
7fdc8bcb53
|
Randomise user agent when the guest token can't be found
|
2021-12-14 20:04:46 +00:00 |
|
JustAnotherArchivist
|
4b3c6aefe7
|
Add default values to user and tweet scrapers for a more untuitive usage
|
2021-12-12 04:57:16 +00:00 |
|
JustAnotherArchivist
|
525cd71225
|
Retry guest token retrieval
Fixes #325 (hopefully)
|
2021-12-12 00:10:59 +00:00 |
|
JustAnotherArchivist
|
72abff9e5c
|
Reuse guest tokens across scrapes
Cf. #326
|
2021-12-11 23:18:42 +00:00 |
|
JustAnotherArchivist
|
bcaa477b3d
|
Update list of scrapers
|
2021-12-08 08:29:02 +00:00 |
|
JustAnotherArchivist
|
66d4c99f82
|
Remove dev version notice
|
2021-12-08 08:25:21 +00:00 |
|
JustAnotherArchivist
|
0ac50f1383
|
Add README to package metadata
|
2021-12-08 08:18:25 +00:00 |
|
JustAnotherArchivist
|
c2257ad16e
|
Add Python 3.10 classifier
|
2021-12-08 08:15:05 +00:00 |
|
JustAnotherArchivist
|
58f654405f
|
Add --citation
Closes #229
|
2021-12-08 07:51:28 +00:00 |
|
JustAnotherArchivist
|
35fb61a327
|
Fix crash on dumping scopes which have a variable pointing to a dataclass
|
2021-11-24 03:39:06 +00:00 |
|
JustAnotherArchivist
|
a6b6f3faaa
|
Throw an error on empty arguments
Fixes #290
|
2021-10-10 17:43:27 +00:00 |
|
JustAnotherArchivist
|
5e829e2541
|
Refactor class instantiation to remove the need to repeat 'retries' everywhere
|
2021-09-30 09:58:10 +00:00 |
|
JustAnotherArchivist
|
d4567da23c
|
Improve list of scrapers on --help output
Don't list all scrapers in the usage line, and provide a sorted readable list instead.
|
2021-09-30 09:35:17 +00:00 |
|