Tristan Lee
|
4e59638e7c
|
added a forwardedUrl attribute to TelegramPost and made forwarded attribute type Channel.
|
2022-03-30 21:33:03 -05:00 |
|
Tristan Lee
|
a7eb54d226
|
implemented Media dataclasses for Telegram, and added variable for extracting a post's view count
|
2022-03-30 21:07:17 -05:00 |
|
Tristan Lee
|
d32c9add8a
|
added capability to scrape multiple videos from a single post
|
2022-03-30 18:13:15 -05:00 |
|
Tristan Lee
|
fb8d73ac95
|
handled case where channel has no profile image
|
2022-03-29 13:15:53 -05:00 |
|
Tristan Lee
|
ed829163a0
|
added capability to extract the number of channel members when the the string in membersDiv has the word 'subscribers' rather than 'members'.
|
2022-03-29 01:12:07 -05:00 |
|
Logan Williams
|
de4ebed81f
|
Fix KeyError caused by retweets without URLs in TwitterProfileScraper
|
2022-02-24 18:08:12 +01:00 |
|
Logan Williams
|
72b26f2373
|
Scrape images, video, and post forwarding information for Telegram channel posts
|
2022-02-24 15:31:02 +01:00 |
|
JustAnotherArchivist
|
77bbb9f61f
|
Remove useless pass
|
2022-02-20 18:54:51 +00:00 |
|
JustAnotherArchivist
|
57a624c618
|
Merge pull request #410 from AccentuSoft/master
Fix Vkontakte-user module crash on users with millions of followers
|
2022-02-18 06:01:35 +00:00 |
|
AccentuSoft
|
b1cfd51121
|
Implementing changes
|
2022-02-17 21:52:15 +02:00 |
|
AccentuSoft
|
ace2c16f54
|
Fix Vkontakte-user module crash on users with millions of followers
|
2022-02-17 15:42:46 +02:00 |
|
JustAnotherArchivist
|
2f9c0457df
|
Convert t.co card URLs to unshortened when possible
|
2022-02-17 01:50:15 +00:00 |
|
JustAnotherArchivist
|
878f2a3c7a
|
Handle cards without descriptions and thumbnails
Fixes #407
|
2022-02-17 01:49:32 +00:00 |
|
JustAnotherArchivist
|
25ee014e29
|
Extract cards
|
2022-02-16 02:59:21 +00:00 |
|
JustAnotherArchivist
|
a192dc6236
|
Handle TweetWithVisibilityResults
Fixes #400
|
2022-02-14 18:08:59 +00:00 |
|
JustAnotherArchivist
|
a7242f340b
|
Remove obsolete TODO
There is no retweetedTweetRef in Twitter's JS.
|
2022-02-14 18:08:29 +00:00 |
|
JustAnotherArchivist
|
359cc25cdf
|
Fix crash on entity attribute when scraping suspended users
Fixes #396
|
2022-02-10 04:22:59 +00:00 |
|
JustAnotherArchivist
|
01799a7391
|
Detect when CLI guest token from file has expired
|
2022-02-08 19:38:45 +00:00 |
|
JustAnotherArchivist
|
b0753c34ed
|
Fix forgotten method name changes in 7d939c11
Fixes #393
|
2022-02-08 15:35:49 +00:00 |
|
JustAnotherArchivist
|
7f78fa0bc0
|
Recurse through all tweets encountered, not only ones with a positive replyCount
Fixes #266
|
2022-02-07 18:13:56 +00:00 |
|
JustAnotherArchivist
|
8702a9c7e2
|
Add Reddit submission scraper
Closes #312
|
2022-02-07 04:43:54 +00:00 |
|
JustAnotherArchivist
|
8ac1fd3ea8
|
Refactor Pushshift code to separate the general things from the search
|
2022-02-07 04:43:19 +00:00 |
|
JustAnotherArchivist
|
9235890f9a
|
Fix KeyError crash on attempting to scrape inexistent tweet ID
|
2022-02-07 04:04:21 +00:00 |
|
JustAnotherArchivist
|
7d939c110c
|
Port profile and tweet scrapers to GraphQL API
Fixes #367
|
2022-02-07 03:49:14 +00:00 |
|
JustAnotherArchivist
|
8e95e9a9a7
|
Fix crash on places without a bounding box
Fixes #374
|
2022-02-07 00:38:22 +00:00 |
|
JustAnotherArchivist
|
aa7d7d3dc3
|
Refactor automatic importing in snscrape.modules to something less hacky
Cf. #357
|
2022-02-05 03:22:55 +00:00 |
|
JustAnotherArchivist
|
560c78c5cf
|
Make all optional scraper arguments keyword-only and fix Mastodon argument style to conform with the other scrapers
Cf. #376
|
2022-01-30 00:21:18 +00:00 |
|
JustAnotherArchivist
|
107c3c71c2
|
Remove unnecessary f-strings
Cf. #370
|
2022-01-28 21:22:13 +00:00 |
|
JustAnotherArchivist
|
7f88678253
|
Merge pull request #359 from own3dh2so4/master
Added proxy option to Scraper base
|
2022-01-13 23:08:28 +00:00 |
|
David Garcia Alvarez
|
52e4f9fb69
|
Added proxy option to Scraper base
|
2022-01-13 16:56:00 +01:00 |
|
JustAnotherArchivist
|
eebdfc1c55
|
Refactor username vs ID mess
Closes #354
|
2022-01-12 22:36:26 +00:00 |
|
JustAnotherArchivist
|
e6076353c8
|
Fix user ID being a string instead of an int on the entity
|
2022-01-12 22:35:50 +00:00 |
|
JustAnotherArchivist
|
a32d79fab2
|
Fix crash on certain mblogs that lack the raw_text attribute
|
2022-01-12 22:31:49 +00:00 |
|
JustAnotherArchivist
|
65391297f6
|
Move CLI methods to end of class definition for consistent code style
|
2022-01-12 21:09:38 +00:00 |
|
JustAnotherArchivist
|
deb2659dd6
|
Prefix CLI-related methods with an underscore
Closes #355
|
2022-01-12 21:07:10 +00:00 |
|
JustAnotherArchivist
|
93e62744d7
|
Fix missing timezone info
|
2022-01-07 00:42:09 +00:00 |
|
JustAnotherArchivist
|
3f3632d341
|
Add support for Mastodon profile and toot scrapes
Closes #43
|
2022-01-06 03:25:06 +00:00 |
|
JustAnotherArchivist
|
5070953feb
|
Skip private fields and properties on dataclass-to-JSON conversion
|
2022-01-06 02:08:48 +00:00 |
|
JustAnotherArchivist
|
853848ed5d
|
ScrollDirection is not part of the public API
|
2022-01-05 19:43:19 +00:00 |
|
JustAnotherArchivist
|
0b4abdc43f
|
Fix baseUrl on tweet scrapes
|
2022-01-05 02:39:54 +00:00 |
|
JustAnotherArchivist
|
267b7d0e32
|
Rename CLI classmethods
|
2022-01-05 02:27:09 +00:00 |
|
JustAnotherArchivist
|
acb7f10a4f
|
Cache Twitter tokens on disk from the CLI for reuse between scrapes
Closes #339
|
2022-01-05 02:20:40 +00:00 |
|
JustAnotherArchivist
|
ca00b480b1
|
Fix AssertionError on quoted comments
Fixes #340
|
2022-01-04 01:15:08 +00:00 |
|
JustAnotherArchivist
|
f189ab4241
|
Prefix all private API names with an underscore
Cf. #328
|
2022-01-03 17:51:23 +00:00 |
|
JustAnotherArchivist
|
c6e1e33a23
|
Fix crashing typos
|
2022-01-03 17:49:55 +00:00 |
|
JustAnotherArchivist
|
a37ea528d3
|
Refactor Reddit scrapers again to merge RedditPushshiftScraper and RedditScraper
Cf. #328
|
2022-01-03 17:48:35 +00:00 |
|
JustAnotherArchivist
|
eee06d8593
|
Refactor Reddit scrapers into a more reasonable code structure
Cf. #328
|
2021-12-24 04:58:32 +00:00 |
|
JustAnotherArchivist
|
4dd3ee6e47
|
Refactor Instagram scrapers to get rid of the awkward mode parameter
Cf. #328
|
2021-12-24 04:50:53 +00:00 |
|
JustAnotherArchivist
|
0336ce13ed
|
Add support for fetching a guest token from the API
|
2021-12-23 04:26:50 +00:00 |
|
JustAnotherArchivist
|
193d4f80d6
|
Fix user agent in API headers staying constant
|
2021-12-23 04:25:23 +00:00 |
|