Commit Graph

32 Commits

Author SHA1 Message Date
Tristan Lee
cb429909d0 added User dataclass as argument to VKontaktePost dataclass 2022-07-05 10:21:59 -05:00
Tristan Lee
2ce014ade4 fixed edge case for videos that have data-link-attr but no href attribute 2022-04-03 01:45:25 -05:00
AccentuSoft
b1cfd51121 Implementing changes 2022-02-17 21:52:15 +02:00
AccentuSoft
ace2c16f54 Fix Vkontakte-user module crash on users with millions of followers 2022-02-17 15:42:46 +02:00
JustAnotherArchivist
deb2659dd6 Prefix CLI-related methods with an underscore
Closes #355
2022-01-12 21:07:10 +00:00
JustAnotherArchivist
267b7d0e32 Rename CLI classmethods 2022-01-05 02:27:09 +00:00
JustAnotherArchivist
ca00b480b1 Fix AssertionError on quoted comments
Fixes #340
2022-01-04 01:15:08 +00:00
JustAnotherArchivist
f189ab4241 Prefix all private API names with an underscore
Cf. #328
2022-01-03 17:51:23 +00:00
JustAnotherArchivist
e7d35ec1eb Fix date parsing on quoted posts 2021-12-15 16:55:14 +00:00
JustAnotherArchivist
a6b6f3faaa Throw an error on empty arguments
Fixes #290
2021-10-10 17:43:27 +00:00
JustAnotherArchivist
5e829e2541 Refactor class instantiation to remove the need to repeat 'retries' everywhere 2021-09-30 09:58:10 +00:00
JustAnotherArchivist
892941b609 Fix crash on reposts of hidden profiles 2020-12-13 23:22:17 +00:00
JustAnotherArchivist
e3022628b6 Fix crash on photo reposts 2020-12-13 22:46:28 +00:00
JustAnotherArchivist
defe874bf4 Fix date extraction on VK
Only the most recent posts have the nice timestamp property...
2020-10-17 02:22:15 +00:00
JustAnotherArchivist
3f8935ee4d Fix crash on video reposts 2020-10-17 02:20:40 +00:00
JustAnotherArchivist
cd12500dbf Fix date extraction on quoted posts 2020-10-17 02:13:27 +00:00
JustAnotherArchivist
5dc61d50ac Add support for outlinks, photos, videos, and quoted posts on VK 2020-10-17 00:07:26 +00:00
JustAnotherArchivist
11a82e110a Remove obsolete comment
Cf. f296f9d2
2020-10-16 18:37:51 +00:00
JustAnotherArchivist
16ebe8bf48 Introduce dedicated IntWithGranularity type and deprecate the direct *Granularity fields 2020-10-16 18:20:47 +00:00
JustAnotherArchivist
bd53e729a0 Replace named tuples with dataclasses and move JSON conversion logic to the base classes
Named tuples were never really adequate for this since the order aspect of them doesn't make sense.
Further, named tuples don't support multiple inheritance. This meant that the objects returned by get_items() were not actually Items, for example. Since Python 3.9, such named tuples cannot be created anymore.

Fixes #111
2020-10-15 23:44:28 +00:00
JustAnotherArchivist
a70b361176 Use more assignment expressions where appropriate 2020-10-01 21:45:25 +00:00
JustAnotherArchivist
f296f9d21d Refactor post extraction of VK again to work around their weird behaviours
VK doesn't always return posts in chronological order, so that can't be used to filter out duplicates. Instead, remember the last 1k post IDs and filter using that. This should catch the vast majority of duplicates. (Also, duplicates can't only happen in the geoblocking workaround; sometimes, VK also simply returns the same post again for no obvious reason.)
2020-09-12 02:00:50 +00:00
JustAnotherArchivist
8265ffc19e Work around geoblocked posts on VK
To get around the block, try to iterate over post offsets individually instead of in 10-steps. This means we should get every post that isn't blocked as long as there are at least 10 posts between two blocked ones.

Fixes #68
2020-09-12 02:00:26 +00:00
JustAnotherArchivist
f8efe98608 Fix post order on VK: reinsert pinned post at the correct location in the stream 2020-09-12 00:03:29 +00:00
JustAnotherArchivist
07d446fd19 Fix crash in VK scraper 2020-09-10 21:05:03 +00:00
JustAnotherArchivist
1a2e367a87 Cache entities 2020-09-01 02:34:21 +00:00
JustAnotherArchivist
9df4352089 Fix crash on VK pages without an info div 2020-08-24 17:42:33 +00:00
JustAnotherArchivist
dd25fd0526 Add support for extracting the entity behind a scrape
Closes #11

Backwards incompatibility: snscrape.modules.twitter.Account is now called User. However, this was previously only used on the list member scraper, which has been broken for a while since the list member list is no longer publicly accessible.

For compatibility reasons, the CLI does not output the entity by default; the new option --with-entity enables it.
2020-08-24 01:38:27 +00:00
JustAnotherArchivist
1459245258 Consistently raise ScraperException on fatal errors 2020-05-30 00:53:49 +00:00
Jody Leonard
b6772d3778 vkontakte-user: Handle additional un-scrapeable profile case 2019-10-31 16:01:29 -04:00
Jody Leonard
20ea117a2c Fix vkontakte-user pagination 2019-10-30 22:29:49 -04:00
JustAnotherArchivist
78c295f7e0 Add support for VKontakte (fixes #13) 2019-04-18 18:39:21 +02:00