JustAnotherArchivist
7499384110
Merge pull request #131 from gitshrl/facebook/fix-group-pagination
...
Fix pagination error for Facebook group scraper
2020-10-21 15:08:50 +00:00
sahrul
7a0f68b7ec
fix pagination for facebook group scraper
2020-10-21 21:30:00 +07:00
JustAnotherArchivist
1a219fd2b6
Merge pull request #129 from gitshrl/facebook/fix-group-scraper
...
Update base URL for Facebook group scraper
2020-10-21 14:03:59 +00:00
sahrul
6fb98dae12
update base url for facebook group scraper
2020-10-21 19:57:02 +07:00
JustAnotherArchivist
8c2c0fa47a
Remove workaround for http://bugs.python.org/issue16308 as snscrape requires 3.8+ now anyway
2020-10-18 20:25:54 +00:00
JustAnotherArchivist
58c8365c33
Add test extra requirements
2020-10-18 20:03:29 +00:00
JustAnotherArchivist
2c11ec38fa
Replace requests.models with plain requests
...
requests.models is all but undocumented, and the three types needed here are all in the requests namespace as well.
2020-10-18 02:35:55 +00:00
JustAnotherArchivist
fe5e23502d
collections.deque support and other minor improvements to snscrape._cli._repr
2020-10-18 02:12:09 +00:00
JustAnotherArchivist
644cd1d2fb
Add support for various further complicated types to snscrape._cli._repr
2020-10-18 01:42:45 +00:00
JustAnotherArchivist
5ccfab6314
Add .gitignore
2020-10-18 01:14:04 +00:00
JustAnotherArchivist
bf895ea5b1
Minor README cleanup
2020-10-17 21:21:20 +00:00
JustAnotherArchivist
e956e2562b
Replace pkg_resources with importlib.metadata
2020-10-17 21:16:45 +00:00
JustAnotherArchivist
defe874bf4
Fix date extraction on VK
...
Only the most recent posts have the nice timestamp property...
2020-10-17 02:22:15 +00:00
JustAnotherArchivist
3f8935ee4d
Fix crash on video reposts
2020-10-17 02:20:40 +00:00
JustAnotherArchivist
cd12500dbf
Fix date extraction on quoted posts
2020-10-17 02:13:27 +00:00
JustAnotherArchivist
5dc61d50ac
Add support for outlinks, photos, videos, and quoted posts on VK
2020-10-17 00:07:26 +00:00
JustAnotherArchivist
11a82e110a
Remove obsolete comment
...
Cf. f296f9d2
2020-10-16 18:37:51 +00:00
JustAnotherArchivist
16ebe8bf48
Introduce dedicated IntWithGranularity type and deprecate the direct *Granularity fields
2020-10-16 18:20:47 +00:00
JustAnotherArchivist
1bbe25647a
Refactor deprecated properties
2020-10-16 18:11:52 +00:00
JustAnotherArchivist
e22b461563
Add Python 3.9 classifier
2020-10-16 01:27:17 +00:00
JustAnotherArchivist
c4a5715e18
Fix Facebook user and community scrapers
...
Facebook is redirecting the previous user agent to the mobile site; use current Firefox ESR instead.
2020-10-16 01:20:50 +00:00
JustAnotherArchivist
5cb64faa72
Formally deprecate the already deprecated item attributes
2020-10-16 00:55:55 +00:00
JustAnotherArchivist
0f78aa45fc
Refactor --format handling to avoid conversion to dict
2020-10-16 00:55:14 +00:00
JustAnotherArchivist
179112a310
Fix --format
...
Broken by the switch to dataclasses in bd53e729
2020-10-16 00:27:13 +00:00
JustAnotherArchivist
4ce9ed4eb3
Add --progress option that prints a status update every 100 results and at the end
...
Closes #116
2020-10-16 00:00:43 +00:00
JustAnotherArchivist
11414cb68f
Rename cli module to make it clear that it is considered private API
2020-10-15 23:47:07 +00:00
JustAnotherArchivist
bd53e729a0
Replace named tuples with dataclasses and move JSON conversion logic to the base classes
...
Named tuples were never really adequate for this since the order aspect of them doesn't make sense.
Further, named tuples don't support multiple inheritance. This meant that the objects returned by get_items() were not actually Items, for example. Since Python 3.9, such named tuples cannot be created anymore.
Fixes #111
2020-10-15 23:44:28 +00:00
JustAnotherArchivist
ffd9289edc
Reduce the logging level of retryable retrieval errors from WARNING to INFO
...
There is no real need to report these as WARNINGs as snscrape tries and in most cases manages to recover. Without --verbose, snscrape's output can be confusing (see #76 ). If the retries fail as well, snscrape will still log that as an ERROR and crash loudly.
2020-10-11 22:29:27 +00:00
JustAnotherArchivist
b1a7b9607f
Skip individual Telegram photo/video links
2020-10-07 01:27:26 +00:00
JustAnotherArchivist
119e53d07c
Fix Telegram post URL extraction
2020-10-07 01:15:51 +00:00
JustAnotherArchivist
c3e2e12369
Deprecate outlinksss
2020-10-01 22:00:26 +00:00
JustAnotherArchivist
a70b361176
Use more assignment expressions where appropriate
2020-10-01 21:45:25 +00:00
JustAnotherArchivist
8b68f1a8af
Fix link previews for pure-image previews
...
... and any other preview that doesn't have all the things for some reason.
2020-10-01 18:56:55 +00:00
JustAnotherArchivist
c72bf3174f
Use assignment expressions for cleaner code
2020-10-01 18:54:57 +00:00
JustAnotherArchivist
472cef2382
Add support for link previews
2020-10-01 18:51:14 +00:00
JustAnotherArchivist
b1d8475a03
Fix link extraction on Telegram
2020-10-01 18:29:08 +00:00
JustAnotherArchivist
3d3faf80bf
Add python_requires to make it even clearer that 3.8+ is required
2020-09-26 16:32:00 +00:00
JustAnotherArchivist
bbb372284b
Bump Python version in README
2020-09-26 15:56:55 +00:00
JustAnotherArchivist
8cf81e9bfc
Fix twitter-profile scraper
...
The Twitter API returns different data structures there, leading to a variety of errors.
2020-09-25 02:45:07 +00:00
JustAnotherArchivist
d90f06b389
Extract more information on users from Twitter
...
Closes #78
2020-09-24 18:39:32 +00:00
JustAnotherArchivist
c519832755
Clarify twitter-list-posts argument value
2020-09-24 18:37:37 +00:00
JustAnotherArchivist
397a0b988e
Remove Twitter list member scraper
...
It has been broken for a while. Member lists were removed from the old design, and they're behind a login wall on the new design.
2020-09-24 18:34:15 +00:00
JustAnotherArchivist
f1428fa0e0
Fix crash on nested quoted tweets
2020-09-24 02:45:49 +00:00
JustAnotherArchivist
7d2c546ee5
Deprecate hacky fields in Tweet objects
2020-09-24 02:00:45 +00:00
JustAnotherArchivist
2332c30e26
Replace locale-dependent strptime date parsing with email.utils.parsedate_to_datetime
2020-09-24 02:00:21 +00:00
JustAnotherArchivist
b78bf3e642
Fix crash on banner-less profiles and nested descriptionUrls
2020-09-24 01:58:38 +00:00
JustAnotherArchivist
1a09f9b9a3
Extract more information from Twitter
...
Including: reply/retweet/like/quote counts, media (photos, videos, and GIFs), full user object, quoted tweets, mentioned users, rendered content, conversation ID, language, source
2020-09-24 01:45:08 +00:00
JustAnotherArchivist
5ae5ec7bcd
Bump Python version classifier
...
Python 3.8 is required since commit 1a2e367a .
2020-09-23 22:25:38 +00:00
JustAnotherArchivist
c0ff6631aa
Update README
2020-09-22 22:30:08 +00:00
JustAnotherArchivist
ae60a4d0fd
Add Weibo scraper
...
Closes #52
2020-09-13 02:27:35 +00:00