355 Commits

Author SHA1 Message Date
JustAnotherArchivist
a34f93076a Merge pull request #218 from NoeCampos22/Place_Data
Extract more information on Twitter places
2021-04-20 01:45:22 +00:00
NoeCampos22
8f1c470061 Tweet.place to Place dataclass 2021-04-19 15:13:33 -05:00
NoeCampos22
dbf2a2f689 Get more data from the place
Data like the country, place type and the single place name are now also returned on the JSON.
2021-04-19 12:01:14 -05:00
JustAnotherArchivist
39a34a57ac Handle API endpoints that don't include geolocation data (e.g. twitter-profile scraper)
Fixes #215
2021-04-13 20:15:42 +00:00
JustAnotherArchivist
f44b39705a Fix coordinate extraction from place bounding boxes 2021-04-06 20:53:05 +00:00
JustAnotherArchivist
f64ce217b7 Merge pull request #209 from Lukpier/master
Add tweet location (place full name & geo coordinates) where available
2021-04-06 16:19:33 +00:00
Luca Pierri
cdf87f4b8f Retrieve tweet location 2021-04-06 16:08:34 +00:00
JustAnotherArchivist
47fbc2a84d Add note on features exclusive to the dev version
Cf. #195
2021-02-24 19:39:45 +00:00
JustAnotherArchivist
5cd3b7d7cc Fix crash on rare weird 503 responses from Twitter without content 2021-01-26 22:39:02 +00:00
JustAnotherArchivist
0121fa51c2 Fix crash on users with a broken URL in the profile description 2021-01-26 18:33:34 +00:00
JustAnotherArchivist
892941b609 Fix crash on reposts of hidden profiles 2020-12-13 23:22:17 +00:00
JustAnotherArchivist
e3022628b6 Fix crash on photo reposts 2020-12-13 22:46:28 +00:00
JustAnotherArchivist
fdc33d0dba Include properties in JSON representation
This fixes the lack of the profile URL on Twitter users because it's generated using the username rather than set explicitly as a field.
2020-11-05 05:55:26 +00:00
JustAnotherArchivist
6d6411cc24 Fix KeyError on entity for inexistent Twitter accounts 2020-11-03 23:21:28 +00:00
JustAnotherArchivist
61a1ecffc5 Merge pull request #141 from gitshrl/twitter/split-source-url-label
Split tweet source into URL and label
2020-10-27 18:44:10 +00:00
sahrul
d2dce37fa0 add the original tweet source 2020-10-27 13:21:21 +07:00
sahrul
d65f0434da split source into url and label 2020-10-26 16:46:10 +07:00
JustAnotherArchivist
7499384110 Merge pull request #131 from gitshrl/facebook/fix-group-pagination
Fix pagination error for Facebook group scraper
2020-10-21 15:08:50 +00:00
sahrul
7a0f68b7ec fix pagination for facebook group scraper 2020-10-21 21:30:00 +07:00
JustAnotherArchivist
1a219fd2b6 Merge pull request #129 from gitshrl/facebook/fix-group-scraper
Update base URL for Facebook group scraper
2020-10-21 14:03:59 +00:00
sahrul
6fb98dae12 update base url for facebook group scraper 2020-10-21 19:57:02 +07:00
JustAnotherArchivist
8c2c0fa47a Remove workaround for http://bugs.python.org/issue16308 as snscrape requires 3.8+ now anyway 2020-10-18 20:25:54 +00:00
JustAnotherArchivist
58c8365c33 Add test extra requirements 2020-10-18 20:03:29 +00:00
JustAnotherArchivist
2c11ec38fa Replace requests.models with plain requests
requests.models is all but undocumented, and the three types needed here are all in the requests namespace as well.
2020-10-18 02:35:55 +00:00
JustAnotherArchivist
fe5e23502d collections.deque support and other minor improvements to snscrape._cli._repr 2020-10-18 02:12:09 +00:00
JustAnotherArchivist
644cd1d2fb Add support for various further complicated types to snscrape._cli._repr 2020-10-18 01:42:45 +00:00
JustAnotherArchivist
5ccfab6314 Add .gitignore 2020-10-18 01:14:04 +00:00
JustAnotherArchivist
bf895ea5b1 Minor README cleanup 2020-10-17 21:21:20 +00:00
JustAnotherArchivist
e956e2562b Replace pkg_resources with importlib.metadata 2020-10-17 21:16:45 +00:00
JustAnotherArchivist
defe874bf4 Fix date extraction on VK
Only the most recent posts have the nice timestamp property...
2020-10-17 02:22:15 +00:00
JustAnotherArchivist
3f8935ee4d Fix crash on video reposts 2020-10-17 02:20:40 +00:00
JustAnotherArchivist
cd12500dbf Fix date extraction on quoted posts 2020-10-17 02:13:27 +00:00
JustAnotherArchivist
5dc61d50ac Add support for outlinks, photos, videos, and quoted posts on VK 2020-10-17 00:07:26 +00:00
JustAnotherArchivist
11a82e110a Remove obsolete comment
Cf. f296f9d2
2020-10-16 18:37:51 +00:00
JustAnotherArchivist
16ebe8bf48 Introduce dedicated IntWithGranularity type and deprecate the direct *Granularity fields 2020-10-16 18:20:47 +00:00
JustAnotherArchivist
1bbe25647a Refactor deprecated properties 2020-10-16 18:11:52 +00:00
JustAnotherArchivist
e22b461563 Add Python 3.9 classifier 2020-10-16 01:27:17 +00:00
JustAnotherArchivist
c4a5715e18 Fix Facebook user and community scrapers
Facebook is redirecting the previous user agent to the mobile site; use current Firefox ESR instead.
2020-10-16 01:20:50 +00:00
JustAnotherArchivist
5cb64faa72 Formally deprecate the already deprecated item attributes 2020-10-16 00:55:55 +00:00
JustAnotherArchivist
0f78aa45fc Refactor --format handling to avoid conversion to dict 2020-10-16 00:55:14 +00:00
JustAnotherArchivist
179112a310 Fix --format
Broken by the switch to dataclasses in bd53e729
2020-10-16 00:27:13 +00:00
JustAnotherArchivist
4ce9ed4eb3 Add --progress option that prints a status update every 100 results and at the end
Closes #116
2020-10-16 00:00:43 +00:00
JustAnotherArchivist
11414cb68f Rename cli module to make it clear that it is considered private API 2020-10-15 23:47:07 +00:00
JustAnotherArchivist
bd53e729a0 Replace named tuples with dataclasses and move JSON conversion logic to the base classes
Named tuples were never really adequate for this since the order aspect of them doesn't make sense.
Further, named tuples don't support multiple inheritance. This meant that the objects returned by get_items() were not actually Items, for example. Since Python 3.9, such named tuples cannot be created anymore.

Fixes #111
2020-10-15 23:44:28 +00:00
JustAnotherArchivist
ffd9289edc Reduce the logging level of retryable retrieval errors from WARNING to INFO
There is no real need to report these as WARNINGs as snscrape tries and in most cases manages to recover. Without --verbose, snscrape's output can be confusing (see #76). If the retries fail as well, snscrape will still log that as an ERROR and crash loudly.
2020-10-11 22:29:27 +00:00
JustAnotherArchivist
b1a7b9607f Skip individual Telegram photo/video links 2020-10-07 01:27:26 +00:00
JustAnotherArchivist
119e53d07c Fix Telegram post URL extraction 2020-10-07 01:15:51 +00:00
JustAnotherArchivist
c3e2e12369 Deprecate outlinksss 2020-10-01 22:00:26 +00:00
JustAnotherArchivist
a70b361176 Use more assignment expressions where appropriate 2020-10-01 21:45:25 +00:00
JustAnotherArchivist
8b68f1a8af Fix link previews for pure-image previews
... and any other preview that doesn't have all the things for some reason.
2020-10-01 18:56:55 +00:00