Commit Graph

223 Commits

Author SHA1 Message Date
JustAnotherArchivist
f9a3fafb3f Fix --cursor on twitter-search 2021-08-01 20:59:16 +00:00
JustAnotherArchivist
660b8c7a0a Retry empty result sets from Twitter as a workaround for random early stops
#37
2021-07-18 23:59:52 +00:00
JustAnotherArchivist
0c22608dc7 Extract video view count
Also fix the broken ext values sent to Twitter

Closes #246
2021-07-01 17:58:45 +00:00
JustAnotherArchivist
2bb706feda Dump request and response attributes of RequestExceptions
Cf. #243
2021-06-30 21:44:02 +00:00
JustAnotherArchivist
5e6bc4ec50 Fix type of content field (may be None on text-less posts) 2021-05-27 00:33:12 +00:00
JustAnotherArchivist
57d0aaafc1 Remove dirtyUrl which does not appear to be used anymore by Instagram
#234
2021-05-27 00:32:03 +00:00
JustAnotherArchivist
157e4d4265 Fix default value of username field
#234
2021-05-27 00:29:33 +00:00
JustAnotherArchivist
54588e9c42 Add support for fetching top instead of live/chronological tweets
Closes #109
2021-05-23 03:24:30 +00:00
JustAnotherArchivist
9e7274f3d7 Clean up params dict construction 2021-05-23 03:24:11 +00:00
JustAnotherArchivist
ac4e335bdb Clean up duplicated default values 2021-05-23 03:03:32 +00:00
JustAnotherArchivist
1d255de48d Add hashtags and cashtags 2021-05-23 02:51:38 +00:00
JustAnotherArchivist
9c1dcd37f9 Add Tweet.{inReplyToTweetId,inReplyToUser}
This makes User.displayname optional because the replied-to user is not always present in the user mentions.
2021-05-23 02:44:40 +00:00
JustAnotherArchivist
f8dac183d0 Fix type of User.id 2021-05-23 02:43:53 +00:00
JustAnotherArchivist
45d1fa27de Add twitter-tweet scraper for retrieving tweets by ID, including scroll and recursion modes
Closes #51, closes #137
2021-05-23 02:12:13 +00:00
JustAnotherArchivist
98b798b0e5 Remove obsolete twitter-thread scraper
It was still based on the old, deprecated Twitter UI and broke a long time ago.

Closes #176
2021-05-22 22:37:21 +00:00
JustAnotherArchivist
f18b64e7da Add support for scraping Twitter users by ID
Closes #222
2021-05-22 21:17:14 +00:00
JustAnotherArchivist
460be9d581 Add _type attribute on all JSON objects, remove separate attribute on Twitter media 2021-05-22 18:14:54 +00:00
JustAnotherArchivist
97c8caea48 Set Accept-Language header on API requests to English 2021-04-20 01:50:14 +00:00
JustAnotherArchivist
a34f93076a Merge pull request #218 from NoeCampos22/Place_Data
Extract more information on Twitter places
2021-04-20 01:45:22 +00:00
NoeCampos22
8f1c470061 Tweet.place to Place dataclass 2021-04-19 15:13:33 -05:00
NoeCampos22
dbf2a2f689 Get more data from the place
Data like the country, place type and the single place name are now also returned on the JSON.
2021-04-19 12:01:14 -05:00
JustAnotherArchivist
39a34a57ac Handle API endpoints that don't include geolocation data (e.g. twitter-profile scraper)
Fixes #215
2021-04-13 20:15:42 +00:00
JustAnotherArchivist
f44b39705a Fix coordinate extraction from place bounding boxes 2021-04-06 20:53:05 +00:00
JustAnotherArchivist
f64ce217b7 Merge pull request #209 from Lukpier/master
Add tweet location (place full name & geo coordinates) where available
2021-04-06 16:19:33 +00:00
Luca Pierri
cdf87f4b8f Retrieve tweet location 2021-04-06 16:08:34 +00:00
JustAnotherArchivist
47fbc2a84d Add note on features exclusive to the dev version
Cf. #195
2021-02-24 19:39:45 +00:00
JustAnotherArchivist
5cd3b7d7cc Fix crash on rare weird 503 responses from Twitter without content 2021-01-26 22:39:02 +00:00
JustAnotherArchivist
0121fa51c2 Fix crash on users with a broken URL in the profile description 2021-01-26 18:33:34 +00:00
JustAnotherArchivist
892941b609 Fix crash on reposts of hidden profiles 2020-12-13 23:22:17 +00:00
JustAnotherArchivist
e3022628b6 Fix crash on photo reposts 2020-12-13 22:46:28 +00:00
JustAnotherArchivist
fdc33d0dba Include properties in JSON representation
This fixes the lack of the profile URL on Twitter users because it's generated using the username rather than set explicitly as a field.
2020-11-05 05:55:26 +00:00
JustAnotherArchivist
6d6411cc24 Fix KeyError on entity for inexistent Twitter accounts 2020-11-03 23:21:28 +00:00
JustAnotherArchivist
61a1ecffc5 Merge pull request #141 from gitshrl/twitter/split-source-url-label
Split tweet source into URL and label
2020-10-27 18:44:10 +00:00
sahrul
d2dce37fa0 add the original tweet source 2020-10-27 13:21:21 +07:00
sahrul
d65f0434da split source into url and label 2020-10-26 16:46:10 +07:00
JustAnotherArchivist
7499384110 Merge pull request #131 from gitshrl/facebook/fix-group-pagination
Fix pagination error for Facebook group scraper
2020-10-21 15:08:50 +00:00
sahrul
7a0f68b7ec fix pagination for facebook group scraper 2020-10-21 21:30:00 +07:00
JustAnotherArchivist
1a219fd2b6 Merge pull request #129 from gitshrl/facebook/fix-group-scraper
Update base URL for Facebook group scraper
2020-10-21 14:03:59 +00:00
sahrul
6fb98dae12 update base url for facebook group scraper 2020-10-21 19:57:02 +07:00
JustAnotherArchivist
8c2c0fa47a Remove workaround for http://bugs.python.org/issue16308 as snscrape requires 3.8+ now anyway 2020-10-18 20:25:54 +00:00
JustAnotherArchivist
58c8365c33 Add test extra requirements 2020-10-18 20:03:29 +00:00
JustAnotherArchivist
2c11ec38fa Replace requests.models with plain requests
requests.models is all but undocumented, and the three types needed here are all in the requests namespace as well.
2020-10-18 02:35:55 +00:00
JustAnotherArchivist
fe5e23502d collections.deque support and other minor improvements to snscrape._cli._repr 2020-10-18 02:12:09 +00:00
JustAnotherArchivist
644cd1d2fb Add support for various further complicated types to snscrape._cli._repr 2020-10-18 01:42:45 +00:00
JustAnotherArchivist
5ccfab6314 Add .gitignore 2020-10-18 01:14:04 +00:00
JustAnotherArchivist
bf895ea5b1 Minor README cleanup 2020-10-17 21:21:20 +00:00
JustAnotherArchivist
e956e2562b Replace pkg_resources with importlib.metadata 2020-10-17 21:16:45 +00:00
JustAnotherArchivist
defe874bf4 Fix date extraction on VK
Only the most recent posts have the nice timestamp property...
2020-10-17 02:22:15 +00:00
JustAnotherArchivist
3f8935ee4d Fix crash on video reposts 2020-10-17 02:20:40 +00:00
JustAnotherArchivist
cd12500dbf Fix date extraction on quoted posts 2020-10-17 02:13:27 +00:00