Commit Graph

213 Commits

Author SHA1 Message Date
JustAnotherArchivist
1d255de48d Add hashtags and cashtags 2021-05-23 02:51:38 +00:00
JustAnotherArchivist
9c1dcd37f9 Add Tweet.{inReplyToTweetId,inReplyToUser}
This makes User.displayname optional because the replied-to user is not always present in the user mentions.
2021-05-23 02:44:40 +00:00
JustAnotherArchivist
f8dac183d0 Fix type of User.id 2021-05-23 02:43:53 +00:00
JustAnotherArchivist
45d1fa27de Add twitter-tweet scraper for retrieving tweets by ID, including scroll and recursion modes
Closes #51, closes #137
2021-05-23 02:12:13 +00:00
JustAnotherArchivist
98b798b0e5 Remove obsolete twitter-thread scraper
It was still based on the old, deprecated Twitter UI and broke a long time ago.

Closes #176
2021-05-22 22:37:21 +00:00
JustAnotherArchivist
f18b64e7da Add support for scraping Twitter users by ID
Closes #222
2021-05-22 21:17:14 +00:00
JustAnotherArchivist
460be9d581 Add _type attribute on all JSON objects, remove separate attribute on Twitter media 2021-05-22 18:14:54 +00:00
JustAnotherArchivist
97c8caea48 Set Accept-Language header on API requests to English 2021-04-20 01:50:14 +00:00
JustAnotherArchivist
a34f93076a Merge pull request #218 from NoeCampos22/Place_Data
Extract more information on Twitter places
2021-04-20 01:45:22 +00:00
NoeCampos22
8f1c470061 Tweet.place to Place dataclass 2021-04-19 15:13:33 -05:00
NoeCampos22
dbf2a2f689 Get more data from the place
Data like the country, place type and the single place name are now also returned on the JSON.
2021-04-19 12:01:14 -05:00
JustAnotherArchivist
39a34a57ac Handle API endpoints that don't include geolocation data (e.g. twitter-profile scraper)
Fixes #215
2021-04-13 20:15:42 +00:00
JustAnotherArchivist
f44b39705a Fix coordinate extraction from place bounding boxes 2021-04-06 20:53:05 +00:00
JustAnotherArchivist
f64ce217b7 Merge pull request #209 from Lukpier/master
Add tweet location (place full name & geo coordinates) where available
2021-04-06 16:19:33 +00:00
Luca Pierri
cdf87f4b8f Retrieve tweet location 2021-04-06 16:08:34 +00:00
JustAnotherArchivist
47fbc2a84d Add note on features exclusive to the dev version
Cf. #195
2021-02-24 19:39:45 +00:00
JustAnotherArchivist
5cd3b7d7cc Fix crash on rare weird 503 responses from Twitter without content 2021-01-26 22:39:02 +00:00
JustAnotherArchivist
0121fa51c2 Fix crash on users with a broken URL in the profile description 2021-01-26 18:33:34 +00:00
JustAnotherArchivist
892941b609 Fix crash on reposts of hidden profiles 2020-12-13 23:22:17 +00:00
JustAnotherArchivist
e3022628b6 Fix crash on photo reposts 2020-12-13 22:46:28 +00:00
JustAnotherArchivist
fdc33d0dba Include properties in JSON representation
This fixes the lack of the profile URL on Twitter users because it's generated using the username rather than set explicitly as a field.
2020-11-05 05:55:26 +00:00
JustAnotherArchivist
6d6411cc24 Fix KeyError on entity for inexistent Twitter accounts 2020-11-03 23:21:28 +00:00
JustAnotherArchivist
61a1ecffc5 Merge pull request #141 from gitshrl/twitter/split-source-url-label
Split tweet source into URL and label
2020-10-27 18:44:10 +00:00
sahrul
d2dce37fa0 add the original tweet source 2020-10-27 13:21:21 +07:00
sahrul
d65f0434da split source into url and label 2020-10-26 16:46:10 +07:00
JustAnotherArchivist
7499384110 Merge pull request #131 from gitshrl/facebook/fix-group-pagination
Fix pagination error for Facebook group scraper
2020-10-21 15:08:50 +00:00
sahrul
7a0f68b7ec fix pagination for facebook group scraper 2020-10-21 21:30:00 +07:00
JustAnotherArchivist
1a219fd2b6 Merge pull request #129 from gitshrl/facebook/fix-group-scraper
Update base URL for Facebook group scraper
2020-10-21 14:03:59 +00:00
sahrul
6fb98dae12 update base url for facebook group scraper 2020-10-21 19:57:02 +07:00
JustAnotherArchivist
8c2c0fa47a Remove workaround for http://bugs.python.org/issue16308 as snscrape requires 3.8+ now anyway 2020-10-18 20:25:54 +00:00
JustAnotherArchivist
58c8365c33 Add test extra requirements 2020-10-18 20:03:29 +00:00
JustAnotherArchivist
2c11ec38fa Replace requests.models with plain requests
requests.models is all but undocumented, and the three types needed here are all in the requests namespace as well.
2020-10-18 02:35:55 +00:00
JustAnotherArchivist
fe5e23502d collections.deque support and other minor improvements to snscrape._cli._repr 2020-10-18 02:12:09 +00:00
JustAnotherArchivist
644cd1d2fb Add support for various further complicated types to snscrape._cli._repr 2020-10-18 01:42:45 +00:00
JustAnotherArchivist
5ccfab6314 Add .gitignore 2020-10-18 01:14:04 +00:00
JustAnotherArchivist
bf895ea5b1 Minor README cleanup 2020-10-17 21:21:20 +00:00
JustAnotherArchivist
e956e2562b Replace pkg_resources with importlib.metadata 2020-10-17 21:16:45 +00:00
JustAnotherArchivist
defe874bf4 Fix date extraction on VK
Only the most recent posts have the nice timestamp property...
2020-10-17 02:22:15 +00:00
JustAnotherArchivist
3f8935ee4d Fix crash on video reposts 2020-10-17 02:20:40 +00:00
JustAnotherArchivist
cd12500dbf Fix date extraction on quoted posts 2020-10-17 02:13:27 +00:00
JustAnotherArchivist
5dc61d50ac Add support for outlinks, photos, videos, and quoted posts on VK 2020-10-17 00:07:26 +00:00
JustAnotherArchivist
11a82e110a Remove obsolete comment
Cf. f296f9d2
2020-10-16 18:37:51 +00:00
JustAnotherArchivist
16ebe8bf48 Introduce dedicated IntWithGranularity type and deprecate the direct *Granularity fields 2020-10-16 18:20:47 +00:00
JustAnotherArchivist
1bbe25647a Refactor deprecated properties 2020-10-16 18:11:52 +00:00
JustAnotherArchivist
e22b461563 Add Python 3.9 classifier 2020-10-16 01:27:17 +00:00
JustAnotherArchivist
c4a5715e18 Fix Facebook user and community scrapers
Facebook is redirecting the previous user agent to the mobile site; use current Firefox ESR instead.
2020-10-16 01:20:50 +00:00
JustAnotherArchivist
5cb64faa72 Formally deprecate the already deprecated item attributes 2020-10-16 00:55:55 +00:00
JustAnotherArchivist
0f78aa45fc Refactor --format handling to avoid conversion to dict 2020-10-16 00:55:14 +00:00
JustAnotherArchivist
179112a310 Fix --format
Broken by the switch to dataclasses in bd53e729
2020-10-16 00:27:13 +00:00
JustAnotherArchivist
4ce9ed4eb3 Add --progress option that prints a status update every 100 results and at the end
Closes #116
2020-10-16 00:00:43 +00:00