Commit Graph

231 Commits

Author SHA1 Message Date
JustAnotherArchivist
67e8295293 Merge pull request #280 from edsu/master
User Labels
2021-09-19 03:35:49 +00:00
JustAnotherArchivist
5fc2562642 Add user label support on entity retrieval 2021-09-19 03:32:35 +00:00
JustAnotherArchivist
2825bd0a73 Remove accidental empty line 2021-09-19 03:31:56 +00:00
Ed Summers
9831f2a4a0 missing ext
While doing some long term data collection I found some user objects
that lack the key 'ext'. This would cause an exception unless it's
checked for before trying to dig out results.
2021-09-16 13:31:47 -04:00
Ed Summers
a11eef6b06 User label url
Each label also has a URL which is used for learning more about the
label. While there are more label descriptions than label URLs the URLs
do seem to group language variants of the same label. For example
https://help.twitter.com/rules-and-policies/state-affiliated-china is
used for all of the following label descriptions:

* Média affilié à un État, Chine
* China state-affiliated media
* 中国官方媒体
* Çin devletine bağlı medya
* China government official

In some analysis contexts it could be useful to group these together.
2021-09-16 13:04:57 -04:00
Ed Summers
3fb731ade1 User Labels
In August of 2020 Twitter started to label the accounts of government
officials and state-affiliated media entities:

https://blog.twitter.com/en_us/topics/product/2020/new-labels-for-government-and-state-affiliated-media-accounts

This information is extremely important for researchers who are studying
the impact of social media on political discourse, especially because it is not
currently available through either Twitter's v1.1 or v2 API endpoints.

The code in this small PR may seem a bit brittle but I've been using it
to collect data with each of the twitter subcommands and it seems to
work reliably. While there are image and page URLs associated with each
label I chose to only collect the text description of the lable since it
should be sufficient for finding the additional information later if
needed.
2021-09-16 08:06:05 -04:00
JustAnotherArchivist
c76f1637ce Handle 403s from Twitter search
Closes #269
2021-08-30 23:29:20 +00:00
JustAnotherArchivist
ed117e8891 Log response status code and redirects 2021-08-29 18:26:00 +00:00
JustAnotherArchivist
f9a3fafb3f Fix --cursor on twitter-search 2021-08-01 20:59:16 +00:00
JustAnotherArchivist
660b8c7a0a Retry empty result sets from Twitter as a workaround for random early stops
#37
2021-07-18 23:59:52 +00:00
JustAnotherArchivist
0c22608dc7 Extract video view count
Also fix the broken ext values sent to Twitter

Closes #246
2021-07-01 17:58:45 +00:00
JustAnotherArchivist
2bb706feda Dump request and response attributes of RequestExceptions
Cf. #243
2021-06-30 21:44:02 +00:00
JustAnotherArchivist
5e6bc4ec50 Fix type of content field (may be None on text-less posts) 2021-05-27 00:33:12 +00:00
JustAnotherArchivist
57d0aaafc1 Remove dirtyUrl which does not appear to be used anymore by Instagram
#234
2021-05-27 00:32:03 +00:00
JustAnotherArchivist
157e4d4265 Fix default value of username field
#234
2021-05-27 00:29:33 +00:00
JustAnotherArchivist
54588e9c42 Add support for fetching top instead of live/chronological tweets
Closes #109
2021-05-23 03:24:30 +00:00
JustAnotherArchivist
9e7274f3d7 Clean up params dict construction 2021-05-23 03:24:11 +00:00
JustAnotherArchivist
ac4e335bdb Clean up duplicated default values 2021-05-23 03:03:32 +00:00
JustAnotherArchivist
1d255de48d Add hashtags and cashtags 2021-05-23 02:51:38 +00:00
JustAnotherArchivist
9c1dcd37f9 Add Tweet.{inReplyToTweetId,inReplyToUser}
This makes User.displayname optional because the replied-to user is not always present in the user mentions.
2021-05-23 02:44:40 +00:00
JustAnotherArchivist
f8dac183d0 Fix type of User.id 2021-05-23 02:43:53 +00:00
JustAnotherArchivist
45d1fa27de Add twitter-tweet scraper for retrieving tweets by ID, including scroll and recursion modes
Closes #51, closes #137
2021-05-23 02:12:13 +00:00
JustAnotherArchivist
98b798b0e5 Remove obsolete twitter-thread scraper
It was still based on the old, deprecated Twitter UI and broke a long time ago.

Closes #176
2021-05-22 22:37:21 +00:00
JustAnotherArchivist
f18b64e7da Add support for scraping Twitter users by ID
Closes #222
2021-05-22 21:17:14 +00:00
JustAnotherArchivist
460be9d581 Add _type attribute on all JSON objects, remove separate attribute on Twitter media 2021-05-22 18:14:54 +00:00
JustAnotherArchivist
97c8caea48 Set Accept-Language header on API requests to English 2021-04-20 01:50:14 +00:00
JustAnotherArchivist
a34f93076a Merge pull request #218 from NoeCampos22/Place_Data
Extract more information on Twitter places
2021-04-20 01:45:22 +00:00
NoeCampos22
8f1c470061 Tweet.place to Place dataclass 2021-04-19 15:13:33 -05:00
NoeCampos22
dbf2a2f689 Get more data from the place
Data like the country, place type and the single place name are now also returned on the JSON.
2021-04-19 12:01:14 -05:00
JustAnotherArchivist
39a34a57ac Handle API endpoints that don't include geolocation data (e.g. twitter-profile scraper)
Fixes #215
2021-04-13 20:15:42 +00:00
JustAnotherArchivist
f44b39705a Fix coordinate extraction from place bounding boxes 2021-04-06 20:53:05 +00:00
JustAnotherArchivist
f64ce217b7 Merge pull request #209 from Lukpier/master
Add tweet location (place full name & geo coordinates) where available
2021-04-06 16:19:33 +00:00
Luca Pierri
cdf87f4b8f Retrieve tweet location 2021-04-06 16:08:34 +00:00
JustAnotherArchivist
47fbc2a84d Add note on features exclusive to the dev version
Cf. #195
2021-02-24 19:39:45 +00:00
JustAnotherArchivist
5cd3b7d7cc Fix crash on rare weird 503 responses from Twitter without content 2021-01-26 22:39:02 +00:00
JustAnotherArchivist
0121fa51c2 Fix crash on users with a broken URL in the profile description 2021-01-26 18:33:34 +00:00
JustAnotherArchivist
892941b609 Fix crash on reposts of hidden profiles 2020-12-13 23:22:17 +00:00
JustAnotherArchivist
e3022628b6 Fix crash on photo reposts 2020-12-13 22:46:28 +00:00
JustAnotherArchivist
fdc33d0dba Include properties in JSON representation
This fixes the lack of the profile URL on Twitter users because it's generated using the username rather than set explicitly as a field.
2020-11-05 05:55:26 +00:00
JustAnotherArchivist
6d6411cc24 Fix KeyError on entity for inexistent Twitter accounts 2020-11-03 23:21:28 +00:00
JustAnotherArchivist
61a1ecffc5 Merge pull request #141 from gitshrl/twitter/split-source-url-label
Split tweet source into URL and label
2020-10-27 18:44:10 +00:00
sahrul
d2dce37fa0 add the original tweet source 2020-10-27 13:21:21 +07:00
sahrul
d65f0434da split source into url and label 2020-10-26 16:46:10 +07:00
JustAnotherArchivist
7499384110 Merge pull request #131 from gitshrl/facebook/fix-group-pagination
Fix pagination error for Facebook group scraper
2020-10-21 15:08:50 +00:00
sahrul
7a0f68b7ec fix pagination for facebook group scraper 2020-10-21 21:30:00 +07:00
JustAnotherArchivist
1a219fd2b6 Merge pull request #129 from gitshrl/facebook/fix-group-scraper
Update base URL for Facebook group scraper
2020-10-21 14:03:59 +00:00
sahrul
6fb98dae12 update base url for facebook group scraper 2020-10-21 19:57:02 +07:00
JustAnotherArchivist
8c2c0fa47a Remove workaround for http://bugs.python.org/issue16308 as snscrape requires 3.8+ now anyway 2020-10-18 20:25:54 +00:00
JustAnotherArchivist
58c8365c33 Add test extra requirements 2020-10-18 20:03:29 +00:00
JustAnotherArchivist
2c11ec38fa Replace requests.models with plain requests
requests.models is all but undocumented, and the three types needed here are all in the requests namespace as well.
2020-10-18 02:35:55 +00:00