JustAnotherArchivist
e382891642
Fix Twitter trends not having a str representation
2021-09-21 21:40:50 +00:00
JustAnotherArchivist
e5f4389464
Add Twitter trend scraper
...
Due to restrictions on Twitter's side, it is not possible to get trends from a custom location as that would require using an account and/or their API.
Closes #206
2021-09-21 21:28:41 +00:00
JustAnotherArchivist
d91f971f51
Refactor user label implementation and add support for bot accounts
...
Closes #281
2021-09-21 19:39:40 +00:00
JustAnotherArchivist
67e8295293
Merge pull request #280 from edsu/master
...
User Labels
2021-09-19 03:35:49 +00:00
JustAnotherArchivist
5fc2562642
Add user label support on entity retrieval
2021-09-19 03:32:35 +00:00
JustAnotherArchivist
2825bd0a73
Remove accidental empty line
2021-09-19 03:31:56 +00:00
Ed Summers
9831f2a4a0
missing ext
...
While doing some long term data collection I found some user objects
that lack the key 'ext'. This would cause an exception unless it's
checked for before trying to dig out results.
2021-09-16 13:31:47 -04:00
Ed Summers
a11eef6b06
User label url
...
Each label also has a URL which is used for learning more about the
label. While there are more label descriptions than label URLs the URLs
do seem to group language variants of the same label. For example
https://help.twitter.com/rules-and-policies/state-affiliated-china is
used for all of the following label descriptions:
* Média affilié à un État, Chine
* China state-affiliated media
* 中国官方媒体
* Çin devletine bağlı medya
* China government official
In some analysis contexts it could be useful to group these together.
2021-09-16 13:04:57 -04:00
Ed Summers
3fb731ade1
User Labels
...
In August of 2020 Twitter started to label the accounts of government
officials and state-affiliated media entities:
https://blog.twitter.com/en_us/topics/product/2020/new-labels-for-government-and-state-affiliated-media-accounts
This information is extremely important for researchers who are studying
the impact of social media on political discourse, especially because it is not
currently available through either Twitter's v1.1 or v2 API endpoints.
The code in this small PR may seem a bit brittle but I've been using it
to collect data with each of the twitter subcommands and it seems to
work reliably. While there are image and page URLs associated with each
label I chose to only collect the text description of the lable since it
should be sufficient for finding the additional information later if
needed.
2021-09-16 08:06:05 -04:00
JustAnotherArchivist
c76f1637ce
Handle 403s from Twitter search
...
Closes #269
2021-08-30 23:29:20 +00:00
JustAnotherArchivist
ed117e8891
Log response status code and redirects
2021-08-29 18:26:00 +00:00
JustAnotherArchivist
f9a3fafb3f
Fix --cursor on twitter-search
2021-08-01 20:59:16 +00:00
JustAnotherArchivist
660b8c7a0a
Retry empty result sets from Twitter as a workaround for random early stops
...
#37
2021-07-18 23:59:52 +00:00
JustAnotherArchivist
0c22608dc7
Extract video view count
...
Also fix the broken ext values sent to Twitter
Closes #246
2021-07-01 17:58:45 +00:00
JustAnotherArchivist
2bb706feda
Dump request and response attributes of RequestExceptions
...
Cf. #243
2021-06-30 21:44:02 +00:00
JustAnotherArchivist
5e6bc4ec50
Fix type of content field (may be None on text-less posts)
2021-05-27 00:33:12 +00:00
JustAnotherArchivist
57d0aaafc1
Remove dirtyUrl which does not appear to be used anymore by Instagram
...
#234
2021-05-27 00:32:03 +00:00
JustAnotherArchivist
157e4d4265
Fix default value of username field
...
#234
2021-05-27 00:29:33 +00:00
JustAnotherArchivist
54588e9c42
Add support for fetching top instead of live/chronological tweets
...
Closes #109
2021-05-23 03:24:30 +00:00
JustAnotherArchivist
9e7274f3d7
Clean up params dict construction
2021-05-23 03:24:11 +00:00
JustAnotherArchivist
ac4e335bdb
Clean up duplicated default values
2021-05-23 03:03:32 +00:00
JustAnotherArchivist
1d255de48d
Add hashtags and cashtags
2021-05-23 02:51:38 +00:00
JustAnotherArchivist
9c1dcd37f9
Add Tweet.{inReplyToTweetId,inReplyToUser}
...
This makes User.displayname optional because the replied-to user is not always present in the user mentions.
2021-05-23 02:44:40 +00:00
JustAnotherArchivist
f8dac183d0
Fix type of User.id
2021-05-23 02:43:53 +00:00
JustAnotherArchivist
45d1fa27de
Add twitter-tweet scraper for retrieving tweets by ID, including scroll and recursion modes
...
Closes #51 , closes #137
2021-05-23 02:12:13 +00:00
JustAnotherArchivist
98b798b0e5
Remove obsolete twitter-thread scraper
...
It was still based on the old, deprecated Twitter UI and broke a long time ago.
Closes #176
2021-05-22 22:37:21 +00:00
JustAnotherArchivist
f18b64e7da
Add support for scraping Twitter users by ID
...
Closes #222
2021-05-22 21:17:14 +00:00
JustAnotherArchivist
460be9d581
Add _type attribute on all JSON objects, remove separate attribute on Twitter media
2021-05-22 18:14:54 +00:00
JustAnotherArchivist
97c8caea48
Set Accept-Language header on API requests to English
2021-04-20 01:50:14 +00:00
JustAnotherArchivist
a34f93076a
Merge pull request #218 from NoeCampos22/Place_Data
...
Extract more information on Twitter places
2021-04-20 01:45:22 +00:00
NoeCampos22
8f1c470061
Tweet.place to Place dataclass
2021-04-19 15:13:33 -05:00
NoeCampos22
dbf2a2f689
Get more data from the place
...
Data like the country, place type and the single place name are now also returned on the JSON.
2021-04-19 12:01:14 -05:00
JustAnotherArchivist
39a34a57ac
Handle API endpoints that don't include geolocation data (e.g. twitter-profile scraper)
...
Fixes #215
2021-04-13 20:15:42 +00:00
JustAnotherArchivist
f44b39705a
Fix coordinate extraction from place bounding boxes
2021-04-06 20:53:05 +00:00
JustAnotherArchivist
f64ce217b7
Merge pull request #209 from Lukpier/master
...
Add tweet location (place full name & geo coordinates) where available
2021-04-06 16:19:33 +00:00
Luca Pierri
cdf87f4b8f
Retrieve tweet location
2021-04-06 16:08:34 +00:00
JustAnotherArchivist
47fbc2a84d
Add note on features exclusive to the dev version
...
Cf. #195
2021-02-24 19:39:45 +00:00
JustAnotherArchivist
5cd3b7d7cc
Fix crash on rare weird 503 responses from Twitter without content
2021-01-26 22:39:02 +00:00
JustAnotherArchivist
0121fa51c2
Fix crash on users with a broken URL in the profile description
2021-01-26 18:33:34 +00:00
JustAnotherArchivist
892941b609
Fix crash on reposts of hidden profiles
2020-12-13 23:22:17 +00:00
JustAnotherArchivist
e3022628b6
Fix crash on photo reposts
2020-12-13 22:46:28 +00:00
JustAnotherArchivist
fdc33d0dba
Include properties in JSON representation
...
This fixes the lack of the profile URL on Twitter users because it's generated using the username rather than set explicitly as a field.
2020-11-05 05:55:26 +00:00
JustAnotherArchivist
6d6411cc24
Fix KeyError on entity for inexistent Twitter accounts
2020-11-03 23:21:28 +00:00
JustAnotherArchivist
61a1ecffc5
Merge pull request #141 from gitshrl/twitter/split-source-url-label
...
Split tweet source into URL and label
2020-10-27 18:44:10 +00:00
sahrul
d2dce37fa0
add the original tweet source
2020-10-27 13:21:21 +07:00
sahrul
d65f0434da
split source into url and label
2020-10-26 16:46:10 +07:00
JustAnotherArchivist
7499384110
Merge pull request #131 from gitshrl/facebook/fix-group-pagination
...
Fix pagination error for Facebook group scraper
2020-10-21 15:08:50 +00:00
sahrul
7a0f68b7ec
fix pagination for facebook group scraper
2020-10-21 21:30:00 +07:00
JustAnotherArchivist
1a219fd2b6
Merge pull request #129 from gitshrl/facebook/fix-group-scraper
...
Update base URL for Facebook group scraper
2020-10-21 14:03:59 +00:00
sahrul
6fb98dae12
update base url for facebook group scraper
2020-10-21 19:57:02 +07:00