JustAnotherArchivist
0336ce13ed
Add support for fetching a guest token from the API
2021-12-23 04:26:50 +00:00
JustAnotherArchivist
193d4f80d6
Fix user agent in API headers staying constant
2021-12-23 04:25:23 +00:00
JustAnotherArchivist
e7d35ec1eb
Fix date parsing on quoted posts
2021-12-15 16:55:14 +00:00
JustAnotherArchivist
8540045658
Fix typo
2021-12-15 16:36:28 +00:00
JustAnotherArchivist
1f1c1bd8af
Fix docstring style
2021-12-14 20:05:51 +00:00
JustAnotherArchivist
7fdc8bcb53
Randomise user agent when the guest token can't be found
2021-12-14 20:04:46 +00:00
JustAnotherArchivist
4b3c6aefe7
Add default values to user and tweet scrapers for a more untuitive usage
2021-12-12 04:57:16 +00:00
JustAnotherArchivist
525cd71225
Retry guest token retrieval
...
Fixes #325 (hopefully)
2021-12-12 00:10:59 +00:00
JustAnotherArchivist
72abff9e5c
Reuse guest tokens across scrapes
...
Cf. #326
2021-12-11 23:18:42 +00:00
JustAnotherArchivist
bcaa477b3d
Update list of scrapers
2021-12-08 08:29:02 +00:00
JustAnotherArchivist
66d4c99f82
Remove dev version notice
2021-12-08 08:25:21 +00:00
JustAnotherArchivist
0ac50f1383
Add README to package metadata
2021-12-08 08:18:25 +00:00
JustAnotherArchivist
c2257ad16e
Add Python 3.10 classifier
2021-12-08 08:15:05 +00:00
JustAnotherArchivist
58f654405f
Add --citation
...
Closes #229
2021-12-08 07:51:28 +00:00
JustAnotherArchivist
35fb61a327
Fix crash on dumping scopes which have a variable pointing to a dataclass
2021-11-24 03:39:06 +00:00
JustAnotherArchivist
a6b6f3faaa
Throw an error on empty arguments
...
Fixes #290
2021-10-10 17:43:27 +00:00
JustAnotherArchivist
5e829e2541
Refactor class instantiation to remove the need to repeat 'retries' everywhere
2021-09-30 09:58:10 +00:00
JustAnotherArchivist
d4567da23c
Improve list of scrapers on --help output
...
Don't list all scrapers in the usage line, and provide a sorted readable list instead.
2021-09-30 09:35:17 +00:00
JustAnotherArchivist
e5e0da25a0
Remove unused imports
2021-09-30 09:24:18 +00:00
JustAnotherArchivist
821326bcfb
Fix a few f-strings
2021-09-30 09:23:56 +00:00
JustAnotherArchivist
4bf9ef239c
Restructure usage section
2021-09-30 09:18:43 +00:00
JustAnotherArchivist
e382891642
Fix Twitter trends not having a str representation
2021-09-21 21:40:50 +00:00
JustAnotherArchivist
e5f4389464
Add Twitter trend scraper
...
Due to restrictions on Twitter's side, it is not possible to get trends from a custom location as that would require using an account and/or their API.
Closes #206
2021-09-21 21:28:41 +00:00
JustAnotherArchivist
d91f971f51
Refactor user label implementation and add support for bot accounts
...
Closes #281
2021-09-21 19:39:40 +00:00
JustAnotherArchivist
67e8295293
Merge pull request #280 from edsu/master
...
User Labels
2021-09-19 03:35:49 +00:00
JustAnotherArchivist
5fc2562642
Add user label support on entity retrieval
2021-09-19 03:32:35 +00:00
JustAnotherArchivist
2825bd0a73
Remove accidental empty line
2021-09-19 03:31:56 +00:00
Ed Summers
9831f2a4a0
missing ext
...
While doing some long term data collection I found some user objects
that lack the key 'ext'. This would cause an exception unless it's
checked for before trying to dig out results.
2021-09-16 13:31:47 -04:00
Ed Summers
a11eef6b06
User label url
...
Each label also has a URL which is used for learning more about the
label. While there are more label descriptions than label URLs the URLs
do seem to group language variants of the same label. For example
https://help.twitter.com/rules-and-policies/state-affiliated-china is
used for all of the following label descriptions:
* Média affilié à un État, Chine
* China state-affiliated media
* 中国官方媒体
* Çin devletine bağlı medya
* China government official
In some analysis contexts it could be useful to group these together.
2021-09-16 13:04:57 -04:00
Ed Summers
3fb731ade1
User Labels
...
In August of 2020 Twitter started to label the accounts of government
officials and state-affiliated media entities:
https://blog.twitter.com/en_us/topics/product/2020/new-labels-for-government-and-state-affiliated-media-accounts
This information is extremely important for researchers who are studying
the impact of social media on political discourse, especially because it is not
currently available through either Twitter's v1.1 or v2 API endpoints.
The code in this small PR may seem a bit brittle but I've been using it
to collect data with each of the twitter subcommands and it seems to
work reliably. While there are image and page URLs associated with each
label I chose to only collect the text description of the lable since it
should be sufficient for finding the additional information later if
needed.
2021-09-16 08:06:05 -04:00
JustAnotherArchivist
c76f1637ce
Handle 403s from Twitter search
...
Closes #269
2021-08-30 23:29:20 +00:00
JustAnotherArchivist
ed117e8891
Log response status code and redirects
2021-08-29 18:26:00 +00:00
JustAnotherArchivist
f9a3fafb3f
Fix --cursor on twitter-search
2021-08-01 20:59:16 +00:00
JustAnotherArchivist
660b8c7a0a
Retry empty result sets from Twitter as a workaround for random early stops
...
#37
2021-07-18 23:59:52 +00:00
JustAnotherArchivist
0c22608dc7
Extract video view count
...
Also fix the broken ext values sent to Twitter
Closes #246
2021-07-01 17:58:45 +00:00
JustAnotherArchivist
2bb706feda
Dump request and response attributes of RequestExceptions
...
Cf. #243
2021-06-30 21:44:02 +00:00
JustAnotherArchivist
5e6bc4ec50
Fix type of content field (may be None on text-less posts)
2021-05-27 00:33:12 +00:00
JustAnotherArchivist
57d0aaafc1
Remove dirtyUrl which does not appear to be used anymore by Instagram
...
#234
2021-05-27 00:32:03 +00:00
JustAnotherArchivist
157e4d4265
Fix default value of username field
...
#234
2021-05-27 00:29:33 +00:00
JustAnotherArchivist
54588e9c42
Add support for fetching top instead of live/chronological tweets
...
Closes #109
2021-05-23 03:24:30 +00:00
JustAnotherArchivist
9e7274f3d7
Clean up params dict construction
2021-05-23 03:24:11 +00:00
JustAnotherArchivist
ac4e335bdb
Clean up duplicated default values
2021-05-23 03:03:32 +00:00
JustAnotherArchivist
1d255de48d
Add hashtags and cashtags
2021-05-23 02:51:38 +00:00
JustAnotherArchivist
9c1dcd37f9
Add Tweet.{inReplyToTweetId,inReplyToUser}
...
This makes User.displayname optional because the replied-to user is not always present in the user mentions.
2021-05-23 02:44:40 +00:00
JustAnotherArchivist
f8dac183d0
Fix type of User.id
2021-05-23 02:43:53 +00:00
JustAnotherArchivist
45d1fa27de
Add twitter-tweet scraper for retrieving tweets by ID, including scroll and recursion modes
...
Closes #51 , closes #137
2021-05-23 02:12:13 +00:00
JustAnotherArchivist
98b798b0e5
Remove obsolete twitter-thread scraper
...
It was still based on the old, deprecated Twitter UI and broke a long time ago.
Closes #176
2021-05-22 22:37:21 +00:00
JustAnotherArchivist
f18b64e7da
Add support for scraping Twitter users by ID
...
Closes #222
2021-05-22 21:17:14 +00:00
JustAnotherArchivist
460be9d581
Add _type attribute on all JSON objects, remove separate attribute on Twitter media
2021-05-22 18:14:54 +00:00
JustAnotherArchivist
97c8caea48
Set Accept-Language header on API requests to English
2021-04-20 01:50:14 +00:00