JustAnotherArchivist
1bbe25647a
Refactor deprecated properties
2020-10-16 18:11:52 +00:00
JustAnotherArchivist
5cb64faa72
Formally deprecate the already deprecated item attributes
2020-10-16 00:55:55 +00:00
JustAnotherArchivist
bd53e729a0
Replace named tuples with dataclasses and move JSON conversion logic to the base classes
...
Named tuples were never really adequate for this since the order aspect of them doesn't make sense.
Further, named tuples don't support multiple inheritance. This meant that the objects returned by get_items() were not actually Items, for example. Since Python 3.9, such named tuples cannot be created anymore.
Fixes #111
2020-10-15 23:44:28 +00:00
JustAnotherArchivist
a70b361176
Use more assignment expressions where appropriate
2020-10-01 21:45:25 +00:00
JustAnotherArchivist
8cf81e9bfc
Fix twitter-profile scraper
...
The Twitter API returns different data structures there, leading to a variety of errors.
2020-09-25 02:45:07 +00:00
JustAnotherArchivist
d90f06b389
Extract more information on users from Twitter
...
Closes #78
2020-09-24 18:39:32 +00:00
JustAnotherArchivist
c519832755
Clarify twitter-list-posts argument value
2020-09-24 18:37:37 +00:00
JustAnotherArchivist
397a0b988e
Remove Twitter list member scraper
...
It has been broken for a while. Member lists were removed from the old design, and they're behind a login wall on the new design.
2020-09-24 18:34:15 +00:00
JustAnotherArchivist
f1428fa0e0
Fix crash on nested quoted tweets
2020-09-24 02:45:49 +00:00
JustAnotherArchivist
7d2c546ee5
Deprecate hacky fields in Tweet objects
2020-09-24 02:00:45 +00:00
JustAnotherArchivist
2332c30e26
Replace locale-dependent strptime date parsing with email.utils.parsedate_to_datetime
2020-09-24 02:00:21 +00:00
JustAnotherArchivist
b78bf3e642
Fix crash on banner-less profiles and nested descriptionUrls
2020-09-24 01:58:38 +00:00
JustAnotherArchivist
1a09f9b9a3
Extract more information from Twitter
...
Including: reply/retweet/like/quote counts, media (photos, videos, and GIFs), full user object, quoted tweets, mentioned users, rendered content, conversation ID, language, source
2020-09-24 01:45:08 +00:00
JustAnotherArchivist
039b2c6719
Restructure Twitter classes since the 'common' scraper is only used for the old design anymore
2020-09-07 02:38:27 +00:00
JustAnotherArchivist
70a3d9ba3a
Fix infinite loop at the end of profile pages
2020-09-01 04:01:27 +00:00
JustAnotherArchivist
bd619bf4e9
Log and ignore tweets which are not contained in the globalObjects
...
Fixes #61
2020-09-01 03:45:23 +00:00
JustAnotherArchivist
072519f539
Fix pagination on profile pages
2020-09-01 03:23:45 +00:00
JustAnotherArchivist
ba250aabf2
Extract retweeted tweet if present
2020-09-01 03:15:21 +00:00
JustAnotherArchivist
0cc4f0c016
Add support for Twitter profile pages
...
Closes #5
2020-09-01 03:13:49 +00:00
JustAnotherArchivist
1a2e367a87
Cache entities
2020-09-01 02:34:21 +00:00
JustAnotherArchivist
4f24843f89
Extract user ID
2020-09-01 02:26:13 +00:00
JustAnotherArchivist
bfb92a47b9
Move Tweet object generation to TwitterAPIScraper
2020-09-01 02:25:00 +00:00
JustAnotherArchivist
dc5d55004b
Refactor API interaction into something cleaner and more reusable
2020-09-01 01:56:07 +00:00
JustAnotherArchivist
bb83d1d72f
Validate Twitter usernames
...
Closes #55
2020-08-24 19:03:52 +00:00
JustAnotherArchivist
dd25fd0526
Add support for extracting the entity behind a scrape
...
Closes #11
Backwards incompatibility: snscrape.modules.twitter.Account is now called User. However, this was previously only used on the list member scraper, which has been broken for a while since the list member list is no longer publicly accessible.
For compatibility reasons, the CLI does not output the entity by default; the new option --with-entity enables it.
2020-08-24 01:38:27 +00:00
JustAnotherArchivist
924c35f883
Refactor guest token extraction code
2020-08-22 22:59:43 +00:00
JustAnotherArchivist
588ec415ff
Force TwitterThreadScraper to fetch the old design (take 2)
2020-08-12 17:19:42 +00:00
JustAnotherArchivist
966a6ebd8e
Skip promoted tweets/ads
...
Fixes #67
2020-08-11 20:28:35 +00:00
JustAnotherArchivist
4d3d0fe0d7
Update search API parameter values to the ones currently used on Twitter
2020-08-11 20:26:56 +00:00
JustAnotherArchivist
7b967ff82a
Twitter reverted their guest token change ( 90f9598e)
2020-07-08 22:07:18 +00:00
JustAnotherArchivist
90f9598ecc
Adjust to Twitter's new method of handing out guest tokens
...
Fixes #64
2020-06-24 21:22:58 +00:00
JustAnotherArchivist
1459245258
Consistently raise ScraperException on fatal errors
2020-05-30 00:53:49 +00:00
JustAnotherArchivist
722bfd5f7c
Handle Twitter tombstones
...
Fixes #63
2020-05-29 22:12:37 +00:00
JustAnotherArchivist
b6cc3180d9
Force TwitterThreadScraper and TwitterListMembersScraper to fetch the old design
2020-03-04 00:40:49 +00:00
JustAnotherArchivist
613395d1c2
Port TwitterSearchScraper to redesign
...
Fixes #57
2020-03-04 00:40:49 +00:00
JustAnotherArchivist
14e11b28d2
Add support for Twitter lists
...
Closes #46
2019-06-30 14:39:29 +00:00
JustAnotherArchivist
1a07b3b7e8
Add support for Twitter threads
2019-06-30 02:11:46 +00:00
JustAnotherArchivist
757818474d
Add tweet ID and username fields to Tweet items
2019-06-23 11:48:54 +00:00
JustAnotherArchivist
7d1916292c
Twitter: stop recursion based on whether the server returns the same position instead of detecting an empty feed
...
Fixes #37
2019-06-10 14:38:25 +00:00
JustAnotherArchivist
907a003a59
Fix crash when Twitter search produces no results ( fixes #41 )
2019-05-24 11:51:50 +00:00
JustAnotherArchivist
8ada279b57
Add warning if Twitter module gets no results
2019-05-24 11:50:39 +00:00
JustAnotherArchivist
7989af27b5
Handle tweets by temporarily blocked accounts (which show up in the search results but don't have a date or content)
2019-05-21 22:37:43 +00:00
JustAnotherArchivist
32a427dac3
Fix pagination on Twitter ( fixes #40 )
2019-05-18 01:08:00 +00:00
JustAnotherArchivist
64438afc92
Work around tweet URLs that don't have a data-expanded-url attribute ( fixes #38 )
2019-05-16 22:51:22 +00:00
JustAnotherArchivist
9c8bbf051c
Fix order of processing in Twitter module for more useful locals dump output
2019-05-16 22:22:53 +00:00
JustAnotherArchivist
3817aa59d4
Add support for extracting links from tweets (including cards)
...
Both the t.co and the original URLs can be extracted. Note that card links are always t.co since Twitter's HTML does not include the original URL for those.
2019-05-16 16:42:52 +00:00
JustAnotherArchivist
f91979eb32
Add --max-position option to twitter-search scraper as a workaround for pagination stopping early ( #37 )
...
The value needs to be of the format 'TWEET-<seenID>-<newestID>' where <seenID> is the last result that was returned by a previous scrape and <newestID> is the first result returned by the initial scrape.
2019-05-10 17:30:15 +00:00
JustAnotherArchivist
85fff319bc
Disable Twitter's spelling correction
...
src=typd means "this is what was typed in and could be incorrect". src=spxr is "no, I really mean that". src=sprv appears to be an alias of spxr that is no longer used.
2019-05-10 16:43:59 +00:00
JustAnotherArchivist
536fcb3303
Return proper items from scrapers including clean URLs ( fixes #9 and #10 )
2019-04-18 14:44:21 +02:00
JustAnotherArchivist
73bc99596f
Treat Twitter responses without a Content-Type header as invalid ( fixes #21 )
2019-04-18 02:24:35 +02:00