Commit Graph

409 Commits

Author SHA1 Message Date
JustAnotherArchivist
e47fbe3d1f Bump user agent
Fixes #760
2023-03-14 03:03:50 +00:00
JustAnotherArchivist
99050710d7 Fix AttributeError crashes on resolving user IDs to usernames or vice-versa 2023-03-03 02:25:48 +00:00
JustAnotherArchivist
3f7bb0516d Fix crash due to missing profile timeline on unavailable users (e.g. protected) 2023-03-03 01:32:07 +00:00
JustAnotherArchivist
98b50ff9e9 Separate warnings for empty responses and unavailable users/communities 2023-03-03 01:16:49 +00:00
JustAnotherArchivist
fd75fff202 Fix crash on communities without a description 2023-03-03 00:39:08 +00:00
JustAnotherArchivist
c77d19da5d Fix crash on some deleted tweets in communities 2023-03-03 00:31:30 +00:00
JustAnotherArchivist
945bfbde04 Merge pull request #743 from kelcheone/master
Add Twitter cashtag scraper
2023-03-02 21:24:07 +00:00
KΞVIN KΞLCHΞ
0942beedd6 fix: code style line spacing 2023-03-02 19:08:53 +00:00
KΞVIN KΞLCHΞ
3545837637 fix: code style line spacing 2023-03-02 19:05:16 +00:00
KΞVIN KΞLCHΞ
aa8d93e07c Merge branch 'JustAnotherArchivist:master' into master 2023-03-01 22:49:43 +03:00
kelche
7061ad2eb5 fix: code style 2023-03-01 18:09:34 +03:00
JustAnotherArchivist
03ef3debaf Fix behaviour on SIGPIPE/BrokenPipeError 2023-02-28 20:20:28 +00:00
JustAnotherArchivist
42cb6d8170 Fix crash on quotedRefResult without an actual result
Fixes #740
2023-02-28 20:16:55 +00:00
JustAnotherArchivist
ea7c6786c2 Handle TweetWithVisibilityResults on quoted tweets
Fixes #604
2023-02-28 20:16:07 +00:00
kelche
61dbbba6b1 feat: cashtag func 2023-02-27 22:39:31 +03:00
kelche
d1592177ab feat: cashtag func 2023-02-27 22:35:21 +03:00
JustAnotherArchivist
21cf626803 Update list of scrapers 2023-02-21 22:10:33 +00:00
JustAnotherArchivist
f329b69ed4 Add support for scraping Twitter's user search
#263
2023-02-21 22:07:40 +00:00
JustAnotherArchivist
f109f3fd46 Fix forgotten warning name change (cf. 7327a013) 2023-02-21 21:59:06 +00:00
JustAnotherArchivist
7330e0a9a0 Rename private logger variable 2023-02-21 21:26:00 +00:00
JustAnotherArchivist
4e6956e564 Remove dead code 2023-02-21 21:25:01 +00:00
JustAnotherArchivist
4e70306f99 Deprecate Entity type
There is no meaningful distinction from Items, and it complicates the integration of scrapers for user searches
2023-02-21 21:24:00 +00:00
JustAnotherArchivist
7327a01397 Refactor module-level deprecation code 2023-02-21 21:23:12 +00:00
JustAnotherArchivist
880a0a7f55 Handle TweetUnavailable results
Fixes #433
2023-02-21 20:16:23 +00:00
JustAnotherArchivist
57b126c656 Add support for scraping Twitter Communities
Closes #614
2023-02-21 20:15:57 +00:00
JustAnotherArchivist
82f64a6472 Remove dead code 2023-02-21 06:22:13 +00:00
JustAnotherArchivist
6a6b02cb28 Handle tombstones
Closes #392
Fixes #603
2023-02-21 04:23:47 +00:00
JustAnotherArchivist
3d6cd63a00 Fix more logger typos 2023-02-21 04:23:47 +00:00
JustAnotherArchivist
9a2f1524c2 Remove dead code 2023-02-21 04:23:47 +00:00
JustAnotherArchivist
b5694e01a2 Fix logger typo 2023-02-21 04:23:47 +00:00
JustAnotherArchivist
280b972f22 Fix extraction of tweets behind 'offensive' replies button 2023-02-21 04:23:47 +00:00
JustAnotherArchivist
6ba478657b Merge pull request #733 from mrunderline/fix/telegram_channel_members_count
fix: telegram channel members count
2023-02-20 19:16:03 +00:00
Ali Madihi
71fb33af70 fix: telegram channel members count 2023-02-20 22:14:34 +03:30
JustAnotherArchivist
c65e36a094 Bump GraphQL endpoints 2023-02-19 06:21:40 +00:00
JustAnotherArchivist
206907612d Fix double dump on exceptions with --dump-locals 2023-02-19 05:12:47 +00:00
JustAnotherArchivist
fe5d90b748 Fix tweets behind 'Show more replies' button getting missed
Fixes #572
2023-02-19 03:29:39 +00:00
JustAnotherArchivist
f1cb96b685 Merge pull request #724 from quentinwolf/patch-1
Twitter: change fullUrl to use 'orig' instead of 'large'
2023-02-19 02:55:27 +00:00
JustAnotherArchivist
8709282ba0 Add deprecated properties to JSON
Cf. #611
2023-02-19 02:51:47 +00:00
quentinwolf
0933a30e37 change fullUrl to use 'orig' instead of 'large'
Changing fullUrl from '&name=large' to '&name=orig'  since large is capped at half the resolution of orig which may not be ideal for scraping/archiving.

Large images are  2048px x 1365px
Original images are up to 4096px × 2730px

Alternatively one could add largeUrl as an alternative to download the Large image and utillze fullUrl as above to download the original image for those that do wish to save either versions, but I feel there is no reason for saving the middle-resolution image.
2023-02-13 16:45:44 -07:00
JustAnotherArchivist
d60ce38b6a Make (most) consistency errors in unified cards non-fatal
Fixes #703
2023-02-10 02:39:06 +00:00
JustAnotherArchivist
23ebdd2a3c Fix YAML syntax 2023-02-02 21:03:52 +00:00
JustAnotherArchivist
35c0c32c38 Refine bug report template 2023-02-02 21:02:16 +00:00
JustAnotherArchivist
b515a66b93 Fix crash in recursive tweet scraping
Introduced by 3e297c9a

Fixes #684
2023-01-19 16:18:15 +00:00
JustAnotherArchivist
36e85c54c1 Log response headers for debugging 2023-01-16 03:48:21 +00:00
JustAnotherArchivist
49270f6d3a Fix debug messages for redirects to report the correct status code and redirect location 2023-01-16 03:47:46 +00:00
JustAnotherArchivist
d0fb9ab8a9 Log TLS connection details for debugging 2023-01-16 02:39:05 +00:00
JustAnotherArchivist
5d3f27bc2b Fix title-less BroadcastCard crash 2023-01-15 16:36:04 +00:00
JustAnotherArchivist
b7cb270b6e Fix crash on empty user objects 2023-01-15 12:31:28 +00:00
JustAnotherArchivist
8ad26fc7d1 Switch from setup.py to pyproject.toml 2023-01-13 18:52:03 +00:00
JustAnotherArchivist
1fb5c39168 Add Python 3.11 classifier 2023-01-13 10:12:39 +00:00