Tristan Lee
40b8d9f267
Merge pull request #7 from bellingcat/more-tg-info
...
More tg info
2022-07-05 08:29:09 -07:00
Tristan Lee
fdc40f7411
Merge pull request #6 from bellingcat/add-vk-user
...
added User dataclass as argument to VKontaktePost dataclass
2022-07-05 08:28:01 -07:00
Tristan Lee
82351800d6
Merge pull request #5 from JustAnotherArchivist/master
...
merge upstream
2022-07-05 08:25:20 -07:00
Tristan Lee
73f10a4f24
fixed edge case where channel with no members fails _get_entity
2022-07-05 10:23:26 -05:00
Tristan Lee
cb429909d0
added User dataclass as argument to VKontaktePost dataclass
2022-07-05 10:21:59 -05:00
JustAnotherArchivist
d72b51953f
Fix missing r prefix on string with regex backslashes
2022-06-24 23:12:50 +00:00
Tristan Lee
056cd6215c
incorporated requested changes from maintainer, removed modifications to VK module
2022-06-23 15:47:18 -05:00
JustAnotherArchivist
d5b406bc1b
Update API parameters to what Twitter currently uses
...
The `count` reduction does not affect anything as Twitter ignores that parameter now. Cf. #481
2022-06-23 19:50:17 +00:00
Tristan Lee
56e4232083
fixed typo
2022-06-23 11:51:13 -05:00
JustAnotherArchivist
50899c01f3
Fix crash on malformed guest token cache file
...
Fixes #494
2022-06-16 17:12:04 +00:00
JustAnotherArchivist
bcad6923c2
Rename Tweet.content to rawContent and User.description to renderedDescription for consistency
...
Closes #479
2022-06-14 00:35:02 +00:00
JustAnotherArchivist
0d361685ff
Fix AttributeError crash on scrapers using the default CLI constructor
...
Introduced by 267b7d0e
Fixes #483
2022-06-01 17:35:38 +00:00
JustAnotherArchivist
530f4fa122
Fix KeyErrors on display_url and expanded_url for certain users with broken profile links
...
Fixes #480
2022-05-29 17:23:43 +00:00
JustAnotherArchivist
dc6bc9bf9d
Refactor how links on Twitter are handled
...
All links in text (tweets, profile descriptions, and profile links) are now represented by TextLink objects, which contain all relevant information: the displayed text (if available), the URL, the short t.co URL, and the indices in the text at which it appears.
Closes #478
2022-05-29 07:16:04 +00:00
JustAnotherArchivist
01cf6a09b3
Fix type of description URL objects
2022-05-29 05:08:23 +00:00
JustAnotherArchivist
ef7c4fad3e
Fix AttributeError for DescriptionURL on from-import
2022-05-29 05:08:23 +00:00
Tristan Lee
65723f10ff
fixed merge
2022-05-25 06:47:47 -05:00
Tristan Lee
07a5f6fd7d
merged master into more-tg-info to update upstream PR
2022-05-25 01:18:48 -05:00
Tristan Lee
0822a9c354
Merge pull request #4 from JustAnotherArchivist/master
...
upstream merge
2022-05-24 23:10:38 -07:00
JustAnotherArchivist
faeffe2603
Merge pull request #474 from GeraniumKF/GeraniumKF-reddit-since-crash
...
Fix crash using --since with Reddit
2022-05-23 23:06:16 +00:00
Geranium
e3bdc02a7c
Reddit: deprecate 'created' property for 'date'
...
This fixes a crash when using --since with the Reddit scraper,
as the CLI code expects items to have a date property.
2022-05-23 23:31:44 +01:00
Tristan Lee
e2d922301e
forgot to save modified twitter.py module
2022-05-09 09:37:36 -05:00
Tristan Lee
b13e62eb5d
Merge branch 'JustAnotherArchivist-master'
2022-05-09 09:35:35 -05:00
Tristan Lee
f38513503d
fixed merge conflicts
2022-05-09 09:35:19 -05:00
Tristan Lee
0a4bd39ca6
Merge pull request #2 from bellingcat/telegram-media
...
Implemented JustAnotherArchivist's requested changes to Telegram scraper from PR
2022-05-09 07:23:39 -07:00
Tristan Lee
c18ca0f047
Merge branch 'master' into telegram-media
2022-05-09 09:21:40 -05:00
Tristan Lee
5648e957d0
improved consistency of code formatting and added _STYLE_MEDIA_URL_PATTERN as variable
2022-04-27 16:41:24 -05:00
Tristan Lee
21f7b620ec
moved forward finding out of tgme_widget_message_text clause, since it wasn't correctly getting the forwarding information in forwarded posts that contained attachments but no text
2022-04-21 18:26:31 -05:00
Tristan Lee
9b3faec980
added additional attributes for hashtags and user mentions, removed redundant outlinks
2022-04-21 18:06:43 -05:00
Tristan Lee
97d38e5cde
added additional termination criteria to Telegram scraper
2022-04-21 09:41:53 -05:00
Tristan Lee
b276c3cc27
fixed issue where some videos and photos weren't being scraped (because they weren't in a post containing a 'tgme_widget_message_text' div
2022-04-17 06:50:43 -05:00
Tristan Lee
1e4e0c278d
fixed issue where Telegram scraper terminated early because some pages didn't have a next page link (added reasonable default)
2022-04-17 04:33:22 -05:00
Tristan Lee
babcddda19
made Telegram scraper not return full channel info for forwarded_from attribute; fixed video edge cases.
2022-04-17 03:55:37 -05:00
JustAnotherArchivist
ed3ea944d1
Fix newsletter issue cards without an issue description
...
Fixes #456
2022-04-16 19:44:36 +00:00
JustAnotherArchivist
e7a6d38a5f
Add support for community_details cards
2022-04-15 20:07:01 +00:00
JustAnotherArchivist
6c50eee31b
Fix proxies not being applied correctly due to missing merge with environment settings
...
Fixes #447
2022-04-15 19:23:54 +00:00
JustAnotherArchivist
5103a33afa
Fix t.co card URL replacement on retweets
...
Fixes #411
2022-04-15 03:18:45 +00:00
JustAnotherArchivist
247bd82d79
Refactor to tweetId variable
2022-04-15 03:14:29 +00:00
JustAnotherArchivist
5fc67f2bcf
Add support for 'message me' cards
2022-04-15 02:52:37 +00:00
JustAnotherArchivist
65e7d8bd24
Fix warning on card URL translation to include the tweet ID
2022-04-15 02:52:03 +00:00
JustAnotherArchivist
3870282a42
Fix broadcast and event card crashes
2022-04-12 20:53:38 +00:00
JustAnotherArchivist
7c0fcdec43
Fix Periscope card crashes
2022-04-12 18:29:51 +00:00
JustAnotherArchivist
9af1f19034
Properly support all card types
...
Fixes #407
2022-04-12 18:11:26 +00:00
JustAnotherArchivist
5fc3c0e290
Fix crash in locals dumping on module-less frames
2022-04-12 18:03:36 +00:00
Tristan Lee
f978954bb3
Merge branch 'JustAnotherArchivist:master' into master
2022-04-03 01:49:28 -05:00
Tristan Lee
2ce014ade4
fixed edge case for videos that have data-link-attr but no href attribute
2022-04-03 01:45:25 -05:00
JustAnotherArchivist
5d156c6a15
Detect and raise error on redirect from GraphQL endpoint to login
...
#165
2022-04-03 02:34:30 +00:00
Tristan Lee
4e59638e7c
added a forwardedUrl attribute to TelegramPost and made forwarded attribute type Channel.
2022-03-30 21:33:03 -05:00
Tristan Lee
a7eb54d226
implemented Media dataclasses for Telegram, and added variable for extracting a post's view count
2022-03-30 21:07:17 -05:00
Tristan Lee
d32c9add8a
added capability to scrape multiple videos from a single post
2022-03-30 18:13:15 -05:00