355 Commits

Author SHA1 Message Date
Tristan Lee
40b8d9f267 Merge pull request #7 from bellingcat/more-tg-info
More tg info
2022-07-05 08:29:09 -07:00
Tristan Lee
fdc40f7411 Merge pull request #6 from bellingcat/add-vk-user
added User dataclass as argument to VKontaktePost dataclass
2022-07-05 08:28:01 -07:00
Tristan Lee
82351800d6 Merge pull request #5 from JustAnotherArchivist/master
merge upstream
2022-07-05 08:25:20 -07:00
Tristan Lee
73f10a4f24 fixed edge case where channel with no members fails _get_entity 2022-07-05 10:23:26 -05:00
Tristan Lee
cb429909d0 added User dataclass as argument to VKontaktePost dataclass 2022-07-05 10:21:59 -05:00
JustAnotherArchivist
d72b51953f Fix missing r prefix on string with regex backslashes 2022-06-24 23:12:50 +00:00
Tristan Lee
056cd6215c incorporated requested changes from maintainer, removed modifications to VK module 2022-06-23 15:47:18 -05:00
JustAnotherArchivist
d5b406bc1b Update API parameters to what Twitter currently uses
The `count` reduction does not affect anything as Twitter ignores that parameter now. Cf. #481
2022-06-23 19:50:17 +00:00
Tristan Lee
56e4232083 fixed typo 2022-06-23 11:51:13 -05:00
JustAnotherArchivist
50899c01f3 Fix crash on malformed guest token cache file
Fixes #494
2022-06-16 17:12:04 +00:00
JustAnotherArchivist
bcad6923c2 Rename Tweet.content to rawContent and User.description to renderedDescription for consistency
Closes #479
2022-06-14 00:35:02 +00:00
JustAnotherArchivist
0d361685ff Fix AttributeError crash on scrapers using the default CLI constructor
Introduced by 267b7d0e

Fixes #483
2022-06-01 17:35:38 +00:00
JustAnotherArchivist
530f4fa122 Fix KeyErrors on display_url and expanded_url for certain users with broken profile links
Fixes #480
2022-05-29 17:23:43 +00:00
JustAnotherArchivist
dc6bc9bf9d Refactor how links on Twitter are handled
All links in text (tweets, profile descriptions, and profile links) are now represented by TextLink objects, which contain all relevant information: the displayed text (if available), the URL, the short t.co URL, and the indices in the text at which it appears.

Closes #478
2022-05-29 07:16:04 +00:00
JustAnotherArchivist
01cf6a09b3 Fix type of description URL objects 2022-05-29 05:08:23 +00:00
JustAnotherArchivist
ef7c4fad3e Fix AttributeError for DescriptionURL on from-import 2022-05-29 05:08:23 +00:00
Tristan Lee
65723f10ff fixed merge 2022-05-25 06:47:47 -05:00
Tristan Lee
07a5f6fd7d merged master into more-tg-info to update upstream PR 2022-05-25 01:18:48 -05:00
Tristan Lee
0822a9c354 Merge pull request #4 from JustAnotherArchivist/master
upstream merge
2022-05-24 23:10:38 -07:00
JustAnotherArchivist
faeffe2603 Merge pull request #474 from GeraniumKF/GeraniumKF-reddit-since-crash
Fix crash using --since with Reddit
2022-05-23 23:06:16 +00:00
Geranium
e3bdc02a7c Reddit: deprecate 'created' property for 'date'
This fixes a crash when using --since with the Reddit scraper,
as the CLI code expects items to have a date property.
2022-05-23 23:31:44 +01:00
Tristan Lee
e2d922301e forgot to save modified twitter.py module 2022-05-09 09:37:36 -05:00
Tristan Lee
b13e62eb5d Merge branch 'JustAnotherArchivist-master' 2022-05-09 09:35:35 -05:00
Tristan Lee
f38513503d fixed merge conflicts 2022-05-09 09:35:19 -05:00
Tristan Lee
0a4bd39ca6 Merge pull request #2 from bellingcat/telegram-media
Implemented JustAnotherArchivist's requested changes to Telegram scraper from PR
2022-05-09 07:23:39 -07:00
Tristan Lee
c18ca0f047 Merge branch 'master' into telegram-media 2022-05-09 09:21:40 -05:00
Tristan Lee
5648e957d0 improved consistency of code formatting and added _STYLE_MEDIA_URL_PATTERN as variable 2022-04-27 16:41:24 -05:00
Tristan Lee
21f7b620ec moved forward finding out of tgme_widget_message_text clause, since it wasn't correctly getting the forwarding information in forwarded posts that contained attachments but no text 2022-04-21 18:26:31 -05:00
Tristan Lee
9b3faec980 added additional attributes for hashtags and user mentions, removed redundant outlinks 2022-04-21 18:06:43 -05:00
Tristan Lee
97d38e5cde added additional termination criteria to Telegram scraper 2022-04-21 09:41:53 -05:00
Tristan Lee
b276c3cc27 fixed issue where some videos and photos weren't being scraped (because they weren't in a post containing a 'tgme_widget_message_text' div 2022-04-17 06:50:43 -05:00
Tristan Lee
1e4e0c278d fixed issue where Telegram scraper terminated early because some pages didn't have a next page link (added reasonable default) 2022-04-17 04:33:22 -05:00
Tristan Lee
babcddda19 made Telegram scraper not return full channel info for forwarded_from attribute; fixed video edge cases. 2022-04-17 03:55:37 -05:00
JustAnotherArchivist
ed3ea944d1 Fix newsletter issue cards without an issue description
Fixes #456
2022-04-16 19:44:36 +00:00
JustAnotherArchivist
e7a6d38a5f Add support for community_details cards 2022-04-15 20:07:01 +00:00
JustAnotherArchivist
6c50eee31b Fix proxies not being applied correctly due to missing merge with environment settings
Fixes #447
2022-04-15 19:23:54 +00:00
JustAnotherArchivist
5103a33afa Fix t.co card URL replacement on retweets
Fixes #411
2022-04-15 03:18:45 +00:00
JustAnotherArchivist
247bd82d79 Refactor to tweetId variable 2022-04-15 03:14:29 +00:00
JustAnotherArchivist
5fc67f2bcf Add support for 'message me' cards 2022-04-15 02:52:37 +00:00
JustAnotherArchivist
65e7d8bd24 Fix warning on card URL translation to include the tweet ID 2022-04-15 02:52:03 +00:00
JustAnotherArchivist
3870282a42 Fix broadcast and event card crashes 2022-04-12 20:53:38 +00:00
JustAnotherArchivist
7c0fcdec43 Fix Periscope card crashes 2022-04-12 18:29:51 +00:00
JustAnotherArchivist
9af1f19034 Properly support all card types
Fixes #407
2022-04-12 18:11:26 +00:00
JustAnotherArchivist
5fc3c0e290 Fix crash in locals dumping on module-less frames 2022-04-12 18:03:36 +00:00
Tristan Lee
f978954bb3 Merge branch 'JustAnotherArchivist:master' into master 2022-04-03 01:49:28 -05:00
Tristan Lee
2ce014ade4 fixed edge case for videos that have data-link-attr but no href attribute 2022-04-03 01:45:25 -05:00
JustAnotherArchivist
5d156c6a15 Detect and raise error on redirect from GraphQL endpoint to login
#165
2022-04-03 02:34:30 +00:00
Tristan Lee
4e59638e7c added a forwardedUrl attribute to TelegramPost and made forwarded attribute type Channel. 2022-03-30 21:33:03 -05:00
Tristan Lee
a7eb54d226 implemented Media dataclasses for Telegram, and added variable for extracting a post's view count 2022-03-30 21:07:17 -05:00
Tristan Lee
d32c9add8a added capability to scrape multiple videos from a single post 2022-03-30 18:13:15 -05:00