Commit Graph

  • d5b406bc1b Update API parameters to what Twitter currently uses JustAnotherArchivist 2022-06-23 19:50:17 +00:00
  • 56e4232083 fixed typo Tristan Lee 2022-06-23 11:51:13 -05:00
  • 50899c01f3 Fix crash on malformed guest token cache file JustAnotherArchivist 2022-06-16 17:12:04 +00:00
  • bcad6923c2 Rename Tweet.content to rawContent and User.description to renderedDescription for consistency JustAnotherArchivist 2022-06-14 00:35:02 +00:00
  • 0d361685ff Fix AttributeError crash on scrapers using the default CLI constructor JustAnotherArchivist 2022-06-01 17:35:38 +00:00
  • 530f4fa122 Fix KeyErrors on display_url and expanded_url for certain users with broken profile links JustAnotherArchivist 2022-05-29 17:23:38 +00:00
  • dc6bc9bf9d Refactor how links on Twitter are handled JustAnotherArchivist 2022-05-29 07:16:04 +00:00
  • 01cf6a09b3 Fix type of description URL objects JustAnotherArchivist 2022-05-29 05:06:51 +00:00
  • ef7c4fad3e Fix AttributeError for DescriptionURL on from-import JustAnotherArchivist 2022-05-29 05:06:34 +00:00
  • 65723f10ff fixed merge Tristan Lee 2022-05-25 06:47:47 -05:00
  • 07a5f6fd7d merged master into more-tg-info to update upstream PR Tristan Lee 2022-05-25 01:18:48 -05:00
  • 0822a9c354 Merge pull request #4 from JustAnotherArchivist/master Tristan Lee 2022-05-24 23:10:38 -07:00
  • faeffe2603 Merge pull request #474 from GeraniumKF/GeraniumKF-reddit-since-crash JustAnotherArchivist 2022-05-23 23:06:16 +00:00
  • e3bdc02a7c Reddit: deprecate 'created' property for 'date' Geranium 2022-05-23 22:43:33 +01:00
  • e2d922301e forgot to save modified twitter.py module Tristan Lee 2022-05-09 09:37:36 -05:00
  • b13e62eb5d Merge branch 'JustAnotherArchivist-master' Tristan Lee 2022-05-09 09:35:35 -05:00
  • f38513503d fixed merge conflicts Tristan Lee 2022-05-09 09:35:19 -05:00
  • 0a4bd39ca6 Merge pull request #2 from bellingcat/telegram-media Tristan Lee 2022-05-09 07:23:39 -07:00
  • c18ca0f047 Merge branch 'master' into telegram-media telegram-media Tristan Lee 2022-05-09 09:21:40 -05:00
  • 5648e957d0 improved consistency of code formatting and added _STYLE_MEDIA_URL_PATTERN as variable Tristan Lee 2022-04-27 16:41:24 -05:00
  • 21f7b620ec moved forward finding out of tgme_widget_message_text clause, since it wasn't correctly getting the forwarding information in forwarded posts that contained attachments but no text Tristan Lee 2022-04-21 18:26:31 -05:00
  • 9b3faec980 added additional attributes for hashtags and user mentions, removed redundant outlinks Tristan Lee 2022-04-21 18:06:43 -05:00
  • 97d38e5cde added additional termination criteria to Telegram scraper Tristan Lee 2022-04-21 09:41:53 -05:00
  • b276c3cc27 fixed issue where some videos and photos weren't being scraped (because they weren't in a post containing a 'tgme_widget_message_text' div Tristan Lee 2022-04-17 06:50:43 -05:00
  • 1e4e0c278d fixed issue where Telegram scraper terminated early because some pages didn't have a next page link (added reasonable default) Tristan Lee 2022-04-17 04:33:22 -05:00
  • babcddda19 made Telegram scraper not return full channel info for forwarded_from attribute; fixed video edge cases. Tristan Lee 2022-04-17 03:55:37 -05:00
  • ed3ea944d1 Fix newsletter issue cards without an issue description JustAnotherArchivist 2022-04-16 19:44:36 +00:00
  • e7a6d38a5f Add support for community_details cards JustAnotherArchivist 2022-04-15 20:07:01 +00:00
  • 6c50eee31b Fix proxies not being applied correctly due to missing merge with environment settings JustAnotherArchivist 2022-04-15 19:23:54 +00:00
  • 5103a33afa Fix t.co card URL replacement on retweets JustAnotherArchivist 2022-04-15 03:18:45 +00:00
  • 247bd82d79 Refactor to tweetId variable JustAnotherArchivist 2022-04-15 03:14:29 +00:00
  • 5fc67f2bcf Add support for 'message me' cards JustAnotherArchivist 2022-04-15 02:52:37 +00:00
  • 65e7d8bd24 Fix warning on card URL translation to include the tweet ID JustAnotherArchivist 2022-04-15 02:52:03 +00:00
  • 3870282a42 Fix broadcast and event card crashes JustAnotherArchivist 2022-04-12 20:53:38 +00:00
  • 7c0fcdec43 Fix Periscope card crashes JustAnotherArchivist 2022-04-12 18:29:51 +00:00
  • 9af1f19034 Properly support all card types JustAnotherArchivist 2022-04-12 18:11:26 +00:00
  • 5fc3c0e290 Fix crash in locals dumping on module-less frames JustAnotherArchivist 2022-04-12 18:03:36 +00:00
  • f978954bb3 Merge branch 'JustAnotherArchivist:master' into master Tristan Lee 2022-04-03 01:49:28 -05:00
  • 2ce014ade4 fixed edge case for videos that have data-link-attr but no href attribute Tristan Lee 2022-04-03 01:45:25 -05:00
  • 5d156c6a15 Detect and raise error on redirect from GraphQL endpoint to login JustAnotherArchivist 2022-04-03 02:34:30 +00:00
  • 4e59638e7c added a forwardedUrl attribute to TelegramPost and made forwarded attribute type Channel. Tristan Lee 2022-03-30 21:33:03 -05:00
  • a7eb54d226 implemented Media dataclasses for Telegram, and added variable for extracting a post's view count Tristan Lee 2022-03-30 21:07:17 -05:00
  • d32c9add8a added capability to scrape multiple videos from a single post Tristan Lee 2022-03-30 18:13:15 -05:00
  • fb8d73ac95 handled case where channel has no profile image Tristan Lee 2022-03-29 13:15:53 -05:00
  • ed829163a0 added capability to extract the number of channel members when the the string in membersDiv has the word 'subscribers' rather than 'members'. Tristan Lee 2022-03-29 01:12:07 -05:00
  • 694657ef80 Fix broken exception references JustAnotherArchivist 2022-03-09 01:01:47 +00:00
  • b8efce2a12 Clean up unnecessary imports Logan Williams 2022-03-08 15:10:15 +01:00
  • 1ab0f4fccb Fix missing quoted tweet reference in certain buggy cases JustAnotherArchivist 2022-03-07 22:16:58 +00:00
  • 3a92b5bf0d Add log message for guest token file deletion JustAnotherArchivist 2022-02-26 19:32:55 +00:00
  • 2480b173f4 Fix crash on race condition in CLI guest token manager resets JustAnotherArchivist 2022-02-26 19:31:08 +00:00
  • de4ebed81f Fix KeyError caused by retweets without URLs in TwitterProfileScraper Logan Williams 2022-02-24 18:08:12 +01:00
  • 72b26f2373 Scrape images, video, and post forwarding information for Telegram channel posts Logan Williams 2020-10-15 08:14:14 -07:00
  • 77bbb9f61f Remove useless pass JustAnotherArchivist 2022-02-20 18:54:51 +00:00
  • 57a624c618 Merge pull request #410 from AccentuSoft/master JustAnotherArchivist 2022-02-18 06:01:35 +00:00
  • b1cfd51121 Implementing changes AccentuSoft 2022-02-17 21:52:15 +02:00
  • ace2c16f54 Fix Vkontakte-user module crash on users with millions of followers AccentuSoft 2022-02-17 15:42:46 +02:00
  • 2f9c0457df Convert t.co card URLs to unshortened when possible JustAnotherArchivist 2022-02-17 01:50:15 +00:00
  • 878f2a3c7a Handle cards without descriptions and thumbnails JustAnotherArchivist 2022-02-17 01:49:32 +00:00
  • 25ee014e29 Extract cards JustAnotherArchivist 2022-02-16 02:59:21 +00:00
  • a192dc6236 Handle TweetWithVisibilityResults JustAnotherArchivist 2022-02-14 18:08:59 +00:00
  • a7242f340b Remove obsolete TODO JustAnotherArchivist 2022-02-14 18:08:29 +00:00
  • 359cc25cdf Fix crash on entity attribute when scraping suspended users JustAnotherArchivist 2022-02-10 04:22:59 +00:00
  • 01799a7391 Detect when CLI guest token from file has expired JustAnotherArchivist 2022-02-08 19:38:45 +00:00
  • b0753c34ed Fix forgotten method name changes in 7d939c11 JustAnotherArchivist 2022-02-08 15:35:49 +00:00
  • 7f78fa0bc0 Recurse through all tweets encountered, not only ones with a positive replyCount JustAnotherArchivist 2022-02-07 18:13:56 +00:00
  • 8702a9c7e2 Add Reddit submission scraper JustAnotherArchivist 2022-02-07 04:43:54 +00:00
  • 8ac1fd3ea8 Refactor Pushshift code to separate the general things from the search JustAnotherArchivist 2022-02-07 04:43:19 +00:00
  • 9235890f9a Fix KeyError crash on attempting to scrape inexistent tweet ID JustAnotherArchivist 2022-02-07 04:04:21 +00:00
  • 7d939c110c Port profile and tweet scrapers to GraphQL API JustAnotherArchivist 2022-02-07 03:49:14 +00:00
  • 8e95e9a9a7 Fix crash on places without a bounding box JustAnotherArchivist 2022-02-07 00:38:22 +00:00
  • aa7d7d3dc3 Refactor automatic importing in snscrape.modules to something less hacky JustAnotherArchivist 2022-02-05 03:22:24 +00:00
  • 560c78c5cf Make all optional scraper arguments keyword-only and fix Mastodon argument style to conform with the other scrapers JustAnotherArchivist 2022-01-30 00:21:18 +00:00
  • 107c3c71c2 Remove unnecessary f-strings JustAnotherArchivist 2022-01-28 21:22:13 +00:00
  • 7f88678253 Merge pull request #359 from own3dh2so4/master JustAnotherArchivist 2022-01-13 23:08:28 +00:00
  • 52e4f9fb69 Added proxy option to Scraper base David Garcia Alvarez 2022-01-13 16:56:00 +01:00
  • eebdfc1c55 Refactor username vs ID mess JustAnotherArchivist 2022-01-12 22:36:26 +00:00
  • e6076353c8 Fix user ID being a string instead of an int on the entity JustAnotherArchivist 2022-01-12 22:35:50 +00:00
  • a32d79fab2 Fix crash on certain mblogs that lack the raw_text attribute JustAnotherArchivist 2022-01-12 22:31:49 +00:00
  • 65391297f6 Move CLI methods to end of class definition for consistent code style JustAnotherArchivist 2022-01-12 21:09:38 +00:00
  • deb2659dd6 Prefix CLI-related methods with an underscore JustAnotherArchivist 2022-01-12 21:07:10 +00:00
  • 93e62744d7 Fix missing timezone info JustAnotherArchivist 2022-01-07 00:42:09 +00:00
  • 3f3632d341 Add support for Mastodon profile and toot scrapes JustAnotherArchivist 2022-01-06 03:25:06 +00:00
  • 5070953feb Skip private fields and properties on dataclass-to-JSON conversion JustAnotherArchivist 2022-01-06 02:08:48 +00:00
  • 853848ed5d ScrollDirection is not part of the public API JustAnotherArchivist 2022-01-05 19:43:19 +00:00
  • 0b4abdc43f Fix baseUrl on tweet scrapes JustAnotherArchivist 2022-01-05 02:39:54 +00:00
  • 267b7d0e32 Rename CLI classmethods JustAnotherArchivist 2022-01-05 02:27:09 +00:00
  • acb7f10a4f Cache Twitter tokens on disk from the CLI for reuse between scrapes JustAnotherArchivist 2022-01-05 02:20:40 +00:00
  • afb6bfc429 add feature_request and question templates TheTechRobo 2022-01-04 12:55:41 -05:00
  • ec5626097a Create bug_report.yml TheTechRobo 2022-01-04 12:39:49 -05:00
  • ca00b480b1 Fix AssertionError on quoted comments JustAnotherArchivist 2022-01-04 01:15:08 +00:00
  • f189ab4241 Prefix all private API names with an underscore JustAnotherArchivist 2022-01-03 17:51:23 +00:00
  • c6e1e33a23 Fix crashing typos JustAnotherArchivist 2022-01-03 17:49:55 +00:00
  • a37ea528d3 Refactor Reddit scrapers again to merge RedditPushshiftScraper and RedditScraper JustAnotherArchivist 2022-01-03 17:47:46 +00:00
  • eee06d8593 Refactor Reddit scrapers into a more reasonable code structure JustAnotherArchivist 2021-12-24 04:58:32 +00:00
  • 4dd3ee6e47 Refactor Instagram scrapers to get rid of the awkward mode parameter JustAnotherArchivist 2021-12-24 04:50:53 +00:00
  • 0336ce13ed Add support for fetching a guest token from the API JustAnotherArchivist 2021-12-23 04:26:50 +00:00
  • 193d4f80d6 Fix user agent in API headers staying constant JustAnotherArchivist 2021-12-23 04:25:23 +00:00
  • e7d35ec1eb Fix date parsing on quoted posts JustAnotherArchivist 2021-12-15 16:55:14 +00:00
  • 8540045658 Fix typo JustAnotherArchivist 2021-12-15 16:36:28 +00:00
  • 1f1c1bd8af Fix docstring style JustAnotherArchivist 2021-12-14 20:05:51 +00:00