Commit Graph

  • 773fa82f06 introduces reddit dropin msramalho 2025-06-10 16:31:19 +01:00
  • ef0e909a72 extractor to auto detect best quality msramalho 2025-06-10 16:29:35 +01:00
  • 6bbc7fb47a improves antibot flow and makes auth_wall detection optional msramalho 2025-06-10 16:29:07 +01:00
  • 809b8c7749 default dropin introduced msramalho 2025-06-10 16:14:42 +01:00
  • 6d82655cc4 manifest improvement for antibot msramalho 2025-06-10 16:14:34 +01:00
  • 6bd493a791 dropin with new ytdlp feature and helper method msramalho 2025-06-10 16:11:55 +01:00
  • 287e823f43 improves twitter URL cleaning and introduces another bestquality check msramalho 2025-06-10 16:09:38 +01:00
  • c815488daa adds new URLs to ignore msramalho 2025-06-10 15:44:52 +01:00
  • f53e34d6bd Bump webrecorder/browsertrix-crawler from 1.6.1 to 1.6.2 dependabot[bot] 2025-06-09 20:55:07 +00:00
  • 4cfbc3008b Merge pull request #313 from bellingcat/feat/antibot-auth Miguel Sozinho Ramalho 2025-06-08 14:42:35 +01:00
  • 6f02493ff1 adds clips extraction to VK, though generic_extractor should still be run for those msramalho 2025-06-08 14:36:55 +01:00
  • 1f2d637928 minor improvements msramalho 2025-06-08 14:16:21 +01:00
  • 18cc05a2fe allows auth_for_site to receive do.main directly msramalho 2025-06-08 14:16:12 +01:00
  • c96fd71f35 minor cleanup msramalho 2025-06-07 20:06:53 +01:00
  • b3183510ea installs ffmpeg in GH actions msramalho 2025-06-07 20:03:26 +01:00
  • d13a5ef003 adds tests in minor improvements msramalho 2025-06-07 19:58:18 +01:00
  • 48c1ab3c1f doc improvements msramalho 2025-06-07 19:14:16 +01:00
  • b2ee42ee95 adds the first antibot dropin: VKontakte msramalho 2025-06-07 19:10:01 +01:00
  • 07ff5baf07 adds Dropin flexible integration for antibot msramalho 2025-06-07 19:09:37 +01:00
  • d202d79e0f lint msramalho 2025-06-07 19:06:14 +01:00
  • e2e6490b49 minimal changes msramalho 2025-06-07 18:15:21 +01:00
  • 952487da30 adds missing bin dependency msramalho 2025-06-07 18:14:42 +01:00
  • c7a84bc97a generalizes ydl info to filename method for reusing msramalho 2025-06-07 18:14:08 +01:00
  • c0be41950d Merge branch 'dev' of https://github.com/bellingcat/auto-archiver into dev msramalho 2025-06-04 17:06:42 +01:00
  • ae547ef83f Merge pull request #308 from bellingcat/dependabot/npm_and_yarn/scripts/settings/actions-a541a3dacb Miguel Sozinho Ramalho 2025-06-04 15:06:59 +01:00
  • 8a897cf601 minimal changes: standard naming msramalho 2025-06-04 15:06:08 +01:00
  • 14c8af5cc8 Merge pull request #310 from djhmateer/waczscreenshot bug fix Miguel Sozinho Ramalho 2025-06-04 15:01:12 +01:00
  • 8e2e18ef75 Merge pull request #311 from bellingcat/feat/seleniumbase Miguel Sozinho Ramalho 2025-06-04 14:53:31 +01:00
  • 5491f3e9e7 fixing s3 storage tests msramalho 2025-06-04 14:41:00 +01:00
  • 264ba82ea0 finish removing screenshot_enricher references msramalho 2025-06-04 14:31:07 +01:00
  • 05231445d9 removes unnecessary ignored files msramalho 2025-06-04 14:19:25 +01:00
  • 2c6be4447f linting msramalho 2025-06-04 14:17:38 +01:00
  • 5f68c151a0 removes webdriver utils used by screenshot enricher msramalho 2025-06-04 14:17:19 +01:00
  • 6d2aec032f Merge remote-tracking branch 'origin/main' into dev msramalho 2025-06-04 14:15:14 +01:00
  • bc8cf2fb29 minor TODO msramalho 2025-06-04 14:10:19 +01:00
  • f066111d49 removes geckodriver dependencies following screenshot enricher removal msramalho 2025-06-04 14:09:13 +01:00
  • e6f3826a3a dropping screenshot enricher msramalho 2025-06-04 12:08:59 +01:00
  • e5a78a5d06 antibot can be used out of the box msramalho 2025-06-04 12:01:42 +01:00
  • 258fb4faaf visual HTML preview improvements msramalho 2025-06-04 12:00:40 +01:00
  • 5ec00f7811 adds dependencies for seleniumbase msramalho 2025-06-04 12:00:22 +01:00
  • 22408e2a98 adds test for antibot msramalho 2025-06-04 11:59:59 +01:00
  • 378b1a6d22 expand S3 objects content type for better preview results in non-latin languages msramalho 2025-06-04 11:53:41 +01:00
  • d130c1b3fa WIP attempt at ytdlp impersonation msramalho 2025-06-04 11:53:18 +01:00
  • cbd189c97d general cleanup msramalho 2025-06-04 11:53:01 +01:00
  • d2e8f1a512 introduces antibot step with seleniumbase msramalho 2025-06-04 11:20:46 +01:00
  • 488802b632 poetry update msramalho 2025-06-04 11:08:44 +01:00
  • c772082f0e counter_screenshots to counter_warc_files in wacz_extractor so don't get error about add mulitple items with same id. Dave Mateer 2025-06-03 12:30:18 +01:00
  • ee68f3efee Merge remote-tracking branch 'origin/main' into feat/seleniumbase msramalho 2025-06-03 11:05:16 +01:00
  • efe2a1a8b6 Bump the actions group in /scripts/settings with 4 updates dependabot[bot] 2025-06-02 20:21:07 +00:00
  • 6735fa890b v1.0.1 dependency updates, generic extractor improvements (#307) v1.0.1 Miguel Sozinho Ramalho 2025-06-02 20:57:12 +01:00
  • 69028588b3 linting msramalho 2025-06-02 20:04:34 +01:00
  • b351a33593 version bump msramalho 2025-06-02 20:03:48 +01:00
  • 87e1cdc102 adds yt-dlp curl-cffi msramalho 2025-06-02 20:02:35 +01:00
  • 4170c2011c formatting msramalho 2025-06-02 19:33:55 +01:00
  • dd4e372703 use ffmpeg -bitexact to reduce duplicate content storing msramalho 2025-06-02 19:33:53 +01:00
  • b9f7927a3b closes 305 and further fixes finding local downloads from uncommon ytdlp extractors msramalho 2025-06-02 19:14:09 +01:00
  • d99b7c9efe Merge remote-tracking branch 'origin/main' into dev msramalho 2025-06-02 13:25:34 +01:00
  • 48be13fb2a catch for if self.comments are true but no actual comments in video (#303) Dave Mateer 2025-06-02 13:02:19 +01:00
  • 4aae5047f5 Changed log level on gsheet_feeder_db started from warning to info (#301) Dave Mateer 2025-06-02 12:53:21 +01:00
  • 258e56aa26 poetry updates msramalho 2025-06-02 12:52:26 +01:00
  • 9ad6213efa npm version bump on scripts/settings msramalho 2025-06-02 12:47:31 +01:00
  • 2f36e50e0b bumps browsertrix in docker image msramalho 2025-06-02 12:06:14 +01:00
  • 2d7206f99d REMOVES vk_extractor until further notice msramalho 2025-06-02 12:06:02 +01:00
  • ac24fd8f49 improves generic extractor edge-cases and yt-dlp updates msramalho 2025-06-02 12:03:51 +01:00
  • ee3e871dd8 wacz: allow exceptional cases where more than one resource image is available msramalho 2025-05-28 11:53:29 +01:00
  • e6fdef66df improves instructions on docker setup with an example URL msramalho 2025-04-28 11:16:01 +01:00
  • 5cf640af8a experiments with seleniumbase msramalho 2025-04-28 11:08:00 +01:00
  • 33cacd145f Update tests-download.yaml Miguel Sozinho Ramalho 2025-04-07 21:15:18 +01:00
  • 0f69b5fe0c update repo badges v1.0.0 Miguel Sozinho Ramalho 2025-03-31 16:19:29 +01:00
  • ad2e8397b2 Merge pull request #287 from bellingcat/fix/insta_tbot_empty Erin Clark 2025-03-31 14:31:46 +01:00
  • 144adaad5b Only return success for instagram_tbot_extractor.py with content. erinhmclark 2025-03-31 14:14:36 +01:00
  • c7c7eb00a1 Merge pull request #286 from bellingcat/version_comparison Erin Clark 2025-03-31 12:40:42 +01:00
  • 7e4ba62918 Small code change erinhmclark 2025-03-31 12:05:39 +01:00
  • 9c2b506189 update runner os to matrix os. erinhmclark 2025-03-31 12:00:24 +01:00
  • 8940580638 Add poetry cache clear, and small code change erinhmclark 2025-03-31 11:41:26 +01:00
  • c2821d7c83 Fix poetry install deletion erinhmclark 2025-03-31 11:25:51 +01:00
  • a590647279 Small code tidy to trigger tests. erinhmclark 2025-03-31 11:23:49 +01:00
  • 1edfdae03e Update download tests to match cache process. erinhmclark 2025-03-31 11:17:40 +01:00
  • 6c7f6af4b4 Add cache action with key to OS, py version and lockfile hash, and install packages from source. erinhmclark 2025-03-31 11:11:56 +01:00
  • 8685b6bf13 Merge pull request #285 from bellingcat/fix-ubuntu-22 Erin Clark 2025-03-28 15:38:03 +00:00
  • 0ce7f5a1b5 Disable caching Patrick Robertson 2025-03-28 18:40:02 +04:00
  • 85d3f2fa02 Revert changes Patrick Robertson 2025-03-28 18:36:11 +04:00
  • fd540bd03a Code change to trigger tests Patrick Robertson 2025-03-28 18:26:14 +04:00
  • 86f328515c Use cache key that includes os version Patrick Robertson 2025-03-28 18:24:24 +04:00
  • 68992025b0 Update version comparison. erinhmclark 2025-03-28 14:29:44 +00:00
  • 6544934825 Merge pull request #283 from bellingcat/1.0-release Patrick Robertson 2025-03-28 18:06:59 +04:00
  • 197599b406 Merge pull request #284 from bellingcat/revert-downloads-test Patrick Robertson 2025-03-28 18:06:49 +04:00
  • 96efdcbba1 Merge pull request #281 from bellingcat/add_inst_api_script Erin Clark 2025-03-28 13:58:37 +00:00
  • 2ec494b4b9 Revert downloads CI tests changes Patrick Robertson 2025-03-28 17:55:58 +04:00
  • 1d18399d70 Merge pull request #222 from bellingcat/feat/yt-dlp-pots Erin Clark 2025-03-28 13:54:27 +00:00
  • 3550a009e6 v1.0.0 release 🎉 Patrick Robertson 2025-03-28 13:53:29 +00:00
  • dd7d85b4b4 Lock erinhmclark 2025-03-28 13:47:18 +00:00
  • c510c04643 Update config reference in test_generic_extractor.py erinhmclark 2025-03-28 13:43:46 +00:00
  • a0d955fe84 lock erinhmclark 2025-03-28 13:39:58 +00:00
  • 5e7c57650b Update "default" to "auto" for clarity, update docs erinhmclark 2025-03-28 13:16:16 +00:00
  • 1db7d6702d Update the documentation erinhmclark 2025-03-28 12:27:18 +00:00
  • b1a8792f9f Remove duplicate line erinhmclark 2025-03-28 11:44:37 +00:00
  • f715100dd5 Add run_instagrapi_server.sh and update docs erinhmclark 2025-03-28 11:31:23 +00:00
  • dbcf19d1b8 Update update path reference erinhmclark 2025-03-28 10:55:21 +00:00
  • 0840b7283c Format erinhmclark 2025-03-28 10:43:00 +00:00