Commit Graph

  • 17d9bf694f fix docker image so as not to remove browsertrix files msramalho 2023-09-06 17:07:10 +01:00
  • 368395ffa8 Merge pull request #88 from djhmateer/v6-test Miguel Sozinho Ramalho 2023-08-28 11:09:28 +01:00
  • 21d7d2e16c format youtubedl_archiver.py Miguel Sozinho Ramalho 2023-08-28 11:09:03 +01:00
  • 0bbb4c9b08 Added noplaylist true to youtubedl so that videos in playlists will work Dave Mateer 2023-08-27 17:26:36 +01:00
  • a30607801f Bump version to v0.6.6 for release v0.6.6 msramalho 2023-08-24 17:10:16 +01:00
  • c75d54a4ec Merge pull request #87 from bellingcat/fix-wacz Miguel Sozinho Ramalho 2023-08-24 17:09:49 +01:00
  • 804fcb1204 browsertrix dependencies isolated into dockerfile msramalho 2023-08-24 16:57:58 +01:00
  • b2adceff25 Bump version to v0.6.5 for release v0.6.5 msramalho 2023-08-24 12:43:49 +01:00
  • 92a0a92b47 closes #86 msramalho 2023-08-24 12:43:28 +01:00
  • bf3c04b3fc Bump version to v0.6.4 for release v0.6.4 msramalho 2023-08-18 21:25:17 +01:00
  • 7eebecdb2c update dependencies msramalho 2023-08-18 21:25:13 +01:00
  • b17b5953dd closes #59 msramalho 2023-08-17 18:11:58 +01:00
  • ceb717ea65 exclude vk emojis msramalho 2023-08-17 18:11:26 +01:00
  • 6e4fb76940 exclude ok resource images from wacz enricher msramalho 2023-08-09 11:26:46 +01:00
  • 810a31b1f0 fix: whisper handle error http code msramalho 2023-08-08 18:06:48 +01:00
  • 8b15d733b1 adds whisper endpoints msramalho 2023-08-05 14:03:57 +01:00
  • ca37d54b7f Bump version to v0.6.3 for release v0.6.3 msramalho 2023-08-05 13:58:39 +01:00
  • a1742b5565 fixing whisper enricher msramalho 2023-08-05 13:57:09 +01:00
  • 60a1f3a27a minor fixes msramalho 2023-07-31 16:08:48 +01:00
  • 31c07a02e1 Bump version to v0.6.2 for release v0.6.2 msramalho 2023-07-28 13:10:14 +01:00
  • bd231488ff parameter fix msramalho 2023-07-28 13:10:06 +01:00
  • fb197f1064 excluding telegram embeds msramalho 2023-07-28 12:57:15 +01:00
  • ec1a78e973 Bump version to v0.6.1 for release v0.6.1 msramalho 2023-07-28 12:51:37 +01:00
  • 139bdec051 excludes files from perceptual hash msramalho 2023-07-28 12:51:24 +01:00
  • f15a70f859 missing hash_enricher import msramalho 2023-07-28 12:51:04 +01:00
  • 419eaef449 fixes unsued tmp_dir msramalho 2023-07-28 12:50:52 +01:00
  • 1695954c98 new metadata enricher msramalho 2023-07-28 12:46:30 +01:00
  • aa71c85a98 improving ignored content from waczs msramalho 2023-07-28 12:19:14 +01:00
  • 7a5c9c65bd detects duplicates before storing, eg: wacz getting media already fetched by another archiver msramalho 2023-07-28 10:51:48 +01:00
  • fc93ebaba0 cleanup msramalho 2023-07-28 10:49:39 +01:00
  • 1b44a302cd removing some reverse search engines msramalho 2023-07-28 10:49:20 +01:00
  • 1368f7aebc feat: making grayscale a toggle msramalho 2023-07-28 10:49:03 +01:00
  • e3a0003a47 adding WACZ screenshots msramalho 2023-07-27 21:36:25 +01:00
  • 59551b3b20 minor improvements: finding best twitter image quality msramalho 2023-07-27 21:36:15 +01:00
  • f086d89111 new escape message msramalho 2023-07-27 20:14:59 +01:00
  • 3dd3775cbd removes rearchiving logic msramalho 2023-07-27 20:14:50 +01:00
  • 1e66a2c905 Bump version to v0.6.0 for release v0.6.0 msramalho 2023-07-27 15:42:29 +01:00
  • e8f44b652e minor improvements msramalho 2023-07-27 15:42:23 +01:00
  • dd034da844 feat: WACZ enricher can now be probed for media, and used as an archiver OR enricher msramalho 2023-07-27 15:42:10 +01:00
  • 65e3c99483 Bump version to v0.5.28 for release v0.5.28 msramalho 2023-07-26 16:13:14 +01:00
  • 888ad8f004 fix: twitter hack videos extension detection msramalho 2023-07-26 16:12:56 +01:00
  • 086a9e6c84 fix: remove unnecessary log msramalho 2023-07-11 12:17:15 +01:00
  • 4d80ee6f02 Bump version to v0.5.27 for release v0.5.27 msramalho 2023-07-11 12:16:06 +01:00
  • 92569ae6be fix: telegram archiver was outdated for images msramalho 2023-07-11 12:15:56 +01:00
  • abaf86c776 Bump version to v0.5.26 for release v0.5.26 msramalho 2023-07-02 18:42:59 +02:00
  • 8005a1955a fixes #82 twitter api walls msramalho 2023-07-02 18:42:43 +02:00
  • b7889a182d readme update msramalho 2023-06-26 18:18:46 +01:00
  • 04f827f183 Bump version to v0.5.25 for release v0.5.25 msramalho 2023-06-26 18:15:45 +01:00
  • 485901da3c security update msramalho 2023-06-26 18:15:19 +01:00
  • a2c6cdc111 Bump version to v0.5.24 for release v0.5.24 msramalho 2023-06-26 17:58:47 +01:00
  • 8bb7883eeb Merge pull request #81 from emieldatalytica/add_perceptual_hash Miguel Sozinho Ramalho 2023-06-26 17:34:27 +01:00
  • a0971fc601 final code review changes msramalho 2023-06-26 17:32:19 +01:00
  • 0cba2c25c6 get all media method msramalho 2023-06-26 17:28:19 +01:00
  • 7c0b05b276 new column msramalho 2023-06-26 17:27:57 +01:00
  • 3bbfdf6eba fix: excluding screenshots msramalho 2023-06-26 17:27:49 +01:00
  • a7a6bda1c2 improve missing col behaviour to error log msramalho 2023-06-26 17:27:37 +01:00
  • d80145002d formatter to accommodate properties of inner media msramalho 2023-06-26 17:06:50 +01:00
  • b4f86d0e8d refactor to hash all images and save hex string msramalho 2023-06-26 17:06:30 +01:00
  • 6cf3e109ed refactor discovery of inner media elements msramalho 2023-06-26 17:05:25 +01:00
  • d4f983e575 adds missing lib numpy msramalho 2023-06-26 16:55:19 +01:00
  • 88b07d777b cleanup example file msramalho 2023-06-26 16:55:05 +01:00
  • 222e6ddb28 add perceptual hashing with pdq Emiel de Heij 2023-06-26 15:42:44 +02:00
  • 3e340b2580 change to old status Emiel de Heij 2023-06-26 15:37:47 +02:00
  • 9fc09c724b add module for perceptual hashing with pdq Emiel de Heij 2023-06-26 15:25:55 +02:00
  • f6e5a14d75 add dependencies Emiel de Heij 2023-06-26 15:24:55 +02:00
  • 0e9c765b96 Merge pull request #80 from brrttwrks/update_orchestration_example Miguel Sozinho Ramalho 2023-06-26 13:25:52 +01:00
  • 87f553661b add csb_db config to exapmle.orchestration.yaml Eric Nicholas Barrett 2023-06-21 20:54:14 +04:00
  • cc66ee3fd4 bump to patch 23 v0.5.23 Logan Williams 2023-06-06 12:24:43 -06:00
  • b3b727b005 Fix ValueError v0.5.22 v0.5.21 Logan Williams 2023-06-06 12:13:08 -06:00
  • ee37b20e6c fix: on missing col msramalho 2023-05-24 20:25:30 +01:00
  • a184bf7b97 Bump version to v0.5.20 for release v0.5.20 msramalho 2023-05-24 20:24:35 +01:00
  • e535f44a88 optional folder msramalho 2023-05-24 20:24:15 +01:00
  • 0f28bf0e35 Bump version to v0.5.19 for release v0.5.19 msramalho 2023-05-24 19:57:51 +01:00
  • 18a8636552 feat: new DB for auto-archiver-api msramalho 2023-05-24 19:24:53 +01:00
  • 81be65c828 Bump version to v0.5.18 for release v0.5.18 msramalho 2023-05-24 11:19:02 +01:00
  • 0a91863212 typing fixes msramalho 2023-05-24 11:18:39 +01:00
  • 3ad8349e3f Bump version to v0.5.17 for release v0.5.17 msramalho 2023-05-23 19:05:53 +01:00
  • 2768225cd1 fix: generator not called msramalho 2023-05-23 19:05:47 +01:00
  • 3e44b9b577 Bump version to v0.5.16 for release v0.5.16 msramalho 2023-05-23 18:12:56 +01:00
  • 1a5797d0f8 feat: orchestrator fed returns archive result msramalho 2023-05-23 18:12:04 +01:00
  • 768b8fce9f Bump version to v0.5.15 for release v0.5.15 msramalho 2023-05-19 12:35:26 +01:00
  • 613b1f1e50 properly overwrite configs msramalho 2023-05-19 12:35:19 +01:00
  • 919c37bfb6 Bump version to v0.5.14 for release v0.5.14 msramalho 2023-05-19 12:18:02 +01:00
  • a655b3c987 gsheet accepts ID too msramalho 2023-05-19 12:17:34 +01:00
  • d645b840ee disable duplicate GH actions msramalho 2023-05-19 12:17:03 +01:00
  • 3da9c9cf8f Bump version to v0.5.13 for release v0.5.13 msramalho 2023-05-19 11:49:38 +01:00
  • 987bbcaad0 removes conflicting unused dep msramalho 2023-05-19 11:49:29 +01:00
  • 68e9d2a2ce allows yaml config to be overwritten msramalho 2023-05-19 11:49:02 +01:00
  • 76be271c18 Update workflows to work with main branch Logan Williams 2023-05-15 10:14:53 +02:00
  • 074f132ad9 Merge branch 'dockerize' Logan Williams 2023-05-11 15:09:02 +02:00
  • c47da0a46f Fix issue with profiles in browsertrix Logan Williams 2023-05-11 15:08:27 +02:00
  • eb82936a04 Merge pull request #76 from bellingcat/dockerize Miguel Sozinho Ramalho 2023-05-11 13:57:37 +01:00
  • cc03ad7c49 Update README.md Miguel Sozinho Ramalho 2023-05-11 13:55:28 +01:00
  • 6d2aa3dd7a Add invocation example Logan Williams 2023-05-11 14:32:23 +02:00
  • f2e580de4e Update README images Logan Williams 2023-05-11 14:30:27 +02:00
  • 3f48d75d8f Merge branch 'dockerize' of github.com:bellingcat/auto-archiver into dockerize Logan Williams 2023-05-11 11:33:47 +02:00
  • 80ea912d0e Update README Logan Williams 2023-05-11 11:32:46 +02:00
  • b7c69c0f0d Bump version to v0.5.12 for release v0.5.12 msramalho 2023-05-10 18:58:34 +01:00
  • c98991cdfb fix: vk-url-scraper version update msramalho 2023-05-10 18:57:45 +01:00
  • 45b982ec38 fix: max chars on sheets cell msramalho 2023-05-10 18:57:33 +01:00