Commit Graph

  • 29901da601 Merge branch 'load_modules' into docs_update Patrick Robertson 2025-02-11 14:10:56 +00:00
  • 895c843f04 Add a cheat sheet for configs and better folder structure for core modules Patrick Robertson 2025-02-11 14:06:53 +00:00
  • 2f51d3917a Further addition to docs: creating modules, configurations, installation Patrick Robertson 2025-02-11 13:49:30 +00:00
  • aa5ac18d6a Merge pull request #189 from bellingcat/add_module_tests Erin Clark 2025-02-11 13:11:41 +00:00
  • 7d87b858d6 Merge branch 'load_modules' into docs_update Patrick Robertson 2025-02-11 13:09:38 +00:00
  • c8cd7ea63c Merge branch 'load_modules' into add_module_tests erinhmclark 2025-02-11 13:08:08 +00:00
  • 977618b4ce doc: adds note about telethon vs telegram extractors msramalho 2025-02-11 13:04:59 +00:00
  • d90d3cec28 fix telethon_extractor setup msramalho 2025-02-11 13:03:18 +00:00
  • 977f06c37a renames api_db property for clarity msramalho 2025-02-11 12:56:33 +00:00
  • 5c59029221 updates api_db for new API endpoint msramalho 2025-02-11 12:53:58 +00:00
  • 4eeb39477c improves gsheetdb feedback on retrieve sheet failure msramalho 2025-02-11 12:53:46 +00:00
  • 6fdd5f0e66 fix cases of single : vs :: in entrypoint msramalho 2025-02-11 12:53:12 +00:00
  • e6594ad3dc merge result into cached results for context preservation msramalho 2025-02-11 12:52:42 +00:00
  • 7309cd32e7 fix: context to be updated on Metadata.merge msramalho 2025-02-11 12:51:17 +00:00
  • d1d6cde008 Set mock timestamp without z format erinhmclark 2025-02-11 12:27:48 +00:00
  • 5e2e93382f Test fixes for 3.10 compliance. erinhmclark 2025-02-11 12:17:42 +00:00
  • f97ec6a9e0 Fixed S3 module import erinhmclark 2025-02-11 11:58:28 +00:00
  • 89d9140d15 Fixed setup/ config_setup reference erinhmclark 2025-02-11 11:47:11 +00:00
  • 1792e02d1d skip authenticated tests in test_gdrive_storage.py erinhmclark 2025-02-11 11:34:36 +00:00
  • 18666ff027 skip authenticated tests in test_gsheet_feeder.py erinhmclark 2025-02-11 11:28:24 +00:00
  • a69ac3e509 Fix file hash reference in S3 tests erinhmclark 2025-02-11 09:46:22 +00:00
  • ed81dcdaf0 Remove dangling 'b = ' from config.py Patrick Robertson 2025-02-10 23:07:03 +00:00
  • e7273bc741 Fix link Patrick Robertson 2025-02-10 23:04:41 +00:00
  • dbc564e18b Add sphinx_book_theme theme to poetry Patrick Robertson 2025-02-10 22:58:52 +00:00
  • 2650cd8fb2 Use a script to auto-generate documentation for the core modules from the manifest file Patrick Robertson 2025-02-10 22:51:04 +00:00
  • 8d894066f2 Merge branch 'load_modules' into add_module_tests erinhmclark 2025-02-10 19:00:05 +00:00
  • 3dae2337a1 remove cdn_url check before storage. erinhmclark 2025-02-10 18:56:46 +00:00
  • e97ccf8a73 Separate setup() and module_setup(). erinhmclark 2025-02-10 18:07:47 +00:00
  • 2c3d1f591f Separate setup() and module_setup(). erinhmclark 2025-02-10 17:25:15 +00:00
  • 12f14cccc9 fixes gsheet feeder<->db connection via context. msramalho 2025-02-10 16:58:35 +00:00
  • ab6cf52533 fixes bad hash initialization msramalho 2025-02-10 16:45:28 +00:00
  • 824728739a Start fleshing out the docs more - rearrange, separate out modules section, move files over to md (from rst) Patrick Robertson 2025-02-10 16:24:16 +00:00
  • c4bb667cec Merge branch 'load_modules' into add_module_tests erinhmclark 2025-02-10 16:17:08 +00:00
  • f311621e58 Small fixes. Add timestamp helper method. erinhmclark 2025-02-10 15:57:42 +00:00
  • 15abf686b1 decouples s3_storage from hash_enricher msramalho 2025-02-10 15:48:54 +00:00
  • 8fb3dc754b fixing telethon extractor to use default entrypoint msramalho 2025-02-10 14:59:51 +00:00
  • 7c848046e8 adds better info about wrong/missing modules msramalho 2025-02-10 14:59:32 +00:00
  • f3f6b92817 Implementation test cleanup Patrick Robertson 2025-02-10 12:43:21 +00:00
  • 74207d7821 Implementation tests for auto-archiver Patrick Robertson 2025-02-10 13:27:11 +01:00
  • e9dd321dcd Fix setting cli_feeder as default feeder on clean install Patrick Robertson 2025-02-10 13:06:24 +01:00
  • 1fad37fd93 Remove blank file Patrick Robertson 2025-02-07 23:08:30 +01:00
  • 63aba6ad39 Fix sphinx-autoapi imports Patrick Robertson 2025-02-07 21:54:49 +01:00
  • 950624dd4b Fix S3 storage to media in whisper_enricher.py. erinhmclark 2025-02-07 20:26:00 +00:00
  • 2920cf685f Small fixes to whisper_enricher.py. erinhmclark 2025-02-07 12:35:40 +00:00
  • e9ad1e1b85 Pass media to storage cdn_call erinhmclark 2025-02-06 22:01:55 +00:00
  • 266c7a14e6 Context related fixes, some more tests. erinhmclark 2025-02-06 16:53:00 +00:00
  • 67504a683e Merge branch 'load_modules' into add_module_tests erinhmclark 2025-02-06 10:13:37 +00:00
  • 5b0bad832f Updated test, test metadata erinhmclark 2025-02-06 10:11:56 +00:00
  • a506f2a88f Clarify that an extractor's method can also return False if no valid data was found Patrick Robertson 2025-02-06 10:19:28 +01:00
  • 6ab8fd2ee4 Tidy up setting modules as Orchestrator attributes on startup. Patrick Robertson 2025-02-05 20:39:53 +01:00
  • 52542812dc Merge tests from version with context. erinhmclark 2025-02-05 16:42:58 +00:00
  • 48abb5e66b Remove dangling screenshot_enricher file. Moved to modules/screenshot_enricher Patrick Robertson 2025-02-04 18:16:03 +01:00
  • 91ca325fd5 Update yt-dlp to latest version + remove code no longer needed from bluesky dropin Patrick Robertson 2025-02-04 17:46:46 +01:00
  • 0633e17998 Close the facebook 'login' window if it's there - to allow for proper screenshots Patrick Robertson 2025-02-04 14:18:46 +01:00
  • 034197a81f Fix typos in csv feeder docs (in manifest) Patrick Robertson 2025-02-04 13:40:07 +01:00
  • 78e6418249 Unit tests for csv feeder + fix some bugs Patrick Robertson 2025-02-04 13:37:17 +01:00
  • b301f60ea3 Fix using validators set in __manifest__.py Patrick Robertson 2025-02-04 13:36:05 +01:00
  • a873e56b87 Remove old csv_feeder file - now inside a module Patrick Robertson 2025-02-04 12:57:35 +01:00
  • 72b5ea9ab6 Restore headless arg Patrick Robertson 2025-02-03 17:40:40 +01:00
  • c574b694ed Set up screenshot enricher to use authentication/cookies Patrick Robertson 2025-02-03 17:25:59 +01:00
  • 7ec328ab40 Remove cookie options from generic_extractor - it now uses 'authentication' global settings :D Patrick Robertson 2025-02-03 16:04:36 +01:00
  • 7a2be5a0da Add cookie extraction to 'authentication' options, get generic_extractor working using this info Patrick Robertson 2025-02-03 16:03:07 +01:00
  • 9c9e9b370e Remove lingering reference to ArchivingContext Patrick Robertson 2025-02-03 16:02:38 +01:00
  • 9a8c94b641 Fix getting/setting folder context for metadata Patrick Robertson 2025-02-03 16:02:17 +01:00
  • c25d5cae84 Remove ArchivingContext completely Patrick Robertson 2025-01-30 17:50:54 +01:00
  • d76063c3f3 Fix unit tests Patrick Robertson 2025-01-30 16:46:53 +01:00
  • d6b4b7a932 Further cleanup Patrick Robertson 2025-01-30 16:43:09 +01:00
  • 953011f368 Don't make modules 'dataclasses' Patrick Robertson 2025-01-30 14:39:52 +01:00
  • 527438826c Fix manifests for required configs. erinhmclark 2025-01-30 13:04:51 +00:00
  • fade68c6f4 Fix up unit tests - dataclass + subclasses not having @dataclass was breaking it Patrick Robertson 2025-01-30 13:45:24 +01:00
  • b7d9145f6c Further tidyups + refactoring for new structure Patrick Robertson 2025-01-30 13:21:10 +01:00
  • cddae65a90 Update modules for new core structure. erinhmclark 2025-01-30 08:42:23 +00:00
  • 18ff36ce15 Add ruamel to dependencies (replaces pyyaml) Patrick Robertson 2025-01-29 19:37:41 +01:00
  • 00a7018f36 Fix up dependency checking (use 'dependencies' instead of 'external_dependencies' -> simpler/easier to remember Patrick Robertson 2025-01-29 19:25:22 +01:00
  • 3d37c494aa Tidy ups + unit tests: Patrick Robertson 2025-01-29 18:42:12 +01:00
  • 4c1c8953ca Add unit tests for timestamping_enricher Patrick Robertson 2025-01-29 12:20:52 +01:00
  • dcd5576f29 set metadata enricher to requires_setup=True (requires exiftool which isn't installed by default on most machines) Patrick Robertson 2025-01-29 00:10:40 +01:00
  • 7a4871db6b Fix up unit tests for new structure Patrick Robertson 2025-01-28 14:40:12 +01:00
  • 9635449ac0 more user friendly error logging when config issues are found Patrick Robertson 2025-01-28 11:44:52 +01:00
  • 27b25c5bd4 Validate orchestration.yaml file inputs - so if a user enters invalid values, it also validates them Patrick Robertson 2025-01-28 11:37:23 +01:00
  • 1d2a1d4db7 Allow framework for config settings that should not be stored in config (e.g. cli_feeder.urls Patrick Robertson 2025-01-28 11:14:12 +01:00
  • 57b3bec935 Google sheets feeder and database implemented. erinhmclark 2025-01-27 20:13:12 +00:00
  • 6c67effd8c remove name reference in local_storage.py erinhmclark 2025-01-27 19:17:18 +00:00
  • e1a9373336 Refactoring for new config setup erinhmclark 2025-01-27 19:03:02 +00:00
  • e3074013d0 Fix loading/saving to orchestration file with comments Patrick Robertson 2025-01-27 14:28:04 +01:00
  • f68e2726f2 Refactor loader + step into module, use LazyBaseModule and BaseModule Patrick Robertson 2025-01-27 14:01:36 +01:00
  • 7fd95866a1 Further fixes/changes to loading 'types' for config + manifest edits Patrick Robertson 2025-01-27 11:48:04 +01:00
  • 14e2479599 Merge branch 'more_mainifests' into load_modules Patrick Robertson 2025-01-27 11:05:56 +01:00
  • 0b03f54f4e Fix up config validation, and allow for custom 'validators' Patrick Robertson 2025-01-27 11:00:52 +01:00
  • ebebd27897 Fix archiver to extractor naming erinhmclark 2025-01-27 09:11:45 +00:00
  • 21a7ff0520 Fix types in manifests erinhmclark 2025-01-27 08:43:18 +00:00
  • 96b35a272c Rm gsheet references in utils erinhmclark 2025-01-24 18:51:15 +00:00
  • dd402b456f Fix and add types to manifest erinhmclark 2025-01-24 18:50:11 +00:00
  • 3fc6ddfe85 Tweaks to logging strings Patrick Robertson 2025-01-24 15:30:00 +01:00
  • f1e9ab6751 Merge branch 'main' into load_modules Patrick Robertson 2025-01-24 15:23:15 +01:00
  • e8138eac1c Add ubuntu-latest to the matrix of test runners (#181) Patrick Robertson 2025-01-24 15:03:55 +01:00
  • a6fc4e1bb1 modifies base docker image to use browsertrix 1.4.2 (#182) Miguel Sozinho Ramalho 2025-01-24 13:59:29 +00:00
  • 1942e8b819 Gsheets utility revert erinhmclark 2025-01-24 13:34:30 +00:00
  • 024fe58377 fix config parsing in manifests, remove module level configs erinhmclark 2025-01-24 13:33:12 +00:00
  • 0453d95f56 fix config parsing in manifests erinhmclark 2025-01-24 13:24:54 +00:00