mirror of
https://github.com/bellingcat/auto-archiver.git
synced 2026-06-10 12:18:30 +03:00
* Add formatters to the TOC - fixes #204 * Add 'steps' settings to the example YAML in the docs. Fixes #206 * Improved docs on authentication architecture * Fix setting modules on the command line - they now override any module settings in the orchestration as opposed to appending * Fix tests for gsheet-feeder: add a test service_account.json (note: not real keys in there) * Rename the command line entrypoint to _command_line_run Also: make it clear that code implementation should not call this Make sure the command line entry returns (we don't want a generator) * Fix unit tests to use now code-entry points * Version bump * Move iterating of generator up to __main__ * Breakpoint * two minor fixes * Fix unit tests + add new '__main__' entry point implementation test * Skip youtube tests if running on CI. Should still run them locally * Fix full implementation run on GH actions * Fix skipif test for GH Actions CI * Add skipifs for truth - it blocks GH: --------- Co-authored-by: msramalho <19508417+msramalho@users.noreply.github.com>
928 B
928 B
Module Documentation
These pages describe the core modules that come with auto-archiver and provide the main functionality for archiving websites on the internet. There are five core module types:
- Feeders - these 'feed' information (the URLs) from various sources to the
auto-archiverfor processing - Extractors - these 'extract' the page data for a given URL that is fed in by a feeder
- Enrichers - these 'enrich' the data extracted in the previous step with additional information
- Storage - these 'store' the data in a persistent location (on disk, Google Drive etc.)
- Databases - these 'store' the status of the entire archiving process in a log file or database.
:maxdepth: 1
:caption: Core Modules
:hidden:
modules/config_cheatsheet
modules/feeder
modules/extractor
modules/enricher
modules/storage
modules/database
modules/formatter