mirror of
https://github.com/bellingcat/auto-archiver.git
synced 2026-06-11 12:48:28 +03:00
Docs tidy ups and re-organising
This commit is contained in:
@@ -3,6 +3,24 @@
|
||||
|
||||
This section of the documentation provides guidelines for configuring the tool.
|
||||
|
||||
## Configuring using a file
|
||||
|
||||
The recommended way to configure auto-archiver for long-term and deployed projects is a configuration file, typically called `orchestration.yaml`. This is a YAML file containing all the settings for your entire workflow.
|
||||
|
||||
The structure of orchestration file is split into 2 parts: `steps` (what [steps](../flow_overview.md) to use) and `configurations` (settings for different modules), here's a simplification:
|
||||
|
||||
A default `orchestration.yaml` will be created for you the first time you run auto-archiver (without any arguments). Here's what it looks like:
|
||||
|
||||
<details>
|
||||
<summary>View exampleorchestration.yaml</summary>
|
||||
|
||||
```{literalinclude} ../example.orchestration.yaml
|
||||
:language: yaml
|
||||
:caption: orchestration.yaml
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
## Configuring from the Command Line
|
||||
|
||||
You can run auto-archiver directy from the command line, without the need for a configuration file, command line arguments are parsed using the format `module_name.config_value`. For example, a config value of `api_key` in the `instagram_extractor` module would be passed on the command line with the flag `--instagram_extractor.api_key=API_KEY`.
|
||||
@@ -14,23 +32,10 @@ auto-archiver --instagram_extractor.api_key=123 --other_module.setting --store
|
||||
# will store the new settings into the configuration file (default: orchestration.yaml)
|
||||
```
|
||||
|
||||
## Configuring using a file
|
||||
|
||||
The recommended way to configure auto-archiver for long-term and deployed projects is a configuration file, typically called `orchestration.yaml`. This is a YAML file containing all the settings for your entire workflow.
|
||||
|
||||
A default `orchestration.yaml` will be created for you the first time you run auto-archiver (without any arguments). Here's what it looks like:
|
||||
|
||||
<details>
|
||||
<summary>View example orchestration.yaml</summary>
|
||||
|
||||
```{literalinclude} ../example.orchestration.yaml
|
||||
:language: yaml
|
||||
:caption: orchestration.yaml
|
||||
```{note} Arguments passed on the command line override those saved in your settings file. Save them to your config file using the -s or --store flag
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
## Core Module Configuration
|
||||
## Seeing all Configuration Options
|
||||
|
||||
View the configurable settings for the core modules on the individual doc pages for each [](../core_modules.md).
|
||||
You can also view all settings available for the modules you have on your system using the `--help` flag in auto-archiver.
|
||||
|
||||
@@ -38,21 +38,52 @@ Docker works like a virtual machine running inside your computer, it isolates ev
|
||||
2. `$PWD/local_archive` is a folder `local_archive/` in case you want to archive locally and have the files accessible outside docker
|
||||
3. `/app/local_archive` is a folder inside docker that you can reference in your orchestration.yml file
|
||||
|
||||
### Example invocations
|
||||
|
||||
The invocations below will run the auto-archiver Docker image using a configuration file that you have specified
|
||||
|
||||
```bash
|
||||
# all the configurations come from ./secrets/orchestration.yaml
|
||||
docker run --rm -v $PWD/secrets:/app/secrets -v $PWD/local_archive:/app/local_archive bellingcat/auto-archiver --config secrets/orchestration.yaml
|
||||
# uses the same configurations but for another google docs sheet
|
||||
# with a header on row 2 and with some different column names
|
||||
# notice that columns is a dictionary so you need to pass it as JSON and it will override only the values provided
|
||||
docker run --rm -v $PWD/secrets:/app/secrets -v $PWD/local_archive:/app/local_archive bellingcat/auto-archiver --config secrets/orchestration.yaml --gsheet_feeder.sheet="use it on another sheets doc" --gsheet_feeder.header=2 --gsheet_feeder.columns='{"url": "link"}'
|
||||
# all the configurations come from orchestration.yaml and specifies that s3 files should be private
|
||||
docker run --rm -v $PWD/secrets:/app/secrets -v $PWD/local_archive:/app/local_archive bellingcat/auto-archiver --config secrets/orchestration.yaml --s3_storage.private=1
|
||||
```
|
||||
|
||||
## Installing Locally with Pip
|
||||
|
||||
1. Make sure you have python 3.10 or higher installed
|
||||
2. Install the package with your preferred package manager: `pip/pipenv/conda install auto-archiver` or `poetry add auto-archiver`
|
||||
3. Test it's installed with `auto-archiver --help`
|
||||
4. Run it with your orchestration file and pass any flags you want in the command line `auto-archiver --config secrets/orchestration.yaml` if your orchestration file is inside a `secrets/`, which we advise
|
||||
4. Install other local dependency requirements (for )
|
||||
5. Run it with your orchestration file and pass any flags you want in the command line `auto-archiver --config secrets/orchestration.yaml` if your orchestration file is inside a `secrets/`, which we advise
|
||||
|
||||
### Example invocations
|
||||
|
||||
Once all your [local requirements](#installing-local-requirements) are correctly installed, the
|
||||
|
||||
```bash
|
||||
# all the configurations come from ./secrets/orchestration.yaml
|
||||
auto-archiver --config secrets/orchestration.yaml
|
||||
# uses the same configurations but for another google docs sheet
|
||||
# with a header on row 2 and with some different column names
|
||||
# notice that columns is a dictionary so you need to pass it as JSON and it will override only the values provided
|
||||
auto-archiver --config secrets/orchestration.yaml --gsheet_feeder.sheet="use it on another sheets doc" --gsheet_feeder.header=2 --gsheet_feeder.columns='{"url": "link"}'
|
||||
# all the configurations come from orchestration.yaml and specifies that s3 files should be private
|
||||
auto-archiver --config secrets/orchestration.yaml --s3_storage.private=1
|
||||
```
|
||||
|
||||
### Installing Local Requirements
|
||||
|
||||
If using the local installation method, you will also need to install the following dependencies locally:
|
||||
|
||||
1. [ffmpeg](https://www.ffmpeg.org/) must also be installed locally for this tool to work.
|
||||
2. [firefox](https://www.mozilla.org/en-US/firefox/new/) and [geckodriver](https://github.com/mozilla/geckodriver/releases) on a path folder like `/usr/local/bin`.
|
||||
1.[ffmpeg](https://www.ffmpeg.org/) - for handling of downloaded videos
|
||||
2. [firefox](https://www.mozilla.org/en-US/firefox/new/) and [geckodriver](https://github.com/mozilla/geckodriver/releases) on a path folder like `/usr/local/bin` - for taking webpage screenshots with the screenshot enricher
|
||||
3. (optional) [fonts-noto](https://fonts.google.com/noto) to deal with multiple unicode characters during selenium/geckodriver's screenshots: `sudo apt install fonts-noto -y`.
|
||||
|
||||
4. [Browsertrix Crawler docker image](https://hub.docker.com/r/webrecorder/browsertrix-crawler) for the WACZ enricher/archiver
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user