improves docs for how-to and migrations

This commit is contained in:
msramalho
2025-06-11 13:37:03 +01:00
parent 3cf51dd874
commit e567bba6f9
7 changed files with 20 additions and 11 deletions

View File

@@ -15,19 +15,29 @@ We have dropped the `vk_extractor` because of problems in a project we relied on
Module 'vk_extractor' not found. Are you sure it's installed/exists?
```
## Dropping `screenshot_enricher` module
We have dropped the `screenshot_enricher` module because a new `antibot_extractor_enricher` (see below) module replaces its functionality more robustly and with less dependency hassle on geckodriver/firefox. You will need to remove it from your configuration file, otherwise you will see an error like:
```{code} console
Module 'screenshot_enricher' not found. Are you sure it's installed/exists?
```
## New `antibot_extractor_enricher` module and VkDropin
We have added a new `antibot_extractor_enricher` module that uses a computer-controlled browser to extract content from websites that use anti-bot measures. You can add it to your configuration file like this:
We have added a new [`antibot_extractor_enricher`](../modules/autogen/extractor/antibot_extractor_enricher.md) module that uses a computer-controlled browser to extract content from websites that use anti-bot measures. You can add it to your configuration file like this:
```{code} yaml
steps:
extractors:
- antibot_extractor_enricher
extractors:
- antibot_extractor_enricher
# or alternatively, if you want to use it as an enricher:
enrichers:
- antibot_extractor_enricher
# or alternatively, if you want to use it as an enricher:
enrichers:
- antibot_extractor_enricher
```
It will take a full page screenshot, a PDF capture, extract HTML source code, and any other relevant media.
It comes with Dropins that we will be adding and maintaining.
> Dropin: A module with site-specific behaviours that is loaded automatically. You don't need to add them to your configuration steps for them to run. Sometimes they need `authentication` configurations though.
@@ -36,9 +46,9 @@ One such Dropin is the VkDropin which uses this automated browser to access VKon
```{code} yaml
authentication:
vk:
username: your_username
password: your_password
vk.com:
username: your_username
password: your_password
```
See all available Dropins in [the source code](https://github.com/bellingcat/auto-archiver/tree/main/src/auto_archiver/modules/antibot_extractor_enricher/dropins). Usually each Dropin needs its own authentication settings, similarly to the VkDropin.