diff --git a/docs/source/how_to/authentication_how_to.md b/docs/source/how_to/01_authentication_how_to.md similarity index 100% rename from docs/source/how_to/authentication_how_to.md rename to docs/source/how_to/01_authentication_how_to.md diff --git a/docs/source/how_to/gsheets_setup.md b/docs/source/how_to/02_gsheets_setup.md similarity index 100% rename from docs/source/how_to/gsheets_setup.md rename to docs/source/how_to/02_gsheets_setup.md diff --git a/docs/source/how_to/logging.md b/docs/source/how_to/03_logging.md similarity index 100% rename from docs/source/how_to/logging.md rename to docs/source/how_to/03_logging.md diff --git a/docs/source/how_to/run_instagrapi_server.md b/docs/source/how_to/04_run_instagrapi_server.md similarity index 100% rename from docs/source/how_to/run_instagrapi_server.md rename to docs/source/how_to/04_run_instagrapi_server.md diff --git a/docs/source/how_to/upgrading_1_0_1_to_1_1_0.md b/docs/source/how_to/05_upgrading_to_1_1_0.md similarity index 61% rename from docs/source/how_to/upgrading_1_0_1_to_1_1_0.md rename to docs/source/how_to/05_upgrading_to_1_1_0.md index 81e00e2..57bc253 100644 --- a/docs/source/how_to/upgrading_1_0_1_to_1_1_0.md +++ b/docs/source/how_to/05_upgrading_to_1_1_0.md @@ -15,19 +15,29 @@ We have dropped the `vk_extractor` because of problems in a project we relied on Module 'vk_extractor' not found. Are you sure it's installed/exists? ``` +## Dropping `screenshot_enricher` module +We have dropped the `screenshot_enricher` module because a new `antibot_extractor_enricher` (see below) module replaces its functionality more robustly and with less dependency hassle on geckodriver/firefox. You will need to remove it from your configuration file, otherwise you will see an error like: + +```{code} console +Module 'screenshot_enricher' not found. Are you sure it's installed/exists? +``` + + ## New `antibot_extractor_enricher` module and VkDropin -We have added a new `antibot_extractor_enricher` module that uses a computer-controlled browser to extract content from websites that use anti-bot measures. You can add it to your configuration file like this: +We have added a new [`antibot_extractor_enricher`](../modules/autogen/extractor/antibot_extractor_enricher.md) module that uses a computer-controlled browser to extract content from websites that use anti-bot measures. You can add it to your configuration file like this: ```{code} yaml steps: - extractors: - - antibot_extractor_enricher + extractors: + - antibot_extractor_enricher - # or alternatively, if you want to use it as an enricher: - enrichers: - - antibot_extractor_enricher + # or alternatively, if you want to use it as an enricher: + enrichers: + - antibot_extractor_enricher ``` +It will take a full page screenshot, a PDF capture, extract HTML source code, and any other relevant media. + It comes with Dropins that we will be adding and maintaining. > Dropin: A module with site-specific behaviours that is loaded automatically. You don't need to add them to your configuration steps for them to run. Sometimes they need `authentication` configurations though. @@ -36,9 +46,9 @@ One such Dropin is the VkDropin which uses this automated browser to access VKon ```{code} yaml authentication: - vk: - username: your_username - password: your_password + vk.com: + username: your_username + password: your_password ``` See all available Dropins in [the source code](https://github.com/bellingcat/auto-archiver/tree/main/src/auto_archiver/modules/antibot_extractor_enricher/dropins). Usually each Dropin needs its own authentication settings, similarly to the VkDropin. \ No newline at end of file diff --git a/docs/source/how_to/new_config_format.md b/docs/source/how_to/06_new_config_format.md similarity index 100% rename from docs/source/how_to/new_config_format.md rename to docs/source/how_to/06_new_config_format.md diff --git a/src/auto_archiver/modules/generic_extractor/twitter.py b/src/auto_archiver/modules/generic_extractor/twitter.py index 189a7e6..9006e57 100644 --- a/src/auto_archiver/modules/generic_extractor/twitter.py +++ b/src/auto_archiver/modules/generic_extractor/twitter.py @@ -7,8 +7,7 @@ from slugify import slugify from auto_archiver.core.metadata import Metadata, Media from auto_archiver.utils import url as UrlUtil, get_datetime_from_str from auto_archiver.core.extractor import Extractor - -from .dropin import GenericDropin, InfoExtractor +from auto_archiver.modules.generic_extractor.dropin import GenericDropin, InfoExtractor class Twitter(GenericDropin):