From 42162c5e3f74a381cba45a15d37f60591c74ebf1 Mon Sep 17 00:00:00 2001 From: Patrick Robertson Date: Mon, 17 Mar 2025 09:23:43 +0000 Subject: [PATCH] Various docs improvements based on Friday Office Hours discussion --- docs/source/installation/faq.md | 60 +++++++++++++++++++ docs/source/installation/installation.md | 6 ++ docs/source/installation/setup.md | 2 +- docs/source/installation/upgrading.md | 30 ++++++++++ .../modules/generic_extractor/__manifest__.py | 3 + 5 files changed, 100 insertions(+), 1 deletion(-) create mode 100644 docs/source/installation/faq.md create mode 100644 docs/source/installation/upgrading.md diff --git a/docs/source/installation/faq.md b/docs/source/installation/faq.md new file mode 100644 index 0000000..bc6b38f --- /dev/null +++ b/docs/source/installation/faq.md @@ -0,0 +1,60 @@ +# Frequently Asked Questions + + +### Q: What websites does the Auto Archiver support? +**A:** The Auto Archiver works for a large variety of sites. Firstly, the Auto Archiver can download +and archive any video website supported by YT-DLP, a powerful video-downloading tool ([full list of of +sites here](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md)). Aside from these sites, +there are various different 'Extractors' for specific websites. See the full list of extractors that +are available on the [extractors](../modules/extractor.md) page. Some sites supported include: + +* Twitter +* Instagram +* Telegram +* VKontact +* Tiktok +* Bluesky + +```{note} What websites the Auto Archiver can archie depends on what extractors you have enabled in +your configuration. See [configuration](./configurations.md) for more info. +``` + +### Q: Does the Auto Archiver only work for social media posts ? +**A:** No, the Auto Archiver can archive any web page on the internet, not just social media posts. +However, for social media posts Auto Archiver can extract more relevant/useful information (such as +post comments, likes, author etc.) which may not be available for a generic website. If you are looking +to more generally archive webpages, then you should make sure to enable the [](../modules/autogen/extractor/wacz_extractor_enricher.md) +and the [](../modules/autogen/extractor/wayback_extractor_enricher.md). + +### Q: What kind of data is stored for each webpage that's archived? +**A:** This depends on the website archived, but more generally, for social media posts any videos and photos in +the post will be archived. For video sites, the video will be downloaded separately. For most of these sites, additional +metadata such as published date, uploader/author and ratings/comments will also be saved. Additionally, further data can be +saved depending on the enrichers that you have enabled. Some other types of data saved are timestamps if you have the +[](../modules/autogen/enricher/timestamping_enricher.md) or [](../modules/autogen/enricher/opentimestamps_enricher.md) enabled, +screenshots of the web page with the [](../modules/autogen/enricher/screenshot_enricher.md), and for videos, thumbnails of the +video with the [](../modules/autogen/enricher/thumbnail_enricher.md). You can also store things like hashes (SHA256, or pdq hashes) +with the various hash enrichers. + +### Q: Where is my data stored? +**A:** With the default configuration, data is stored on your local computer in the `local_storage` folder. You can adjust these settings by +changing the [storage modules](../modules/storage.md) you have enabled. For example, you could choose to store your data in an S3 bucket or +on Google Drive. + +```{note} +You can choose to store your data in multiple places, for example your local drive **and** an S3 bucket for redundancy. +``` + +### Q: What should I do is something doesn't work? +**A:** First, read through the log files to see if you can find a specific reason why something isn't working. Learn more about logging +and how to enable debug logging in the [Logging Howto](../how_to/logging.md). + +If you cannot find an answer in the logs, then try searching this documentation or existing / closed issues on the [Github Issue Tracker](https://github.com/bellingcat/auto-archiver/issues?q=is%3Aissue%20). If you still cannot find an answer, then consider opening an issue on the Github Issue Tracker or asking in the Bellingcat Discord +'Auto Archiver' group. + +#### Common reasons why an archiving might not work: + +* The website may have temporarily adjusted its settings - sometimes sites like Telegram or Twitter adjust their scraping protection settings. Often, +waiting a day or two and then trying again can work. +* The site requires you to be logged in - make sure the +* The website you're trying to archive has changed its settings/structure. Make sure you're using the latest version of Auto Archiver and try again. diff --git a/docs/source/installation/installation.md b/docs/source/installation/installation.md index eff0720..40b21cf 100644 --- a/docs/source/installation/installation.md +++ b/docs/source/installation/installation.md @@ -1,5 +1,11 @@ # Installation +```{toctree} +:maxdepth: 1 + +upgrading.md +``` + There are 3 main ways to use the auto-archiver. We recommend the 'docker' method for most uses. This installs all the requirements in one command. 1. Easiest (recommended): [via docker](#installing-with-docker) diff --git a/docs/source/installation/setup.md b/docs/source/installation/setup.md index f5b6e9d..0911b75 100644 --- a/docs/source/installation/setup.md +++ b/docs/source/installation/setup.md @@ -1,7 +1,6 @@ # Getting Started ```{toctree} -:maxdepth: 1 :hidden: installation.md @@ -9,6 +8,7 @@ configurations.md config_editor.md authentication.md requirements.md +faq.md config_cheatsheet.md ``` diff --git a/docs/source/installation/upgrading.md b/docs/source/installation/upgrading.md new file mode 100644 index 0000000..3c77dd8 --- /dev/null +++ b/docs/source/installation/upgrading.md @@ -0,0 +1,30 @@ + +# Upgrading + +If an update is available, then you will see a message in the logs when you +run Auto Archiver. Here's what those logs look like: + +```{code} bash +********* IMPORTANT: UPDATE AVAILABLE ******** +A new version of auto-archiver is available (v0.13.6, you have 0.13.4) +Make sure to update to the latest version using: `pip install --upgrade auto-archiver` +``` + +Upgrading Auto Archiver depends on the way you installed it. + +## Docker + +To upgrade using docker, update the docker image with: + +``` +docker pull bellingcat/auto-archiver:latest +``` + +## Pip + +To upgrade the pip package, use: + +``` +pip install --upgrade auto-archiver +``` + diff --git a/src/auto_archiver/modules/generic_extractor/__manifest__.py b/src/auto_archiver/modules/generic_extractor/__manifest__.py index 274a4ba..128b006 100644 --- a/src/auto_archiver/modules/generic_extractor/__manifest__.py +++ b/src/auto_archiver/modules/generic_extractor/__manifest__.py @@ -15,6 +15,9 @@ supported by `yt-dlp`, such as YouTube, Facebook, and others. It provides functi for retrieving videos, subtitles, comments, and other metadata, and it integrates with the broader archiving framework. +For a full list of video platforms supported by `yt-dlp`, see the +[official documentation](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md) + ### Features - Supports downloading videos and playlists. - Retrieves metadata like titles, descriptions, upload dates, and durations.