WIP: Docs tidyups+add howto on logging and authentication

(Authentication is WIP)
2026-06-11 20:58:29 +03:00 · 2025-02-19 10:29:05 +00:00
parent 3c543a3a6a
commit 222a94563f
15 changed files with 173 additions and 82 deletions
--- a/docs/source/installation/authentication.md
+++ b/docs/source/installation/authentication.md
@@ -0,0 +1,72 @@
+# Authentication
+
+The Authentication framework for auto-archiver allows you to add login details for various websites in a flexible way, directly from the configuration file.
+
+There are two main use cases for authentication:
+* Some websites require some kind of authentication in order to view the content. Examples include Facebook, Telegram etc.
+* Some websites use anti-bot systems to block bot-like tools from accessing the website. Adding real login information to auto-archiver can sometimes bypass this.
+
+## The Authentication Config
+
+You can save your authentication information directly inside your orchestration config file, or as a separate file (for security/multi-deploy purposes). Whether storing your settings inside the orchestration file, or as a separate file, the configuration format is the same. Currently, auto-archiver supports the following authentication types:
+
+**Username & Password:**
+- `username`: str - the username to use for login
+- `password`: str - the password to use for login
+
+**API**
+- `api_key`: str - the API key to use for login
+- `api_secret`: str - the API secret to use for login
+  
+**Cookies**
+- `cookie`: str - a cookie string to use for login (specific to this site)
+- `cookies_from_browser`: str - load cookies from this browser, for this site only.
+- `cookies_file`: str - load cookies from this file, for this site only.
+
+```{note} 
+
+The Username & Password, and API settings only work with the Generic Extractor. Other modules (like the screenshot enricher) can only use the `cookies` options. Furthermore, many sites can still detect bots and block username/password logins. Twitter/X and YouTube are two prominent ones that block username/password logging.
+
+One of the 'Cookies' options is recommended for the most robust archiving.
+```
+
+```{code} yaml
+authentication:
+   # optional file to load authentication information from, for security or multi-system deploy purposes
+   load_from_file: path/to/authentication/file.txt
+   # optional setting to load cookies from the named browser on the system, for **ALL** websites
+   cookies_from_browser: firefox
+   # optional setting to load cookies from a cookies.txt/cookies.jar file, for **ALL** websites. See note below on extracting these
+   cookies_file: path/to/cookies.jar
+
+   mysite.com:
+      username: myusername
+      password: 123
+    
+    facebook.com:
+       cookie: single_cookie
+
+    othersite.com:
+       api_key: 123
+       api_secret: 1234
+  
+```
+
+
+### Recommendations for authentication
+
+1. **Store authentication information separately:**
+The authentication part of your configuration contains sensitive information. You should make efforts not to share this with others. For extra security, use the `load_from_file` option to keep your authentication settings out of your configuration file, ideally in a different folder.
+
+2. **Don't use your own personal credentials**
+Depending on the website you are extracting information from, there may be rules (Terms of Service) that prohibit you from scraping or extracting information using a bot. If you use your own personal account, there's a possibility it might get blocked/disabled. It's recommended to set up a separate, 'throwaway' account. In that way, if it gets blocked you can easily create another one to continue your archiving.
+
+
+### How to create a cookies.jar or pass cookies directly to auto-archiver
+
+auto-archiver uses yt-dlp's powerful cookies features under the hood. For instructions on how to extract a cookies.jar (or cookies.txt) file directly from your browser, see the FAQ in the [yt-dlp documentation](https://github.com/yt-dlp/yt-dlp/wiki/FAQ#how-do-i-pass-cookies-to-yt-dlp)
+
+```{note} For developers:
+
+For information on how to access and use authentication settings from within your module, see the `{generic_extractor}` for an example, or view the [`auth_for_site()` function in BaseModule](../autoapi/core/base_module/index.rst)
+```
--- a/docs/source/installation/configurations.md
+++ b/docs/source/installation/configurations.md
@@ -23,7 +23,7 @@ A default `orchestration.yaml` will be created for you the first time you run au

 ## Configuring from the Command Line

-You can run auto-archiver directy from the command line, without the need for a configuration file, command line arguments are parsed using the format `module_name.config_value`. For example, a config value of `api_key` in the `instagram_extractor` module would be passed on the command line with the flag `--instagram_extractor.api_key=API_KEY`.
+You can run auto-archiver directly from the command line, without the need for a configuration file, command line arguments are parsed using the format `module_name.config_value`. For example, a config value of `api_key` in the `instagram_extractor` module would be passed on the command line with the flag `--instagram_extractor.api_key=API_KEY`.

 The command line arguments are useful for testing or editing config values and enabling/disabling modules on the fly. When you are happy with your settings, you can store them back in your configuration file by passing the `-s/--store` flag on the command line.

--- a/docs/source/installation/installation.md
+++ b/docs/source/installation/installation.md
@@ -5,6 +5,7 @@
 :hidden:

 configurations.md
+authentication.md
 config_cheatsheet.md
 ```