Files
auto-archiver/docs/source/how_to/authentication.md
Patrick Robertson 3c543a3a6a Various fixes for issues with new architecture (#208)
* Add formatters to the TOC - fixes #204

* Add 'steps' settings to the example YAML in the docs. Fixes #206

* Improved docs on authentication architecture

* Fix setting modules on the command line - they now override any module settings in the orchestration as opposed to appending

* Fix tests for gsheet-feeder: add a test service_account.json (note: not real keys in there)

* Rename the command line entrypoint to _command_line_run

Also: make it clear that code implementation should not call this
Make sure the command line entry returns (we don't want a generator)

* Fix unit tests to use now code-entry points

* Version bump

* Move iterating of generator up to __main__

* Breakpoint

* two minor fixes

* Fix unit tests + add new '__main__' entry point implementation test

* Skip youtube tests if running on CI. Should still run them locally

* Fix full implementation run on GH actions

* Fix skipif test for GH Actions CI

* Add skipifs for truth - it blocks GH:

---------

Co-authored-by: msramalho <19508417+msramalho@users.noreply.github.com>
2025-02-18 19:10:09 +00:00

3.2 KiB

Authentication

The Authentication framework for auto-archiver allows you to add login details for various websites in a flexible way, directly from the configuration file.

There are two main use cases for authentication:

  • Some websites require some kind of authentication in order to view the content. Examples include Facebook, Telegram etc.
  • Some websites use anti-bot systems to block bot-like tools from accessig the website. Adding real login information to auto-archiver can sometimes bypass this.

The Authentication Config

You can save your authentication information directly inside your orchestration config file, or as a separate file (for security/multi-deploy purposes). Whether storing your settings inside the orchestration file, or as a separate file, the configuration format is the same.

authentication:
   # optional file to load authentication information from, for security or multi-system deploy purposes
   load_from_file: path/to/authentication/file.txt
   # optional setting to load cookies from the named browser on the system.
   cookies_from_browser: firefox
   # optional setting to load cookies from a cookies.txt/cookies.jar file. See note below on extracting these
   cookies_file: path/to/cookies.jar

   twitter.com,x.com:
      username: myusername
      password: 123
    
    facebook.com:
       cookie: single_cookie

    othersite.com:
       api_key: 123
       api_secret: 1234

# All available options:
  # - username: str - the username to use for login
  # - password: str - the password to use for login
  # - api_key: str - the API key to use for login
  # - api_secret: str - the API secret to use for login
  # - cookie: str - a cookie string to use for login (specific to this site)

Recommendations for authentication

  1. Store authentication information separately: The authentication part of your configuration contains sensitive information. You should make efforts not to share this with others. For extra security, use the load_from_file option to keep your authentication settings out of your configuration file, ideally in a different folder.

  2. Don't use your own personal credentials Depending on the website you are extracting information from, there may be rules (Terms of Service) that prohibit you from scraping or extracting information using a bot. If you use your own personal account, there's a possibility it might get blocked/disabled. It's recommended to set up a separate, 'throwaway' account. In that way, if it gets blocked you can easily create another one to continue your archiving.

How to create a cookies.jar or pass cookies directly to auto-archiver

auto-archiver uses yt-dlp's powerful cookies features under the hood. For instructions on how to extract a cookies.jar (or cookies.txt) file directly from your browser, see the FAQ in the yt-dlp documentation


For information on how to access and use authentication settings from within your module, see the `{generic_extractor}` for an example, or view the [`auth_for_site()` function in BaseModule](../autoapi/core/base_module/index.rst)