mirror of
https://github.com/bellingcat/auto-archiver.git
synced 2026-06-12 13:18:28 +03:00
Various fixes for issues with new architecture (#208)
* Add formatters to the TOC - fixes #204 * Add 'steps' settings to the example YAML in the docs. Fixes #206 * Improved docs on authentication architecture * Fix setting modules on the command line - they now override any module settings in the orchestration as opposed to appending * Fix tests for gsheet-feeder: add a test service_account.json (note: not real keys in there) * Rename the command line entrypoint to _command_line_run Also: make it clear that code implementation should not call this Make sure the command line entry returns (we don't want a generator) * Fix unit tests to use now code-entry points * Version bump * Move iterating of generator up to __main__ * Breakpoint * two minor fixes * Fix unit tests + add new '__main__' entry point implementation test * Skip youtube tests if running on CI. Should still run them locally * Fix full implementation run on GH actions * Fix skipif test for GH Actions CI * Add skipifs for truth - it blocks GH: --------- Co-authored-by: msramalho <19508417+msramalho@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
6d43bc7d4d
commit
3c543a3a6a
@@ -63,12 +63,6 @@ class BaseModule(ABC):
|
||||
def config_setup(self, config: dict):
|
||||
|
||||
authentication = config.get('authentication', {})
|
||||
# extract out concatenated sites
|
||||
for key, val in copy(authentication).items():
|
||||
if "," in key:
|
||||
for site in key.split(","):
|
||||
authentication[site] = val
|
||||
del authentication[key]
|
||||
|
||||
# this is important. Each instance is given its own deepcopied config, so modules cannot
|
||||
# change values to affect other modules
|
||||
@@ -89,16 +83,21 @@ class BaseModule(ABC):
|
||||
Returns the authentication information for a given site. This is used to authenticate
|
||||
with a site before extracting data. The site should be the domain of the site, e.g. 'twitter.com'
|
||||
|
||||
extract_cookies: bool - whether or not to extract cookies from the given browser and return the
|
||||
cookie jar (disabling can speed up) processing if you don't actually need the cookies jar
|
||||
:param site: the domain of the site to get authentication information for
|
||||
:param extract_cookies: whether or not to extract cookies from the given browser/file and return the cookie jar (disabling can speed up processing if you don't actually need the cookies jar).
|
||||
|
||||
Currently, the dict can have keys of the following types:
|
||||
- username: str - the username to use for login
|
||||
- password: str - the password to use for login
|
||||
- api_key: str - the API key to use for login
|
||||
- api_secret: str - the API secret to use for login
|
||||
- cookie: str - a cookie string to use for login (specific to this site)
|
||||
- cookies_jar: YoutubeDLCookieJar | http.cookiejar.MozillaCookieJar - a cookie jar compatible with requests (e.g. `requests.get(cookies=cookie_jar)`)
|
||||
:returns: authdict dict of login information for the given site
|
||||
|
||||
**Global options:**\n
|
||||
* cookies_from_browser: str - the name of the browser to extract cookies from (e.g. 'chrome', 'firefox' - uses ytdlp under the hood to extract\n
|
||||
* cookies_file: str - the path to a cookies file to use for login\n
|
||||
|
||||
**Currently, the sites dict can have keys of the following types:**\n
|
||||
* username: str - the username to use for login\n
|
||||
* password: str - the password to use for login\n
|
||||
* api_key: str - the API key to use for login\n
|
||||
* api_secret: str - the API secret to use for login\n
|
||||
* cookie: str - a cookie string to use for login (specific to this site)\n
|
||||
"""
|
||||
# TODO: think about if/how we can deal with sites that have multiple domains (main one is x.com/twitter.com)
|
||||
# for now the user must enter them both, like "x.com,twitter.com" in their config. Maybe we just hard-code?
|
||||
|
||||
Reference in New Issue
Block a user