Commit Graph

94 Commits

Author SHA1 Message Date
Lilia Kai
e37b848ef5 Refactor auth methods
De duplicate some common codepaths. Also, for routes accepting basic
authentication, allow bearer auth as an alternative. This allows
clients to switch to bearer auth opportunistically, but we won't
have to coordinate deployments.

Basic auth should be deprecated since we don't really use a
user/password auth scheme.
2024-01-31 12:38:37 -10:00
Lilia Kai
5258da65a5 Instrument exceptions by type 2024-01-08 16:45:25 +01:00
msramalho
c058bfd067 fixes unique constraint issues for archives containing the same url in archive_urls 2023-12-20 18:38:28 +00:00
msramalho
cff6f713bd aa dep update 2023-12-20 14:28:24 +00:00
msramalho
23beab0eb8 logging correct emails in sheet_service endpoint 2023-12-17 23:55:26 +00:00
msramalho
3599ab2c19 dependency updates 2023-12-13 19:06:02 +00:00
msramalho
496a3651e5 detecting already inserted entries 2023-12-13 14:59:51 +00:00
msramalho
74f93ef856 catch cached inserts 2023-12-13 14:28:28 +00:00
msramalho
50417481f4 dep updated 2023-12-13 14:16:15 +00:00
msramalho
7dd0503d90 slight /metrics improvement 2023-12-13 13:46:53 +00:00
msramalho
48272cc8e9 dependencies update 2023-12-13 13:46:41 +00:00
msramalho
b92b8e3f8a auto-archiver dep update 2023-12-13 11:51:23 +00:00
msramalho
0e8864c68e updates auto-archiver 2023-12-13 10:38:38 +00:00
Miguel Sozinho Ramalho
1b7e6602db Merge pull request #33 from bellingcat/allow-query-before-archive 2023-12-13 10:30:36 +00:00
msramalho
99acfb113f most recent first 2023-12-12 22:43:31 +00:00
msramalho
3d4d7979a5 fixes data leak 2023-12-12 22:24:36 +00:00
msramalho
bb4ac31c12 version updated 2023-12-12 19:17:24 +00:00
msramalho
6874d123eb adds logic to test if archive is needed, if specified by the user 2023-12-12 19:14:10 +00:00
Lilia Kai
76c99af48b Remove static file endpoint 2023-12-11 13:43:44 +01:00
msramalho
3ab5477e6c removing tmp log 2023-10-25 15:01:51 +01:00
msramalho
5e0024c726 temp changes 2023-10-25 14:59:25 +01:00
msramalho
7ed54c18d7 fixing sql non-null constaint 2023-10-25 14:51:41 +01:00
msramalho
e3c128c4fd adds access control to new endpoint 2023-10-17 16:08:35 +01:00
Lilia Kai
d8bb637532 Add db task endpoint 2023-10-16 14:53:08 +02:00
Miguel Sozinho Ramalho
d99ddea9a9 Merge pull request #13 from bellingcat/get_status 2023-09-22 10:30:29 +01:00
msramalho
f017dbe1f2 quick fix author_id 2023-09-20 13:52:14 +01:00
msramalho
c6cd027e13 allows search to happen with API_TOKEN 2023-09-20 11:30:57 +01:00
Lilia Kai
f20dd05928 Refactor get_status and create_archive_task error handling
Raise exceptions instead of returning error messages from the worker in
create_arvive_task. This ensures consistency in how the errors are
presented on the task result: the Exception will be the result instead
of *maybe* being wrapped in an object like {error: Exception}.

This lets us simplify error handling in get_status so we have only one
try/except block where the error can be returned to the client.
2023-09-20 11:43:55 +02:00
Lilia Kai
00201770ba Create archive task returns dict instead of string
This will save the task result in redis as a json object instead of a
json-encoded string. This makes for a nicer response from get_status and
prevents the client having to parse a json string to work with the
result.
2023-09-20 11:43:55 +02:00
msramalho
f7160aad91 updating auto-archiver dependency 2023-09-20 10:24:24 +01:00
Lilia Kai
1b39f2c291 Rename variables in get_status
There are no logic changes in this commit, just renamed variables so
that fewer things are called "result" which seemed confusing.

Instead of result.result = task_result.result,
we can say response.result = task.result
2023-09-20 11:01:00 +02:00
msramalho
ceb5c9764d updates aa to latest version 2023-09-15 20:20:32 +01:00
msramalho
fc01ba1194 updates auto-archiver dependency 2023-09-15 01:04:52 +01:00
Lilia Kai
8e4801f3d3 Run browsertrix in docker on the host
Install docker in the container

Add a named volume called `browsertrix`

Mount the named volume in the worker at /crawls

Expose the host docker socket

Override the environment variable from auto-archiver's Dockerfile so
that it will call docker.

This will require setting new configs in orchestration.yaml:

 wacz_archiver_enricher:
  browsertrix_home: auto-archiver-api_browsertrix
  wacz_collections: /crawls
2023-09-12 20:37:25 +02:00
Lilia Kai
91762f58b7 Add option to serve local archive files
Set an environment variable in the docker compose file, then reference
that variable in main.py to mount the local archive so that the links
generated by auto-archiver will work correctly. Fixes #8
2023-09-05 16:10:37 +02:00
Lilia Kai
9b622d1393 Update src/.example.env
Removes some configs that are no longer used and adds some that are.
2023-09-04 19:29:49 +02:00
Lilia Kai
3b46554aa1 Fix get_user_first_group for user with no groups
If the email is defined in user-groups.yaml but has no groups, groups is
assigned None and len(groups) throws an exception.

Intuitively, one would expect groups to default to [] rather than None
because [] is passed as the second argument to Dictionary.get, but this
default only applies if the key is not found in the dictionary. In this
case the key is defined but has a value of None.
2023-08-31 20:56:48 +02:00
msramalho
ce1599b160 wacz working in docker 2023-08-24 17:44:37 +01:00
msramalho
03164b9ede version updates 2023-08-18 21:33:45 +01:00
msramalho
75b42c0f33 filter by date archived before/after 2023-08-18 16:15:06 +01:00
msramalho
55dc977bfa removing duplicate env var from dockerfile 2023-07-28 16:53:34 +01:00
msramalho
4741638c33 wacz working withing docker 2023-07-28 16:01:45 +01:00
msramalho
c1d76fae81 missing reqs 2023-07-28 14:46:41 +01:00
msramalho
ee2db3f950 archiver-api updates 2023-07-28 13:55:35 +01:00
msramalho
6b9f5149e8 ensuring email is lowercase 2023-07-24 16:23:38 +01:00
msramalho
8c6ff8cb91 version bump 2023-07-11 15:44:13 +01:00
msramalho
344cc8d2bd fix: group permissions 2023-07-11 15:42:44 +01:00
msramalho
409eb07b44 fix: update aa version 2023-07-11 12:32:37 +01:00
msramalho
fafe821432 pulling twitter scraper fix 2023-07-02 19:00:24 +02:00
msramalho
707b19b4fa feat: email domain-level access 2023-06-27 14:50:13 +01:00