Bump the python group across 1 directory with 9 updates

Bumps the python group with 9 updates in the / directory: | Package | From | To | | --- | --- | --- | | [pytest-loguru](https://github.com/mcarans/pytest-loguru) | `0.4.0` | `0.4.1` | | [ruff](https://github.com/astral-sh/ruff) | `0.15.11` | `0.15.15` | | [sphinxcontrib-mermaid](https://github.com/mgaitan/sphinxcontrib-mermaid) | `1.2.3` | `2.0.2` | | [google-api-python-client](https://github.com/googleapis/google-api-python-client) | `2.194.0` | `2.197.0` | | [google-auth-httplib2](https://github.com/googleapis/google-cloud-python) | `0.3.1` | `0.4.0` | | [google-auth-oauthlib](https://github.com/googleapis/google-cloud-python) | `1.3.1` | `1.4.0` | | [boto3](https://github.com/boto/boto3) | `1.42.94` | `1.43.20` | | [rich-argparse](https://github.com/hamdanal/rich-argparse) | `1.7.2` | `1.8.0` | | [cryptography](https://github.com/pyca/cryptography) | `46.0.7` | `48.0.0` | Updates `pytest-loguru` from 0.4.0 to 0.4.1 - [Release notes](https://github.com/mcarans/pytest-loguru/releases) - [Commits](https://github.com/mcarans/pytest-loguru/compare/0.4.0...0.4.1) Updates `ruff` from 0.15.11 to 0.15.15 - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](https://github.com/astral-sh/ruff/compare/0.15.11...0.15.15) Updates `sphinxcontrib-mermaid` from 1.2.3 to 2.0.2 - [Changelog](https://github.com/mgaitan/sphinxcontrib-mermaid/blob/master/CHANGELOG.md) - [Commits](https://github.com/mgaitan/sphinxcontrib-mermaid/compare/1.2.3...2.0.2) Updates `google-api-python-client` from 2.194.0 to 2.197.0 - [Release notes](https://github.com/googleapis/google-api-python-client/releases) - [Commits](https://github.com/googleapis/google-api-python-client/compare/v2.194.0...v2.197.0) Updates `google-auth-httplib2` from 0.3.1 to 0.4.0 - [Release notes](https://github.com/googleapis/google-cloud-python/releases) - [Changelog](https://github.com/googleapis/google-cloud-python/blob/main/CHANGELOG.md) - [Commits](https://github.com/googleapis/google-cloud-python/compare/google-auth-httplib2-v0.3.1...google-auth-httplib2-v0.4.0) Updates `google-auth-oauthlib` from 1.3.1 to 1.4.0 - [Release notes](https://github.com/googleapis/google-cloud-python/releases) - [Changelog](https://github.com/googleapis/google-cloud-python/blob/main/packages/gcp-sphinx-docfx-yaml/CHANGELOG.md) - [Commits](https://github.com/googleapis/google-cloud-python/compare/google-auth-oauthlib-v1.3.1...google-auth-oauthlib-v1.4.0) Updates `boto3` from 1.42.94 to 1.43.20 - [Release notes](https://github.com/boto/boto3/releases) - [Commits](https://github.com/boto/boto3/compare/1.42.94...1.43.20) Updates `rich-argparse` from 1.7.2 to 1.8.0 - [Release notes](https://github.com/hamdanal/rich-argparse/releases) - [Changelog](https://github.com/hamdanal/rich-argparse/blob/main/CHANGELOG.md) - [Commits](https://github.com/hamdanal/rich-argparse/compare/v1.7.2...v1.8.0) Updates `cryptography` from 46.0.7 to 48.0.0 - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pyca/cryptography/compare/46.0.7...48.0.0) --- updated-dependencies: - dependency-name: pytest-loguru dependency-version: 0.4.1 dependency-type: direct:development update-type: version-update:semver-patch dependency-group: python - dependency-name: ruff dependency-version: 0.15.15 dependency-type: direct:development update-type: version-update:semver-patch dependency-group: python - dependency-name: sphinxcontrib-mermaid dependency-version: 2.0.2 dependency-type: direct:development update-type: version-update:semver-major dependency-group: python - dependency-name: google-api-python-client dependency-version: 2.197.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python - dependency-name: google-auth-httplib2 dependency-version: 0.4.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python - dependency-name: google-auth-oauthlib dependency-version: 1.4.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python - dependency-name: boto3 dependency-version: 1.43.20 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python - dependency-name: rich-argparse dependency-version: 1.8.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python - dependency-name: cryptography dependency-version: 48.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: python ... Signed-off-by: dependabot[bot] <support@github.com>
Merge pull request #430 from bellingcat/dev
2026-06-08 11:28:28 +03:00 · 2026-06-03 06:40:09 +00:00 · 2026-04-27 15:52:39 +01:00 · 2026-04-27 12:35:54 +01:00 · 2026-04-27 12:34:47 +01:00 · 2026-04-24 11:08:28 +01:00
112 changed files with 5539 additions and 2523 deletions
--- a/.github/workflows/docker-publish.yaml
+++ b/.github/workflows/docker-publish.yaml
@@ -22,7 +22,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Check out the repo
-        uses: actions/checkout@v4
+        uses: actions/checkout@v6
        
      - name: Set up QEMU
        uses: docker/setup-qemu-action@v3
@@ -33,14 +33,14 @@ jobs:
        uses: docker/setup-buildx-action@v3

      - name: Log in to Docker Hub
-        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772
+        uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9
        with:
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_PASSWORD }}

      - name: Extract metadata (tags, labels) for Docker
        id: meta
-        uses: docker/metadata-action@902fa8ec7d6ecbf8d84d538b9b233a880e428804
+        uses: docker/metadata-action@c299e40c65443455700f0fdfc63efafe5b349051
        with:
          images: bellingcat/auto-archiver
      
--- a/.github/workflows/python-publish.yaml
+++ b/.github/workflows/python-publish.yaml
@@ -22,10 +22,10 @@ jobs:

    steps:
    - name: Checkout Repository
-      uses: actions/checkout@v4
+      uses: actions/checkout@v6

    - name: Set up Python
-      uses: actions/setup-python@v5
+      uses: actions/setup-python@v6
      with:
        python-version-file: pyproject.toml

--- a/.github/workflows/ruff.yaml
+++ b/.github/workflows/ruff.yaml
@@ -20,11 +20,11 @@ jobs:
  build:
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@v6
      - name: Install Python
-        uses: actions/setup-python@v5
+        uses: actions/setup-python@v6
        with:
-          python-version: "3.11"
+          python-version: "3.12"
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
--- a/.github/workflows/tests-core.yaml
+++ b/.github/workflows/tests-core.yaml
@@ -26,13 +26,13 @@ jobs:
        working-directory: ./

    steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@v6

      - name: Install ffmpeg
        run: sudo apt-get update && sudo apt-get install -y ffmpeg

      - name: Set up Python ${{ matrix.python-version }}
-        uses: actions/setup-python@v5
+        uses: actions/setup-python@v6
        with:
          python-version: ${{ matrix.python-version }}

@@ -40,7 +40,7 @@ jobs:
        run: pipx install poetry
      
      - name: Cache Poetry and pip artifacts
-        uses: actions/cache@v4
+        uses: actions/cache@v5
        with:
          path: |
            ~/.cache/pypoetry
--- a/.github/workflows/tests-download.yaml
+++ b/.github/workflows/tests-download.yaml
@@ -20,13 +20,13 @@ jobs:
        working-directory: ./

    steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@v6

      - name: Install ffmpeg
        run: sudo apt-get update && sudo apt-get install -y ffmpeg

      - name: Set up Python ${{ matrix.python-version }}
-        uses: actions/setup-python@v5
+        uses: actions/setup-python@v6
        with:
          python-version: ${{ matrix.python-version }}

@@ -34,7 +34,7 @@ jobs:
        run: pipx install poetry

      - name: Cache Poetry and pip artifacts
-        uses: actions/cache@v4
+        uses: actions/cache@v5
        with:
          path: |
            ~/.cache/pypoetry
@@ -47,4 +47,4 @@ jobs:
      - name: Run Download Tests
        run: poetry run pytest -ra -v -x -m "download"
        env:
-          TWITTER_BEARER_TOKEN: ${{ secrets.TWITTER_BEARER_TOKEN }}
+          TWITTER_BEARER_TOKEN: ${{ secrets.TWITTER_BEARER_TOKEN || '' }}
--- a/.readthedocs.yaml
+++ b/.readthedocs.yaml
@@ -7,6 +7,8 @@ version: 2

 build:
  os: ubuntu-22.04
+  apt_packages:
+    - ffmpeg
  tools:
    python: "3.10"
    nodejs: "22"
--- a/17
+++ b/17
@@ -1,18 +1,17 @@
-FROM webrecorder/browsertrix-crawler:1.6.3 AS base
+FROM webrecorder/browsertrix-crawler:1.12.4 AS base

 ENV RUNNING_IN_DOCKER=1 \
    LANG=C.UTF-8 \
    LC_ALL=C.UTF-8 \
    PYTHONDONTWRITEBYTECODE=1 \
-    PYTHONFAULTHANDLER=1 \
-    PATH="/root/.local/bin:$PATH"
+    PYTHONFAULTHANDLER=1


 ARG TARGETARCH

 # Installing system dependencies
 RUN	apt-get update && \
-    apt-get install -y --no-install-recommends gcc ffmpeg fonts-noto exiftool python3-tk 
+    apt-get install -y --no-install-recommends gcc ffmpeg fonts-noto exiftool python3-tk

 # Poetry and runtime
 FROM base AS runtime
@@ -41,11 +40,21 @@ COPY ./src/ .
 RUN /poetry-venv/bin/poetry install --only main --no-cache


+# Run as non-root user to avoid permission issues with mounted volumes (see #342)
+# The base image already has an 'ubuntu' user at UID/GID 1000.
+# Ensure directories that need write access at runtime are writable.
+RUN chown 1000:1000 /app && \
+    chown -R 1000:1000 /app/.venv/lib/python3.12/site-packages/seleniumbase/drivers/ && \
+    mkdir -p /app/local_archive /app/secrets /tmp/archive && \
+    chown -R 1000:1000 /app/local_archive /app/secrets /tmp/archive
+
 # Update PATH to include virtual environment binaries
 # Allowing entry point to run the application directly with Python
 ENV VIRTUAL_ENV=/app/.venv \
    PATH="/app/.venv/bin:$PATH"

+USER 1000
+
 ENTRYPOINT ["python3", "-m", "auto_archiver"]

 # should be executed with 2 volumes (3 if local_storage is used)
--- a/docker-compose.yaml
+++ b/docker-compose.yaml
@@ -6,6 +6,9 @@ services:
      context: .
      dockerfile: Dockerfile
    container_name: auto-archiver
+    # Override user to match host UID/GID and avoid permission issues on volumes.
+    # Set USER_ID and GROUP_ID env vars, or defaults to 1000:1000.
+    user: "${USER_ID:-1000}:${GROUP_ID:-1000}"
    volumes:
      - ./secrets:/app/secrets
      - ./local_archive:/app/local_archive
--- a/docs/source/development/developer_guidelines.md
+++ b/docs/source/development/developer_guidelines.md
@@ -21,7 +21,7 @@ This allows you to run the auto-archiver without the `poetry run` prefix.
 ### Optional Development Packages

 Install development packages (used for unit tests etc.) using:
-`poetry install -with dev`
+`poetry install --with dev`


 ```{toctree}
@@ -33,4 +33,4 @@ docs
 release
 settings_page
 style_guide
-```
+```
--- a/docs/source/development/style_guide.md
+++ b/docs/source/development/style_guide.md
@@ -50,7 +50,7 @@ Note not all warnings can be fixed automatically.

 Most fixes are safe, but some non-standard practices such as dynamic loading are not picked up by linters. Ensure you check any modifications by this before committing them.
 ```shell
-make ruff-fix
+make ruff-clean
 ```

 **Changing Configurations ⚙️**
@@ -67,4 +67,4 @@ One example is to extend the selected rules for linting the `pyproject.toml` fil
 extend-select = ["B"]
 ```

-Then re-run the `make ruff-check` command to see the new rules in action.
+Then re-run the `make ruff-check` command to see the new rules in action.
--- a/docs/source/development/testing.md
+++ b/docs/source/development/testing.md
@@ -8,7 +8,7 @@

 ## Running Tests 

-1. Make sure you've installed the dev dependencies with `pytest install --with dev`
+1. Make sure you've installed the dev dependencies with `poetry install --with dev`
 2. Tests can be run as follows:
 ```{code} bash
 #### Command prefix of 'poetry run' removed here for simplicity
@@ -26,7 +26,7 @@ pytest -ra -v tests/test_file.py
 pytest -ra -v tests/test_file.py::test_function_name
 ```

-3. Some tests require environment variables to be set. You can use the example `.env.test.example` file as a template. Copy it to `.env.test` and fill in the required values. This file will be loaded automatically by `pytest`.
+3. Some tests require environment variables to be set. You can use the example `tests/.env.test.example` file as a template. Copy it to `tests/.env.test` and fill in the required values. This file will be loaded automatically by `pytest`.
 ```{code} bash
-cp .env.test.example .env.test
-```
+cp tests/.env.test.example tests/.env.test
+```
--- a/docs/source/how_to/03_logging.md
+++ b/docs/source/how_to/03_logging.md
@@ -24,7 +24,7 @@ This will disable all logs from Auto Archiver, but it does not disable logs for

 #### Logging Level

-There are 7 logging levels in total, with 5 of them used in this tool. They are: `DEBUG`, `INFO`, `SUCCESS`, `WARNING` and `ERROR`.
+There are 7 logging levels in total, with 5 of them used in this tool. They are: `DEBUG`, `INFO`, `SUCCESS`, `WARNING` and `ERROR`. If you select a level, only that and higher (more serious) levels will be included. `DEBUG` is the most verbose, while `ERROR` is the least verbose. 

 Change the warning level by setting the value in your orchestration config file:

@@ -42,6 +42,20 @@ For normal usage, it is recommended to use the `INFO` level, or if you prefer qu
 ```{note} To learn about all logging levels, see the [loguru documentation](https://loguru.readthedocs.io/en/stable/api/logger.html)
 ```

+### Logging Format
+By default, the console logs are formatted in a human-readable way and the file logs are formatted in JSON. This is new from version 1.1.1. If you want to change the format of the console logs to JSON too you can set the `format:` option in your logging settings. 
+
+```{code} yaml
+:caption: orchestration.yaml
+
+logging:
+    format: json
+```
+
+When the Auto Archiver is writing logs it will include context about specific tasks, so if you are archiving a URL from a Google Sheet, both the URL (and a unique `trace_id` for that URL's archiving attempt) and the Spreadsheet name and row will be included in the logs. This is useful for debugging and understanding what the Auto Archiver is doing.
+
+Using JSON allows you to easily parse the logs and extract specific information, tools like [`jq`](https://jqlang.org/) can be used to filter and search through the logs.
+
 ### Logging to a file

 As default, auto-archiver will log to the console. But if you wish to store your logs for future reference, or you are running the auto-archiver from within code a implementation, then you may wish to enable file logging. This can be done by setting the `file:` config value in the logging settings.
@@ -84,6 +98,7 @@ The below example logs only `DEBUG` logs to the console and to the file `/my/fil

 logging:
    level: DEBUG
+    format: json
    file: /my/file.log
    rotation: 1 week
 ```
--- a/docs/source/modules/extractor.md
+++ b/docs/source/modules/extractor.md
@@ -4,8 +4,9 @@ Extractor modules are used to extract the content of a given URL. Typically, one

 Extractors that are able to extract content from a wide range of websites include:
 1. Generic Extractor: parses videos and images on sites using the powerful yt-dlp library.
-2. Wayback Machine Extractor: sends pages to the Wayback machine for archiving, and stores the link.
-3. WACZ Extractor: runs a web browser to 'browse' the URL and save a copy of the page in WACZ format. 
+2. Antibot Extractor: uses a headless browser to bypass bot detection and extract content.
+3. WACZ Extractor: runs a web browser to 'browse' the URL and save a copy of the page in WACZ format.
+4. Wayback Machine Extractor: sends pages to the Wayback machine for archiving, and stores the archived link.

 ```{include} autogen/extractor.md
 ```
--- a/poetry.lock
+++ b/poetry.lock
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"

 [project]
 name = "auto-archiver"
-version = "1.1.0"
+version = "1.2.7"
 description = "Automatically archive links to videos, images, and social media content from Google Sheets (and more)."

 requires-python = ">=3.10,<3.13"
@@ -28,9 +28,9 @@ dependencies = [
    "loguru (>=0.0.0)",
    "ffmpeg-python (>=0.0.0)",
    "telethon (>=0.0.0)",
-    "google-api-python-client (>=0.0.0)",
-    "google-auth-httplib2 (>=0.0.0)",
-    "google-auth-oauthlib (>=0.0.0)",
+    "google-api-python-client (>=2.197.0)",
+    "google-auth-httplib2 (>=0.4.0)",
+    "google-auth-oauthlib (>=1.4.0)",
    "oauth2client (>=0.0.0)",
    "pdqhash (>=0.0.0)",
    "pillow (>=0.0.0)",
@@ -40,7 +40,7 @@ dependencies = [
    "instaloader (>=0.0.0)",
    "tqdm (>=0.0.0)",
    "jinja2 (>=0.0.0)",
-    "boto3 (>=1.28.0,<2.0.0)",
+    "boto3 (>=1.43.20,<2.0.0)",
    "dataclasses-json (>=0.0.0)",
    "numpy (==2.1.3)",
    "requests[socks] (>=0.0.0)",
@@ -48,30 +48,31 @@ dependencies = [
    "jsonlines (>=0.0.0)",
    "pysubs2 (>=0.0.0)",
    "retrying (>=0.0.0)",
-    "rich-argparse (>=1.6.0,<2.0.0)",
+    "rich-argparse (>=1.8.0,<2.0.0)",
    "ruamel-yaml (>=0.18.10,<0.19.0)",
-    "rfc3161-client (>=1.0.1,<2.0.0)",
-    "cryptography (>44.0.1,<45.0.0)",
+    "rfc3161-client (>=1.0.5)",
+    "cryptography (>=48.0.0)",
    "opentimestamps (>=0.4.5,<0.5.0)",
    "bgutil-ytdlp-pot-provider (>=1.0.0)",
-    "yt-dlp[curl-cffi,default] (>=2025.5.22,<2026.0.0)",
+    "yt-dlp[curl-cffi,default] (>=2025.5.22)",
    "secretstorage (>=3.3.3,<4.0.0)",
    "seleniumbase (>=4.36.4,<5.0.0)",
    "pyautogui (>=0.9.54,<0.10.0)",
+    "pyperclip (>=1.9.0)", 
 ]

 [tool.poetry.group.dev.dependencies]
 pytest = "^8.3.4"
 autopep8 = "^2.3.1"
-pytest-loguru = "^0.4.0"
+pytest-loguru = "^0.4.1"
 pytest-mock = "^3.14.0"
-ruff = "^0.9.10"
+ruff = "^0.15.15"
 pre-commit = "^4.1.0"

 [tool.poetry.group.docs.dependencies]
 sphinx = "^8.1.3"
 sphinx-autoapi = "^3.4.0"
-sphinxcontrib-mermaid = "^1.0.0"
+sphinxcontrib-mermaid = "^2.0.2"
 sphinx-autobuild = "^2024.10.3"
 sphinx-copybutton = "^0.5.2"
 myst-parser = "^4.0.0"
--- a/scripts/settings/package-lock.json
+++ b/scripts/settings/package-lock.json
--- a/scripts/settings/package.json
+++ b/scripts/settings/package.json
@@ -14,7 +14,7 @@
    "@emotion/react": "latest",
    "@emotion/styled": "latest",
    "@mui/icons-material": "^7.1.1",
-    "@mui/material": "latest",
+    "@mui/material": "^7.1.1",
    "react": "19.1.0",
    "react-dom": "19.1.0",
    "react-markdown": "^10.0.0",
--- a/scripts/settings/src/App.tsx
+++ b/scripts/settings/src/App.tsx
@@ -31,7 +31,7 @@ import {
  Stack,
  Button,
 } from '@mui/material';
-import Grid from '@mui/material/Grid2';
+import Grid from '@mui/material/Grid';

 import { parseDocument, Document, YAMLSeq, YAMLMap, Scalar } from 'yaml'
 import StepCard from './StepCard';
--- a/scripts/settings/src/StepCard.tsx
+++ b/scripts/settings/src/StepCard.tsx
@@ -25,7 +25,7 @@ import {
    Typography,
    InputAdornment,
 } from '@mui/material';
-import Grid from '@mui/material/Grid2';
+import Grid from '@mui/material/Grid';
 import DragIndicatorIcon from '@mui/icons-material/DragIndicator';
 import Visibility from '@mui/icons-material/Visibility';
 import VisibilityOff from '@mui/icons-material/VisibilityOff';
--- a/scripts/telegram_setup.py
+++ b/scripts/telegram_setup.py
@@ -14,7 +14,7 @@ You will need to provide your phone number and a 2FA code the first time you run

 import os
 from telethon.sync import TelegramClient
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger


 # Create a
@@ -24,4 +24,4 @@ SESSION_FILE = "secrets/anon-insta"

 os.makedirs("secrets", exist_ok=True)
 with TelegramClient(SESSION_FILE, API_ID, API_HASH) as client:
-    logger.success(f"New session file created: {SESSION_FILE}.session")
+    logger.success(f"new session file created: {SESSION_FILE}.session")
--- a/src/auto_archiver/core/base_module.py
+++ b/src/auto_archiver/core/base_module.py
@@ -7,7 +7,7 @@ from tempfile import TemporaryDirectory
 from auto_archiver.utils import url as UrlUtil
 from auto_archiver.core.consts import MODULE_TYPES as CONF_MODULE_TYPES

-from loguru import logger
+from auto_archiver.utils.custom_logger import logger

 if TYPE_CHECKING:
    from .module import ModuleFactory
--- a/src/auto_archiver/core/config.py
+++ b/src/auto_archiver/core/config.py
@@ -10,7 +10,7 @@ from ruamel.yaml import YAML, CommentedMap
 import json
 import os

-from loguru import logger
+from auto_archiver.utils.custom_logger import logger

 from copy import deepcopy
 from auto_archiver.core.consts import MODULE_TYPES
@@ -118,8 +118,7 @@ class DefaultValidatingParser(argparse.ArgumentParser):
        """
        Override of error to format a nicer looking error message using logger
        """
-        logger.error("Problem with configuration file (tip: use --help to see the available options):")
-        logger.error(message)
+        logger.error(f"Problem with configuration file (tip: use --help to see the available options): \n{message}")
        self.exit(2)

    def parse_known_args(self, args=None, namespace=None):
@@ -136,8 +135,7 @@ class DefaultValidatingParser(argparse.ArgumentParser):
                    try:
                        self._check_value(action, action.default)
                    except argparse.ArgumentError as e:
-                        logger.error(f"You have an invalid setting in your configuration file ({action.dest}):")
-                        logger.error(e)
+                        logger.error(f"You have an invalid setting in your configuration file ({action.dest}):\n {e}")
                        exit()

        return super().parse_known_args(args, namespace)
--- a/src/auto_archiver/core/extractor.py
+++ b/src/auto_archiver/core/extractor.py
@@ -12,7 +12,7 @@ from contextlib import suppress
 import mimetypes
 import os
 import requests
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger
 from retrying import retry
 import re

@@ -94,7 +94,7 @@ class Extractor(BaseModule):
                to_filename = to_filename[-64:]
        to_filename = os.path.join(self.tmp_dir, to_filename)
        if verbose:
-            logger.debug(f"downloading {url[0:50]=} {to_filename=}")
+            logger.debug(f"Downloading {to_filename=}")
        headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36"
        }
@@ -117,7 +117,7 @@ class Extractor(BaseModule):
            return to_filename

        except requests.RequestException as e:
-            logger.warning(f"Failed to fetch the Media URL: {str(e)[:250]}")
+            logger.warning(f"Failed to fetch the Media URL: {e}")
        if try_best_quality:
            return None, url

--- a/src/auto_archiver/core/media.py
+++ b/src/auto_archiver/core/media.py
@@ -11,7 +11,7 @@ from dataclasses import dataclass, field
 from dataclasses_json import dataclass_json, config
 import mimetypes

-from loguru import logger
+from auto_archiver.utils.custom_logger import logger


@dataclass_json  # annotation order matters
@@ -86,7 +86,7 @@ class Media:
    @property  # getter .mimetype
    def mimetype(self) -> str:
        if not self.filename or len(self.filename) == 0:
-            logger.warning(f"cannot get mimetype from media without filename: {self}")
+            logger.warning(f"Cannot get mimetype from media without filename: {self}")
            return ""
        if not self._mimetype:
            self._mimetype = mimetypes.guess_type(self.filename)[0]
@@ -116,13 +116,12 @@ class Media:
        # self.is_video() should be used together with this method
        try:
            streams = ffmpeg.probe(self.filename, select_streams="v")["streams"]
-            logger.debug(f"STREAMS FOR {self.filename} {streams}")
+            logger.debug(f"Streams for {self.filename}: {streams}")
            return any(s.get("duration_ts", 0) > 0 for s in streams)
        except Error:
            return False  # ffmpeg errors when reading bad files
        except Exception as e:
-            logger.error(e)
-            logger.error(traceback.format_exc())
+            logger.error(f"{e}: {traceback.format_exc()}")
            try:
                fsize = os.path.getsize(self.filename)
                return fsize > 20_000
--- a/src/auto_archiver/core/metadata.py
+++ b/src/auto_archiver/core/metadata.py
@@ -11,13 +11,14 @@ Key Functionalities:

 from __future__ import annotations
 import hashlib
+import os
 from typing import Any, List, Union, Dict
 from dataclasses import dataclass, field
 from dataclasses_json import dataclass_json
 import datetime
 from urllib.parse import urlparse
 from dateutil.parser import parse as parse_dt
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger

 from .media import Media

@@ -181,8 +182,14 @@ class Metadata:
        media_hashes = set()
        new_media = []
        for m in self.media:
+            if not m.filename:
+                new_media.append(m)
+                continue
            h = m.get("hash")
            if not h:
+                if not os.path.exists(m.filename):
+                    logger.warning(f"Skipping missing media file: {m.filename}")
+                    continue
                h = calculate_hash_in_chunks(hashlib.sha256(), int(1.6e7), m.filename)
            if len(h) and h in media_hashes:
                continue
--- a/src/auto_archiver/core/module.py
+++ b/src/auto_archiver/core/module.py
@@ -16,7 +16,7 @@ import sys
 from importlib.util import find_spec
 import os
 from os.path import join
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger
 import auto_archiver
 from auto_archiver.core.consts import DEFAULT_MANIFEST, MANIFEST_FILE, SetupError

--- a/src/auto_archiver/core/orchestrator.py
+++ b/src/auto_archiver/core/orchestrator.py
@@ -15,9 +15,11 @@ import traceback
 from copy import copy

 from rich_argparse import RichHelpFormatter
-from loguru import logger
+from auto_archiver.utils.custom_logger import format_for_human_readable_console, logger
 import requests

+from auto_archiver.utils.misc import random_str
+
 from .metadata import Metadata, Media
 from auto_archiver.version import __version__
 from .config import (
@@ -342,7 +344,14 @@ Here's how that would look: \n\nsteps:\n  extractors:\n  - [your_extractor_name_
        # add other logging info
        if self.logger_id is None:  # note - need direct comparison to None since need to consider falsy value 0
            use_level = logging_config["level"]
-            self.logger_id = logger.add(sys.stderr, level=use_level)
+            self.logger_id = logger.add(
+                sys.stderr,
+                level=use_level,
+                catch=True,
+                format="<level>{extra[serialized]}</level>"
+                if logging_config.get("format", "").lower() == "json"
+                else format_for_human_readable_console(),
+            )

            rotation = logging_config["rotation"]
            log_file = logging_config["file"]
@@ -356,9 +365,10 @@ Here's how that would look: \n\nsteps:\n  extractors:\n  - [your_extractor_name_
                        f"{log_file}.{i}_{level.lower()}",
                        filter=lambda rec, lvl=level: rec["level"].name == lvl,
                        rotation=rotation,
+                        format="{extra[serialized]}",
                    )
            elif log_file:
-                logger.add(log_file, rotation=rotation, level=use_level)
+                logger.add(log_file, rotation=rotation, level=use_level, format="{extra[serialized]}")

    def install_modules(self, modules_by_type):
        """
@@ -457,7 +467,11 @@ Here's how that would look: \n\nsteps:\n  extractors:\n  - [your_extractor_name_
        return self.setup_complete_parser(basic_config, yaml_config, unused_args)

    def check_for_updates(self):
-        response = requests.get("https://pypi.org/pypi/auto-archiver/json").json()
+        try:
+            response = requests.get("https://pypi.org/pypi/auto-archiver/json", timeout=10).json()
+        except Exception as e:
+            logger.debug(f"Unable to check for updates: {e}")
+            return
        latest_version = version.parse(response["info"]["version"])
        current_version = version.parse(__version__)
        # check version compared to current version
@@ -466,13 +480,9 @@ Here's how that would look: \n\nsteps:\n  extractors:\n  - [your_extractor_name_
                update_cmd = "`docker pull bellingcat/auto-archiver:latest`"
            else:
                update_cmd = "`pip install --upgrade auto-archiver`"
-            logger.warning("")
-            logger.warning("********* IMPORTANT: UPDATE AVAILABLE ********")
            logger.warning(
-                f"A new version of auto-archiver is available (v{latest_version}, you have v{current_version})"
+                f"\n********* IMPORTANT: UPDATE AVAILABLE ********\nA new version of auto-archiver is available (v{latest_version}, you have v{current_version})\nMake sure to update to the latest version using: {update_cmd}\n"
            )
-            logger.warning(f"Make sure to update to the latest version using: {update_cmd}")
-            logger.warning("")

    def setup(self, args: list):
        """
@@ -522,7 +532,7 @@ Here's how that would look: \n\nsteps:\n  extractors:\n  - [your_extractor_name_
            self.setup(args)
            return self.feed()
        except Exception as e:
-            logger.error(e)
+            logger.error(f"{e}: {traceback.format_exc()}")
            exit(1)

    def cleanup(self) -> None:
@@ -534,8 +544,10 @@ Here's how that would look: \n\nsteps:\n  extractors:\n  - [your_extractor_name_
        url_count = 0
        for feeder in self.feeders:
            for item in feeder:
-                yield self.feed_item(item)
-                url_count += 1
+                with logger.contextualize(url=item.get_url(), trace=random_str(12)):
+                    logger.info("Started processing")
+                    yield self.feed_item(item)
+                    url_count += 1

        logger.info(f"Processed {url_count} URL(s)")
        self.cleanup()
@@ -555,13 +567,13 @@ Here's how that would look: \n\nsteps:\n  extractors:\n  - [your_extractor_name_
            return self.archive(item)
        except KeyboardInterrupt:
            # catches keyboard interruptions to do a clean exit
-            logger.warning(f"caught interrupt on {item=}")
+            logger.warning("Caught interrupt")
            for d in self.databases:
                d.aborted(item)
            self.cleanup()
            exit()
        except Exception as e:
-            logger.error(f"Got unexpected error on item {item}: {e}\n{traceback.format_exc()}")
+            logger.error(f"Got unexpected error: {e}\n{traceback.format_exc()}")
            for d in self.databases:
                if isinstance(e, AssertionError):
                    d.failed(item, str(e))
@@ -589,7 +601,7 @@ Here's how that would look: \n\nsteps:\n  extractors:\n  - [your_extractor_name_
        try:
            check_url_or_raise(original_url)
        except ValueError as e:
-            logger.error(f"Error archiving URL {original_url}: {e}")
+            logger.error(f"Error archiving: {e}")
            raise e

        # 1 - sanitize - each archiver is responsible for cleaning/expanding its own URLs
@@ -599,7 +611,7 @@ Here's how that would look: \n\nsteps:\n  extractors:\n  - [your_extractor_name_

        result.set_url(url)
        if original_url != url:
-            logger.debug(f"Sanitized URL from {original_url} to {url}")
+            logger.debug(f"Sanitized URL to {url}")
            result.set("original_url", original_url)

        # 2 - notify start to DBs, propagate already archived if feature enabled in DBs
@@ -614,25 +626,25 @@ Here's how that would look: \n\nsteps:\n  extractors:\n  - [your_extractor_name_
                try:
                    d.done(cached_result, cached=True)
                except Exception as e:
-                    logger.error(f"ERROR database {d.name}: {e}: {traceback.format_exc()}")
+                    logger.error(f"Database {d.name}: {e}: {traceback.format_exc()}")
            return cached_result

        # 3 - call extractors until one succeeds
        for a in self.extractors:
-            logger.info(f"Trying extractor {a.name} for {url}")
+            logger.info(f"Trying extractor {a.name}")
            try:
                result.merge(a.download(result))
                if result.is_success():
                    break
            except Exception as e:
-                logger.error(f"ERROR archiver {a.name}: {e}: {traceback.format_exc()}")
+                logger.error(f"Extractor {a.name}: {e}: {traceback.format_exc()}")

        # 4 - call enrichers to work with archived content
        for e in self.enrichers:
            try:
                e.enrich(result)
            except Exception as exc:
-                logger.error(f"ERROR enricher {e.name}: {exc}: {traceback.format_exc()}")
+                logger.error(f"Enricher {e.name}: {exc}: {traceback.format_exc()}")

        # 5 - store all downloaded/generated media
        result.store(storages=self.storages)
@@ -651,7 +663,7 @@ Here's how that would look: \n\nsteps:\n  extractors:\n  - [your_extractor_name_
            try:
                d.done(result)
            except Exception as e:
-                logger.error(f"ERROR database {d.name}: {e}: {traceback.format_exc()}")
+                logger.error(f"Database {d.name}: {e}: {traceback.format_exc()}")

        return result

--- a/src/auto_archiver/core/storage.py
+++ b/src/auto_archiver/core/storage.py
@@ -24,7 +24,7 @@ from abc import abstractmethod
 from typing import IO
 import os

-from loguru import logger
+from auto_archiver.utils.custom_logger import logger
 from slugify import slugify

 from auto_archiver.utils.misc import random_str
--- a/src/auto_archiver/modules/antibot_extractor_enricher/antibot_extractor_enricher.py
+++ b/src/auto_archiver/modules/antibot_extractor_enricher/antibot_extractor_enricher.py
@@ -7,7 +7,7 @@ from urllib.parse import urljoin
 import glob
 import importlib.util

-from loguru import logger
+from auto_archiver.utils.custom_logger import logger
 import selenium
 from seleniumbase import SB

@@ -16,6 +16,7 @@ from auto_archiver.modules.antibot_extractor_enricher.dropin import Dropin
 from auto_archiver.modules.antibot_extractor_enricher.dropins.default import DefaultDropin
 from auto_archiver.utils.misc import random_str
 from auto_archiver.utils.url import is_relevant_url
+from auto_archiver.utils.deletion_detection import detect_deletion, flag_as_deleted


 class AntibotExtractorEnricher(Extractor, Enricher):
@@ -57,7 +58,7 @@ class AntibotExtractorEnricher(Extractor, Enricher):
                    continue  # Skip imported modules/classes/functions
                if isinstance(obj, type) and issubclass(obj, Dropin):
                    dropins.append(obj)
-        logger.debug(f"ANTIBOT loaded drop-in classes: {', '.join([d.__name__ for d in dropins])}")
+        logger.debug(f"Loaded drop-in classes: {', '.join([d.__name__ for d in dropins])}")
        return dropins

    def sanitize_url(self, url: str) -> str:
@@ -72,6 +73,7 @@ class AntibotExtractorEnricher(Extractor, Enricher):
        if self.enrich(result):
            result.status = "antibot"
            return result
+        return False

    def _prepare_user_data_dir(self):
        if self.user_data_dir:
@@ -81,30 +83,59 @@ class AntibotExtractorEnricher(Extractor, Enricher):
            os.makedirs(self.user_data_dir, exist_ok=True)

    def enrich(self, to_enrich: Metadata, custom_data_dir: bool = True) -> bool:
+        if to_enrich.get_media_by_id("html_source_code"):
+            logger.info("Antibot has already been executed, skipping.")
+            return True
        using_user_data_dir = self.user_data_dir if custom_data_dir else None
        url = to_enrich.get_url()
-        url_sample = url[:75]
+
+        # Use xvfb in Docker environments where no display is available
+        use_xvfb = bool(os.environ.get("RUNNING_IN_DOCKER"))

        try:
-            with SB(uc=True, agent=self.agent, headed=None, user_data_dir=using_user_data_dir, proxy=self.proxy) as sb:
-                logger.info(f"ANTIBOT selenium browser is up with agent {self.agent}, opening {url_sample}...")
+            with SB(
+                uc=True,
+                agent=self.agent,
+                headed=None,
+                user_data_dir=using_user_data_dir,
+                proxy=self.proxy,
+                xvfb=use_xvfb,
+            ) as sb:
+                logger.info(f"Selenium browser is up with agent {self.agent}, opening url...")
                sb.uc_open_with_reconnect(url, 4)

-                logger.debug(f"ANTIBOT handling CAPTCHAs for {url_sample}...")
+                logger.debug("Handling CAPTCHAs for...")
                sb.uc_gui_handle_cf()
                sb.uc_gui_click_rc()  # NB: using handle instead of click breaks some sites like reddit, for now we separate here but can have dropins deciding this in the future

                dropin = self._get_suitable_dropin(url, sb)
-                dropin.open_page(url)
+                if not dropin.open_page(url):
+                    # Check for deletion indicators
+                    page_title = sb.get_title()
+                    html_source = sb.get_page_source()
+                    deletion_info = detect_deletion(html_content=html_source, page_title=page_title, url=url)
+                    if deletion_info:
+                        flag_as_deleted(to_enrich, deletion_info)
+                        return to_enrich
+                    logger.warning("Failed to open drop-in page (not detected as deleted)")
+                    return False

-                if self.detect_auth_wall and self._hit_auth_wall(sb):
-                    logger.warning(f"ANTIBOT SKIP since auth wall or CAPTCHA was detected for {url_sample}")
+                if self.detect_auth_wall and (dropin.hit_auth_wall() and self._hit_auth_wall(sb)):
+                    logger.warning("Skipping since auth wall or CAPTCHA was detected")
                    return False

                sb.wait_for_ready_state_complete()
                sb.sleep(1)  # margin for the page to load completely

-                to_enrich.set_title(sb.get_title())
+                page_title = sb.get_title()
+                html_source = sb.get_page_source()
+
+                # Check if the page indicates content was deleted
+                deletion_info = detect_deletion(html_content=html_source, page_title=page_title, url=url)
+                if deletion_info:
+                    flag_as_deleted(to_enrich, deletion_info)
+
+                to_enrich.set_title(page_title)
                self._enrich_html_source_code(sb, to_enrich)

                self._enrich_full_page_screenshot(sb, to_enrich)
@@ -125,18 +156,18 @@ class AntibotExtractorEnricher(Extractor, Enricher):
                    js_css_selector=dropin.js_for_video_css_selectors(),
                    max_media=self.max_download_videos - downloaded_videos,
                )
-                logger.info(f"ANTIBOT completed for {url_sample}")
+                logger.info("Completed")

            return to_enrich
        except selenium.common.exceptions.SessionNotCreatedException as e:
            if custom_data_dir:  # the retry logic only works once
                logger.error(
-                    f"ANTIBOT session not created error: {e}. Please remove the user_data_dir {self.user_data_dir} and try again, will retry without user data dir though."
+                    f"Session not created error: {e}. Please remove the user_data_dir {self.user_data_dir} and try again, will retry without user data dir though."
                )
                return self.enrich(to_enrich, custom_data_dir=False)
            raise e  # re-raise
        except Exception as e:
-            logger.error(f"ANTIBOT runtime error: {e}: {traceback.format_exc()}")
+            logger.error(f"Runtime error: {e}: {traceback.format_exc()}")
            return False

    def _get_suitable_dropin(self, url: str, sb: SB):
@@ -146,7 +177,7 @@ class AntibotExtractorEnricher(Extractor, Enricher):
        """
        for dropin in self.dropins:
            if dropin.suitable(url):
-                logger.debug(f"ANTIBOT using drop-in {dropin.__name__} for {url}")
+                logger.debug(f"Using drop-in {dropin.__name__}")
                return dropin(sb, self)

        return DefaultDropin(sb, self)
@@ -275,8 +306,14 @@ class AntibotExtractorEnricher(Extractor, Enricher):
            return
        url = to_enrich.get_url()
        all_urls = set()
+        logger.debug(f"Extracting media for {js_css_selector=}")
+
+        try:
+            sources = sb.execute_script(js_css_selector)
+        except selenium.common.exceptions.JavascriptException as e:
+            logger.error(f"Error executing JavaScript selector {js_css_selector}: {e}")
+            return

-        sources = sb.execute_script(js_css_selector)
        # js_for_css_selectors
        for src in sources:
            if len(all_urls) >= max_media:
--- a/src/auto_archiver/modules/antibot_extractor_enricher/captcha_services/.gitignore
+++ b/src/auto_archiver/modules/antibot_extractor_enricher/captcha_services/.gitignore
@@ -0,0 +1 @@
+*.py
--- a/src/auto_archiver/modules/antibot_extractor_enricher/dropin.py
+++ b/src/auto_archiver/modules/antibot_extractor_enricher/dropin.py
@@ -1,6 +1,8 @@
+import json
 import os
+import traceback
 from typing import Mapping
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger
 from seleniumbase import SB
 import yt_dlp

@@ -73,8 +75,11 @@ class Dropin:

        You can overwrite this instead of `images_selector` for more control over scraped images.
        """
+        if not self.images_selectors():
+            return "return [];"
+        safe_selector = json.dumps(self.images_selectors())
        return f"""
-            return Array.from(document.querySelectorAll("{self.images_selectors()}")).map(el => el.src || el.href).filter(Boolean);
+            return Array.from(document.querySelectorAll({safe_selector})).map(el => el.src || el.href).filter(Boolean);
        """

    def js_for_video_css_selectors(self) -> str:
@@ -83,8 +88,11 @@ class Dropin:

        You can overwrite this instead of `video_selector` for more control over scraped videos.
        """
+        if not self.video_selectors():
+            return "return [];"
+        safe_selector = json.dumps(self.video_selectors())
        return f"""
-            return Array.from(document.querySelectorAll("{self.video_selectors()}")).map(el => el.src || el.href).filter(Boolean);
+            return Array.from(document.querySelectorAll({safe_selector})).map(el => el.src || el.href).filter(Boolean);
        """

    def open_page(self, url) -> bool:
@@ -102,6 +110,12 @@ class Dropin:
        """
        return 0, 0

+    def hit_auth_wall(self) -> bool:
+        """
+        Custom check to see if the current page is behind an authentication wall, if True is returned the default global auth wall detector is used instead. If false, no auth wall is detected and the page is considered open.
+        """
+        return True
+
    def _get_username_password(self, site) -> tuple[str, str]:
        """
        Get the username and password for the site from the extractor's auth data.
@@ -143,7 +157,7 @@ class Dropin:
        with yt_dlp.YoutubeDL(validated_options) as ydl:
            for url in video_urls:
                try:
-                    logger.debug(f"Downloading video from URL: {url}")
+                    logger.debug(f"Downloading video from url: {url}")
                    info = ydl.extract_info(url, download=True)
                    filename = ydl_entry_to_filename(ydl, info)
                    if not filename:  # Failed to download video.
@@ -155,5 +169,5 @@ class Dropin:
                    to_enrich.add_media(media)
                    downloaded += 1
                except Exception as e:
-                    logger.error(f"Error downloading {url}: {e}")
+                    logger.error(f"Download failed: {e} {traceback.format_exc()}")
        return downloaded
--- a/src/auto_archiver/modules/antibot_extractor_enricher/dropins/linkedin.py
+++ b/src/auto_archiver/modules/antibot_extractor_enricher/dropins/linkedin.py
@@ -1,5 +1,5 @@
 from typing import Mapping
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger
 from auto_archiver.modules.antibot_extractor_enricher.dropin import Dropin


@@ -62,7 +62,7 @@ class LinkedinDropin(Dropin):
            self.sb.wait_for_ready_state_complete()

        username, password = self._get_username_password("linkedin.com")
-        logger.debug("LinkedinDropin Logging in to Linkedin with username: {}", username)
+        logger.debug("Logging in to Linkedin with username: {}", username)
        self.sb.type("#username", username)
        self.sb.type("#password", password)
        self.sb.click_if_visible("#password-visibility-toggle", timeout=0.5)
--- a/src/auto_archiver/modules/antibot_extractor_enricher/dropins/reddit.py
+++ b/src/auto_archiver/modules/antibot_extractor_enricher/dropins/reddit.py
@@ -3,7 +3,7 @@ from typing import Mapping
 from auto_archiver.core.metadata import Metadata
 from auto_archiver.modules.antibot_extractor_enricher.dropin import Dropin

-from loguru import logger
+from auto_archiver.utils.custom_logger import logger


 class RedditDropin(Dropin):
@@ -50,7 +50,7 @@ class RedditDropin(Dropin):
        self._close_cookies_banner()

        username, password = self._get_username_password("reddit.com")
-        logger.debug("RedditDropin Logging in to Reddit with username: {}", username)
+        logger.debug("Logging in to Reddit with username: {}", username)

        self.sb.type("#login-username", username)
        self.sb.type("#login-password", password)
@@ -68,7 +68,7 @@ class RedditDropin(Dropin):
            self.sb.click_link_text("Log in")
            self.sb.wait_for_ready_state_complete()
            if self.sb.is_text_visible("Welcome back"):
-                logger.debug("RedditDropin Login successful")
+                logger.debug("Login successful")
                self.sb.click_if_visible("this link")

    def _close_cookies_banner(self):
@@ -88,5 +88,5 @@ class RedditDropin(Dropin):
            .map(el => el.src || el.href)
            .filter(url => url && /\.(m3u8|mpd|ism)$/.test(url));
        """)
-        logger.debug("RedditDropin Found {} video URLs", len(filtered_urls))
+        logger.debug("Found {} video URLs", len(filtered_urls))
        return 0, self._download_videos_with_ytdlp(filtered_urls, to_enrich)
--- a/src/auto_archiver/modules/antibot_extractor_enricher/dropins/tiktok.py
+++ b/src/auto_archiver/modules/antibot_extractor_enricher/dropins/tiktok.py
@@ -0,0 +1,56 @@
+from contextlib import suppress
+from typing import Mapping
+
+from auto_archiver.utils.custom_logger import logger
+from auto_archiver.modules.antibot_extractor_enricher.dropin import Dropin
+
+
+class TikTokDropin(Dropin):
+    """
+    A class to handle TikTok drop-in functionality for the antibot extractor enricher module.
+
+    """
+
+    def documentation() -> Mapping[str, str]:
+        return {
+            "name": "TikTok Dropin",
+            "description": "Handles TikTok posts and works without authentication.\nNOTE: This dropin is highly susceptible to TikTok's bot detection mechanisms and may not work reliably if you reuse the same IP. The GenericExtractor is recommended for TikTok posts, as it handles video/image download more reliable. In the future we plan to implement better anti captcha measures for this dropin.",
+            "site": "tiktok.com",
+        }
+
+    @staticmethod
+    def suitable(url: str) -> bool:
+        return "tiktok.com" in url
+
+    @staticmethod
+    def images_selectors() -> str:
+        return '[data-e2e="detail-photo"] img'
+
+    @staticmethod
+    def video_selectors() -> str:
+        return None  # TikTok videos should be handled by the generic extractor
+
+    def open_page(self, url) -> bool:
+        self.sb.wait_for_ready_state_complete()
+        self._close_cookies_banner()
+        # TODO: implement login logic
+        if url != self.sb.get_current_url():
+            return False
+        if self.sb.is_text_visible("Video currently unavailable"):
+            logger.debug("Video may have been removed or is private.")
+            return False
+        return True
+
+    def hit_auth_wall(self) -> bool:
+        return False  # TikTok does not require authentication for public posts
+
+    def _close_cookies_banner(self):
+        with suppress(Exception):  # selenium.common.exceptions.JavascriptException
+            self.sb.execute_script("""
+                document
+                    .querySelector("tiktok-cookie-banner")
+                    .shadowRoot.querySelector("faceplate-dialog")
+                    .querySelector("button")
+                    .click()
+            """)
+        self.sb.click_if_visible("Skip")
--- a/src/auto_archiver/modules/antibot_extractor_enricher/dropins/vk.py
+++ b/src/auto_archiver/modules/antibot_extractor_enricher/dropins/vk.py
@@ -4,7 +4,7 @@ from typing import Mapping
 from auto_archiver.core.metadata import Metadata
 from auto_archiver.modules.antibot_extractor_enricher.dropin import Dropin

-from loguru import logger
+from auto_archiver.utils.custom_logger import logger


 class VkDropin(Dropin):
--- a/src/auto_archiver/modules/api_db/api_db.py
+++ b/src/auto_archiver/modules/api_db/api_db.py
@@ -2,7 +2,7 @@ from typing import Union

 import os
 import requests
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger

 from auto_archiver.core import Database
 from auto_archiver.core import Metadata
@@ -36,9 +36,9 @@ class AAApiDb(Database):
        if not self.store_results:
            return
        if cached:
-            logger.debug(f"skipping saving archive of {item.get_url()} to the AA API because it was cached")
+            logger.debug("Skipping saving archive to AA API because it was cached")
            return
-        logger.debug(f"saving archive of {item.get_url()} to the AA API.")
+        logger.debug("Saving archive to the AA API.")

        payload = {
            "author_id": self.author_id,
--- a/src/auto_archiver/modules/atlos_feeder_db_storage/atlos_feeder_db_storage.py
+++ b/src/auto_archiver/modules/atlos_feeder_db_storage/atlos_feeder_db_storage.py
@@ -3,7 +3,7 @@ import os
 from typing import IO, Iterator, Optional, Union

 import requests
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger

 from auto_archiver.core import Database, Feeder, Media, Metadata, Storage
 from auto_archiver.utils import calculate_file_hash
@@ -66,13 +66,13 @@ class AtlosFeederDbStorage(Feeder, Database, Storage):
        """Mark an item as failed in Atlos, if the ID exists."""
        atlos_id = item.metadata.get("atlos_id")
        if not atlos_id:
-            logger.info(f"Item {item.get_url()} has no Atlos ID, skipping")
+            logger.info("No Atlos ID available, skipping")
            return
        self._post(
            f"/api/v2/source_material/metadata/{atlos_id}/auto_archiver",
            json={"metadata": {"processed": True, "status": "error", "error": reason}},
        )
-        logger.info(f"Stored failure for {item.get_url()} (ID {atlos_id}) on Atlos: {reason}")
+        logger.info(f"Stored failure ID {atlos_id} on Atlos: {reason}")

    def fetch(self, item: Metadata) -> Union[Metadata, bool]:
        """check and fetch if the given item has been archived already, each
@@ -88,7 +88,7 @@ class AtlosFeederDbStorage(Feeder, Database, Storage):
        """Mark an item as successfully archived in Atlos."""
        atlos_id = item.metadata.get("atlos_id")
        if not atlos_id:
-            logger.info(f"Item {item.get_url()} has no Atlos ID, skipping")
+            logger.info("Item has no Atlos ID, skipping")
            return
        self._post(
            f"/api/v2/source_material/metadata/{atlos_id}/auto_archiver",
@@ -100,7 +100,7 @@ class AtlosFeederDbStorage(Feeder, Database, Storage):
                }
            },
        )
-        logger.info(f"Stored success for {item.get_url()} (ID {atlos_id}) on Atlos")
+        logger.info(f"Stored success ID {atlos_id} on Atlos")

    # ! Atlos Module - Storage Methods

--- a/src/auto_archiver/modules/cli_feeder/cli_feeder.py
+++ b/src/auto_archiver/modules/cli_feeder/cli_feeder.py
@@ -1,5 +1,3 @@
-from loguru import logger
-
 from auto_archiver.core.feeder import Feeder
 from auto_archiver.core.metadata import Metadata
 from auto_archiver.core.consts import SetupError
@@ -16,8 +14,5 @@ class CLIFeeder(Feeder):
    def __iter__(self) -> Metadata:
        urls = self.config["urls"]
        for url in urls:
-            logger.debug(f"Processing {url}")
            m = Metadata().set_url(url)
            yield m
-
-        logger.success(f"Processed {len(urls)} URL(s)")
--- a/src/auto_archiver/modules/console_db/console_db.py
+++ b/src/auto_archiver/modules/console_db/console_db.py
@@ -1,4 +1,4 @@
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger

 from auto_archiver.core import Database
 from auto_archiver.core import Metadata
--- a/src/auto_archiver/modules/csv_db/csv_db.py
+++ b/src/auto_archiver/modules/csv_db/csv_db.py
@@ -1,5 +1,5 @@
 import os
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger
 from csv import DictWriter
 from dataclasses import asdict

--- a/src/auto_archiver/modules/csv_feeder/csv_feeder.py
+++ b/src/auto_archiver/modules/csv_feeder/csv_feeder.py
@@ -1,4 +1,4 @@
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger
 import csv

 from auto_archiver.core import Feeder
@@ -35,5 +35,4 @@ class CSVFeeder(Feeder):
                        logger.warning(f"Not a valid URL in row: {row}, skipping")
                        continue
                    url = row[url_column]
-                    logger.debug(f"Processing {url}")
                    yield Metadata().set_url(url)
--- a/src/auto_archiver/modules/gdrive_storage/gdrive_storage.py
+++ b/src/auto_archiver/modules/gdrive_storage/gdrive_storage.py
@@ -8,7 +8,7 @@ from google.oauth2 import service_account
 from google.oauth2.credentials import Credentials
 from googleapiclient.discovery import build
 from googleapiclient.http import MediaFileUpload
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger

 from auto_archiver.core import Media
 from auto_archiver.core import Storage
@@ -62,7 +62,7 @@ class GDriveStorage(Storage):
        parent_id, folder_id = self.root_folder_id, None
        path_parts = media.key.split(os.path.sep)
        filename = path_parts[-1]
-        logger.info(f"looking for folders for {path_parts[0:-1]} before getting url for {filename=}")
+        logger.info(f"Looking for folders for {path_parts[0:-1]} before getting url for {filename=}")
        for folder in path_parts[0:-1]:
            folder_id = self._get_id_from_parent_and_name(parent_id, folder, use_mime_type=True, raise_on_missing=True)
            parent_id = folder_id
@@ -70,7 +70,7 @@ class GDriveStorage(Storage):
        file_id = self._get_id_from_parent_and_name(folder_id, filename, raise_on_missing=True)
        if not file_id:
            #
-            logger.info(f"file {filename} not found in folder {folder_id}")
+            logger.info(f"File {filename} not found in folder {folder_id}")
            return None
        return f"https://drive.google.com/file/d/{file_id}/view?usp=sharing"

@@ -83,7 +83,7 @@ class GDriveStorage(Storage):
        parent_id, upload_to = self.root_folder_id, None
        path_parts = media.key.split(os.path.sep)
        filename = path_parts[-1]
-        logger.info(f"checking folders {path_parts[0:-1]} exist (or creating) before uploading {filename=}")
+        logger.info(f"Checking folders {path_parts[0:-1]} exist (or creating) before uploading {filename=}")
        for folder in path_parts[0:-1]:
            upload_to = self._get_id_from_parent_and_name(parent_id, folder, use_mime_type=True, raise_on_missing=False)
            if upload_to is None:
@@ -91,7 +91,7 @@ class GDriveStorage(Storage):
            parent_id = upload_to

        # upload file to gd
-        logger.debug(f"uploading {filename=} to folder id {upload_to}")
+        logger.debug(f"Uploading {filename=} to folder id {upload_to}")
        file_metadata = {"name": [filename], "parents": [upload_to]}
        try:
            media = MediaFileUpload(media.filename, resumable=True)
@@ -100,11 +100,11 @@ class GDriveStorage(Storage):
                .create(supportsAllDrives=True, body=file_metadata, media_body=media, fields="id")
                .execute()
            )
-            logger.debug(f"uploadf: uploaded file {gd_file['id']} successfully in folder={upload_to}")
+            logger.debug(f"Uploadf: uploaded file {gd_file['id']} successfully in folder={upload_to}")
        except FileNotFoundError as e:
-            logger.error(f"gd uploadf: file not found {media.filename=} - {e}")
+            logger.error(f"GD uploadf: file not found {media.filename=} - {e}")
        except Exception as e:
-            logger.error(f"gd uploadf: error uploading {media.filename=} to {upload_to} - {e}")
+            logger.error(f"GD uploadf: error uploading {media.filename=} to {upload_to} - {e}")

    # must be implemented even if unused
    def uploadf(self, file: IO[bytes], key: str, **kwargs: dict) -> bool:
@@ -133,7 +133,7 @@ class GDriveStorage(Storage):
            self.api_cache = getattr(self, "api_cache", {})
            cache_key = f"{parent_id}_{name}_{use_mime_type}"
            if cache_key in self.api_cache:
-                logger.debug(f"cache hit for {cache_key=}")
+                logger.debug(f"Cache hit for {cache_key=}")
                return self.api_cache[cache_key]

        # API logic
@@ -168,7 +168,7 @@ class GDriveStorage(Storage):
            else:
                logger.debug(f"{debug_header} not found, attempt {attempt + 1}/{retries}.")
                if attempt < retries - 1:
-                    logger.debug(f"sleeping for {sleep_seconds} second(s)")
+                    logger.debug(f"Sleeping for {sleep_seconds} second(s)")
                    time.sleep(sleep_seconds)

        if raise_on_missing:
--- a/src/auto_archiver/modules/generic_extractor/manifest.py
+++ b/src/auto_archiver/modules/generic_extractor/manifest.py
@@ -58,7 +58,11 @@ If you are having issues with the extractor, you can review the version of `yt-d
        },
        "proxy": {
            "default": "",
-            "help": "http/socks (https seems to not work atm) proxy to use for the webdriver, eg https://proxy-user:password@proxy-ip:port",
+            "help": "http/https/socks proxy to use for the webdriver, eg https://proxy-user:password@proxy-ip:port",
+        },
+        "proxy_on_failure_only": {
+            "default": True,
+            "help": "Applies only if a proxy is set. In that case if this setting is True, the extractor will only use the proxy if the initial request fails; if it is False, the extractor will always use the proxy.",
        },
        "end_means_success": {
            "default": True,
--- a/src/auto_archiver/modules/generic_extractor/bluesky.py
+++ b/src/auto_archiver/modules/generic_extractor/bluesky.py
@@ -1,4 +1,4 @@
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger

 from auto_archiver.core.extractor import Extractor
 from auto_archiver.core.metadata import Metadata, Media
@@ -39,12 +39,18 @@ class Bluesky(GenericDropin):
        media_url = "https://bsky.social/xrpc/com.atproto.sync.getBlob?cid={}&did={}"
        for image_media in image_medias:
            url = media_url.format(image_media["image"]["ref"]["$link"], post["author"]["did"])
-            image_media = archiver.download_from_url(url)
-            media.append(Media(image_media))
+            filename = archiver.download_from_url(url)
+            if filename:
+                media.append(Media(filename))
+            else:
+                logger.warning(f"Failed to download Bluesky image from {url}")
        for video_media in video_medias:
            url = media_url.format(video_media["ref"]["$link"], post["author"]["did"])
-            video_media = archiver.download_from_url(url)
-            media.append(Media(video_media))
+            filename = archiver.download_from_url(url)
+            if filename:
+                media.append(Media(filename))
+            else:
+                logger.warning(f"Failed to download Bluesky video from {url}")
        return media

    def _get_post_data(self, post: dict) -> dict:
--- a/src/auto_archiver/modules/generic_extractor/facebook.py
+++ b/src/auto_archiver/modules/generic_extractor/facebook.py
@@ -34,7 +34,7 @@ def _extract_metadata(self, webpage, video_id):
            ...,
            "attachments",
            ...,
-            lambda k, v: (k == "media" and str(v["id"]) == video_id and v["__typename"] == "Video"),
+            lambda k, v: k == "media" and str(v["id"]) == video_id and v["__typename"] == "Video",
        ),
        expected_type=dict,
    )
--- a/src/auto_archiver/modules/generic_extractor/generic_extractor.py
+++ b/src/auto_archiver/modules/generic_extractor/generic_extractor.py
@@ -4,6 +4,7 @@ import datetime
 import os
 import importlib
 import subprocess
+import traceback
 import zipfile

 from typing import Generator, Type
@@ -14,12 +15,13 @@ from yt_dlp.extractor.common import InfoExtractor
 from yt_dlp.utils import MaxDownloadsReached
 import pysubs2

-from loguru import logger
+from auto_archiver.utils.custom_logger import logger

 from auto_archiver.core.extractor import Extractor
 from auto_archiver.core import Metadata, Media
 from auto_archiver.utils import get_datetime_from_str
 from auto_archiver.utils.misc import ydl_entry_to_filename
+from auto_archiver.utils.deletion_detection import detect_deletion, flag_as_deleted
 from .dropin import GenericDropin


@@ -63,8 +65,7 @@ class GenericExtractor(Extractor):
            if os.environ.get("AUTO_ARCHIVER_ALLOW_RESTART", "1") != "1":
                logger.warning("yt-dlp or plugin was updated — please restart auto-archiver manually")
            else:
-                logger.warning("yt-dlp or plugin was updated — restarting auto-archiver")
-                logger.warning(" ======= RESTARTING ======= ")
+                logger.warning("yt-dlp or plugin was updated — restarting auto-archiver\n ======= RESTARTING ======= ")
                os.execv(sys.executable, [sys.executable] + sys.argv)

    def update_package(self, package_name: str) -> bool:
@@ -80,7 +81,7 @@ class GenericExtractor(Extractor):
                return True
            logger.info(f"{package_name} already up to date")
        except Exception as e:
-            logger.error(f"Error updating {package_name}: {e}")
+            logger.error(f"Failed to update {package_name}: {e}")
        return False

    def setup_po_tokens(self) -> None:
@@ -203,10 +204,13 @@ class GenericExtractor(Extractor):
        if thumbnail_url:
            try:
                cover_image_path = self.download_from_url(thumbnail_url)
-                media = Media(cover_image_path)
-                metadata.add_media(media, id="cover")
+                if cover_image_path:
+                    media = Media(cover_image_path)
+                    metadata.add_media(media, id="cover")
+                else:
+                    logger.warning(f"Failed to download cover image from {thumbnail_url}")
            except Exception as e:
-                logger.error(f"Error downloading cover image {thumbnail_url}: {e}")
+                logger.error(f"Could not download cover image {thumbnail_url}: {e}")

        dropin = self.dropin_for_name(info_extractor.ie_key())
        if dropin:
@@ -306,9 +310,9 @@ class GenericExtractor(Extractor):
            result.set_url(url)

        if "description" in video_data and not result.get("content"):
-            result.set_content(video_data.get("description"))
+            result.set_content(video_data.pop("description"))
        # extract comments if enabled
-        if self.comments and video_data.get("comments", []) is not None:
+        if self.comments and video_data.get("comments", None) is not None:
            result.set(
                "comments",
                [
@@ -354,7 +358,7 @@ class GenericExtractor(Extractor):
        if not dropin:
            # TODO: add a proper link to 'how to create your own dropin'
            logger.debug(f"""Could not find valid dropin for {info_extractor.ie_key()}.
-                     Why not try creating your own, and make sure it has a valid function called 'create_metadata'. Learn more: https://auto-archiver.readthedocs.io/en/latest/user_guidelines.html#""")
+                     Why not try creating your own, and make sure it has a valid function called 'create_metadata'. Learn more: https://auto-archiver.readthedocs.io/en/latest/modules/autogen/extractor/generic_extractor.html#dropins""")
            return False

        post_data = dropin.extract_post(url, ie_instance)
@@ -375,7 +379,7 @@ class GenericExtractor(Extractor):
        if "entries" in data:
            entries = data.get("entries", [])
            if not len(entries):
-                logger.info("YoutubeDLArchiver could not find any video")
+                logger.info("GenericExtractor could not find any video")
                return False
        else:
            entries = [data]
@@ -407,9 +411,9 @@ class GenericExtractor(Extractor):
                            logger.error(f"Error loading subtitle file {val.get('filepath')}: {e}")
                result.add_media(new_media)
            except Exception as e:
-                logger.error(f"Error processing entry {entry}: {e}")
+                logger.error(f"Error processing entry {str(entry)[:256]}: {e} {traceback.format_exc()}")
        if not len(result.media):
-            logger.info(f"No media found for entry {entry}, skipping.")
+            logger.info(f"No media found for entry {str(entry)[:256]}, skipping.")
            return False

        return self.add_metadata(data, info_extractor, url, result)
@@ -484,6 +488,13 @@ class GenericExtractor(Extractor):
            # don't download since it can be a live stream
            data = ydl.extract_info(url, ie_key=info_extractor.ie_key(), download=False)

+            # Check for deletion indicators in video data
+            deletion_info = detect_deletion(video_data=data, url=url)
+            if deletion_info:
+                result = Metadata()
+                flag_as_deleted(result, deletion_info)
+                return result
+
            result = _helper_for_successful_extract_info(data, info_extractor, url, ydl)

        except MaxDownloadsReached:
@@ -503,6 +514,16 @@ class GenericExtractor(Extractor):
            try:
                result = self.get_metadata_for_post(info_extractor, url, ydl)
            except (yt_dlp.utils.DownloadError, yt_dlp.utils.ExtractorError) as post_e:
+                # Check if the error indicates deletion
+                deletion_info = detect_deletion(error_message=str(post_e), url=url)
+                if deletion_info:
+                    result = Metadata()
+                    flag_as_deleted(result, deletion_info)
+                    return result
+
+                if "NSFW tweet requires authentication." in str(post_e):
+                    logger.warning(str(post_e))
+                    return False
                logger.error("Error downloading metadata for post: {error}", error=str(post_e))
                return False
            except Exception as generic_e:
@@ -514,7 +535,7 @@ class GenericExtractor(Extractor):
                )
                return False

-        if result:
+        if result and not result.is_success():
            extractor_name = "yt-dlp"
            if info_extractor:
                extractor_name += f"_{info_extractor.ie_key()}"
@@ -526,7 +547,7 @@ class GenericExtractor(Extractor):

        return result

-    def download(self, item: Metadata) -> Metadata:
+    def download(self, item: Metadata, skip_proxy: bool = False) -> Metadata:
        url = item.get_url()

        # TODO: this is a temporary hack until this issue is closed: https://github.com/yt-dlp/yt-dlp/issues/11025
@@ -534,6 +555,16 @@ class GenericExtractor(Extractor):
            url = url.replace("https://ya.ru", "https://yandex.ru")
            item.set("replaced_url", url)

+        # proxy_on_failure_only logic
+        if self.proxy and self.proxy_on_failure_only and not skip_proxy:
+            # when proxy_on_failure_only is True, we first try to download without a proxy and only continue with execution if that fails
+            try:
+                if without_proxy := self.download(item, skip_proxy=True):
+                    logger.info("Downloaded successfully without proxy.")
+                    return without_proxy
+            except Exception:
+                logger.debug("Download without proxy failed, trying with proxy...")
+
        ydl_options = [
            "-o",
            os.path.join(self.tmp_dir, "%(id)s.%(ext)s"),
@@ -544,10 +575,12 @@ class GenericExtractor(Extractor):
            "--live-from-start" if self.live_from_start else "--no-live-from-start",
            "--postprocessor-args",
            "ffmpeg:-bitexact",  # ensure bitexact output to avoid mismatching hashes for same video
+            "--js-runtimes",
+            "node",  # yt-dlp defaults to deno-only; node is available in the base image
        ]

        # proxy handling
-        if self.proxy:
+        if self.proxy and not skip_proxy:
            ydl_options.extend(["--proxy", self.proxy])

        # max_downloads handling
@@ -560,17 +593,17 @@ class GenericExtractor(Extractor):
        # order of importance: username/password -> api_key -> cookie -> cookies_from_browser -> cookies_file
        if auth:
            if "username" in auth and "password" in auth:
-                logger.debug(f"Using provided auth username and password for {url}")
+                logger.debug("Using provided auth username and password")
                ydl_options.extend(("--username", auth["username"]))
                ydl_options.extend(("--password", auth["password"]))
            elif "cookie" in auth:
-                logger.debug(f"Using provided auth cookie for {url}")
+                logger.debug("Using provided auth cookie")
                yt_dlp.utils.std_headers["cookie"] = auth["cookie"]
            elif "cookies_from_browser" in auth:
-                logger.debug(f"Using extracted cookies from browser {auth['cookies_from_browser']} for {url}")
+                logger.debug(f"Using extracted cookies from browser {auth['cookies_from_browser']}")
                ydl_options.extend(("--cookies-from-browser", auth["cookies_from_browser"]))
            elif "cookies_file" in auth:
-                logger.debug(f"Using cookies from file {auth['cookies_file']} for {url}")
+                logger.debug(f"Using cookies from file {auth['cookies_file']}")
                ydl_options.extend(("--cookies", auth["cookies_file"]))

        # Applying user-defined extractor_args
@@ -584,7 +617,7 @@ class GenericExtractor(Extractor):
                ydl_options.extend(["--extractor-args", f"{key}:{arg_str}"])

        if self.ytdlp_args:
-            logger.debug("Adding additional ytdlp arguments: {self.ytdlp_args}")
+            logger.debug(f"Adding additional ytdlp arguments: {self.ytdlp_args}")
            ydl_options += self.ytdlp_args.split(" ")

        *_, validated_options = yt_dlp.parse_options(ydl_options)
@@ -592,9 +625,9 @@ class GenericExtractor(Extractor):
            validated_options
        )  # allsubtitles and subtitleslangs not working as expected, so default lang is always "en"

+        result: Metadata = None
        for info_extractor in self.suitable_extractors(url):
-            result = self.download_for_extractor(info_extractor, url, ydl)
-            if result:
-                return result
-
-        return False
+            local_result: Metadata = self.download_for_extractor(info_extractor, url, ydl)
+            if local_result:
+                result = result.merge(local_result) if result else local_result
+        return result if result else False
--- a/src/auto_archiver/modules/generic_extractor/tiktok.py
+++ b/src/auto_archiver/modules/generic_extractor/tiktok.py
@@ -1,5 +1,6 @@
+import re
 import requests
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger

 from yt_dlp.extractor.tiktok import TikTokIE, TikTokLiveIE, TikTokVMIE, TikTokUserIE

@@ -14,70 +15,109 @@ class Tiktok(GenericDropin):
    It's useful for capturing content that requires a login, like sensitive content.
    """

+    # Regex pattern to match TikTok photo post URLs
+    PHOTO_URL_REGEX = r"https?://(?:www\.)?tiktok\.com/@[\w\.-]+/photo/\d+"
    TIKWM_ENDPOINT = "https://www.tikwm.com/api/?url={url}"

    def suitable(self, url, info_extractor) -> bool:
        """This dropin (which uses Tikvm) is suitable for *all* Tiktok type URLs - videos, lives, VMs, and users.
        Return the 'suitable' method from the TikTokIE class."""
-        return any(extractor().suitable(url) for extractor in (TikTokIE, TikTokLiveIE, TikTokVMIE, TikTokUserIE))
+        return any(extractor().suitable(url) for extractor in (TikTokIE, TikTokLiveIE, TikTokVMIE, TikTokUserIE)) or (
+            re.match(self.PHOTO_URL_REGEX, url) is not None
+        )

    def extract_post(self, url: str, ie_instance):
-        logger.debug(f"Using Tikwm API to attempt to download tiktok video from {url=}")
+        logger.debug("Using Tikwm API to attempt to download tiktok video")

        endpoint = self.TIKWM_ENDPOINT.format(url=url)

        r = requests.get(endpoint)
        if r.status_code != 200:
-            raise ValueError(f"unexpected status code '{r.status_code}' from tikwm.com for {url=}:")
+            raise ValueError(f"Unexpected status code '{r.status_code}' from tikwm.com")

        try:
            json_response = r.json()
        except ValueError:
-            raise ValueError(f"failed to parse JSON response from tikwm.com for {url=}")
+            raise ValueError("Failed to parse JSON response from tikwm.com")

        if not json_response.get("msg") == "success" or not (api_data := json_response.get("data", {})):
-            raise ValueError(f"failed to get a valid response from tikwm.com for {url=}: {repr(json_response)}")
+            raise ValueError(f"Unable to download with tikwm.com: {repr(json_response)}")

        # tries to get the non-watermarked version first
-        video_url = api_data.pop("play", api_data.pop("wmplay", None))
-        if not video_url:
-            raise ValueError(f"no valid video URL found in response from tikwm.com for {url=}")
-
-        api_data["video_url"] = video_url
+        play_url = api_data.pop("play", api_data.pop("wmplay", None))
+        if play_url and "mime_type=audio" in play_url:
+            play_url = None
+        if play_url:
+            api_data["video_url"] = play_url
        return api_data

    def keys_to_clean(self, video_data: dict, info_extractor):
-        return ["video_url", "title", "create_time", "author", "cover", "origin_cover", "ai_dynamic_cover", "duration"]
+        return [
+            "video_url",
+            "title",
+            "create_time",
+            "author",
+            "cover",
+            "origin_cover",
+            "ai_dynamic_cover",
+            "duration",
+            "size",
+            "wm_size",
+            "music",
+            "music_info",
+            "play_count",
+            "digg_count",
+            "comment_count",
+            "share_count",
+            "download_count",
+            "collect_count",
+            "anchors",
+            "anchors_extras",
+            "is_ad",
+            "commerce_info",
+            "commercial_video_info",
+            "item_comment_settings",
+            "mentioned_users",
+        ]  # all of these will be added via api_data in a single metadata field vs individual ones in the generic extractor

    def create_metadata(self, post: dict, ie_instance, archiver, url):
        # prepare result, start by downloading video
        result = Metadata()
-        video_url = post.pop("video_url")
-
+        is_success = False
        # get the cover if possible
        cover_url = post.pop("origin_cover", post.pop("cover", post.pop("ai_dynamic_cover", None)))
        if cover_url and (cover_downloaded := archiver.download_from_url(cover_url)):
            result.add_media(Media(cover_downloaded))

-        # get the video or fail
-        video_downloaded = archiver.download_from_url(video_url, f"vid_{post.get('id', '')}")
-        if not video_downloaded:
-            logger.error(f"failed to download video from {video_url}")
-            return False
-        video_media = Media(video_downloaded)
-        if duration := post.get("duration", None):
-            video_media.set("duration", duration)
-        result.add_media(video_media)
+        for image_url in post.pop("images", []):
+            if image_downloaded := archiver.download_from_url(image_url):
+                result.add_media(Media(image_downloaded))
+                is_success = True  # this is an images post and we got it/them
+
+        # get the video if present, could be an image post
+        if video_url := post.pop("video_url", None):
+            video_downloaded = archiver.download_from_url(video_url, f"vid_{post.get('id', '')}")
+            if not video_downloaded:
+                logger.error("Failed to download video")
+                return False
+            video_media = Media(video_downloaded)
+            if duration := post.pop("duration", None):
+                video_media.set("duration", duration)
+            result.add_media(video_media)
+            is_success = True  # this is a video post and we got it

        # add remaining metadata
-        result.set_title(post.get("title", ""))
+        result.set_title(post.pop("title", ""))

-        if created_at := post.get("create_time", None):
+        if created_at := post.pop("create_time", None):
            result.set_timestamp(datetime.fromtimestamp(created_at, tz=timezone.utc))

-        if author := post.get("author", None):
+        if author := post.pop("author", None):
            result.set("author", author)

-        result.set("api_data", post)
-
+        result.set("api_data", {k: v for k, v in post.items() if v})
+        if is_success:
+            result.success("yt-dlp_TikTok")
+        else:
+            raise ValueError("Unable to download any media from TikTok post, possibly deleted or private.")
        return result
--- a/src/auto_archiver/modules/generic_extractor/truth.py
+++ b/src/auto_archiver/modules/generic_extractor/truth.py
@@ -1,6 +1,7 @@
 from typing import Type

 from auto_archiver.utils import traverse_obj
+from auto_archiver.utils.custom_logger import logger
 from auto_archiver.core.metadata import Metadata, Media
 from auto_archiver.core.extractor import Extractor
 from yt_dlp.extractor.common import InfoExtractor
@@ -58,6 +59,9 @@ class Truth(GenericDropin):
        # add the media
        for media in post.get("media_attachments", []):
            filename = archiver.download_from_url(media["url"])
+            if not filename:
+                logger.warning(f"Failed to download media from {media['url']}")
+                continue
            result.add_media(Media(filename), id=media.get("id"))

        return result
--- a/src/auto_archiver/modules/generic_extractor/twitter.py
+++ b/src/auto_archiver/modules/generic_extractor/twitter.py
@@ -1,13 +1,16 @@
 import re
 import mimetypes

-from loguru import logger
+from auto_archiver.utils.custom_logger import logger
 from slugify import slugify

 from auto_archiver.core.metadata import Metadata, Media
 from auto_archiver.utils import url as UrlUtil, get_datetime_from_str
 from auto_archiver.core.extractor import Extractor
+from auto_archiver.utils.deletion_detection import detect_deletion, flag_as_deleted
 from auto_archiver.modules.generic_extractor.dropin import GenericDropin, InfoExtractor
+import requests
+from retrying import retry


 class Twitter(GenericDropin):
@@ -28,7 +31,85 @@ class Twitter(GenericDropin):

    def extract_post(self, url: str, ie_instance: InfoExtractor):
        twid = ie_instance._match_valid_url(url).group("id")
-        return ie_instance._extract_status(twid=twid)
+        try:
+            post_data = ie_instance._extract_status(twid=twid)
+            if not post_data or not post_data.get("user") or not post_data.get("created_at"):
+                raise ValueError("Error retrieving post with twitter dropin")
+            return post_data
+        except Exception as e:
+            logger.debug(f"yt-dlp twitter extraction failed: {e}")
+            # try fxtwitter API as fallback
+            return self._fetch_fxtwitter(twid)
+
+    def _fetch_fxtwitter(self, twid: str) -> dict:
+        """Fetch tweet data from fxtwitter API and convert to expected format."""
+        fxtwitter_url = f"https://api.fxtwitter.com/status/{twid}"
+        logger.info(f"Falling back to fxtwitter API for tweet extraction: {fxtwitter_url}")
+
+        @retry(wait_random_min=500, wait_random_max=2000, stop_max_attempt_number=3)
+        def fetch_fxtwitter_data(url):
+            headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/112.0"}
+            resp = requests.get(url, headers=headers, timeout=15)
+            if resp.status_code != 200:
+                raise ValueError(f"Failed to retrieve tweet from fxtwitter API: {resp.status_code}")
+            data = resp.json()
+            if "tweet" not in data:
+                raise ValueError(f"No tweet data in fxtwitter response: {data.get('message', 'Unknown error')}")
+            return data["tweet"]
+
+        tweet = fetch_fxtwitter_data(fxtwitter_url)
+
+        # Convert fxtwitter format to expected format
+        author = tweet.get("author", {}).get("name", "")
+        created_at = tweet.get("created_at", "")  # Format: "Sun Feb 08 18:45:00 +0000 2026"
+        full_text = tweet.get("text", "") or tweet.get("raw_text", "")
+
+        # Convert media format
+        media = []
+        fx_media = tweet.get("media", {})
+
+        # Handle photos
+        for photo in fx_media.get("photos", []):
+            media.append({"type": "photo", "media_url_https": photo.get("url", "")})
+
+        # Handle videos
+        for video in fx_media.get("videos", []):
+            variants = video.get("variants", [])
+            # Convert to expected variant format
+            converted_variants = []
+            for var in variants:
+                converted_variants.append(
+                    {
+                        "url": var.get("url", ""),
+                        "content_type": var.get("content_type", "video/mp4"),
+                        "bitrate": var.get("bitrate", 0),
+                    }
+                )
+            if converted_variants:
+                media.append({"type": "video", "video_info": {"variants": converted_variants}})
+
+        # Handle animated gifs (fxtwitter may include these in videos)
+        for item in fx_media.get("all", []):
+            if item.get("type") == "gif":
+                variants = item.get("variants", [])
+                converted_variants = []
+                for var in variants:
+                    converted_variants.append(
+                        {
+                            "url": var.get("url", ""),
+                            "content_type": var.get("content_type", "video/mp4"),
+                            "bitrate": var.get("bitrate", 0),
+                        }
+                    )
+                if converted_variants:
+                    media.append({"type": "animated_gif", "video_info": {"variants": converted_variants}})
+
+        return {
+            "user": {"name": author},
+            "created_at": created_at,
+            "full_text": full_text,
+            "entities": {"media": media},
+        }

    def keys_to_clean(self, video_data, info_extractor):
        return ["user", "created_at", "entities", "favorited", "translator_type"]
@@ -37,7 +118,15 @@ class Twitter(GenericDropin):
        result = Metadata()
        try:
            if not tweet.get("user") or not tweet.get("created_at"):
-                raise ValueError("Error retreiving post. Are you sure it exists?")
+                # Check for deletion indicators
+                deletion_info = detect_deletion(
+                    video_data=tweet, url=url, error_message="Missing user or created_at fields"
+                )
+                if deletion_info:
+                    flag_as_deleted(result, deletion_info)
+                    return result
+
+                raise ValueError("Error retrieving post. Are you sure it exists?")
            timestamp = get_datetime_from_str(tweet["created_at"], "%a %b %d %H:%M:%S %z %Y")
        except (ValueError, KeyError) as ex:
            logger.warning(f"Unable to parse tweet: {str(ex)}\nRetreived tweet data: {tweet}")
@@ -68,5 +157,8 @@ class Twitter(GenericDropin):
                mimetype = variant["content_type"]
            ext = mimetypes.guess_extension(mimetype)
            media.filename = archiver.download_from_url(media.get("src"), f"{slugify(url)}_{i}{ext}")
+            if not media.filename:
+                logger.warning(f"Failed to download media from {media.get('src')}")
+                continue
            result.add_media(media)
        return result
--- a/src/auto_archiver/modules/ghostarchive_enricher/init.py
+++ b/src/auto_archiver/modules/ghostarchive_enricher/init.py
@@ -0,0 +1 @@
+from .ghostarchive_enricher import GhostarchiveEnricher
--- a/src/auto_archiver/modules/ghostarchive_enricher/manifest.py
+++ b/src/auto_archiver/modules/ghostarchive_enricher/manifest.py
@@ -0,0 +1,58 @@
+{
+    "name": "Ghost Archive Enricher",
+    "type": ["enricher"],
+    "entry_point": "ghostarchive_enricher::GhostarchiveEnricher",
+    "requires_setup": False,
+    "dependencies": {
+        "python": ["loguru", "requests", "bs4", "seleniumbase"],
+    },
+    "configs": {
+        "timeout": {
+            "default": 120,
+            "type": "int",
+            "help": "seconds to wait for successful archive confirmation from Ghost Archive.",
+        },
+        "check_existing": {
+            "default": True,
+            "type": "bool",
+            "help": "whether to search for an existing archive before submitting a new one.",
+        },
+        "proxy_http": {
+            "default": None,
+            "help": "http proxy to use for requests, eg http://proxy-user:password@proxy-ip:port",
+        },
+        "proxy_https": {
+            "default": None,
+            "help": "https proxy to use for requests, eg https://proxy-user:password@proxy-ip:port",
+        },
+    },
+    "description": """
+    Submits the current URL to [Ghost Archive](https://ghostarchive.org/) for archiving and returns the archived page URL.
+
+    Used as an **enricher** to add a Ghost Archive URL to items already extracted by other modules.
+
+    ### Features
+    - Archives any public URL using the Ghost Archive service.
+    - Optionally checks for existing archives before submitting a new one.
+    - Supports HTTP and HTTPS proxies for requests.
+    - Parses HTML responses to extract archive URLs (Ghost Archive has no JSON API).
+
+    ### Important
+    - This module confirms that Ghost Archive accepted the URL submission and returned an archive link.
+      It does **not** verify the contents or completeness of the archived page.
+
+    ### Notes
+    - Ghost Archive is a free service with no authentication required.
+    - Archived pages must be smaller than 50 MB (including CSS, fonts, images, etc.).
+    - Videos are archived up to 360p and must be under 100 MB and shorter than 30 minutes.
+    - Archival may take up to 5 minutes depending on the queue and page complexity.
+    - Archived content is stored indefinitely.
+    - Ghost Archive does not archive pages that require authentication or form submission.
+
+    ### Limitations
+    - No official API — this module interacts with the Ghost Archive web interface.
+    - The submission endpoint is protected by Cloudflare, so a headless browser (SeleniumBase) is used for new submissions.
+    - Searching for existing archives uses plain HTTP requests and does not require a browser.
+    - Rate limiting may apply; consider using a delay between requests if archiving many URLs.
+    """,
+}
--- a/src/auto_archiver/modules/ghostarchive_enricher/ghostarchive_enricher.py
+++ b/src/auto_archiver/modules/ghostarchive_enricher/ghostarchive_enricher.py
@@ -0,0 +1,153 @@
+import time
+import re
+
+import requests
+from bs4 import BeautifulSoup
+from seleniumbase import SB
+from auto_archiver.utils.custom_logger import logger
+from auto_archiver.utils import url as UrlUtil
+from auto_archiver.core import Enricher, Metadata
+
+
+class GhostarchiveEnricher(Enricher):
+    """
+    Submits the current URL to Ghost Archive (ghostarchive.org) for archiving
+    and stores the archived page URL as enrichment metadata.
+
+    Ghost Archive has no official API — this module interacts with the web form
+    and parses HTML responses. The submission endpoint is protected by Cloudflare,
+    so a headless browser (SeleniumBase) is used for archival submissions, while
+    plain HTTP requests are used for searching existing archives.
+
+    Note: this module only confirms that Ghost Archive accepted the submission
+    and returned an archive URL. It does not verify that the archived page
+    content is complete or correctly rendered.
+    """
+
+    GHOSTARCHIVE_BASE = "https://ghostarchive.org"
+    ARCHIVE_ENDPOINT = f"{GHOSTARCHIVE_BASE}/archive2"
+    SEARCH_ENDPOINT = f"{GHOSTARCHIVE_BASE}/search"
+    ARCHIVE_URL_PATTERN = re.compile(r"/archive/([A-Za-z0-9]+)")
+
+    def _get_proxies(self) -> dict:
+        proxies = {}
+        if self.proxy_http:
+            proxies["http"] = self.proxy_http
+        if self.proxy_https:
+            proxies["https"] = self.proxy_https
+        return proxies
+
+    def _get_headers(self) -> dict:
+        return {
+            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
+        }
+
+    def _normalize_archive_href(self, href: str) -> str | None:
+        """Normalize an archive link href to a full HTTPS URL, filtering out replay links."""
+        if "/archive/" not in href or "/replay/" in href:
+            return None
+        if href.startswith("/"):
+            return f"{self.GHOSTARCHIVE_BASE}{href}"
+        if href.startswith("http://ghostarchive.org"):
+            return href.replace("http://", "https://")
+        if href.startswith("https://ghostarchive.org"):
+            return href
+        return None
+
+    def _search_existing(self, url: str) -> str | None:
+        """
+        Search Ghost Archive for an existing archive of the given URL.
+        Returns the archive URL if found, otherwise None.
+        """
+        try:
+            r = requests.get(
+                self.SEARCH_ENDPOINT,
+                params={"term": url},
+                headers=self._get_headers(),
+                proxies=self._get_proxies(),
+                timeout=30,
+            )
+            if r.status_code != 200:
+                logger.warning(f"Ghost Archive search returned status {r.status_code}")
+                return None
+
+            soup = BeautifulSoup(r.text, "html.parser")
+            for link in soup.find_all("a", href=True):
+                archive_url = self._normalize_archive_href(link["href"])
+                if archive_url:
+                    logger.info(f"Found existing Ghost Archive: {archive_url}")
+                    return archive_url
+
+        except requests.exceptions.RequestException as e:
+            logger.warning(f"Ghost Archive search failed: {e}")
+
+        return None
+
+    def _submit_url(self, url: str) -> str | None:
+        """
+        Submit a URL to Ghost Archive for archiving using a headless browser.
+        The /archive2 endpoint is Cloudflare-protected, requiring JS execution.
+        Returns the archive URL if successful, otherwise None.
+        """
+        try:
+            with SB(uc=True, headless=True) as sb:
+                logger.debug("Opening Ghost Archive homepage in headless browser")
+                sb.open(self.GHOSTARCHIVE_BASE)
+
+                # fill in the archive form and submit
+                sb.type('input[name="archive"]', url)
+                sb.click('input[type="submit"][value="Submit for archival"]')
+
+                # wait for navigation to /archive/{id} or timeout
+                start_time = time.time()
+                while time.time() - start_time < self.timeout:
+                    current_url = sb.get_current_url()
+                    if self.ARCHIVE_URL_PATTERN.search(current_url):
+                        archive_url = current_url.split("?")[0]
+                        logger.info(f"Ghost Archive saved: {archive_url}")
+                        return archive_url
+                    time.sleep(2)
+
+                # if we didn't redirect, try parsing the page source
+                page_source = sb.get_page_source()
+                return self._parse_archive_url(page_source)
+
+        except Exception as e:
+            logger.warning(f"Ghost Archive submission failed: {e}")
+            return None
+
+    def _parse_archive_url(self, html: str) -> str | None:
+        """Parse HTML response to find an archive URL."""
+        soup = BeautifulSoup(html, "html.parser")
+        for link in soup.find_all("a", href=True):
+            archive_url = self._normalize_archive_href(link["href"])
+            if archive_url:
+                return archive_url
+        return None
+
+    def enrich(self, to_enrich: Metadata) -> bool:
+        url = to_enrich.get_url()
+        if UrlUtil.is_auth_wall(url):
+            logger.debug("[SKIP] Ghost Archive since url is behind AUTH WALL")
+            return False
+
+        if to_enrich.get("ghostarchive"):
+            logger.info(f"Ghost Archive enricher had already been executed: {to_enrich.get('ghostarchive')}")
+            return True
+
+        # optionally check for existing archive first
+        archive_url = None
+        if self.check_existing:
+            logger.debug(f"Searching Ghost Archive for existing archive of {url}")
+            archive_url = self._search_existing(url)
+
+        if not archive_url:
+            logger.debug(f"Submitting {url} to Ghost Archive")
+            archive_url = self._submit_url(url)
+
+        if archive_url:
+            to_enrich.set("ghostarchive", archive_url)
+            return True
+
+        logger.warning(f"Ghost Archive failed to archive {url}")
+        return False
--- a/src/auto_archiver/modules/gsheet_feeder_db/gsheet_feeder_db.py
+++ b/src/auto_archiver/modules/gsheet_feeder_db/gsheet_feeder_db.py
@@ -10,11 +10,12 @@ The filtered rows are processed into `Metadata` objects.
 """

 import os
+import traceback
 from typing import Tuple, Union, Iterator
 from urllib.parse import quote

 import gspread
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger
 from slugify import slugify
 from retrying import retry

@@ -31,28 +32,39 @@ class GsheetsFeederDB(Feeder, Database):
        if not self.sheet and not self.sheet_id:
            raise ValueError("You need to define either a 'sheet' name or a 'sheet_id' in your manifest.")

-    def open_sheet(self):
+    @retry(
+        wait_exponential_multiplier=1,
+        stop_max_attempt_number=6,
+    )
+    def open_sheet(self) -> gspread.Spreadsheet:
        if self.sheet:
            return self.gsheets_client.open(self.sheet)
        else:
            return self.gsheets_client.open_by_key(self.sheet_id)

-    def __iter__(self) -> Iterator[Metadata]:
-        sh = self.open_sheet()
-        for ii, worksheet in enumerate(sh.worksheets()):
-            if not self.should_process_sheet(worksheet.title):
-                logger.debug(f"SKIPPED worksheet '{worksheet.title}' due to allow/block rules")
-                continue
-            logger.info(f"Opening worksheet {ii=}: {worksheet.title=} header={self.header}")
-            gw = GWorksheet(worksheet, header_row=self.header, columns=self.columns)
-            if len(missing_cols := self.missing_required_columns(gw)):
-                logger.debug(
-                    f"SKIPPED worksheet '{worksheet.title}' due to missing required column(s) for {missing_cols}"
-                )
-                continue
+    @retry(
+        wait_exponential_multiplier=1,
+        stop_max_attempt_number=6,
+    )
+    def enumerate_sheets(self, sheet) -> Iterator[gspread.Worksheet]:
+        for worksheet in sheet.worksheets():
+            yield worksheet

-            # process and yield metadata here:
-            yield from self._process_rows(gw)
+    def __iter__(self) -> Iterator[Metadata]:
+        spreadsheet = self.open_sheet()
+        for worksheet in self.enumerate_sheets(spreadsheet):
+            with logger.contextualize(worksheet=f"{spreadsheet.title}:{worksheet.title}"):
+                if not self.should_process_sheet(worksheet.title):
+                    logger.debug("Skipped worksheet due to allow/block rules")
+                    continue
+                logger.info(f"Opening worksheet header={self.header}")
+                gw = GWorksheet(worksheet, header_row=self.header, columns=self.columns)
+                if len(missing_cols := self.missing_required_columns(gw)):
+                    logger.debug(f"Skipped worksheet due to missing required column(s) for {missing_cols}")
+                    continue
+
+                # process and yield metadata here:
+                yield from self._process_rows(gw)
            logger.info(f"Finished worksheet {worksheet.title}")

    def _process_rows(self, gw: GWorksheet):
@@ -69,7 +81,9 @@ class GsheetsFeederDB(Feeder, Database):
            # All checks done - archival process starts here
            m = Metadata().set_url(url)
            self._set_context(m, gw, row)
-            yield m
+
+            with logger.contextualize(row=row):
+                yield m

    def _set_context(self, m: Metadata, gw: GWorksheet, row: int) -> Metadata:
        # TODO: Check folder value not being recognised
@@ -99,16 +113,16 @@ class GsheetsFeederDB(Feeder, Database):
        return missing

    def started(self, item: Metadata) -> None:
-        logger.info(f"STARTED {item}")
+        logger.info("STARTED")
        gw, row = self._retrieve_gsheet(item)
        gw.set_cell(row, "status", "Archive in progress")

    def failed(self, item: Metadata, reason: str) -> None:
-        logger.error(f"FAILED {item}")
+        logger.error("FAILED")
        self._safe_status_update(item, f"Archive failed {reason}")

    def aborted(self, item: Metadata) -> None:
-        logger.warning(f"ABORTED {item}")
+        logger.warning("ABORTED")
        self._safe_status_update(item, "")

    def fetch(self, item: Metadata) -> Union[Metadata, bool]:
@@ -117,13 +131,13 @@ class GsheetsFeederDB(Feeder, Database):

    def done(self, item: Metadata, cached: bool = False) -> None:
        """archival result ready - should be saved to DB"""
-        logger.success(f"DONE {item.get_url()}")
        gw, row = self._retrieve_gsheet(item)
-        # self._safe_status_update(item, 'done')

        cell_updates = []
        row_values = gw.get_row(row)

+        logger.info("DONE")
+
        def batch_if_valid(col, val, final_value=None):
            final_value = final_value or val
            try:
@@ -175,9 +189,7 @@ class GsheetsFeederDB(Feeder, Database):
            )

        @retry(
-            wait_incrementing_start=1000,
-            wait_incrementing_increment=3000,
-            wait_incrementing_max=20_000,
+            wait_exponential_multiplier=1,
            stop_max_attempt_number=5,
        )
        def batch_set_cell_with_retry(gw, cell_updates: list):
@@ -190,15 +202,13 @@ class GsheetsFeederDB(Feeder, Database):
            gw, row = self._retrieve_gsheet(item)
            gw.set_cell(row, "status", new_status)
        except Exception as e:
-            logger.debug(f"Unable to update sheet: {e}")
+            logger.debug(f"Unable to update sheet: {e}: {traceback.format_exc()}")

    def _retrieve_gsheet(self, item: Metadata) -> Tuple[GWorksheet, int]:
        if gsheet := item.get_context("gsheet"):
            gw: GWorksheet = gsheet.get("worksheet")
            row: int = gsheet.get("row")
        elif self.sheet_id:
-            logger.error(
-                f"Unable to retrieve Gsheet for {item.get_url()}, GsheetDB must be used alongside GsheetFeeder."
-            )
+            logger.error("Unable to retrieve Gsheet, GsheetDB must be used alongside GsheetFeeder.")

        return gw, row
--- a/src/auto_archiver/modules/gsheet_feeder_db/gworksheet.py
+++ b/src/auto_archiver/modules/gsheet_feeder_db/gworksheet.py
@@ -1,4 +1,5 @@
 from gspread import utils
+from retrying import retry


 class GWorksheet:
@@ -26,6 +27,10 @@ class GWorksheet:
        "replaywebpage": "replaywebpage",
    }

+    @retry(
+        wait_exponential_multiplier=1,
+        stop_max_attempt_number=6,
+    )
    def __init__(self, worksheet, columns=COLUMN_NAMES, header_row=1):
        self.wks = worksheet
        self.columns = columns
--- a/src/auto_archiver/modules/hash_enricher/hash_enricher.py
+++ b/src/auto_archiver/modules/hash_enricher/hash_enricher.py
@@ -9,7 +9,7 @@ making it suitable for handling large files efficiently.
 """

 import hashlib
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger

 from auto_archiver.core import Enricher
 from auto_archiver.core import Metadata
@@ -22,10 +22,12 @@ class HashEnricher(Enricher):
    """

    def enrich(self, to_enrich: Metadata) -> None:
-        url = to_enrich.get_url()
-        logger.debug(f"calculating media hashes for {url=} (using {self.algorithm})")
+        logger.debug(f"Calculating media hashes with algo={self.algorithm}")

        for i, m in enumerate(to_enrich.media):
+            if not m.filename:
+                logger.warning(f"Skipping hash for media without filename: {m}")
+                continue
            if len(hd := self.calculate_hash(m.filename)):
                to_enrich.media[i].set("hash", f"{self.algorithm}:{hd}")

--- a/src/auto_archiver/modules/html_formatter/html_formatter.py
+++ b/src/auto_archiver/modules/html_formatter/html_formatter.py
@@ -4,7 +4,7 @@ import os
 import pathlib
 from jinja2 import Environment, FileSystemLoader
 from urllib.parse import quote
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger
 import json
 import base64

@@ -35,7 +35,7 @@ class HtmlFormatter(Formatter):
    def format(self, item: Metadata) -> Media:
        url = item.get_url()
        if item.is_empty():
-            logger.debug(f"[SKIP] FORMAT there is no media or metadata to format: {url=}")
+            logger.debug("Nothing to format, skipping")
            return

        content = self.template.render(
--- a/src/auto_archiver/modules/instagram_api_extractor/manifest.py
+++ b/src/auto_archiver/modules/instagram_api_extractor/manifest.py
@@ -22,7 +22,7 @@
        "full_profile_max_posts": {
            "default": 0,
            "type": "int",
-            "help": "Use to limit the number of posts to download when full_profile is true. 0 means no limit. limit is applied softly since posts are fetched in batch, once to: posts, tagged posts, and highlights",
+            "help": "Use to limit the number of posts to download when full_profile is true or when a URL for multiple posts is passed (like /stories /highlights ...). 0 means no limit. when full_profile is true the order of downloaded content is stories -> posts -> tagged posts -> highlights, so a value of 10 could download 2 stories, 7 posts, 1 tagged posts, and 0 highlights.",
        },
        "minimize_json_output": {
            "default": True,
--- a/src/auto_archiver/modules/instagram_api_extractor/instagram_api_extractor.py
+++ b/src/auto_archiver/modules/instagram_api_extractor/instagram_api_extractor.py
@@ -8,11 +8,13 @@ data, reducing JSON output size, and handling large profiles.

 """

+import math
 import re
 from datetime import datetime
+import traceback

 import requests
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger
 from retrying import retry
 from tqdm import tqdm

@@ -35,17 +37,19 @@ class InstagramAPIExtractor(Extractor):
    def setup(self) -> None:
        if self.api_endpoint[-1] == "/":
            self.api_endpoint = self.api_endpoint[:-1]
+        self.full_profile_max_posts = int(self.full_profile_max_posts or 0)
+        if self.full_profile_max_posts == 0:
+            self.full_profile_max_posts = math.inf

    def download(self, item: Metadata) -> Metadata:
        url = item.get_url()
-
        url.replace("instagr.com", "instagram.com").replace("instagr.am", "instagram.com")
        insta_matches = self.valid_url.findall(url)
-        logger.info(f"{insta_matches=}")
+
        if not len(insta_matches) or len(insta_matches[0]) != 3:
            return
        if len(insta_matches) > 1:
-            logger.warning(f"Multiple instagram matches found in {url=}, using the first one")
+            logger.debug("Multiple instagram matches found, using the first one")
            return
        g1, g2, g3 = insta_matches[0][0], insta_matches[0][1], insta_matches[0][2]
        if g1 == "":
@@ -61,13 +65,13 @@ class InstagramAPIExtractor(Extractor):
                return self.download_post(item, id=g3, context="story")
            return self.download_stories(item, g2)
        else:
-            logger.warning(f"Unknown instagram regex group match {g1=} found in {url=}")
+            logger.warning(f"Unknown instagram regex group match {g1=}")
            return

    @retry(wait_random_min=1000, wait_random_max=3000, stop_max_attempt_number=5)
    def call_api(self, path: str, params: dict) -> dict:
        headers = {"accept": "application/json", "x-access-key": self.access_token}
-        logger.debug(f"calling {self.api_endpoint}/{path} with {params=}")
+        logger.debug(f"Calling {self.api_endpoint}/{path} with {params=}")
        return requests.get(f"{self.api_endpoint}/{path}", headers=headers, params=params).json()

    def cleanup_dict(self, d: dict | list) -> dict:
@@ -95,67 +99,89 @@ class InstagramAPIExtractor(Extractor):
        result.set_title(user.get("full_name", username)).set("data", user)
        if pic_url := user.get("profile_pic_url_hd", user.get("profile_pic_url")):
            filename = self.download_from_url(pic_url)
-            result.add_media(Media(filename=filename), id="profile_picture")
+            if filename:
+                result.add_media(Media(filename=filename), id="profile_picture")
+            else:
+                logger.warning(f"Failed to download profile picture from {pic_url}")

+        count_posts = 0
        if self.full_profile:
            user_id = user.get("pk")
            # download all stories
            try:
-                stories = self._download_stories_reusable(result, username)
+                stories = self._download_stories_reusable(
+                    result, username, max_to_download=self.full_profile_max_posts - count_posts
+                )
+                count_posts += len(stories)
                result.set("#stories", len(stories))
            except Exception as e:
                result.append("errors", f"Error downloading stories for {username}")
-                logger.error(f"Error downloading stories for {username}: {e}")
+                logger.error(f"Error downloading stories for {username}: {e} {traceback.format_exc()}")

            # download all posts
            try:
-                self.download_all_posts(result, user_id)
+                if count_posts < self.full_profile_max_posts:
+                    count_posts += self.download_all_posts(
+                        result, user_id, max_to_download=self.full_profile_max_posts - count_posts
+                    )
            except Exception as e:
                result.append("errors", f"Error downloading posts for {username}")
-                logger.error(f"Error downloading posts for {username}: {e}")
+                logger.error(f"Error downloading posts for {username}: {e} {traceback.format_exc()}")

            # download all tagged
            try:
-                self.download_all_tagged(result, user_id)
+                if count_posts < self.full_profile_max_posts:
+                    count_posts += self.download_all_tagged(
+                        result, user_id, max_to_download=self.full_profile_max_posts - count_posts
+                    )
            except Exception as e:
                result.append("errors", f"Error downloading tagged posts for {username}")
-                logger.error(f"Error downloading tagged posts for {username}: {e}")
+                logger.error(f"Error downloading tagged posts for {username}: {e} {traceback.format_exc()}")

            # download all highlights
            try:
-                self.download_all_highlights(result, username, user_id)
+                if count_posts < self.full_profile_max_posts:
+                    count_posts += self.download_all_highlights(
+                        result, username, user_id, max_to_download=self.full_profile_max_posts - count_posts
+                    )
            except Exception as e:
                result.append("errors", f"Error downloading highlights for {username}")
-                logger.error(f"Error downloading highlights for {username}: {e}")
+                logger.error(f"Error downloading highlights for {username}: {e} {traceback.format_exc()}")

        result.set_url(url)  # reset as scrape_item modifies it
        return result.success("insta profile")

-    def download_all_highlights(self, result, username, user_id):
+    def download_all_highlights(self, result, username, user_id, max_to_download: int) -> int:
        count_highlights = 0
        highlights = self.call_api("v1/user/highlights", {"user_id": user_id})
+        highlights = highlights[: min(max_to_download, len(highlights))]  # newest to oldest
        for h in highlights:
            try:
-                h_info = self._download_highlights_reusable(result, h.get("pk"))
+                h_info = self._download_highlights_reusable(result, h.get("pk"), max_to_download=max_to_download)
                count_highlights += len(h_info.get("items", []))
            except Exception as e:
                result.append(
                    "errors",
                    f"Error downloading highlight id{h.get('pk')} for {username}",
                )
-                logger.error(f"Error downloading highlight id{h.get('pk')} for {username}: {e}")
-            if self.full_profile_max_posts and count_highlights >= self.full_profile_max_posts:
-                logger.info(f"HIGHLIGHTS reached full_profile_max_posts={self.full_profile_max_posts}")
+                logger.error(
+                    f"Error downloading highlight id{h.get('pk')} for {username}: {e} {traceback.format_exc()}"
+                )
+            if count_highlights >= max_to_download:
+                logger.debug(f"HIGHLIGHTS reached max_to_download={self.full_profile_max_posts}")
                break
        result.set("#highlights", count_highlights)
+        return count_highlights

-    def download_post(self, result: Metadata, code: str = None, id: str = None, context: str = None) -> Metadata:
+    def download_post(self, result: Metadata, code: str = None, id: str = None, context: str = "") -> Metadata:
        if id:
            post = self.call_api("v1/media/by/id", {"id": id})
        else:
            post = self.call_api("v1/media/by/code", {"code": code})
        assert post, f"Post {id or code} not found"

+        result.set(f"{context}_data", post)
+
        if caption_text := post.get("caption_text"):
            result.set_title(caption_text)

@@ -166,54 +192,58 @@ class InstagramAPIExtractor(Extractor):
        return result.success(f"insta {context or 'post'}")

    def download_highlights(self, result: Metadata, id: str) -> Metadata:
-        h_info = self._download_highlights_reusable(result, id)
+        h_info = self._download_highlights_reusable(result, id, self.full_profile_max_posts)
        items = len(h_info.get("items", []))
        del h_info["items"]
        result.set_title(h_info.get("title")).set("data", h_info).set("#reels", items)
        return result.success("insta highlights")

-    def _download_highlights_reusable(self, result: Metadata, id: str) -> dict:
+    def _download_highlights_reusable(self, result: Metadata, id: str, max_to_download: int) -> dict:
        full_h = self.call_api("v2/highlight/by/id", {"id": id})
        h_info = full_h.get("response", {}).get("reels", {}).get(f"highlight:{id}")
        assert h_info, f"Highlight {id} not found: {full_h=}"

        if cover_media := h_info.get("cover_media", {}).get("cropped_image_version", {}).get("url"):
            filename = self.download_from_url(cover_media)
-            result.add_media(Media(filename=filename), id=f"cover_media highlight {id}")
+            if filename:
+                result.add_media(Media(filename=filename), id=f"cover_media highlight {id}")
+            else:
+                logger.warning(f"Failed to download cover media from {cover_media}")

        items = h_info.get("items", [])[::-1]  # newest to oldest
+        items = items[: min(max_to_download, len(items))]
        for h in tqdm(items, desc="downloading highlights", unit="highlight"):
            try:
                self.scrape_item(result, h, "highlight")
            except Exception as e:
                result.append("errors", f"Error downloading highlight {h.get('id')}")
-                logger.error(f"Error downloading highlight, skipping {h.get('id')}: {e}")
+                logger.error(f"Error downloading highlight, skipping {h.get('id')}: {e} {traceback.format_exc()}")

        return h_info

    def download_stories(self, result: Metadata, username: str) -> Metadata:
        now = datetime.now().strftime("%Y-%m-%d_%H-%M")
-        stories = self._download_stories_reusable(result, username)
+        stories = self._download_stories_reusable(result, username, max_to_download=self.full_profile_max_posts)
        if stories == []:
            return result.success("insta no story")
        result.set_title(f"stories {username} at {now}").set("#stories", len(stories))
        return result.success(f"insta stories {now}")

-    def _download_stories_reusable(self, result: Metadata, username: str) -> list[dict]:
+    def _download_stories_reusable(self, result: Metadata, username: str, max_to_download: int) -> list[dict]:
        stories = self.call_api("v1/user/stories/by/username", {"username": username})
        if not stories or not len(stories):
            return []
-        stories = stories[::-1]  # newest to oldest
+        stories = stories[::-1][: min(max_to_download, len(stories))]  # newest to oldest

        for s in tqdm(stories, desc="downloading stories", unit="story"):
            try:
                self.scrape_item(result, s, "story")
            except Exception as e:
                result.append("errors", f"Error downloading story {s.get('id')}")
-                logger.error(f"Error downloading story, skipping {s.get('id')}: {e}")
+                logger.error(f"Error downloading story, skipping {s.get('id')}: {e} {traceback.format_exc()}")
        return stories

-    def download_all_posts(self, result: Metadata, user_id: str):
+    def download_all_posts(self, result: Metadata, user_id: str, max_to_download: int) -> int:
        end_cursor = None
        pbar = tqdm(desc="downloading posts")

@@ -223,22 +253,23 @@ class InstagramAPIExtractor(Extractor):
            if not posts or not isinstance(posts, list) or len(posts) != 2:
                break
            posts, end_cursor = posts[0], posts[1]
-            logger.info(f"parsing {len(posts)} posts, next {end_cursor=}")
-
+            posts = posts[: min(max_to_download, len(posts))]
+            logger.info(f"Parsing {len(posts)} posts, next {end_cursor=} {post_count=} {max_to_download=}")
            for p in posts:
                try:
                    self.scrape_item(result, p, "post")
                except Exception as e:
                    result.append("errors", f"Error downloading post {p.get('id')}")
-                    logger.error(f"Error downloading post, skipping {p.get('id')}: {e}")
+                    logger.error(f"Error downloading post, skipping {p.get('id')}: {e} {traceback.format_exc()}")
                pbar.update(1)
                post_count += 1
-            if self.full_profile_max_posts and post_count >= self.full_profile_max_posts:
-                logger.info(f"POSTS reached full_profile_max_posts={self.full_profile_max_posts}")
+            if post_count >= max_to_download:
+                logger.info(f"POSTS reached max_to_download={self.full_profile_max_posts}")
                break
        result.set("#posts", post_count)
+        return post_count

-    def download_all_tagged(self, result: Metadata, user_id: str):
+    def download_all_tagged(self, result: Metadata, user_id: str, max_to_download: int) -> int:
        next_page_id = ""
        pbar = tqdm(desc="downloading tagged posts")

@@ -250,22 +281,23 @@ class InstagramAPIExtractor(Extractor):
                break
            next_page_id = resp.get("next_page_id")

-            logger.info(f"parsing {len(posts)} tagged posts, next {next_page_id=}")
-
+            logger.info(f"Parsing {len(posts)} tagged posts, next {next_page_id=}")
+            posts = posts[: min(max_to_download, len(posts))]
            for p in posts:
                try:
                    self.scrape_item(result, p, "tagged")
                except Exception as e:
                    result.append("errors", f"Error downloading tagged post {p.get('id')}")
-                    logger.error(f"Error downloading tagged post, skipping {p.get('id')}: {e}")
+                    logger.error(f"Error downloading tagged post, skipping {p.get('id')}: {e} {traceback.format_exc()}")
                pbar.update(1)
                tagged_count += 1
-            if self.full_profile_max_posts and tagged_count >= self.full_profile_max_posts:
-                logger.info(f"TAGS reached full_profile_max_posts={self.full_profile_max_posts}")
+            if tagged_count >= max_to_download:
+                logger.info(f"TAGS reached max_to_download={self.full_profile_max_posts}")
                break
        result.set("#tagged", tagged_count)
+        return tagged_count

-    ### reusable parsing utils below
+    # reusable parsing utils below

    def scrape_item(self, result: Metadata, item: dict, context: str = None) -> dict:
        """
@@ -319,7 +351,10 @@ class InstagramAPIExtractor(Extractor):
        image_media = None
        if image_url := item.get("thumbnail_url"):
            filename = self.download_from_url(image_url, verbose=False)
-            image_media = Media(filename=filename)
+            if filename:
+                image_media = Media(filename=filename)
+            else:
+                logger.warning(f"Failed to download thumbnail from {image_url}")

        # retrieve video info
        best_id = item.get("id", item.get("pk"))
@@ -331,16 +366,19 @@ class InstagramAPIExtractor(Extractor):

        if video_url := item.get("video_url"):
            filename = self.download_from_url(video_url, verbose=False)
-            video_media = Media(filename=filename)
-            if taken_at:
-                video_media.set("date", taken_at)
-            if code:
-                video_media.set("url", f"https://www.instagram.com/p/{code}")
-            if caption_text:
-                video_media.set("text", caption_text)
-            video_media.set("preview", [image_media])
-            video_media.set("data", [item])
-            return item, video_media, f"{context or 'video'} {best_id}"
+            if filename:
+                video_media = Media(filename=filename)
+                if taken_at:
+                    video_media.set("date", taken_at)
+                if code:
+                    video_media.set("url", f"https://www.instagram.com/p/{code}")
+                if caption_text:
+                    video_media.set("text", caption_text)
+                video_media.set("preview", [image_media])
+                video_media.set("data", [item])
+                return item, video_media, f"{context or 'video'} {best_id}"
+            else:
+                logger.warning(f"Failed to download video from {video_url}")
        elif image_media:
            if taken_at:
                image_media.set("date", taken_at)
--- a/src/auto_archiver/modules/instagram_extractor/instagram_extractor.py
+++ b/src/auto_archiver/modules/instagram_extractor/instagram_extractor.py
@@ -7,8 +7,9 @@ highlights, and tagged posts. Authentication is required via username/password o
 import re
 import os
 import shutil
+import traceback
 import instaloader
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger

 from auto_archiver.core import Extractor
 from auto_archiver.core import Metadata
@@ -29,8 +30,9 @@ class InstagramExtractor(Extractor):
    # TODO: links to stories

    def setup(self) -> None:
-        logger.warning("Instagram Extractor is not actively maintained, and may not work as expected.")
-        logger.warning("Please consider using the Instagram Tbot Extractor or Instagram API Extractor instead.")
+        logger.warning(
+            "Instagram Extractor is not actively maintained, and may not work as expected.\nPlease consider using the Instagram Tbot Extractor or Instagram API Extractor instead."
+        )

        self.insta = instaloader.Instaloader(
            download_geotags=True,
@@ -43,8 +45,7 @@ class InstagramExtractor(Extractor):
            self.insta.load_session_from_file(self.username, self.session_file)
        except Exception:
            try:
-                logger.debug("Session file failed", exc_info=True)
-                logger.info("No valid session file found - Attempting login with use and password.")
+                logger.info("No valid session file found - Attempting login with username and password.")
                self.insta.login(self.username, self.password)
                self.insta.save_session_to_file(self.session_file)
            except Exception as e:
@@ -79,7 +80,7 @@ class InstagramExtractor(Extractor):
        return result

    def download_post(self, url: str, post_id: str) -> Metadata:
-        logger.debug(f"Instagram {post_id=} detected in {url=}")
+        logger.debug(f"Instagram {post_id=} detected")

        post = instaloader.Post.from_shortcode(self.insta.context, post_id)
        if self.insta.download_post(post, target=post.owner_username):
@@ -87,7 +88,7 @@ class InstagramExtractor(Extractor):

    def download_profile(self, url: str, username: str) -> Metadata:
        # gets posts, posts where username is tagged, igtv postss, stories, and highlights
-        logger.debug(f"Instagram {username=} detected in {url=}")
+        logger.debug(f"Instagram {username=} detected")

        profile = instaloader.Profile.from_username(self.insta.context, username)
        try:
@@ -95,27 +96,27 @@ class InstagramExtractor(Extractor):
                try:
                    self.insta.download_post(post, target=f"profile_post_{post.owner_username}")
                except Exception as e:
-                    logger.error(f"Failed to download post: {post.shortcode}: {e}")
+                    logger.error(f"Failed to download post: {post.shortcode}: {e} {traceback.format_exc()}")
        except Exception as e:
-            logger.error(f"Failed profile.get_posts: {e}")
+            logger.error(f"Failed profile.get_posts: {e}: {traceback.format_exc()}")

        try:
            for post in profile.get_tagged_posts():
                try:
                    self.insta.download_post(post, target=f"tagged_post_{post.owner_username}")
                except Exception as e:
-                    logger.error(f"Failed to download tagged post: {post.shortcode}: {e}")
+                    logger.error(f"Failed to download tagged post: {post.shortcode}: {e} {traceback.format_exc()}")
        except Exception as e:
-            logger.error(f"Failed profile.get_tagged_posts: {e}")
+            logger.error(f"Failed profile.get_tagged_posts: {e} {traceback.format_exc()}")

        try:
            for post in profile.get_igtv_posts():
                try:
                    self.insta.download_post(post, target=f"igtv_post_{post.owner_username}")
                except Exception as e:
-                    logger.error(f"Failed to download igtv post: {post.shortcode}: {e}")
+                    logger.error(f"Failed to download igtv post: {post.shortcode}: {e} {traceback.format_exc()}")
        except Exception as e:
-            logger.error(f"Failed profile.get_igtv_posts: {e}")
+            logger.error(f"Failed profile.get_igtv_posts: {e} {traceback.format_exc()}")

        try:
            for story in self.insta.get_stories([profile.userid]):
@@ -123,9 +124,9 @@ class InstagramExtractor(Extractor):
                    try:
                        self.insta.download_storyitem(item, target=f"story_item_{story.owner_username}")
                    except Exception as e:
-                        logger.error(f"Failed to download story item: {item}: {e}")
+                        logger.error(f"Failed to download story item: {item}: {e} {traceback.format_exc()}")
        except Exception as e:
-            logger.error(f"Failed get_stories: {e}")
+            logger.error(f"Failed get_stories: {e} {traceback.format_exc()}")

        try:
            for highlight in self.insta.get_highlights(profile.userid):
@@ -133,9 +134,9 @@ class InstagramExtractor(Extractor):
                    try:
                        self.insta.download_storyitem(item, target=f"highlight_item_{highlight.owner_username}")
                    except Exception as e:
-                        logger.error(f"Failed to download highlight item: {item}: {e}")
+                        logger.error(f"Failed to download highlight item: {item}: {e} {traceback.format_exc()}")
        except Exception as e:
-            logger.error(f"Failed get_highlights: {e}")
+            logger.error(f"Failed get_highlights: {e} {traceback.format_exc()}")

        return self.process_downloads(url, f"@{username}", profile._asdict(), None)

@@ -158,4 +159,4 @@ class InstagramExtractor(Extractor):

            return result.success("instagram")
        except Exception as e:
-            logger.error(f"Could not fetch instagram post {url} due to: {e}")
+            logger.error(f"Could not fetch instagram post due to: {e} {traceback.format_exc()}")
--- a/src/auto_archiver/modules/instagram_tbot_extractor/instagram_tbot_extractor.py
+++ b/src/auto_archiver/modules/instagram_tbot_extractor/instagram_tbot_extractor.py
@@ -12,7 +12,7 @@ import shutil
 import time
 from sqlite3 import OperationalError

-from loguru import logger
+from auto_archiver.utils.custom_logger import logger
 from telethon.sync import TelegramClient

 from auto_archiver.core import Extractor
@@ -32,7 +32,7 @@ class InstagramTbotExtractor(Extractor):
        1. makes a copy of session_file that is removed in cleanup
        2. checks if the session file is valid
        """
-        logger.info(f"SETUP {self.name} checking login...")
+        logger.debug(f"SETUP {self.name} checking login...")
        self._prepare_session_file()
        self._initialize_telegram_client()

@@ -58,10 +58,10 @@ class InstagramTbotExtractor(Extractor):
                "If you do, disable at least one of the archivers for the first-time setup of the telethon session: {e}"
            )
        with self.client.start():
-            logger.info(f"SETUP {self.name} login works.")
+            logger.debug(f"SETUP {self.name} login works.")

    def cleanup(self) -> None:
-        logger.info(f"CLEANUP {self.name}.")
+        logger.debug(f"CLEANUP {self.name}.")
        session_file_name = self.session_file + ".session"
        if os.path.exists(session_file_name):
            os.remove(session_file_name)
@@ -79,17 +79,17 @@ class InstagramTbotExtractor(Extractor):

            # This may be outdated and replaced by the below message, but keeping until confirmed
            if "You must enter a URL to a post" in message:
-                logger.debug(f"invalid link {url=} for {self.name}: {message}")
+                logger.debug(f"Invalid link for {self.name}: {message}")
                return False

            if "Media not found or unavailable" in message:
-                logger.debug(f"No media found for link {url=} for {self.name}: {message}")
+                logger.debug(f"No media found for {self.name}: {message}")
                return False

            if message:
                result.set_content(message).set_title(message[:128])
            elif result.is_empty():
-                logger.debug(f"No media found for link {url=} for {self.name}: {message}")
+                logger.debug(f"No media found for {self.name}: {message}")
                return False
            return result.success("insta-via-bot")

--- a/src/auto_archiver/modules/json_enricher/json_enricher.py
+++ b/src/auto_archiver/modules/json_enricher/json_enricher.py
@@ -1,5 +1,5 @@
 import json
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger
 import os

 from auto_archiver.core import Enricher
@@ -8,9 +8,7 @@ from auto_archiver.core import Media, Metadata

 class JsonEnricher(Enricher):
    def enrich(self, to_enrich: Metadata) -> None:
-        url = to_enrich.get_url()
-
-        logger.debug(f"JSON Enricher for {url=}")
+        logger.debug("Enriching as JSON")

        item_path = os.path.join(self.tmp_dir, "metadata.json")
        with open(item_path, mode="w", encoding="utf-8") as outf:
--- a/src/auto_archiver/modules/local_storage/local_storage.py
+++ b/src/auto_archiver/modules/local_storage/local_storage.py
@@ -1,7 +1,7 @@
 import shutil
 from typing import IO
 import os
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger

 from auto_archiver.core import Media
 from auto_archiver.core import Storage
@@ -38,8 +38,7 @@ class LocalStorage(Storage):
        os.makedirs(os.path.dirname(dest), exist_ok=True)
        logger.debug(f"[{self.__class__.__name__}] storing file {media.filename} with key {media.key} to {dest}")

-        res = shutil.copy2(media.filename, dest)
-        logger.info(res)
+        shutil.copy2(media.filename, dest)
        return True

    # must be implemented even if unused
--- a/src/auto_archiver/modules/meta_enricher/meta_enricher.py
+++ b/src/auto_archiver/modules/meta_enricher/meta_enricher.py
@@ -1,6 +1,6 @@
 import datetime
 import os
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger

 from auto_archiver.core import Enricher
 from auto_archiver.core import Metadata
@@ -12,22 +12,22 @@ class MetaEnricher(Enricher):
    """

    def enrich(self, to_enrich: Metadata) -> None:
-        url = to_enrich.get_url()
        if to_enrich.is_empty():
-            logger.debug(f"[SKIP] META_ENRICHER there is no media or metadata to enrich: {url=}")
+            logger.debug("[SKIP] META_ENRICHER there is no media or metadata to enrich")
            return

-        logger.debug(f"calculating archive metadata information for {url=}")
+        logger.debug("Calculating archive metadata information")

        self.enrich_file_sizes(to_enrich)
        self.enrich_archive_duration(to_enrich)

    def enrich_file_sizes(self, to_enrich: Metadata):
-        logger.debug(
-            f"calculating archive file sizes for url={to_enrich.get_url()} ({len(to_enrich.media)} media files)"
-        )
+        logger.debug(f"Calculating archive file sizes for {len(to_enrich.media)} media files")
        total_size = 0
        for media in to_enrich.get_all_media():
+            if not media.filename:
+                logger.warning(f"Skipping file size for media without filename: {media}")
+                continue
            file_stats = os.stat(media.filename)
            media.set("bytes", file_stats.st_size)
            media.set("size", self.human_readable_bytes(file_stats.st_size))
@@ -44,7 +44,7 @@ class MetaEnricher(Enricher):
            size /= 1024

    def enrich_archive_duration(self, to_enrich):
-        logger.debug(f"calculating archive duration for url={to_enrich.get_url()} ")
+        logger.debug("Calculating archive duration")

        archive_duration = datetime.datetime.now(datetime.timezone.utc) - to_enrich.get("_processed_at")
        to_enrich.set("archive_duration_seconds", archive_duration.seconds)
--- a/src/auto_archiver/modules/metadata_enricher/manifest.py
+++ b/src/auto_archiver/modules/metadata_enricher/manifest.py
@@ -3,6 +3,13 @@
    "type": ["enricher"],
    "requires_setup": True,
    "dependencies": {"python": ["loguru"], "bin": ["exiftool"]},
+    "configs": {
+        "look_for_keys": {
+            "default": [],
+            "help": "list of lowercased metadata keys that will be included in the enriched metadata. Special keys: 'author', 'datetimes', 'location' to include related metadata fields. The default empty list `[]` means all metadata will be included.",
+            "type": "list",
+        },
+    },
    "description": """
    Extracts metadata information from files using ExifTool.

--- a/src/auto_archiver/modules/metadata_enricher/metadata_enricher.py
+++ b/src/auto_archiver/modules/metadata_enricher/metadata_enricher.py
@@ -1,6 +1,6 @@
 import subprocess
 import traceback
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger

 from auto_archiver.core import Enricher
 from auto_archiver.core import Metadata
@@ -12,11 +12,12 @@ class MetadataEnricher(Enricher):
    """

    def enrich(self, to_enrich: Metadata) -> None:
-        url = to_enrich.get_url()
-        logger.debug(f"extracting EXIF metadata for {url=}")
+        logger.debug("Extracting EXIF metadata")

        for i, m in enumerate(to_enrich.media):
            if len(md := self.get_metadata(m.filename)):
+                if self.look_for_keys != []:
+                    md = self.select_metadata(md, self.look_for_keys)
                to_enrich.media[i].set("metadata", md)

    def get_metadata(self, filename: str) -> dict:
@@ -24,15 +25,44 @@ class MetadataEnricher(Enricher):
            # Run ExifTool command to extract metadata from the file
            cmd = ["exiftool", filename]
            result = subprocess.run(cmd, capture_output=True, text=True)
-
            # Process the output to extract individual metadata fields
            metadata = {}
            for line in result.stdout.splitlines():
                field, value = line.strip().split(":", 1)
                metadata[field.strip()] = value.strip()
            return metadata
-        except FileNotFoundError:
-            logger.error("[exif_enricher] ExifTool not found. Make sure ExifTool is installed and added to PATH.")
+        except FileNotFoundError as e:
+            logger.error(f"ExifTool not found. Make sure ExifTool is installed and added to PATH. {e}")
        except Exception as e:
            logger.error(f"Error occurred: {e}: {traceback.format_exc()}")
        return {}
+
+    def select_metadata(self, all_md, requested_metadata_keys):
+        """
+        coordinates the selection of metadata from the general exiftool output to the user-specified grocery list
+        """
+        # defining the batches of metadata that get pulled for special terms
+        author_key_terms = ["author", "producer", "creator"]
+        datetime_key_terms = ["date", "time"]
+        location_key_terms = ["gps", "latitude", "longitude"]
+
+        specified_md = {}
+        for md_key in all_md.keys():
+            md_key_lower = md_key.lower()
+            # checking for special baskets within the grocery list of requested metadata
+            if ("author" in requested_metadata_keys) and any(
+                term in md_key_lower and len(all_md[md_key]) for term in author_key_terms
+            ):
+                specified_md[md_key] = all_md[md_key]
+            if ("datetime" in requested_metadata_keys) and any(
+                term in md_key_lower and len(all_md[md_key]) for term in datetime_key_terms
+            ):
+                specified_md[md_key] = all_md[md_key]
+            if ("location" in requested_metadata_keys) and any(
+                term in md_key_lower and len(all_md[md_key]) for term in location_key_terms
+            ):
+                specified_md[md_key] = all_md[md_key]
+            # if the metadata value is requested directly
+            if md_key_lower in requested_metadata_keys or md_key in requested_metadata_keys and len(all_md[md_key]):
+                specified_md[md_key] = all_md[md_key]
+        return specified_md
--- a/src/auto_archiver/modules/opentimestamps_enricher/opentimestamps_enricher.py
+++ b/src/auto_archiver/modules/opentimestamps_enricher/opentimestamps_enricher.py
@@ -1,6 +1,7 @@
 import os
+import traceback

-from loguru import logger
+from auto_archiver.utils.custom_logger import logger
 import opentimestamps
 from opentimestamps.calendar import RemoteCalendar, DEFAULT_CALENDAR_WHITELIST
 from opentimestamps.core.timestamp import Timestamp, DetachedTimestampFile
@@ -14,13 +15,12 @@ from auto_archiver.utils.misc import get_current_timestamp

 class OpentimestampsEnricher(Enricher):
    def enrich(self, to_enrich: Metadata) -> None:
-        url = to_enrich.get_url()
-        logger.debug(f"OpenTimestamps timestamping files for {url=}")
+        logger.debug("OpenTimestamps timestamping files")

        # Get the media files to timestamp
        media_files = [m for m in to_enrich.media if m.filename and not m.get("opentimestamps")]
        if not media_files:
-            logger.debug(f"No files found to timestamp in {url=}")
+            logger.debug("No files found to timestamp")
            return

        timestamp_files = []
@@ -94,7 +94,7 @@ class OpentimestampsEnricher(Enricher):
                        detached_timestamp.serialize(ctx)
                        f.write(ctx.getbytes())
                except Exception as e:
-                    logger.warning(f"Failed to serialize timestamp file: {e}")
+                    logger.warning(f"Failed to serialize timestamp file: {e} {traceback.format_exc()}")
                    continue

                # Create media for the timestamp file
@@ -113,16 +113,16 @@ class OpentimestampsEnricher(Enricher):
                media.set("opentimestamps", True)

            except Exception as e:
-                logger.warning(f"Error while timestamping {media.filename}: {e}")
+                logger.warning(f"Error while timestamping {media.filename}: {e} {traceback.format_exc()}")

        # Add timestamp files to the metadata
        if timestamp_files:
            to_enrich.set("opentimestamped", True)
            to_enrich.set("opentimestamps_count", len(timestamp_files))
-            logger.info(f"{len(timestamp_files)} OpenTimestamps proofs created for {url=}")
+            logger.info(f"{len(timestamp_files)} OpenTimestamps proofs created")
        else:
            to_enrich.set("opentimestamped", False)
-            logger.warning(f"No successful timestamps created for {url=}")
+            logger.warning("No successful timestamps created")

    def verify_timestamp(self, detached_timestamp):
        """
--- a/src/auto_archiver/modules/pdq_hash_enricher/pdq_hash_enricher.py
+++ b/src/auto_archiver/modules/pdq_hash_enricher/pdq_hash_enricher.py
@@ -15,7 +15,7 @@ import traceback
 import pdqhash
 import numpy as np
 from PIL import Image, UnidentifiedImageError
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger

 from auto_archiver.core import Enricher
 from auto_archiver.core import Metadata
@@ -28,8 +28,7 @@ class PdqHashEnricher(Enricher):
    """

    def enrich(self, to_enrich: Metadata) -> None:
-        url = to_enrich.get_url()
-        logger.debug(f"calculating perceptual hashes for {url=}")
+        logger.debug("Calculating perceptual hashes")
        media_with_hashes = []

        for m in to_enrich.media:
@@ -44,7 +43,7 @@ class PdqHashEnricher(Enricher):
                    media.set("pdq_hash", hd)
                    media_with_hashes.append(media.filename)

-        logger.debug(f"calculated '{len(media_with_hashes)}' perceptual hashes for {url=}: {media_with_hashes}")
+        logger.debug(f"Calculated '{len(media_with_hashes)}' perceptual hashes: {media_with_hashes}")

    def calculate_pdq_hash(self, filename):
        # returns a hexadecimal string with the perceptual hash for the given filename
--- a/src/auto_archiver/modules/s3_storage/s3_storage.py
+++ b/src/auto_archiver/modules/s3_storage/s3_storage.py
@@ -2,7 +2,7 @@ from typing import IO

 import boto3
 import os
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger

 from auto_archiver.core import Media
 from auto_archiver.core import Storage
@@ -56,7 +56,7 @@ class S3Storage(Storage):
            if existing_key := self.file_in_folder(path):
                media._key = existing_key
                media.set("previously archived", True)
-                logger.debug(f"skipping upload of {media.filename} because it already exists in {media.key}")
+                logger.debug(f"Skipping upload of {media.filename} because it already exists in {media.key}")
                return False

            _, ext = os.path.splitext(media.key)
--- a/src/auto_archiver/modules/ssl_enricher/ssl_enricher.py
+++ b/src/auto_archiver/modules/ssl_enricher/ssl_enricher.py
@@ -2,7 +2,7 @@ import ssl
 import os
 from slugify import slugify
 from urllib.parse import urlparse
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger

 from auto_archiver.core import Enricher
 from auto_archiver.core import Metadata, Media
@@ -19,10 +19,10 @@ class SSLEnricher(Enricher):

        url = to_enrich.get_url()
        parsed = urlparse(url)
-        assert parsed.scheme in ["https"], f"Invalid URL scheme {url=}"
+        assert parsed.scheme in ["https"], "Invalid URL scheme"

        domain = parsed.netloc
-        logger.debug(f"fetching SSL certificate for {domain=} in {url=}")
+        logger.debug(f"Fetching SSL certificate for {domain=}")

        cert = ssl.get_server_certificate((domain, 443))
        cert_fn = os.path.join(self.tmp_dir, f"{slugify(domain)}.pem")
--- a/src/auto_archiver/modules/telegram_extractor/telegram_extractor.py
+++ b/src/auto_archiver/modules/telegram_extractor/telegram_extractor.py
@@ -2,7 +2,7 @@ import requests
 import re
 import html
 from bs4 import BeautifulSoup
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger

 from auto_archiver.core import Extractor
 from auto_archiver.core import Metadata, Media
@@ -38,7 +38,7 @@ class TelegramExtractor(Extractor):

        video = s.find("video")
        if video is None:
-            logger.warning("could not find video")
+            logger.warning("Could not find video")
            image_tags = s.find_all(class_="tgme_widget_message_photo_wrap")

            image_urls = []
@@ -49,10 +49,18 @@ class TelegramExtractor(Extractor):
            if not len(image_urls):
                return False
            for img_url in image_urls:
-                result.add_media(Media(self.download_from_url(img_url)))
+                filename = self.download_from_url(img_url)
+                if not filename:
+                    logger.warning(f"Failed to download image from {img_url}")
+                    continue
+                result.add_media(Media(filename))
        else:
            video_url = video.get("src")
-            m_video = Media(self.download_from_url(video_url))
+            video_filename = self.download_from_url(video_url)
+            if not video_filename:
+                logger.warning(f"Failed to download video from {video_url}")
+                return False
+            m_video = Media(video_filename)
            # extract duration from HTML
            try:
                duration = s.find_all("time")[0].contents[0]
--- a/src/auto_archiver/modules/telethon_extractor/telethon_extractor.py
+++ b/src/auto_archiver/modules/telethon_extractor/telethon_extractor.py
@@ -1,3 +1,4 @@
+import asyncio
 import os
 import shutil
 import re
@@ -5,6 +6,7 @@ import time
 from pathlib import Path
 from datetime import date

+from telethon import functions
 from telethon.sync import TelegramClient
 from telethon.errors import ChannelInvalidError
 from telethon.tl.functions.messages import ImportChatInviteRequest
@@ -16,7 +18,7 @@ from telethon.errors.rpcerrorlist import (
 )

 from tqdm import tqdm
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger

 from auto_archiver.core import Extractor
 from auto_archiver.core import Metadata, Media
@@ -24,7 +26,7 @@ from auto_archiver.utils import random_str


 class TelethonExtractor(Extractor):
-    valid_url = re.compile(r"https:\/\/t\.me(\/c){0,1}\/(.+)\/(\d+)")
+    valid_url = re.compile(r"https:\/\/t\.me(\/c){0,1}\/(.+?)(\/s){0,1}\/(\d+)")
    invite_pattern = re.compile(r"t.me(\/joinchat){0,1}\/\+?(.+)")

    def setup(self) -> None:
@@ -52,6 +54,16 @@ class TelethonExtractor(Extractor):
        logger.debug(f"Making a copy of the session file {base_session_filepath} to {self.session_file}.session")
        shutil.copy(base_session_filepath, f"{self.session_file}.session")

+        # ensure a running event loop exists (Needed when used by Celery workers which may close the default one)
+        try:
+            loop = asyncio.get_event_loop()
+            if loop.is_closed():
+                loop = asyncio.new_event_loop()
+                asyncio.set_event_loop(loop)
+        except RuntimeError:
+            loop = asyncio.new_event_loop()
+            asyncio.set_event_loop(loop)
+
        # initiate the client
        self.client = TelegramClient(self.session_file, self.api_id, self.api_hash)

@@ -64,7 +76,7 @@ class TelethonExtractor(Extractor):
                # get currently joined channels
                # https://docs.telethon.dev/en/stable/modules/custom.html#module-telethon.tl.custom.dialog
                joined_channel_ids = [c.id for c in self.client.get_dialogs() if c.is_channel]
-                logger.info(f"already part of {len(joined_channel_ids)} channels")
+                logger.info(f"Already part of {len(joined_channel_ids)} channels")

                i = 0
                pbar = tqdm(desc=f"joining {len(self.channel_invites)} invite links", total=len(self.channel_invites))
@@ -79,22 +91,22 @@ class TelethonExtractor(Extractor):
                            else:
                                ent = self.client.get_entity(invite)  # fails if not a member
                                logger.warning(
-                                    f"please add the property id='{ent.id}' to the 'channel_invites' configuration where {invite=}, not doing so can lead to a minutes-long setup time due to telegram's rate limiting."
+                                    f"Please add the property id='{ent.id}' to the 'channel_invites' configuration where {invite=}, not doing so can lead to a minutes-long setup time due to telegram's rate limiting."
                                )
                        except ValueError:
-                            logger.info(f"joining new channel {invite=}")
+                            logger.info(f"Joining new channel {invite=}")
                            try:
                                self.client(ImportChatInviteRequest(match.group(2)))
                            except UserAlreadyParticipantError:
-                                logger.info(f"already joined {invite=}")
+                                logger.info(f"Already joined {invite=}")
                            except InviteRequestSentError:
-                                logger.warning(f"already sent a join request with {invite} still no answer")
+                                logger.warning(f"Already sent a join request with {invite} still no answer")
                            except InviteHashExpiredError:
                                logger.warning(f"{invite=} has expired please find a more recent one")
                            except Exception as e:
-                                logger.error(f"could not join channel with {invite=} due to {e}")
+                                logger.error(f"Could not join channel with {invite=} due to {e}")
                        except FloodWaitError as e:
-                            logger.warning(f"got a flood error, need to wait {e.seconds} seconds")
+                            logger.warning(f"Got a flood error, need to wait {e.seconds} seconds")
                            time.sleep(e.seconds)
                            continue
                    else:
@@ -116,68 +128,94 @@ class TelethonExtractor(Extractor):
        url = item.get_url()
        # detect URLs that we definitely cannot handle
        match = self.valid_url.search(url)
-        logger.debug(f"TELETHON: {match=}")
+        logger.debug(f"Found telethon url {match=}")
        if not match:
            return False

        is_private = match.group(1) == "/c"
        chat = int(match.group(2)) if is_private else match.group(2)
-        post_id = int(match.group(3))
+        is_story = match.group(3) == "/s"
+        post_id = int(match.group(4))

        result = Metadata()

        # NB: not using bot_token since then private channels cannot be archived: self.client.start(bot_token=self.bot_token)
        with self.client.start():
            # with self.client.start(bot_token=self.bot_token):
-            try:
-                post = self.client.get_messages(chat, ids=post_id)
-            except ValueError as e:
-                logger.error(f"Could not fetch telegram {url} possibly it's private: {e}")
-                return False
-            except ChannelInvalidError as e:
-                logger.error(
-                    f"Could not fetch telegram {url}. This error may be fixed if you setup a bot_token in addition to api_id and api_hash (but then private channels will not be archived, we need to update this logic to handle both): {e}"
-                )
-                return False
+            if is_story:
+                try:
+                    stories = self.client(functions.stories.GetStoriesByIDRequest(peer=chat, id=[post_id]))
+                    if not stories.stories:
+                        logger.info("No stories found, possibly it's private or the story has expired.")
+                        return False
+                    story = stories.stories[0]
+                    logger.debug(f"Got story {story.id=} {story.date=} {story.expire_date=}")
+                    result.set_timestamp(story.date).set("views", story.views.to_dict()).set(
+                        "expire_date", story.expire_date
+                    )

-            logger.debug(f"TELETHON GOT POST {post=}")
-            if post is None:
-                return False
+                    # download the story media
+                    filename_dest = os.path.join(self.tmp_dir, f"{chat}_{post_id}", str(story.id))
+                    if filename := self.client.download_media(story.media, filename_dest):
+                        result.add_media(Media(filename))
+                except Exception as e:
+                    logger.error(f"Error fetching story {post_id} from {chat}: {e}")
+                    return False
+            else:
+                try:
+                    post = self.client.get_messages(chat, ids=post_id)
+                except ValueError as e:
+                    logger.error(f"Could not fetch telegram URL possibly it's private: {e}")
+                    return False
+                except ChannelInvalidError as e:
+                    logger.error(
+                        f"Could not fetch telegram URL. This error may be fixed if you setup a bot_token in addition to api_id and api_hash (but then private channels will not be archived, we need to update this logic to handle both): {e}"
+                    )
+                    return False

-            media_posts = self._get_media_posts_in_group(chat, post)
-            logger.debug(f"got {len(media_posts)=} for {url=}")
+                logger.debug(f"Got post {post=}")
+                if post is None:
+                    return False

-            tmp_dir = self.tmp_dir
+                media_posts = self._get_media_posts_in_group(chat, post)
+                logger.debug(f"Got {len(media_posts)=}")

-            group_id = post.grouped_id if post.grouped_id is not None else post.id
-            title = post.message
-            for mp in media_posts:
-                if len(mp.message) > len(title):
-                    title = mp.message  # save the longest text found (usually only 1)
+                group_id = post.grouped_id if post.grouped_id is not None else post.id
+                title = post.message
+                for mp in media_posts:
+                    if len(mp.message) > len(title):
+                        title = mp.message  # save the longest text found (usually only 1)

-                # media can also be in entities
-                if mp.entities:
-                    other_media_urls = [
-                        e.url
-                        for e in mp.entities
-                        if hasattr(e, "url") and e.url and self._guess_file_type(e.url) in ["video", "image", "audio"]
-                    ]
-                    if len(other_media_urls):
-                        logger.debug(f"Got {len(other_media_urls)} other media urls from {mp.id=}: {other_media_urls}")
-                    for i, om_url in enumerate(other_media_urls):
-                        filename = self.download_from_url(om_url, f"{chat}_{group_id}_{i}")
-                        result.add_media(Media(filename=filename), id=f"{group_id}_{i}")
+                    # media can also be in entities
+                    if mp.entities:
+                        other_media_urls = [
+                            e.url
+                            for e in mp.entities
+                            if hasattr(e, "url")
+                            and e.url
+                            and self._guess_file_type(e.url) in ["video", "image", "audio"]
+                        ]
+                        if len(other_media_urls):
+                            logger.debug(
+                                f"Got {len(other_media_urls)} other media urls from {mp.id=}: {other_media_urls}"
+                            )
+                        for i, om_url in enumerate(other_media_urls):
+                            filename = self.download_from_url(om_url, f"{chat}_{group_id}_{i}")
+                            if not filename:
+                                logger.warning(f"Failed to download media from {om_url}")
+                                continue
+                            result.add_media(Media(filename=filename), id=f"{group_id}_{i}")

-                filename_dest = os.path.join(tmp_dir, f"{chat}_{group_id}", str(mp.id))
-                filename = self.client.download_media(mp.media, filename_dest)
-                if not filename:
-                    logger.debug(f"Empty media found, skipping {str(mp)=}")
-                    continue
-                result.add_media(Media(filename))
+                    filename_dest = os.path.join(self.tmp_dir, f"{chat}_{group_id}", str(mp.id))
+                    filename = self.client.download_media(mp.media, filename_dest)
+                    if not filename:
+                        logger.debug(f"Empty media found, skipping {str(mp)=}")
+                        continue
+                    result.add_media(Media(filename))

-            result.set_title(title).set_timestamp(post.date).set("api_data", post.to_dict())
-            if post.message != title:
-                result.set_content(post.message)
+                result.set_title(title).set_timestamp(post.date).set("api_data", post.to_dict())
+                if post.message != title:
+                    result.set_content(post.message)
        return result.success("telethon")

    def _get_media_posts_in_group(self, chat, original_post, max_amp=10):
--- a/src/auto_archiver/modules/thumbnail_enricher/thumbnail_enricher.py
+++ b/src/auto_archiver/modules/thumbnail_enricher/thumbnail_enricher.py
@@ -9,7 +9,7 @@ and identify important moments without watching the entire video.

 import ffmpeg
 import os
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger

 from auto_archiver.core import Enricher
 from auto_archiver.core import Media, Metadata
@@ -27,12 +27,12 @@ class ThumbnailEnricher(Enricher):
        Calculates how many thumbnails to generate and at which timestamps based on the video duration, the number of thumbnails per minute and the max number of thumbnails.
        Thumbnails are equally distributed across the video duration.
        """
-        logger.debug(f"generating thumbnails for {to_enrich.get_url()}")
+        logger.debug("Generating thumbnails")
        for m_id, m in enumerate(to_enrich.media[::]):
            if m.is_video():
                folder = os.path.join(self.tmp_dir, random_str(24))
                os.makedirs(folder, exist_ok=True)
-                logger.debug(f"generating thumbnails for {m.filename}")
+                logger.debug(f"Generating thumbnails for {m.filename}")
                duration = m.get("duration")

                try:
@@ -42,10 +42,10 @@ class ThumbnailEnricher(Enricher):
                    )
                    to_enrich.media[m_id].set("duration", duration)
                except Exception as e:
-                    logger.warning(f"failed to get duration with FFMPEG from {m.filename}: {e}")
+                    logger.warning(f"Failed to get duration with FFMPEG from {m.filename}: {e}")

                if not duration or type(duration) not in [float, int] or duration <= 0:
-                    logger.warning(f"cannot generate thumbnails for {m.filename} without valid duration")
+                    logger.warning(f"Cannot generate thumbnails for {m.filename} without valid duration")
                    continue

                num_thumbs = int(min(max(1, (duration / 60) * self.thumbnails_per_minute), self.max_thumbnails))
--- a/src/auto_archiver/modules/timestamping_enricher/manifest.py
+++ b/src/auto_archiver/modules/timestamping_enricher/manifest.py
@@ -20,7 +20,7 @@
                    # "http://tsa.sinpe.fi.cr/tsaHttp/", # self-signed
                    # "http://tsa.cra.ge/signserver/tsa?workerName=qtsa", # self-signed
                    "http://tss.cnbs.gob.hn/TSS/HttpTspServer",
-                    "http://dss.nowina.lu/pki-factory/tsa/good-tsa",
+                    # "http://dss.nowina.lu/pki-factory/tsa/good-tsa",
                    # "https://freetsa.org/tsr", # self-signed
                ],
            "help": "List of RFC3161 Time Stamp Authorities to use, separate with commas if passed via the command line.",
--- a/src/auto_archiver/modules/timestamping_enricher/timestamping_enricher.py
+++ b/src/auto_archiver/modules/timestamping_enricher/timestamping_enricher.py
@@ -4,12 +4,12 @@ from importlib.metadata import version
 import hashlib

 from slugify import slugify
+from retrying import retry
 import requests
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger

-from rfc3161_client import (decode_timestamp_response,TimestampRequestBuilder,TimeStampResponse, VerifierBuilder)
+from rfc3161_client import (decode_timestamp_response, TimestampRequestBuilder, TimeStampResponse, VerifierBuilder)
 from rfc3161_client import VerificationError as Rfc3161VerificationError
-from rfc3161_client.base import HashAlgorithm
 from rfc3161_client.tsp import SignedData
 from cryptography import x509
 from cryptography.hazmat.primitives import serialization
@@ -49,8 +49,7 @@ class TimestampingEnricher(Enricher):
            self.session.close()

    def enrich(self, to_enrich: Metadata) -> None:
-        url = to_enrich.get_url()
-        logger.debug(f"RFC3161 timestamping existing files for {url=}")
+        logger.debug(f"RFC3161 timestamping existing files")

        # create a new text file with the existing media hashes
        hashes = [
@@ -58,10 +57,9 @@ class TimestampingEnricher(Enricher):
        ]

        if not len(hashes):
-            logger.debug(f"No hashes found in {url=}")
+            logger.debug(f"No hashes found")
            return

-        
        hashes_fn = os.path.join(self.tmp_dir, "hashes.txt")

        data_to_sign = "\n".join(hashes)
@@ -74,9 +72,9 @@ class TimestampingEnricher(Enricher):
            try:
                message = bytes(data_to_sign, encoding='utf8')

-                logger.debug(f"Timestamping {url=} with {tsa_url=}")
+                logger.debug(f"Timestamping with {tsa_url=}")
                signed: TimeStampResponse = self.sign_data(tsa_url, message)
-                
+
                # fail if there's any issue with the certificates, uses certifi list of trusted CAs or the user-defined `cert_authorities`
                root_cert = self.verify_signed(signed, message)

@@ -92,7 +90,7 @@ class TimestampingEnricher(Enricher):
                timestamp_token_path = self.save_timestamp_token(signed.time_stamp_token(), tsa_url)
                timestamp_tokens.append(Media(filename=timestamp_token_path).set("tsa", tsa_url).set("cert_chain", cert_chain))
            except Exception as e:
-                logger.warning(f"Error while timestamping {url=} with {tsa_url=}: {e}")
+                logger.warning(f"Error while timestamping with {tsa_url=}: {e}")

        if len(timestamp_tokens):
            hashes_media.set("timestamp_authority_files", timestamp_tokens)
@@ -101,9 +99,9 @@ class TimestampingEnricher(Enricher):
            hashes_media.set("cryptography v", version("cryptography"))
            to_enrich.add_media(hashes_media, id="timestamped_hashes")
            to_enrich.set("timestamped", True)
-            logger.info(f"{len(timestamp_tokens)} timestamp tokens created for {url=}")
+            logger.info(f"{len(timestamp_tokens)} timestamp tokens created")
        else:
-            logger.warning(f"No successful timestamps for {url=}")
+            logger.warning(f"No successful timestamps found")

    def save_timestamp_token(self, timestamp_token: bytes, tsa_url: str) -> str:
        """
@@ -114,7 +112,7 @@ class TimestampingEnricher(Enricher):
            f.write(timestamp_token)
        return tst_path

-    def verify_signed(self, timestamp_response: TimeStampResponse, message: bytes) ->  x509.Certificate:
+    def verify_signed(self, timestamp_response: TimeStampResponse, message: bytes) -> x509.Certificate:
        """
        Verify a Signed Timestamp Response is trusted by a known Certificate Authority.

@@ -137,7 +135,7 @@ class TimestampingEnricher(Enricher):

        if not cert_authorities:
            raise ValueError(f"No trusted roots found in {trusted_root_path}.")
-        
+
        timestamp_certs = self.tst_certs(timestamp_response)
        intermediate_certs = timestamp_certs[1:-1]

@@ -149,7 +147,7 @@ class TimestampingEnricher(Enricher):
            message_hash = hashlib.sha256(message).digest()
        else:
            raise ValueError(f"Unsupported hash algorithm: {hash_algorithm}")
-        
+
        for certificate in cert_authorities:
            builder = VerifierBuilder()
            builder.add_root_certificate(certificate)
@@ -159,7 +157,6 @@ class TimestampingEnricher(Enricher):

            verifier = builder.build()

-            
            try:
                verifier.verify(timestamp_response, message_hash)
                return certificate
@@ -172,23 +169,38 @@ class TimestampingEnricher(Enricher):
        # see https://github.com/sigstore/sigstore-python/blob/99948d5b80525a5a104e904ffea58169dc6e0629/sigstore/_internal/timestamp.py#L84-L121

        timestamp_request = (
-                TimestampRequestBuilder().data(bytes_data).nonce(nonce=True).build()
-            )
-        try:
+            TimestampRequestBuilder().data(bytes_data).nonce(nonce=True).build()
+        )
+
+        @retry(
+            wait_exponential_multiplier=1,
+            stop_max_attempt_number=2,
+        )
+        def sign_with_retry():
            response = self.session.post(tsa_url, data=timestamp_request.as_bytes(), timeout=10)
            response.raise_for_status()
+            return response
+
+        try:
+            response = sign_with_retry()
        except requests.RequestException as e:
            logger.error(f"Error while sending request to {tsa_url=}: {e}")
            raise

+        @retry(
+            wait_exponential_multiplier=1,
+            stop_max_attempt_number=2,
+        )
+        def decode_with_retry(response):
+            return decode_timestamp_response(response.content)
        # Check that we can parse the response but do not *verify* it
        try:
-            timestamp_response = decode_timestamp_response(response.content)
+            timestamp_response = decode_with_retry(response)
        except ValueError as e:
            logger.error(f"Invalid timestamp response from server {tsa_url}: {e}")
            raise
        return timestamp_response
-    
+
    def tst_certs(self, tsp_response: TimeStampResponse):
        signed_data: SignedData = tsp_response.signed_data
        certs = [x509.load_der_x509_certificate(c) for c in signed_data.certificates]
@@ -197,7 +209,7 @@ class TimestampingEnricher(Enricher):
        if len(certs) == 1:
            return certs

-        while(len(ordered_certs) < len(certs)):
+        while (len(ordered_certs) < len(certs)):
            if len(ordered_certs) == 0:
                for cert in certs:
                    if not [c for c in certs if cert.subject == c.issuer]:
@@ -221,7 +233,7 @@ class TimestampingEnricher(Enricher):

        cert_chain = []
        for i, cert in enumerate(certificates):
-            cert_fn = os.path.join(self.tmp_dir, f"{i+1} – {str(cert.serial_number)[:20]}.crt")
+            cert_fn = os.path.join(self.tmp_dir, f"{i + 1} – {str(cert.serial_number)[:20]}.crt")
            with open(cert_fn, "wb") as f:
                f.write(cert.public_bytes(encoding=serialization.Encoding.PEM))
            cert_chain.append(Media(filename=cert_fn).set("subject", cert.subject.get_attributes_for_oid(x509.NameOID.COMMON_NAME)[0].value))
--- a/src/auto_archiver/modules/twitter_api_extractor/twitter_api_extractor.py
+++ b/src/auto_archiver/modules/twitter_api_extractor/twitter_api_extractor.py
@@ -4,7 +4,7 @@ import re
 import mimetypes
 import requests

-from loguru import logger
+from auto_archiver.utils.custom_logger import logger
 from pytwitter import Api
 from slugify import slugify

@@ -45,10 +45,9 @@ class TwitterApiExtractor(Extractor):
        if "https://t.co/" in url:
            try:
                r = requests.get(url, timeout=30)
-                logger.debug(f"Expanded url {url} to {r.url}")
                url = r.url
-            except Exception:
-                logger.error(f"Failed to expand url {url}")
+            except Exception as e:
+                logger.error(f"Failed to expand Twitter URL: {e}")
        return url

    def download(self, item: Metadata) -> Metadata:
@@ -67,7 +66,7 @@ class TwitterApiExtractor(Extractor):
            return False, False

        username, tweet_id = matches[0]  # only one URL supported
-        logger.debug(f"Found {username=} and {tweet_id=} in {url=}")
+        logger.debug(f"Found {username=} and {tweet_id=}")

        return username, tweet_id

@@ -85,7 +84,7 @@ class TwitterApiExtractor(Extractor):
                media_fields=["type", "duration_ms", "url", "variants"],
                tweet_fields=["attachments", "author_id", "created_at", "entities", "id", "text", "possibly_sensitive"],
            )
-            logger.debug(tweet)
+            logger.debug(f"Got {tweet=}")
        except Exception as e:
            logger.error(f"Could not get tweet: {e}")
            return False
@@ -115,6 +114,9 @@ class TwitterApiExtractor(Extractor):
                logger.info(f"Found media {media}")
                ext = mimetypes.guess_extension(mimetype)
                media.filename = self.download_from_url(media.get("src"), f"{slugify(url)}_{i}{ext}")
+                if not media.filename:
+                    logger.warning(f"Failed to download media from {media.get('src')}")
+                    continue
                result.add_media(media)

        result.set_content(
--- a/src/auto_archiver/modules/wacz_extractor_enricher/wacz_extractor_enricher.py
+++ b/src/auto_archiver/modules/wacz_extractor_enricher/wacz_extractor_enricher.py
@@ -4,7 +4,7 @@ import os
 import shutil
 import subprocess
 from zipfile import ZipFile
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger
 from warcio.archiveiterator import ArchiveIterator

 from auto_archiver.core import Media, Metadata
@@ -24,8 +24,7 @@ class WaczExtractorEnricher(Enricher, Extractor):
        self.use_docker = os.environ.get("WACZ_ENABLE_DOCKER") or not os.environ.get("RUNNING_IN_DOCKER")
        self.docker_in_docker = os.environ.get("WACZ_ENABLE_DOCKER") and os.environ.get("RUNNING_IN_DOCKER")

-        self.crawl_id = random_str(8)
-        self.cwd_dind = f"/crawls/crawls{self.crawl_id}"
+        self.cwd_dind = f"/crawls/crawls{random_str(8)}"
        self.browsertrix_home_host = os.environ.get("BROWSERTRIX_HOME_HOST")
        self.browsertrix_home_container = os.environ.get("BROWSERTRIX_HOME_CONTAINER") or self.browsertrix_home_host
        # create crawls folder if not exists, so it can be safely removed in cleanup
@@ -51,7 +50,8 @@ class WaczExtractorEnricher(Enricher, Extractor):

        url = to_enrich.get_url()

-        collection = self.crawl_id
+        crawl_id = random_str(8)
+        collection = crawl_id
        browsertrix_home_host = self.browsertrix_home_host or os.path.abspath(self.tmp_dir)
        browsertrix_home_container = self.browsertrix_home_container or browsertrix_home_host

@@ -83,8 +83,10 @@ class WaczExtractorEnricher(Enricher, Extractor):
            # "--blockAds" # note: this has been known to cause issues on cloudflare protected sites
        ]

+        crawl_cwd_dind = os.path.join(self.cwd_dind, crawl_id)
        if self.docker_in_docker:
-            cmd.extend(["--cwd", self.cwd_dind])
+            os.makedirs(crawl_cwd_dind, exist_ok=True)
+            cmd.extend(["--cwd", crawl_cwd_dind])

        if self.auth_for_site(url):
            # there's an auth for this site, but browsertrix only supports username/password auth
@@ -94,7 +96,7 @@ class WaczExtractorEnricher(Enricher, Extractor):

        # call docker if explicitly enabled or we are running on the host (not in docker)
        if self.use_docker:
-            logger.debug(f"generating WACZ in Docker for {url=}")
+            logger.debug("Generating WACZ in Docker")
            logger.debug(f"{browsertrix_home_host=} {browsertrix_home_container=}")
            if self.docker_commands:
                cmd = self.docker_commands + cmd
@@ -109,14 +111,14 @@ class WaczExtractorEnricher(Enricher, Extractor):
                ] + cmd

            if self.profile:
-                profile_file = f"profile-{self.crawl_id}.tar.gz"
+                profile_file = f"profile-{crawl_id}.tar.gz"
                profile_fn = os.path.join(browsertrix_home_container, profile_file)
-                logger.debug(f"copying {self.profile} to {profile_fn}")
+                logger.debug(f"Copying {self.profile} to {profile_fn}")
                shutil.copyfile(self.profile, profile_fn)
                cmd.extend(["--profile", os.path.join("/crawls", profile_file)])

        else:
-            logger.debug(f"generating WACZ without Docker for {url=}")
+            logger.debug("Generating WACZ without Docker")

            if self.profile:
                cmd.extend(["--profile", os.path.join("/app", str(self.profile))])
@@ -137,7 +139,7 @@ class WaczExtractorEnricher(Enricher, Extractor):
            return False

        if self.docker_in_docker:
-            wacz_fn = os.path.join(self.cwd_dind, "collections", collection, f"{collection}.wacz")
+            wacz_fn = os.path.join(crawl_cwd_dind, "collections", collection, f"{collection}.wacz")
        elif self.use_docker:
            wacz_fn = os.path.join(browsertrix_home_container, "collections", collection, f"{collection}.wacz")
        else:
@@ -152,7 +154,7 @@ class WaczExtractorEnricher(Enricher, Extractor):
            self.extract_media_from_wacz(to_enrich, wacz_fn)

        if self.docker_in_docker:
-            jsonl_fn = os.path.join(self.cwd_dind, "collections", collection, "pages", "pages.jsonl")
+            jsonl_fn = os.path.join(crawl_cwd_dind, "collections", collection, "pages", "pages.jsonl")
        elif self.use_docker:
            jsonl_fn = os.path.join(browsertrix_home_container, "collections", collection, "pages", "pages.jsonl")
        else:
--- a/src/auto_archiver/modules/wayback_extractor_enricher/wayback_extractor_enricher.py
+++ b/src/auto_archiver/modules/wayback_extractor_enricher/wayback_extractor_enricher.py
@@ -1,8 +1,8 @@
 import json
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger
 import time
 import requests
-
+from urllib3.exceptions import MaxRetryError
 from auto_archiver.core import Extractor, Enricher
 from auto_archiver.utils import url as UrlUtil
 from auto_archiver.core import Metadata
@@ -31,21 +31,28 @@ class WaybackExtractorEnricher(Enricher, Extractor):

        url = to_enrich.get_url()
        if UrlUtil.is_auth_wall(url):
-            logger.debug(f"[SKIP] WAYBACK since url is behind AUTH WALL: {url=}")
+            logger.debug("[SKIP] WAYBACK since url is behind AUTH WALL")
            return

-        logger.debug(f"calling wayback for {url=}")
-
        if to_enrich.get("wayback"):
            logger.info(f"Wayback enricher had already been executed: {to_enrich.get('wayback')}")
            return True

+        logger.debug("Calling Wayback")
+
        ia_headers = {"Accept": "application/json", "Authorization": f"LOW {self.key}:{self.secret}"}
        post_data = {"url": url}
        if self.if_not_archived_within:
            post_data["if_not_archived_within"] = self.if_not_archived_within
        # see https://docs.google.com/document/d/1Nsv52MvSjbLb2PCpHlat0gkzw0EvtSgpKHu4mk0MnrA for more options
-        r = requests.post("https://web.archive.org/save/", headers=ia_headers, data=post_data, proxies=proxies)
+        try:
+            r = requests.post("https://web.archive.org/save/", headers=ia_headers, data=post_data, proxies=proxies)
+        except MaxRetryError as e:
+            logger.warning(
+                f"MaxRetryError during Wayback POST call to /save, this may be do to a high number of calls leading to rate limiting: {e}"
+            )
+            to_enrich.set("wayback", "failed: possible rate limit")
+            return False

        if r.status_code != 200:
            logger.error(em := f"Internet archive failed with status of {r.status_code}: {r.json()}")
@@ -68,7 +75,7 @@ class WaybackExtractorEnricher(Enricher, Extractor):
        attempt = 1
        while not wayback_url and time.time() - start_time <= self.timeout:
            try:
-                logger.debug(f"GETting status for {job_id=} on {url=} ({attempt=})")
+                logger.debug(f"GETting status for {job_id=} ({attempt=})")
                r_status = requests.get(
                    f"https://web.archive.org/save/status/{job_id}", headers=ia_headers, proxies=proxies
                )
@@ -76,16 +83,19 @@ class WaybackExtractorEnricher(Enricher, Extractor):
                if r_status.status_code == 200 and r_json["status"] == "success":
                    wayback_url = f"https://web.archive.org/web/{r_json['timestamp']}/{r_json['original_url']}"
                elif r_status.status_code != 200 or r_json["status"] != "pending":
+                    if r_json.get("status_ext") in ["error:blocked-url", "error:unauthorized"]:
+                        logger.warning("Wayback cannot currently archive the URL, skipping.")
+                        to_enrich.set("wayback", r_json.get("status_ext"))
                    logger.error(f"Wayback failed with {r_json}")
                    return False
            except requests.exceptions.RequestException as e:
-                logger.warning(f"RequestException: fetching status for {url=} due to: {e}")
+                logger.warning(f"RequestException: fetching status due to: {e}")
                break
            except json.decoder.JSONDecodeError:
-                logger.error(f"Expected a JSON from Wayback and got {r.text} for {url=}")
+                logger.error(f"Expected a JSON from Wayback and got {r.text}")
                break
            except Exception as e:
-                logger.warning(f"error fetching status for {url=} due to: {e}")
+                logger.warning(f"error fetching status due to: {e}")
            if not wayback_url:
                attempt += 1
                time.sleep(1)  # TODO: can be improved with exponential backoff
--- a/src/auto_archiver/modules/whisper_enricher/whisper_enricher.py
+++ b/src/auto_archiver/modules/whisper_enricher/whisper_enricher.py
@@ -1,7 +1,7 @@
 import traceback
 import requests
 import time
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger

 from auto_archiver.core import Enricher
 from auto_archiver.core import Metadata, Media
@@ -25,7 +25,7 @@ class WhisperEnricher(Enricher):

    def enrich(self, to_enrich: Metadata) -> None:
        url = to_enrich.get_url()
-        logger.debug(f"WHISPER[{self.action}]: iterating media items for {url=}.")
+        logger.debug(f"WHISPER[{self.action}]: iterating media items")

        job_results = {}
        for i, m in enumerate(to_enrich.media):
@@ -35,7 +35,7 @@ class WhisperEnricher(Enricher):
                try:
                    job_id = self.submit_job(m)
                    job_results[job_id] = False
-                    logger.debug(f"JOB SUBMITTED: {job_id=} for {m.key=}")
+                    logger.debug(f"Job submitted: {job_id=} for {m.key=}")
                    to_enrich.media[i].set("whisper_model", {"job_id": job_id})
                except Exception as e:
                    logger.error(
@@ -72,14 +72,14 @@ class WhisperEnricher(Enricher):
            "type": self.action,
            # "language": "string" # may be a config
        }
-        logger.debug(f"calling API with {payload=}")
+        logger.debug(f"Calling API with {payload=}")
        response = requests.post(
            f"{self.api_endpoint}/jobs", json=payload, headers={"Authorization": f"Bearer {self.api_key}"}
        )
        assert response.status_code == 201, (
            f"calling the whisper api {self.api_endpoint} returned a non-success code: {response.status_code}"
        )
-        logger.debug(response.json())
+        logger.debug(f"Response from whisper API: {response.json()}")
        return response.json()["id"]

    def check_jobs(self, job_results: dict):
@@ -115,7 +115,7 @@ class WhisperEnricher(Enricher):
            assert r_res.status_code == 200, (
                f"Job artifacts did not respond with 200, instead with: {r_res.status_code}"
            )
-            logger.success(r_res.json())
+            logger.info(f"Job {job_id} completed successfully:{r_res.json()}")
            result = {}
            for art_id, artifact in enumerate(r_res.json()):
                subtitle = []
--- a/src/auto_archiver/utils/custom_logger.py
+++ b/src/auto_archiver/utils/custom_logger.py
@@ -0,0 +1,66 @@
+from loguru import logger
+import json
+
+
+def type_serializer(obj):
+    """Fallback function for objects json can't handle."""
+    if isinstance(obj, type):
+        return obj.__name__
+    return str(obj)
+
+
+def extract_location(record, short=False):
+    """Extracts the file name, function name, and line number from the log record."""
+    if short:
+        return f"{record['file'].name}:{record['line']}"
+    return f"{record['file'].name}:{record['function']}:{record['line']}"
+
+
+def extract_log_data(record):
+    subset = {
+        "level": record["level"].name,
+        "time": record["time"].isoformat(timespec="seconds"),
+    }
+    subset["loc"] = extract_location(record)
+
+    # This is where logger.contextualize() parameters can be added to the output
+    for extra_key in ["trace", "url", "worksheet", "row"]:
+        if extra_val := record.get("extra", {}).get(extra_key):
+            subset[extra_key] = extra_val
+
+    subset["message"] = record["message"]
+    if exception := record.get("exception"):
+        subset["exception"] = exception
+    return subset
+
+
+def serialize_for_console(record):
+    subset = extract_log_data(record)
+    subset.pop("message", None)
+    subset.pop("level", None)
+    subset.pop("loc", None)
+    subset.pop("time", None)
+    if not subset:
+        return ""
+    return json.dumps(subset, ensure_ascii=False, default=type_serializer)
+
+
+def serialize(record):
+    return json.dumps(extract_log_data(record), ensure_ascii=False, default=type_serializer)
+
+
+def patching(record):
+    record["extra"]["serialized"] = serialize(record)
+    record["extra"]["serialize_for_console"] = serialize_for_console(record)
+
+
+def format_for_human_readable_console():
+    return (
+        "<green>{time:YYYY-MM-DD HH:mm:ss.SSS}</green> | "
+        "<level>{level: <8}</level> | "
+        "<cyan>{file}</cyan>:<cyan>{function}</cyan>:<cyan>{line}</cyan> | "
+        "{extra[serialize_for_console]} <level>{message}</level>"
+    )
+
+
+logger = logger.patch(patching)
--- a/src/auto_archiver/utils/deletion_detection.py
+++ b/src/auto_archiver/utils/deletion_detection.py
@@ -0,0 +1,273 @@
+"""
+Deletion Detection Utilities
+
+Provides a best-effort detection of deleted, missing, or unavailable content
+across various social media platforms based on presence of expected keywords.
+
+This module helps identify removed content, helps to:
+- Document content that existed but was deleted
+- Track patterns of content removal
+- Preserve metadata about missing content
+"""
+
+from typing import Optional, Dict, List
+from auto_archiver.utils.custom_logger import logger
+from urllib.parse import urlparse
+
+
+class DeletionIndicators:
+    """
+    Platform-specific indicators that content has been deleted or is unavailable, alongside generic indicators.
+    """
+
+    # Twitter/X deletion indicators
+    TWITTER = [
+        "Hmm...this page doesn't exist",
+        "Try searching for something else",
+        "This Tweet is unavailable",
+        "This account doesn't exist",
+        "This Tweet has been deleted",
+        "This account has been suspended",
+        "Sorry, that page doesn't exist",
+        "The Tweet you're looking for isn't available",
+    ]
+
+    # Facebook deletion indicators
+    FACEBOOK = [
+        "This content isn't available",
+        "Sorry, this content isn't available",
+        "This content is no longer available",
+        "The link you followed may be broken",
+        "Page Not Found",
+        "Content Not Found",
+        "This content is no longer on Facebook",
+    ]
+
+    # Instagram deletion indicators
+    INSTAGRAM = [
+        "Sorry, this page isn't available",
+        "The link you followed may be broken",
+        "Media not found or unavailable",
+        "This post is no longer available",
+        "This account is private",
+    ]
+
+    # TikTok deletion indicators
+    TIKTOK = [
+        "Couldn't find this account",
+        "This video is no longer available",
+        "This video is currently unavailable",
+        "Video not found",
+        "This video may have been deleted",
+    ]
+
+    # YouTube deletion indicators
+    YOUTUBE = [
+        "This video isn't available anymore",
+        "Video unavailable",
+        "This video has been removed",
+        "This video is no longer available",
+        "This video is private",
+        "This video has been removed by the uploader",
+        "This video has been deleted",
+    ]
+
+    # Reddit deletion indicators
+    REDDIT = [
+        "this post has been removed",
+        "this comment has been removed",
+        "[removed]",
+        "[deleted]",
+        "page not found",
+        "there doesn't seem to be anything here",
+    ]
+
+    # VK deletion indicators
+    VK = [
+        "Post deleted",
+        "Page not found",
+        "Content unavailable",
+        "Access denied",
+    ]
+
+    # Telegram deletion indicators
+    TELEGRAM = [
+        "Message not found",
+        "Deleted message",
+        "Channel is private",
+    ]
+
+    # Generic indicators (work across platforms)
+    GENERIC = [
+        "has been removed",
+        "no longer available",
+        "content removed",
+        "access denied",
+        "page not found",
+    ]
+
+    @classmethod
+    def all_indicators(cls) -> List[str]:
+        """Returns all deletion indicators from all platforms."""
+        return (
+            cls.TWITTER
+            + cls.FACEBOOK
+            + cls.INSTAGRAM
+            + cls.TIKTOK
+            + cls.YOUTUBE
+            + cls.REDDIT
+            + cls.VK
+            + cls.TELEGRAM
+            + cls.GENERIC
+        )
+
+    @classmethod
+    def for_url(cls, url: str) -> List[str]:
+        """Returns platform-specific indicators based on URL domain."""
+        platform = _extract_platform(url)
+
+        indicators_map = {
+            "twitter": cls.TWITTER + cls.GENERIC,
+            "facebook": cls.FACEBOOK + cls.GENERIC,
+            "instagram": cls.INSTAGRAM + cls.GENERIC,
+            "tiktok": cls.TIKTOK + cls.GENERIC,
+            "youtube": cls.YOUTUBE + cls.GENERIC,
+            "reddit": cls.REDDIT + cls.GENERIC,
+            "vk": cls.VK + cls.GENERIC,
+            "telegram": cls.TELEGRAM + cls.GENERIC,
+        }
+        return indicators_map.get(platform, cls.GENERIC)
+
+
+def detect_deletion(
+    html_content: str = None,
+    page_title: str = None,
+    error_message: str = None,
+    url: str = None,
+    video_data: dict = None,
+) -> Optional[Dict[str, any]]:
+    """
+    Best-effort deletion detection across multiple signals.
+
+    Checks HTML content, page titles, error messages, and video metadata for
+    indicators that content has been deleted or is unavailable.
+
+    Args:
+        html_content: Raw HTML source of the page
+        page_title: Browser page title
+        error_message: Any error message from the extractor
+        url: The URL being archived (for platform-specific detection)
+        video_data: Video metadata from yt-dlp or other extractors
+
+    Returns:
+        Dictionary with deletion details if detected, None otherwise.
+        Format: {
+            "is_deleted": True,
+            "indicator": "specific text that was found",
+            "source": "html|title|error|metadata",
+            "platform": "twitter|facebook|etc"
+        }
+    """
+
+    # Determine indicators to check based on URL
+    if url:
+        indicators = DeletionIndicators.for_url(url)
+        platform = _extract_platform(url)
+    else:
+        indicators = DeletionIndicators.all_indicators()
+        platform = "unknown"
+
+    # Check HTML content
+    if html_content:
+        for indicator in indicators:
+            if indicator.lower() in html_content.lower():
+                logger.info(f"Deletion detected in HTML: '{indicator}' found for {url}")
+                return {"is_deleted": True, "indicator": indicator, "source": "html_content", "platform": platform}
+
+    # Check page title
+    if page_title:
+        for indicator in indicators:
+            if indicator.lower() in page_title.lower():
+                logger.info(f"Deletion detected in page title: '{indicator}' found for {url}")
+                return {"is_deleted": True, "indicator": indicator, "source": "page_title", "platform": platform}
+
+    # Check error messages
+    if error_message:
+        for indicator in indicators:
+            if indicator.lower() in str(error_message).lower():
+                logger.info(f"Deletion detected in error: '{indicator}' found for {url}")
+                return {"is_deleted": True, "indicator": indicator, "source": "error_message", "platform": platform}
+
+    # Check video metadata (from yt-dlp)
+    if video_data:
+        # Check if yt-dlp flagged it as unavailable
+        if video_data.get("availability") in ["unavailable", "private", "deleted"]:
+            logger.info(f"Deletion detected in metadata: availability={video_data.get('availability')}")
+            return {
+                "is_deleted": True,
+                "indicator": f"availability: {video_data.get('availability')}",
+                "source": "video_metadata",
+                "platform": platform,
+            }
+
+        # Check description/title for deletion indicators
+        for key in ["title", "description", "fulltitle"]:
+            if key in video_data:
+                for indicator in indicators:
+                    if indicator.lower() in str(video_data[key]).lower():
+                        logger.info(f"Deletion detected in {key}: '{indicator}'")
+                        return {
+                            "is_deleted": True,
+                            "indicator": indicator,
+                            "source": f"video_metadata_{key}",
+                            "platform": platform,
+                        }
+
+    return None
+
+
+def _extract_platform(url: str) -> str:
+    """Extracts platform name from URL."""
+    parsed = urlparse(url)
+    domain = parsed.netloc
+
+    if "twitter.com" in domain or "x.com" in domain:
+        return "twitter"
+    elif "facebook.com" in domain or "fb.com" in domain:
+        return "facebook"
+    elif "instagram.com" in domain:
+        return "instagram"
+    elif "tiktok.com" in domain:
+        return "tiktok"
+    elif "youtube.com" in domain or "youtu.be" in domain:
+        return "youtube"
+    elif "reddit.com" in domain:
+        return "reddit"
+    elif "vk.com" in domain:
+        return "vk"
+    elif "t.me" in domain:
+        return "telegram"
+    return "unknown"
+
+
+def flag_as_deleted(metadata, deletion_info: Dict[str, any]) -> None:
+    """
+    Flags metadata object as deleted/unavailable.
+    Adds tentative deletion information to the metadata object.
+
+    Args:
+        metadata: Metadata object to update
+        deletion_info: Dictionary from detect_deletion()
+    """
+    metadata.set("deletion_detected", True)
+    metadata.set("deletion_indicator", deletion_info.get("indicator"))
+    metadata.set("deletion_source", deletion_info.get("source"))
+    metadata.set("deletion_platform", deletion_info.get("platform"))
+    metadata.status = "deleted_or_unavailable"
+
+    logger.debug(
+        f"Content marked as deleted/unavailable: "
+        f"platform={deletion_info.get('platform')}, "
+        f"indicator='{deletion_info.get('indicator')}', "
+        f"source={deletion_info.get('source')}"
+    )
--- a/src/auto_archiver/utils/misc.py
+++ b/src/auto_archiver/utils/misc.py
@@ -6,8 +6,7 @@ import uuid
 from datetime import datetime, timezone
 from dateutil.parser import parse as parse_dt

-import requests
-from loguru import logger
+from auto_archiver.utils.custom_logger import logger


 def mkdir_if_not_exists(folder):
@@ -15,18 +14,6 @@ def mkdir_if_not_exists(folder):
        os.makedirs(folder)


-def expand_url(url):
-    # expand short URL links
-    if "https://t.co/" in url:
-        try:
-            r = requests.get(url)
-            logger.debug(f"Expanded url {url} to {r.url}")
-            return r.url
-        except Exception:
-            logger.error(f"Failed to expand url {url}")
-    return url
-
-
 def getattr_or(o: object, prop: str, default=None):
    try:
        res = getattr(o, prop)
@@ -133,6 +120,9 @@ def ydl_entry_to_filename(ydl, entry: dict) -> str:
    directory = os.path.dirname(base_filename)  # '/get/path/to'
    basename = os.path.basename(base_filename)  # 'file'
    for f in os.listdir(directory):
+        # skip incomplete downloads left behind by yt-dlp
+        if f.endswith(".part"):
+            continue
        if (
            f.startswith(basename)
            or (entry_url and os.path.splitext(f)[0] in entry_url)
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -9,7 +9,7 @@ from tempfile import TemporaryDirectory
 from typing import Dict, Tuple
 import hashlib

-from loguru import logger
+from auto_archiver.utils.custom_logger import logger
 import pytest
 from auto_archiver.core.metadata import Metadata, Media
 from auto_archiver.core.module import ModuleFactory
--- a/tests/core/init.py
+++ b/tests/core/init.py
@@ -0,0 +1 @@
+# Core module tests
--- a/tests/core/test_media.py
+++ b/tests/core/test_media.py
@@ -0,0 +1,198 @@
+"""
+Tests for the Media class from auto_archiver.core.media
+"""
+
+import pytest
+from unittest.mock import Mock, patch
+from auto_archiver.core.media import Media
+
+
+class TestMediaBasics:
+    """Test basic Media properties and methods."""
+
+    def test_media_creation_with_filename(self):
+        media = Media(filename="test.mp4")
+        assert media.filename == "test.mp4"
+        assert media.urls == []
+        assert media.properties == {}
+
+    def test_media_key_property(self):
+        media = Media(filename="test.mp4", _key="my_key")
+        assert media.key == "my_key"
+
+    def test_media_set_get_properties(self):
+        media = Media(filename="test.mp4")
+        result = media.set("author", "John Doe")
+        assert result is media  # returns self for chaining
+        assert media.get("author") == "John Doe"
+        assert media.get("nonexistent") is None
+        assert media.get("nonexistent", "default") == "default"
+
+    def test_media_add_url(self):
+        media = Media(filename="test.mp4")
+        media.add_url("https://example.com/test.mp4")
+        assert "https://example.com/test.mp4" in media.urls
+        media.add_url("https://cdn.example.com/test.mp4")
+        assert len(media.urls) == 2
+
+
+class TestMediaMimetype:
+    """Test mimetype detection and handling."""
+
+    @pytest.mark.parametrize(
+        "filename,expected_mimetype",
+        [
+            ("video.mp4", "video/mp4"),
+            ("image.jpg", "image/jpeg"),
+            ("image.png", "image/png"),
+            ("audio.mp3", "audio/mpeg"),
+            ("document.pdf", "application/pdf"),
+            ("text.txt", "text/plain"),
+        ],
+    )
+    def test_mimetype_detection(self, filename, expected_mimetype):
+        media = Media(filename=filename)
+        assert media.mimetype == expected_mimetype
+
+    def test_mimetype_setter(self):
+        media = Media(filename="file.unknown")
+        media.mimetype = "custom/type"
+        assert media.mimetype == "custom/type"
+
+    def test_mimetype_empty_filename(self):
+        media = Media(filename="")
+        assert media.mimetype == ""
+
+
+class TestMediaTypeChecks:
+    """Test media type checking methods."""
+
+    @pytest.mark.parametrize(
+        "filename,is_video,is_audio,is_image",
+        [
+            ("video.mp4", True, False, False),
+            ("video.avi", True, False, False),
+            ("audio.mp3", False, True, False),
+            ("audio.wav", False, True, False),
+            ("image.jpg", False, False, True),
+            ("image.png", False, False, True),
+            ("document.pdf", False, False, False),
+        ],
+    )
+    def test_type_checks(self, filename, is_video, is_audio, is_image):
+        media = Media(filename=filename)
+        assert media.is_video() == is_video
+        assert media.is_audio() == is_audio
+        assert media.is_image() == is_image
+
+
+class TestMediaStore:
+    """Test media storage functionality."""
+
+    def test_store_with_no_storages(self, caplog):
+        media = Media(filename="test.mp4")
+        metadata = Mock()
+        media.store(metadata, storages=[])
+        assert "No storages found" in caplog.text
+
+    def test_store_with_storage(self):
+        media = Media(filename="test.mp4")
+        metadata = Mock()
+        mock_storage = Mock()
+        media.store(metadata, url="https://example.com", storages=[mock_storage])
+        mock_storage.store.assert_called_once()
+
+
+class TestMediaInnerMedia:
+    """Test nested media retrieval."""
+
+    def test_all_inner_media_no_nested(self):
+        media = Media(filename="test.mp4")
+        inner = list(media.all_inner_media(include_self=False))
+        assert len(inner) == 0
+
+        inner_with_self = list(media.all_inner_media(include_self=True))
+        assert len(inner_with_self) == 1
+        assert inner_with_self[0] is media
+
+    def test_all_inner_media_with_nested(self):
+        parent = Media(filename="parent.mp4")
+        child = Media(filename="child.jpg")
+        grandchild = Media(filename="grandchild.png")
+
+        child.set("thumbnail", grandchild)
+        parent.set("preview", child)
+
+        inner = list(parent.all_inner_media(include_self=False))
+        assert len(inner) == 2
+        assert child in inner
+        assert grandchild in inner
+
+    def test_all_inner_media_with_list_property(self):
+        parent = Media(filename="parent.mp4")
+        child1 = Media(filename="frame1.jpg")
+        child2 = Media(filename="frame2.jpg")
+
+        parent.set("frames", [child1, child2])
+
+        inner = list(parent.all_inner_media(include_self=False))
+        assert len(inner) == 2
+        assert child1 in inner
+        assert child2 in inner
+
+
+class TestMediaIsStored:
+    """Test the is_stored method."""
+
+    def test_is_stored_no_urls(self):
+        media = Media(filename="test.mp4")
+        storage = Mock()
+        storage.config = {"steps": {"storages": ["s3", "local"]}}
+        assert media.is_stored(storage) is False
+
+    def test_is_stored_partial_urls(self):
+        media = Media(filename="test.mp4")
+        media.add_url("https://s3.example.com/test.mp4")
+        storage = Mock()
+        storage.config = {"steps": {"storages": ["s3", "local"]}}
+        assert media.is_stored(storage) is False
+
+    def test_is_stored_full_urls(self):
+        media = Media(filename="test.mp4")
+        media.add_url("https://s3.example.com/test.mp4")
+        media.add_url("file:///local/test.mp4")
+        storage = Mock()
+        storage.config = {"steps": {"storages": ["s3", "local"]}}
+        assert media.is_stored(storage) is True
+
+
+class TestMediaValidVideo:
+    """Test video validation functionality."""
+
+    def test_is_valid_video_with_valid_probe(self):
+        media = Media(filename="test.mp4")
+
+        mock_streams = {"streams": [{"duration_ts": 1000}]}
+
+        with patch("ffmpeg.probe", return_value=mock_streams):
+            assert media.is_valid_video() is True
+
+    def test_is_valid_video_with_no_duration(self):
+        media = Media(filename="test.mp4")
+
+        mock_streams = {"streams": [{"duration_ts": 0}]}
+
+        with patch("ffmpeg.probe", return_value=mock_streams):
+            assert media.is_valid_video() is False
+
+    def test_is_valid_video_with_ffmpeg_error(self):
+        media = Media(filename="test.mp4")
+
+        with patch("ffmpeg.probe", side_effect=Exception("ffmpeg error")):
+            with patch("os.path.getsize", return_value=100):
+                # Falls back to file size check, small file
+                assert media.is_valid_video() is False
+
+            with patch("os.path.getsize", return_value=30000):
+                # Falls back to file size check, larger file
+                assert media.is_valid_video() is True
--- a/tests/core/test_validators.py
+++ b/tests/core/test_validators.py
@@ -0,0 +1,98 @@
+"""
+Tests for validators module from auto_archiver.core.validators
+"""
+
+import argparse
+import json
+import pytest
+
+from auto_archiver.core.validators import positive_number, valid_file, json_loader
+
+
+class TestPositiveNumber:
+    """Test the positive_number validator."""
+
+    @pytest.mark.parametrize(
+        "value,expected",
+        [
+            (0, 0),
+            (1, 1),
+            (100, 100),
+            (0.5, 0.5),
+            (999999, 999999),
+        ],
+    )
+    def test_positive_values(self, value, expected):
+        assert positive_number(value) == expected
+
+    @pytest.mark.parametrize(
+        "value",
+        [
+            -1,
+            -100,
+            -0.5,
+            -999999,
+        ],
+    )
+    def test_negative_values_raise_error(self, value):
+        with pytest.raises(argparse.ArgumentTypeError) as exc_info:
+            positive_number(value)
+        assert "not a positive number" in str(exc_info.value)
+
+
+class TestValidFile:
+    """Test the valid_file validator."""
+
+    def test_valid_file_exists(self, tmp_path):
+        test_file = tmp_path / "test.txt"
+        test_file.write_text("test content")
+        result = valid_file(str(test_file))
+        assert result == str(test_file)
+
+    def test_valid_file_not_exists(self):
+        with pytest.raises(argparse.ArgumentTypeError) as exc_info:
+            valid_file("/nonexistent/path/to/file.txt")
+        assert "does not exist" in str(exc_info.value)
+
+    def test_valid_file_directory_not_file(self, tmp_path):
+        # A directory is not a file
+        with pytest.raises(argparse.ArgumentTypeError) as exc_info:
+            valid_file(str(tmp_path))
+        assert "does not exist" in str(exc_info.value)
+
+
+class TestJsonLoader:
+    """Test the json_loader validator."""
+
+    @pytest.mark.parametrize(
+        "json_str,expected",
+        [
+            ('{"key": "value"}', {"key": "value"}),
+            ('{"number": 123}', {"number": 123}),
+            ('{"list": [1, 2, 3]}', {"list": [1, 2, 3]}),
+            ('{"nested": {"inner": "value"}}', {"nested": {"inner": "value"}}),
+            ("[]", []),
+            ("[1, 2, 3]", [1, 2, 3]),
+            ('"string"', "string"),
+            ("123", 123),
+            ("true", True),
+            ("false", False),
+            ("null", None),
+        ],
+    )
+    def test_valid_json(self, json_str, expected):
+        assert json_loader(json_str) == expected
+
+    @pytest.mark.parametrize(
+        "invalid_json",
+        [
+            "{invalid}",
+            "{'single': 'quotes'}",
+            "{missing: quotes}",
+            '{"unclosed": "brace"',
+            "",
+        ],
+    )
+    def test_invalid_json_raises_error(self, invalid_json):
+        with pytest.raises(json.JSONDecodeError):
+            json_loader(invalid_json)
--- a/tests/data/test_modules/example_extractor/example_extractor.py
+++ b/tests/data/test_modules/example_extractor/example_extractor.py
@@ -1,6 +1,6 @@
 from auto_archiver.core import Extractor

-from loguru import logger
+from auto_archiver.utils.custom_logger import logger


 class ExampleExtractor(Extractor):
--- a/tests/data/test_modules/example_module/example_module.py
+++ b/tests/data/test_modules/example_module/example_module.py
@@ -1,6 +1,6 @@
 from auto_archiver.core import Extractor, Enricher, Feeder, Database, Storage, Formatter, Metadata

-from loguru import logger
+from auto_archiver.utils.custom_logger import logger


 class ExampleModule(Extractor, Enricher, Feeder, Database, Storage, Formatter):
--- a/tests/databases/test_api_db.py
+++ b/tests/databases/test_api_db.py
@@ -29,7 +29,7 @@ def test_fetch_fail_status(api_db, metadata, mocker):
    mock_get = mocker.patch("auto_archiver.modules.api_db.api_db.requests.get")
    mock_get.return_value.status_code = 400
    mock_get.return_value.json.return_value = {}
-    mock_error = mocker.patch("loguru.logger.error")
+    mock_error = mocker.patch("auto_archiver.utils.custom_logger.logger.error")
    assert api_db.fetch(metadata) is False
    mock_error.assert_called_once_with("AA API FAIL (400): {}")

--- a/tests/databases/test_console_db.py
+++ b/tests/databases/test_console_db.py
@@ -0,0 +1,62 @@
+"""
+Tests for the ConsoleDb module
+"""
+
+import pytest
+
+
+@pytest.fixture
+def console_db(setup_module):
+    return setup_module("console_db")
+
+
+class TestConsoleDb:
+    """Test the ConsoleDb functionality."""
+
+    def test_started_logs_info(self, console_db, make_item, caplog):
+        """Test that started() logs an info message."""
+        item = make_item("https://example.com/test")
+
+        with caplog.at_level("INFO"):
+            console_db.started(item)
+
+        assert "STARTED" in caplog.text
+        assert "example.com" in caplog.text
+
+    def test_failed_logs_error(self, console_db, make_item, caplog):
+        """Test that failed() logs an error message with reason."""
+        item = make_item("https://example.com/test")
+        reason = "Connection timeout"
+
+        with caplog.at_level("ERROR"):
+            console_db.failed(item, reason)
+
+        assert "FAILED" in caplog.text
+        assert "Connection timeout" in caplog.text
+
+    def test_aborted_logs_warning(self, console_db, make_item, caplog):
+        """Test that aborted() logs a warning message."""
+        item = make_item("https://example.com/test")
+
+        with caplog.at_level("WARNING"):
+            console_db.aborted(item)
+
+        assert "ABORTED" in caplog.text
+
+    def test_done_logs_success(self, console_db, make_item, caplog):
+        """Test that done() logs a success message."""
+        item = make_item("https://example.com/test")
+
+        with caplog.at_level("INFO"):
+            console_db.done(item)
+
+        assert "DONE" in caplog.text
+
+    def test_done_cached(self, console_db, make_item, caplog):
+        """Test done() with cached=True (should behave the same)."""
+        item = make_item("https://example.com/test")
+
+        with caplog.at_level("INFO"):
+            console_db.done(item, cached=True)
+
+        assert "DONE" in caplog.text
--- a/tests/databases/test_gsheet_db.py
+++ b/tests/databases/test_gsheet_db.py
@@ -10,7 +10,10 @@ def mock_gworksheet(mocker):
    mock_gworksheet = mocker.MagicMock(spec=GWorksheet)
    mock_gworksheet.col_exists.return_value = True
    mock_gworksheet.get_cell.return_value = ""
-    mock_gworksheet.get_row.return_value = {}
+    mock_gworksheet.wks = mocker.MagicMock()
+    mock_gworksheet.wks.spreadsheet = mocker.MagicMock()
+    mock_gworksheet.wks.spreadsheet.title = "Test Spreadsheet"
+    mock_gworksheet.title = "Test Worksheet"
    return mock_gworksheet


--- a/tests/enrichers/test_ghostarchive_enricher.py
+++ b/tests/enrichers/test_ghostarchive_enricher.py
@@ -0,0 +1,277 @@
+import pytest
+import requests
+import os
+from unittest.mock import MagicMock
+
+from auto_archiver.modules.ghostarchive_enricher.ghostarchive_enricher import GhostarchiveEnricher
+
+CI = os.getenv("GITHUB_ACTIONS", "") == "true"
+
+# sample HTML responses for mocking
+SEARCH_HTML_FOUND = """
+<html><body>
+<h1>Archives for https://example.com</h1>
+<table>
+<tr><td><a href="http://ghostarchive.org/archive/Abc12">https://example.com</a></td></tr>
+</table>
+</body></html>
+"""
+
+SEARCH_HTML_NOT_FOUND = """
+<html><body>
+<h1>Archives for https://example.com</h1>
+<p>Page 0 out of 0</p>
+<p>No archives for that site.</p>
+</body></html>
+"""
+
+SAVE_RESPONSE_HTML_WITH_LINK = """
+<html><body>
+<h1>Archive saved</h1>
+<a href="/archive/Xyz99">View archive</a>
+</body></html>
+"""
+
+ENRICHER_CONFIG = {
+    "timeout": 120,
+    "check_existing": True,
+    "proxy_http": None,
+    "proxy_https": None,
+}
+
+
+class TestGhostarchiveEnricher:
+    """Tests for Ghost Archive Enricher"""
+
+    @pytest.fixture(autouse=True)
+    def setup_enricher(self, setup_module):
+        self.enricher: GhostarchiveEnricher = setup_module("ghostarchive_enricher", ENRICHER_CONFIG)
+
+    def test_search_existing_found(self, mocker):
+        """When an existing archive is found, it should be returned."""
+        mock_response = mocker.Mock()
+        mock_response.status_code = 200
+        mock_response.text = SEARCH_HTML_FOUND
+        mocker.patch(
+            "auto_archiver.modules.ghostarchive_enricher.ghostarchive_enricher.requests.get", return_value=mock_response
+        )
+
+        result = self.enricher._search_existing("https://example.com")
+        assert result == "https://ghostarchive.org/archive/Abc12"
+
+    def test_search_existing_not_found(self, mocker):
+        """When no existing archive is found, None should be returned."""
+        mock_response = mocker.Mock()
+        mock_response.status_code = 200
+        mock_response.text = SEARCH_HTML_NOT_FOUND
+        mocker.patch(
+            "auto_archiver.modules.ghostarchive_enricher.ghostarchive_enricher.requests.get", return_value=mock_response
+        )
+
+        result = self.enricher._search_existing("https://example.com")
+        assert result is None
+
+    def test_search_existing_request_error(self, mocker):
+        """When search request fails, None should be returned."""
+        mocker.patch(
+            "auto_archiver.modules.ghostarchive_enricher.ghostarchive_enricher.requests.get",
+            side_effect=requests.exceptions.ConnectionError("connection failed"),
+        )
+
+        result = self.enricher._search_existing("https://example.com")
+        assert result is None
+
+    def test_search_existing_non_200(self, mocker):
+        """When search returns non-200, None should be returned."""
+        mock_response = mocker.Mock()
+        mock_response.status_code = 503
+        mocker.patch(
+            "auto_archiver.modules.ghostarchive_enricher.ghostarchive_enricher.requests.get", return_value=mock_response
+        )
+
+        result = self.enricher._search_existing("https://example.com")
+        assert result is None
+
+    def test_submit_url_success_redirect(self, mocker):
+        """Successful submission via headless browser should return archive URL."""
+        mock_sb = MagicMock()
+        mock_sb.get_current_url.return_value = "https://ghostarchive.org/archive/NewId1"
+        mock_sb.__enter__ = MagicMock(return_value=mock_sb)
+        mock_sb.__exit__ = MagicMock(return_value=False)
+
+        mocker.patch("auto_archiver.modules.ghostarchive_enricher.ghostarchive_enricher.SB", return_value=mock_sb)
+
+        result = self.enricher._submit_url("https://example.com")
+        assert result == "https://ghostarchive.org/archive/NewId1"
+        mock_sb.type.assert_called_once()
+        mock_sb.click.assert_called_once()
+
+    def test_submit_url_success_redirect_strips_query(self, mocker):
+        """Redirect URL query params should be stripped."""
+        mock_sb = MagicMock()
+        mock_sb.get_current_url.return_value = "https://ghostarchive.org/archive/NewId1?wr=false"
+        mock_sb.__enter__ = MagicMock(return_value=mock_sb)
+        mock_sb.__exit__ = MagicMock(return_value=False)
+
+        mocker.patch("auto_archiver.modules.ghostarchive_enricher.ghostarchive_enricher.SB", return_value=mock_sb)
+
+        result = self.enricher._submit_url("https://example.com")
+        assert result == "https://ghostarchive.org/archive/NewId1"
+
+    def test_submit_url_success_html_fallback(self, mocker):
+        """When browser doesn't redirect, should parse page source for archive link."""
+        mock_sb = MagicMock()
+        mock_sb.get_current_url.return_value = "https://ghostarchive.org/archive2"
+        mock_sb.get_page_source.return_value = SAVE_RESPONSE_HTML_WITH_LINK
+        mock_sb.__enter__ = MagicMock(return_value=mock_sb)
+        mock_sb.__exit__ = MagicMock(return_value=False)
+
+        # make timeout=0 so the polling loop exits immediately and falls through to HTML parsing
+        self.enricher.timeout = 0
+        mocker.patch("auto_archiver.modules.ghostarchive_enricher.ghostarchive_enricher.SB", return_value=mock_sb)
+
+        result = self.enricher._submit_url("https://example.com")
+        assert result == "https://ghostarchive.org/archive/Xyz99"
+
+    def test_submit_url_browser_error(self, mocker):
+        """Browser error during submission should return None."""
+        mocker.patch(
+            "auto_archiver.modules.ghostarchive_enricher.ghostarchive_enricher.SB",
+            side_effect=Exception("browser failed to start"),
+        )
+
+        result = self.enricher._submit_url("https://example.com")
+        assert result is None
+
+    def test_proxy_configuration(self, mocker):
+        """Proxies should be passed to search requests when configured."""
+        self.enricher.proxy_http = "http://proxy:8080"
+        self.enricher.proxy_https = "https://proxy:8443"
+
+        mock_get = mocker.patch(
+            "auto_archiver.modules.ghostarchive_enricher.ghostarchive_enricher.requests.get",
+        )
+        mock_response = mocker.Mock()
+        mock_response.status_code = 200
+        mock_response.text = SEARCH_HTML_FOUND
+        mock_get.return_value = mock_response
+
+        result = self.enricher._search_existing("https://example.com")
+
+        call_kwargs = mock_get.call_args
+        assert call_kwargs.kwargs.get("proxies") == {"http": "http://proxy:8080", "https": "https://proxy:8443"}
+        assert result is not None
+
+    def test_parse_archive_url_with_replay_links(self):
+        """Parser should ignore /replay/ links and only return /archive/ links."""
+        html = """
+        <html><body>
+        <a href="/archive/replay/w/id-abc/mp_/https://example.com">replay</a>
+        <a href="/archive/Valid1">valid</a>
+        </body></html>
+        """
+        result = self.enricher._parse_archive_url(html)
+        assert result == "https://ghostarchive.org/archive/Valid1"
+
+    def test_parse_archive_url_no_links(self):
+        """Parser should return None when no archive links found."""
+        html = "<html><body><p>No archive here</p></body></html>"
+        result = self.enricher._parse_archive_url(html)
+        assert result is None
+
+    def test_enrich_sets_ghostarchive_on_metadata(self, mocker, make_item):
+        """enrich() should set 'ghostarchive' key on the metadata object."""
+        mocker.patch.object(self.enricher, "_search_existing", return_value="https://ghostarchive.org/archive/Enr1")
+
+        item = make_item("https://example.com")
+        result = self.enricher.enrich(item)
+
+        assert result is True
+        assert item.get("ghostarchive") == "https://ghostarchive.org/archive/Enr1"
+
+    def test_enrich_skips_if_already_enriched(self, mocker, make_item):
+        """enrich() should skip if ghostarchive key is already set."""
+        mock_search = mocker.patch.object(self.enricher, "_search_existing")
+
+        item = make_item("https://example.com", ghostarchive="https://ghostarchive.org/archive/Old1")
+        result = self.enricher.enrich(item)
+
+        assert result is True
+        mock_search.assert_not_called()
+
+    def test_enrich_returns_false_on_failure(self, mocker, make_item):
+        """enrich() should return False when both search and submit fail."""
+        mocker.patch.object(self.enricher, "_search_existing", return_value=None)
+        mocker.patch.object(self.enricher, "_submit_url", return_value=None)
+
+        item = make_item("https://example.com")
+        result = self.enricher.enrich(item)
+
+        assert result is False
+
+    def test_enrich_skips_auth_wall(self, mocker, make_item):
+        """enrich() should skip URLs behind auth walls."""
+        mocker.patch(
+            "auto_archiver.modules.ghostarchive_enricher.ghostarchive_enricher.UrlUtil.is_auth_wall", return_value=True
+        )
+
+        item = make_item("https://example.com/login")
+        result = self.enricher.enrich(item)
+        assert result is False
+
+    def test_enrich_with_existing_archive(self, mocker, make_item):
+        """enrich() should use existing archive when check_existing is True."""
+        mocker.patch.object(self.enricher, "_search_existing", return_value="https://ghostarchive.org/archive/Exist1")
+        mock_submit = mocker.patch.object(self.enricher, "_submit_url")
+
+        item = make_item("https://example.com")
+        result = self.enricher.enrich(item)
+
+        assert result is True
+        assert item.get("ghostarchive") == "https://ghostarchive.org/archive/Exist1"
+        mock_submit.assert_not_called()
+
+    def test_enrich_submits_when_no_existing(self, mocker, make_item):
+        """enrich() should submit URL when no existing archive found."""
+        mocker.patch.object(self.enricher, "_search_existing", return_value=None)
+        mocker.patch.object(self.enricher, "_submit_url", return_value="https://ghostarchive.org/archive/New42")
+
+        item = make_item("https://example.com")
+        result = self.enricher.enrich(item)
+
+        assert result is True
+        assert item.get("ghostarchive") == "https://ghostarchive.org/archive/New42"
+
+    def test_enrich_skips_check_existing_when_disabled(self, mocker, make_item):
+        """enrich() should skip search when check_existing is False."""
+        self.enricher.check_existing = False
+        mock_search = mocker.patch.object(self.enricher, "_search_existing")
+        mocker.patch.object(self.enricher, "_submit_url", return_value="https://ghostarchive.org/archive/Direct1")
+
+        item = make_item("https://example.com")
+        result = self.enricher.enrich(item)
+
+        assert result is True
+        mock_search.assert_not_called()
+
+    @pytest.mark.download
+    def test_real_search_existing(self, setup_module):
+        """Integration test: search for an existing archive on Ghost Archive."""
+        enricher = setup_module("ghostarchive_enricher", ENRICHER_CONFIG)
+        # example.com is commonly archived
+        result = enricher._search_existing("https://example.com")
+        # we just check it doesn't crash; result may or may not be found
+        assert result is None or result.startswith("https://ghostarchive.org/archive/")
+
+    @pytest.mark.download
+    @pytest.mark.skipif(CI, reason="Avoid submitting a real task on every CI run")
+    def test_real_submit_example_com(self, setup_module, make_item):
+        """Integration test: submit example.com to Ghost Archive and verify enrichment."""
+        enricher = setup_module("ghostarchive_enricher", ENRICHER_CONFIG)
+        item = make_item("https://example.com")
+        result = enricher.enrich(item)
+
+        assert result is True
+        archive_url = item.get("ghostarchive")
+        assert archive_url is not None
+        assert archive_url.startswith("https://ghostarchive.org/archive/")
--- a/tests/enrichers/test_json_enricher.py
+++ b/tests/enrichers/test_json_enricher.py
@@ -0,0 +1,72 @@
+"""
+Tests for the JsonEnricher module
+"""
+
+import json
+import os
+import pytest
+
+
+@pytest.fixture
+def json_enricher(setup_module):
+    return setup_module("json_enricher")
+
+
+class TestJsonEnricher:
+    """Test the JsonEnricher functionality."""
+
+    def test_enrich_creates_json_file(self, json_enricher, make_item):
+        """Test that enrich creates a metadata.json file."""
+        item = make_item("https://example.com/test")
+        item.set("title", "Test Title")
+        item.set("description", "Test description")
+
+        json_enricher.enrich(item)
+
+        # Check that a media with id 'metadata_json' was added
+        json_media = item.get_media_by_id("metadata_json")
+        assert json_media is not None
+        assert json_media.filename.endswith("metadata.json")
+        assert os.path.exists(json_media.filename)
+
+    def test_enrich_json_content(self, json_enricher, make_item):
+        """Test that the JSON content is correct."""
+        item = make_item("https://example.com/test")
+        item.set("title", "Test Title")
+        item.set("custom_field", "custom_value")
+
+        json_enricher.enrich(item)
+
+        json_media = item.get_media_by_id("metadata_json")
+        with open(json_media.filename, "r", encoding="utf-8") as f:
+            content = json.load(f)
+
+        # The to_dict() returns nested structure: {status, metadata: {...}, media: [...]}
+        assert content["metadata"]["title"] == "Test Title"
+        assert content["metadata"]["custom_field"] == "custom_value"
+        assert content["metadata"]["url"] == "https://example.com/test"
+
+    def test_enrich_handles_special_characters(self, json_enricher, make_item):
+        """Test that special characters are handled correctly."""
+        item = make_item("https://example.com/test")
+        item.set("title", "Test with émojis 🎉 and üñíçödé")
+
+        json_enricher.enrich(item)
+
+        json_media = item.get_media_by_id("metadata_json")
+        with open(json_media.filename, "r", encoding="utf-8") as f:
+            content = json.load(f)
+
+        # Access the nested metadata structure
+        assert "émojis 🎉" in content["metadata"]["title"]
+        assert "üñíçödé" in content["metadata"]["title"]
+
+    def test_enrich_empty_metadata(self, json_enricher, make_item):
+        """Test enriching metadata with minimal content."""
+        item = make_item("https://example.com/minimal")
+
+        json_enricher.enrich(item)
+
+        json_media = item.get_media_by_id("metadata_json")
+        assert json_media is not None
+        assert os.path.exists(json_media.filename)
--- a/tests/enrichers/test_meta_enricher.py
+++ b/tests/enrichers/test_meta_enricher.py
@@ -33,7 +33,6 @@ def test_enrich_skips_empty_metadata(meta_enricher, mock_metadata):
    """Test that enrich() does nothing when Metadata is empty."""
    mock_metadata.is_empty.return_value = True
    meta_enricher.enrich(mock_metadata)
-    mock_metadata.get_url.assert_called_once()


 def test_enrich_file_sizes(meta_enricher, metadata, tmp_path):
--- a/tests/enrichers/test_metadata_enricher.py
+++ b/tests/enrichers/test_metadata_enricher.py
@@ -56,6 +56,19 @@ def test_enrich_sets_metadata(enricher, mocker):
    assert metadata.media == [media1, media2]


+def test_enrich_no_metadata_selection(enricher, mocker):
+    media1 = mocker.Mock(filename="img1.jpg")
+    media2 = mocker.Mock(filename="img2.jpg")
+    metadata = mocker.Mock()
+    metadata.media = [media1, media2]
+    enricher.get_metadata = lambda f: {"key": "value"} if f == "img1.jpg" else {}
+    enricher.look_for_keys = ["no-key"]
+    enricher.enrich(metadata)
+    media1.set.assert_called_once_with("metadata", {})
+    media2.set.assert_not_called()
+    assert metadata.media == [media1, media2]
+
+
 def test_enrich_empty_media(enricher, mocker):
    metadata = mocker.Mock()
    metadata.media = []
@@ -65,13 +78,15 @@ def test_enrich_empty_media(enricher, mocker):

 def test_get_metadata_error_handling(enricher, mocker):
    mocker.patch("subprocess.run", side_effect=Exception("Test error"))
-    mock_log = mocker.patch("loguru.logger.error")
+    mock_log = mocker.patch("auto_archiver.utils.custom_logger.logger.error")
    result = enricher.get_metadata("test.jpg")
    assert result == {}
    assert "Error occurred: " in mock_log.call_args[0][0]


-def test_metadata_pickle(enricher, unpickle, mocker):
+# TODO depends on the expected functionality
+"""
+def test_default_metadata_pickle(enricher, unpickle, mocker):
    mock_run = mocker.patch("subprocess.run")
    # Uses pickled values
    mock_run.return_value = unpickle("metadata_enricher_exif.pickle")
@@ -79,6 +94,39 @@ def test_metadata_pickle(enricher, unpickle, mocker):
    expected = unpickle("metadata_enricher_ytshort_expected.pickle")
    enricher.enrich(metadata)
    expected_media = expected.media
+    print(expected_media)
    actual_media = metadata.media
+
    assert len(expected_media) == len(actual_media)
    assert actual_media[0].properties.get("metadata") == expected_media[0].properties.get("metadata")
+"""
+
+
+def test_metadata_pickle_megapixel(enricher, unpickle, mocker):
+    mock_run = mocker.patch("subprocess.run")
+    mock_run.return_value = unpickle("metadata_enricher_exif.pickle")
+    metadata = unpickle("metadata_enricher_ytshort_input.pickle")
+
+    enricher.look_for_keys = ["megapixels"]
+    enricher.enrich(metadata)
+    actual_media = metadata.media
+
+    assert actual_media[0].properties.get("metadata") == {"Megapixels": "0.922"}
+
+
+def test_metadata_specify_datetime_and_metapixels(enricher, unpickle, mocker):
+    mock_run = mocker.patch("subprocess.run")
+    mock_run.return_value = unpickle("metadata_enricher_exif.pickle")
+    metadata = unpickle("metadata_enricher_ytshort_input.pickle")
+
+    enricher.look_for_keys = ["datetime", "megapixels", "image height"]
+    enricher.enrich(metadata)
+    actual_media = metadata.media
+
+    assert actual_media[0].properties.get("metadata") == {
+        "File Modification Date/Time": "2025:02:18 19:42:50+00:00",
+        "File Access Date/Time": "2025:02:18 19:42:50+00:00",
+        "File Inode Change Date/Time": "2025:02:18 19:42:50+00:00",
+        "Megapixels": "0.922",
+        "Image Height": "720",
+    }
--- a/tests/enrichers/test_pdq_hash_enricher.py
+++ b/tests/enrichers/test_pdq_hash_enricher.py
@@ -43,7 +43,7 @@ def test_enrich_skip_non_image(metadata_with_images, mocker):
 def test_enrich_handles_corrupted_image(metadata_with_images, mocker):
    mocker.patch("PIL.Image.open", side_effect=UnidentifiedImageError("Corrupted image"))
    mock_pdq = mocker.patch("pdqhash.compute")
-    mock_logger = mocker.patch("loguru.logger.error")
+    mock_logger = mocker.patch("auto_archiver.utils.custom_logger.logger.error")
    enricher = PdqHashEnricher()
    enricher.enrich(metadata_with_images)

--- a/tests/enrichers/test_thumbnail_enricher.py
+++ b/tests/enrichers/test_thumbnail_enricher.py
@@ -75,12 +75,12 @@ def test_enrich_thumbnail_limits(
 def test_enrich_handles_probe_failure(thumbnail_enricher, metadata_with_video, mocker):
    mocker.patch("ffmpeg.probe", side_effect=Exception("Probe error"))
    mocker.patch("os.makedirs")
-    mock_logger = mocker.patch("loguru.logger.warning")
+    mock_logger = mocker.patch("auto_archiver.utils.custom_logger.logger.warning")
    mocker.patch.object(Media, "is_video", return_value=True)

    thumbnail_enricher.enrich(metadata_with_video)
    # Ensure error was logged
-    mock_logger.assert_called_with("cannot generate thumbnails for video.mp4 without valid duration")
+    mock_logger.assert_called_with("Cannot generate thumbnails for video.mp4 without valid duration")
    # Ensure no thumbnails were created
    thumbnails = metadata_with_video.media[0].get("thumbnails")
    assert thumbnails is None
@@ -128,12 +128,12 @@ def test_enrich_handles_short_video(


 def test_uses_existing_duration_on_exception(thumbnail_enricher, metadata_with_video, mock_ffmpeg_environment, mocker):
-    mock_logger = mocker.patch("loguru.logger.warning")
+    mock_logger = mocker.patch("auto_archiver.utils.custom_logger.logger.warning")
    mock_probe = mocker.patch("ffmpeg.probe", side_effect=Exception("New probe error"))
    metadata_with_video.media[0].set("duration", 3)
    thumbnail_enricher.enrich(metadata_with_video)
    mock_probe.assert_called_once()
-    mock_logger.assert_called_with("failed to get duration with FFMPEG from video.mp4: New probe error")
+    mock_logger.assert_called_with("Failed to get duration with FFMPEG from video.mp4: New probe error")
    assert mock_ffmpeg_environment["mock_output"].run.call_count == 3


--- a/tests/enrichers/test_wacz_enricher.py
+++ b/tests/enrichers/test_wacz_enricher.py
@@ -46,7 +46,7 @@ def test_setup_with_docker(wacz_enricher, mocker):

 def test_already_ran(wacz_enricher, metadata, mocker):
    metadata.add_media(Media("test.wacz"), id="browsertrix")
-    mock_log = mocker.patch("loguru.logger.info")
+    mock_log = mocker.patch("auto_archiver.utils.custom_logger.logger.info")
    assert wacz_enricher.enrich(metadata) is True
    assert "WACZ enricher had already been executed" in mock_log.call_args[0][0]

@@ -73,7 +73,7 @@ def test_download_success(wacz_enricher, mocker) -> None:

 def test_enrich_already_executed(wacz_enricher, mocker) -> None:
    """Test enrich  if already executed."""
-    mock_log = mocker.patch("loguru.logger.info")
+    mock_log = mocker.patch("auto_archiver.utils.custom_logger.logger.info")
    metadata = Metadata().set_url("https://example.com")
    media = Media(filename="some_file.wacz")
    metadata.add_media(media, id="browsertrix")
--- a/tests/extractors/test_antibot_extractor_enricher.py
+++ b/tests/extractors/test_antibot_extractor_enricher.py
@@ -5,6 +5,9 @@ from auto_archiver.modules.antibot_extractor_enricher.antibot_extractor_enricher
 from .test_extractor_base import TestExtractorBase


+CI = os.getenv("GITHUB_ACTIONS", "") == "true"
+
+
 class DummySB:
    def __init__(self, url="", title="", visible_texts=None, visible_elements=None):
        self._url = url
@@ -50,15 +53,17 @@ class TestAntibotExtractorEnricher(TestExtractorBase):
    }

    @pytest.mark.download
+    @pytest.mark.flaky(reruns=2, reruns_delay=5)
    @pytest.mark.parametrize(
-        "url,in_title,in_text,image_count,video_count",
+        "url,in_title,in_text,image_count,video_count,skip_ci",
        [
            (
                "https://en.wikipedia.org/wiki/Western_barn_owl",
                "western barn owl",
                "Tyto alba",
-                5,
+                3,  # Reduced due to Wikipedia rate limiting (429 errors)
                0,
+                False,
            ),
            (
                "https://www.bellingcat.com/news/2025/04/29/open-sources-show-myanmar-junta-airstrike-damages-despite-post-earthquake-ceasefire/",
@@ -66,6 +71,7 @@ class TestAntibotExtractorEnricher(TestExtractorBase):
                "Bellingcat has geolocated",
                5,
                0,
+                False,
            ),
            (
                "https://www.bellingcat.com/news/2025/03/27/gaza-israel-palestine-shot-killed-injured-destroyed-dangerous-drone-journalists-in-gaza/",
@@ -73,6 +79,7 @@ class TestAntibotExtractorEnricher(TestExtractorBase):
                "continued the work of Gazan journalists",
                5,
                1,
+                False,
            ),
            (
                "https://www.bellingcat.com/about/general-information",
@@ -80,6 +87,7 @@ class TestAntibotExtractorEnricher(TestExtractorBase):
                "Stichting Bellingcat",
                0,  # SVGs are ignored
                0,
+                False,
            ),
            (
                "https://vk.com/wikipedia?from=search&w=wall-36156673_20451",
@@ -87,13 +95,27 @@ class TestAntibotExtractorEnricher(TestExtractorBase):
                "16 сентября 1985 года лейблом EMI Records.",
                5,
                0,
+                False,
+            ),
+            (
+                "https://www.tiktok.com/@tracy_2424/photo/7418200173953830162",
+                "TikTok",
+                "Dito ko lang",
+                1,
+                0,
+                True,
            ),
        ],
    )
-    def test_download_pages_with_media(self, setup_module, make_item, url, in_title, in_text, image_count, video_count):
+    def test_download_pages_with_media(
+        self, setup_module, make_item, url, in_title, in_text, image_count, video_count, skip_ci
+    ):
        """
        Test downloading pages with media.
        """
+        if CI and skip_ci:
+            pytest.skip("Skipping test in CI environment")
+
        self.extractor = setup_module(
            self.extractor_module,
            self.config
@@ -107,6 +129,7 @@ class TestAntibotExtractorEnricher(TestExtractorBase):
        item = make_item(url)
        result = self.extractor.download(item)

+        assert result, f"download() returned {result!r} — Selenium may have failed (e.g., window close timeout)"
        assert result.status == "antibot", "Expected status to be 'antibot'"

        # Check title contains all required words (case-insensitive)
@@ -121,9 +144,9 @@ class TestAntibotExtractorEnricher(TestExtractorBase):
            )

        image_media = [m for m in result.media if m.is_image() and not m.get("id") == "screenshot"]
-        assert len(image_media) == image_count, f"Expected {image_count} image items, got {len(image_media)}"
+        assert len(image_media) >= image_count, f"Expected at least {image_count} image items, got {len(image_media)}"
        video_media = [m for m in result.media if m.is_video()]
-        assert len(video_media) == video_count, f"Expected {video_count} video items, got {len(video_media)}"
+        assert len(video_media) >= video_count, f"Expected at least {video_count} video items, got {len(video_media)}"

        for expected_id in ["screenshot", "pdf", "html_source_code"]:
            assert any(m.get("id") == expected_id for m in result.media), (
--- a/tests/extractors/test_generic_extractor.py
+++ b/tests/extractors/test_generic_extractor.py
@@ -48,8 +48,6 @@ class TestGenericExtractor(TestExtractorBase):
            ("https://www.youtube.com/watch?v=5qap5aO4i9A", ["youtube"]),
            ("https://www.tiktok.com/@funnycats0ftiktok/video/7345101300750748970?lang=en", ["tiktok"]),
            ("https://www.instagram.com/p/CU1J9JYJ9Zz/", ["instagram"]),
-            ("https://www.facebook.com/nytimes/videos/10160796550110716", ["facebook"]),
-            ("https://www.facebook.com/BylineFest/photos/t.100057299682816/927879487315946/", ["facebook"]),
        ],
    )
    def test_suitable_extractors(self, url, suitable_extractors):
@@ -148,6 +146,7 @@ class TestGenericExtractor(TestExtractorBase):
    def test_bluesky_download_video(self, make_item):
        item = make_item("https://bsky.app/profile/bellingcat.com/post/3le2l4gsxlk2i")
        result = self.extractor.download(item)
+        assert result.get_url() == "https://bsky.app/profile/bellingcat.com/post/3le2l4gsxlk2i"
        assert result is not False

    @pytest.mark.skipif(not TEST_TRUTH_SOCIAL, reason="Truth social download tests disabled in environment variables.")
--- a/tests/extractors/test_instagram_api_extractor.py
+++ b/tests/extractors/test_instagram_api_extractor.py
@@ -1,4 +1,5 @@
 from datetime import datetime
+import math

 import pytest

@@ -147,14 +148,14 @@ class TestInstagramAPIExtractor(TestExtractorBase):

        self.extractor.full_profile = True
        mock_call.side_effect = [mock_user_response, mock_story_response]
-        mock_highlights.return_value = None
+        mock_highlights.return_value = 1
        mock_stories.return_value = mock_story_response
-        mock_posts.return_value = None
-        mock_tagged.return_value = None
+        mock_posts.return_value = 2
+        mock_tagged.return_value = 3

        result = self.extractor.download_profile(metadata, "test_user")
        assert result.get("#stories") == len(mock_story_response)
-        mock_posts.assert_called_once_with(result, "123")
+        mock_posts.assert_called_once_with(result, "123", max_to_download=math.inf)
        assert "errors" not in result.metadata

    def test_download_profile_not_found(self, metadata, mocker):
@@ -175,10 +176,10 @@ class TestInstagramAPIExtractor(TestExtractorBase):

        self.extractor.full_profile = True
        mock_call.side_effect = [mock_user_response, Exception("Stories API failed"), Exception("Posts API failed")]
-        mock_highlights.return_value = None
-        mock_tagged.return_value = None
+        mock_highlights.return_value = 1
+        mock_tagged.return_value = 2
        stories_tagged.return_value = None
-        mock_posts.return_value = None
+        mock_posts.return_value = 4
        result = self.extractor.download_profile(metadata, "test_user")

        assert result.is_success()
--- a/Show More
+++ b/Show More
				`@@ -0,0 +1 @@`
				`from .ghostarchive_enricher import GhostarchiveEnricher`