Compare commits

..

11 Commits

Author SHA1 Message Date
dependabot[bot]
d8d7c2335f Bump the python group with 14 updates
Bumps the python group with 14 updates:

| Package | From | To |
| --- | --- | --- |
| [pytest-loguru](https://github.com/mcarans/pytest-loguru) | `0.4.0` | `0.4.1` |
| [ruff](https://github.com/astral-sh/ruff) | `0.15.11` | `0.15.20` |
| [sphinxcontrib-mermaid](https://github.com/mgaitan/sphinxcontrib-mermaid) | `1.2.3` | `2.0.2` |
| [telethon](https://telethon.dev) | `1.43.2` | `1.44.0` |
| [google-api-python-client](https://github.com/googleapis/google-api-python-client) | `2.194.0` | `2.198.0` |
| [google-auth-httplib2](https://github.com/googleapis/google-cloud-python) | `0.3.1` | `0.4.0` |
| [google-auth-oauthlib](https://github.com/googleapis/google-cloud-python) | `1.3.1` | `1.4.0` |
| [pillow](https://github.com/python-pillow/Pillow) | `12.2.0` | `12.3.0` |
| [dateparser](https://github.com/scrapinghub/dateparser) | `1.4.0` | `1.4.1` |
| [tqdm](https://github.com/tqdm/tqdm) | `4.67.3` | `4.68.3` |
| [boto3](https://github.com/boto/boto3) | `1.42.94` | `1.43.39` |
| [rich-argparse](https://github.com/hamdanal/rich-argparse) | `1.7.2` | `1.8.0` |
| [cryptography](https://github.com/pyca/cryptography) | `46.0.7` | `49.0.0` |
| [yt-dlp](https://github.com/yt-dlp/yt-dlp) | `2026.3.17` | `2026.6.9` |


Updates `pytest-loguru` from 0.4.0 to 0.4.1
- [Release notes](https://github.com/mcarans/pytest-loguru/releases)
- [Commits](https://github.com/mcarans/pytest-loguru/compare/0.4.0...0.4.1)

Updates `ruff` from 0.15.11 to 0.15.20
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](https://github.com/astral-sh/ruff/compare/0.15.11...0.15.20)

Updates `sphinxcontrib-mermaid` from 1.2.3 to 2.0.2
- [Changelog](https://github.com/mgaitan/sphinxcontrib-mermaid/blob/master/CHANGELOG.md)
- [Commits](https://github.com/mgaitan/sphinxcontrib-mermaid/compare/1.2.3...2.0.2)

Updates `telethon` from 1.43.2 to 1.44.0

Updates `google-api-python-client` from 2.194.0 to 2.198.0
- [Release notes](https://github.com/googleapis/google-api-python-client/releases)
- [Commits](https://github.com/googleapis/google-api-python-client/compare/v2.194.0...v2.198.0)

Updates `google-auth-httplib2` from 0.3.1 to 0.4.0
- [Release notes](https://github.com/googleapis/google-cloud-python/releases)
- [Changelog](https://github.com/googleapis/google-cloud-python/blob/main/CHANGELOG.md)
- [Commits](https://github.com/googleapis/google-cloud-python/compare/google-auth-httplib2-v0.3.1...google-auth-httplib2-v0.4.0)

Updates `google-auth-oauthlib` from 1.3.1 to 1.4.0
- [Release notes](https://github.com/googleapis/google-cloud-python/releases)
- [Changelog](https://github.com/googleapis/google-cloud-python/blob/main/packages/gcp-sphinx-docfx-yaml/CHANGELOG.md)
- [Commits](https://github.com/googleapis/google-cloud-python/compare/google-auth-oauthlib-v1.3.1...google-auth-oauthlib-v1.4.0)

Updates `pillow` from 12.2.0 to 12.3.0
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst)
- [Commits](https://github.com/python-pillow/Pillow/compare/12.2.0...12.3.0)

Updates `dateparser` from 1.4.0 to 1.4.1
- [Release notes](https://github.com/scrapinghub/dateparser/releases)
- [Changelog](https://github.com/scrapinghub/dateparser/blob/master/HISTORY.rst)
- [Commits](https://github.com/scrapinghub/dateparser/compare/v1.4.0...v1.4.1)

Updates `tqdm` from 4.67.3 to 4.68.3
- [Release notes](https://github.com/tqdm/tqdm/releases)
- [Commits](https://github.com/tqdm/tqdm/compare/v4.67.3...v4.68.3)

Updates `boto3` from 1.42.94 to 1.43.39
- [Release notes](https://github.com/boto/boto3/releases)
- [Commits](https://github.com/boto/boto3/compare/1.42.94...1.43.39)

Updates `rich-argparse` from 1.7.2 to 1.8.0
- [Release notes](https://github.com/hamdanal/rich-argparse/releases)
- [Changelog](https://github.com/hamdanal/rich-argparse/blob/main/CHANGELOG.md)
- [Commits](https://github.com/hamdanal/rich-argparse/compare/v1.7.2...v1.8.0)

Updates `cryptography` from 46.0.7 to 49.0.0
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/46.0.7...49.0.0)

Updates `yt-dlp` from 2026.3.17 to 2026.6.9
- [Release notes](https://github.com/yt-dlp/yt-dlp/releases)
- [Changelog](https://github.com/yt-dlp/yt-dlp/blob/master/Changelog.md)
- [Commits](https://github.com/yt-dlp/yt-dlp/compare/2026.03.17...2026.06.09)

---
updated-dependencies:
- dependency-name: pytest-loguru
  dependency-version: 0.4.1
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: python
- dependency-name: ruff
  dependency-version: 0.15.20
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: python
- dependency-name: sphinxcontrib-mermaid
  dependency-version: 2.0.2
  dependency-type: direct:development
  update-type: version-update:semver-major
  dependency-group: python
- dependency-name: telethon
  dependency-version: 1.44.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: python
- dependency-name: google-api-python-client
  dependency-version: 2.198.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: python
- dependency-name: google-auth-httplib2
  dependency-version: 0.4.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: python
- dependency-name: google-auth-oauthlib
  dependency-version: 1.4.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: python
- dependency-name: pillow
  dependency-version: 12.3.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: python
- dependency-name: dateparser
  dependency-version: 1.4.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: python
- dependency-name: tqdm
  dependency-version: 4.68.3
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: python
- dependency-name: boto3
  dependency-version: 1.43.39
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: python
- dependency-name: rich-argparse
  dependency-version: 1.8.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: python
- dependency-name: cryptography
  dependency-version: 49.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: python
- dependency-name: yt-dlp
  dependency-version: 2026.6.9
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: python
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-07-01 20:25:12 +00:00
Miguel Sozinho Ramalho
afbe4fac50 Merge pull request #430 from bellingcat/dev
bug fixes and maintenance
2026-04-27 15:52:39 +01:00
msramalho
e633be1721 version bump 2026-04-27 12:35:54 +01:00
msramalho
bc06de8e5c fixes incomplete yt-dlp parts download 2026-04-27 12:34:47 +01:00
Miguel Sozinho Ramalho
20fddce3a3 Merge pull request #427 from PeterUpfold/deno-container
Fix missing JS runtime config for bguils_po_token_method
2026-04-24 11:08:28 +01:00
msramalho
6efa439cdb dependencies bump 2026-04-23 17:20:54 +01:00
Miguel Sozinho Ramalho
ef77d1fc86 Merge branch 'main' into dev 2026-04-23 14:21:01 +01:00
msramalho
a57a5ee005 adds an extra check when calling pypi as it's led to uncaught ssl errors 2026-04-23 14:20:07 +01:00
msramalho
2582f567ac removes curl/unzip from dockerfile 2026-04-23 14:04:46 +01:00
msramalho
4e5c1a6218 suggested alternative change to deno install 2026-04-23 14:02:51 +01:00
Peter Upfold
12d9c469b2 Add Deno to Dockerfile 2026-04-13 18:19:23 +01:00
10 changed files with 444 additions and 342 deletions

View File

@@ -4,8 +4,7 @@ ENV RUNNING_IN_DOCKER=1 \
LANG=C.UTF-8 \
LC_ALL=C.UTF-8 \
PYTHONDONTWRITEBYTECODE=1 \
PYTHONFAULTHANDLER=1 \
PATH="/root/.local/bin:$PATH"
PYTHONFAULTHANDLER=1
ARG TARGETARCH

643
poetry.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"
[project]
name = "auto-archiver"
version = "1.2.6"
version = "1.2.7"
description = "Automatically archive links to videos, images, and social media content from Google Sheets (and more)."
requires-python = ">=3.10,<3.13"
@@ -27,20 +27,20 @@ dependencies = [
"bs4 (>=0.0.0)",
"loguru (>=0.0.0)",
"ffmpeg-python (>=0.0.0)",
"telethon (>=0.0.0)",
"google-api-python-client (>=0.0.0)",
"google-auth-httplib2 (>=0.0.0)",
"google-auth-oauthlib (>=0.0.0)",
"telethon (>=1.44.0)",
"google-api-python-client (>=2.198.0)",
"google-auth-httplib2 (>=0.4.0)",
"google-auth-oauthlib (>=1.4.0)",
"oauth2client (>=0.0.0)",
"pdqhash (>=0.0.0)",
"pillow (>=0.0.0)",
"pillow (>=12.3.0)",
"python-slugify (>=0.0.0)",
"dateparser (>=0.0.0)",
"dateparser (>=1.4.1)",
"python-twitter-v2 (>=0.0.0)",
"instaloader (>=0.0.0)",
"tqdm (>=0.0.0)",
"tqdm (>=4.68.3)",
"jinja2 (>=0.0.0)",
"boto3 (>=1.28.0,<2.0.0)",
"boto3 (>=1.43.39,<2.0.0)",
"dataclasses-json (>=0.0.0)",
"numpy (==2.1.3)",
"requests[socks] (>=0.0.0)",
@@ -48,13 +48,13 @@ dependencies = [
"jsonlines (>=0.0.0)",
"pysubs2 (>=0.0.0)",
"retrying (>=0.0.0)",
"rich-argparse (>=1.6.0,<2.0.0)",
"rich-argparse (>=1.8.0,<2.0.0)",
"ruamel-yaml (>=0.18.10,<0.19.0)",
"rfc3161-client (>=1.0.5)",
"cryptography (>=46.0.3)",
"cryptography (>=49.0.0)",
"opentimestamps (>=0.4.5,<0.5.0)",
"bgutil-ytdlp-pot-provider (>=1.0.0)",
"yt-dlp[curl-cffi,default] (>=2025.5.22)",
"yt-dlp[curl-cffi,default] (>=2026.6.9)",
"secretstorage (>=3.3.3,<4.0.0)",
"seleniumbase (>=4.36.4,<5.0.0)",
"pyautogui (>=0.9.54,<0.10.0)",
@@ -64,15 +64,15 @@ dependencies = [
[tool.poetry.group.dev.dependencies]
pytest = "^8.3.4"
autopep8 = "^2.3.1"
pytest-loguru = "^0.4.0"
pytest-loguru = "^0.4.1"
pytest-mock = "^3.14.0"
ruff = "^0.15.2"
ruff = "^0.15.20"
pre-commit = "^4.1.0"
[tool.poetry.group.docs.dependencies]
sphinx = "^8.1.3"
sphinx-autoapi = "^3.4.0"
sphinxcontrib-mermaid = "^1.0.0"
sphinxcontrib-mermaid = "^2.0.2"
sphinx-autobuild = "^2024.10.3"
sphinx-copybutton = "^0.5.2"
myst-parser = "^4.0.0"

View File

@@ -11,6 +11,7 @@ Key Functionalities:
from __future__ import annotations
import hashlib
import os
from typing import Any, List, Union, Dict
from dataclasses import dataclass, field
from dataclasses_json import dataclass_json
@@ -186,6 +187,9 @@ class Metadata:
continue
h = m.get("hash")
if not h:
if not os.path.exists(m.filename):
logger.warning(f"Skipping missing media file: {m.filename}")
continue
h = calculate_hash_in_chunks(hashlib.sha256(), int(1.6e7), m.filename)
if len(h) and h in media_hashes:
continue

View File

@@ -467,7 +467,11 @@ Here's how that would look: \n\nsteps:\n extractors:\n - [your_extractor_name_
return self.setup_complete_parser(basic_config, yaml_config, unused_args)
def check_for_updates(self):
response = requests.get("https://pypi.org/pypi/auto-archiver/json").json()
try:
response = requests.get("https://pypi.org/pypi/auto-archiver/json", timeout=10).json()
except Exception as e:
logger.debug(f"Unable to check for updates: {e}")
return
latest_version = version.parse(response["info"]["version"])
current_version = version.parse(__version__)
# check version compared to current version

View File

@@ -575,6 +575,8 @@ class GenericExtractor(Extractor):
"--live-from-start" if self.live_from_start else "--no-live-from-start",
"--postprocessor-args",
"ffmpeg:-bitexact", # ensure bitexact output to avoid mismatching hashes for same video
"--js-runtimes",
"node", # yt-dlp defaults to deno-only; node is available in the base image
]
# proxy handling

View File

@@ -120,6 +120,9 @@ def ydl_entry_to_filename(ydl, entry: dict) -> str:
directory = os.path.dirname(base_filename) # '/get/path/to'
basename = os.path.basename(base_filename) # 'file'
for f in os.listdir(directory):
# skip incomplete downloads left behind by yt-dlp
if f.endswith(".part"):
continue
if (
f.startswith(basename)
or (entry_url and os.path.splitext(f)[0] in entry_url)

View File

@@ -86,6 +86,22 @@ def test_media_management(basic_metadata, media_file):
assert basic_metadata.get_media_by_id("m1") == media1
def test_remove_duplicate_skips_missing_files(basic_metadata, media_file, tmp_path):
"""Missing files should be dropped instead of crashing with FileNotFoundError."""
real_file = tmp_path / "exists.txt"
real_file.write_text("content")
valid = media_file(filename=str(real_file), hash_value="abc")
missing = media_file(filename="/nonexistent/path/gone.mp4")
basic_metadata.add_media(valid, "valid")
basic_metadata.add_media(missing, "missing")
assert len(basic_metadata.media) == 2
basic_metadata.remove_duplicate_media_by_hash()
assert len(basic_metadata.media) == 1
assert basic_metadata.get_media_by_id("valid") == valid
def test_success():
m = Metadata()
assert not m.is_success()

View File

@@ -1,5 +1,6 @@
import pytest
from argparse import ArgumentParser, ArgumentTypeError
from requests.exceptions import SSLError
from auto_archiver.core.orchestrator import ArchivingOrchestrator
from auto_archiver.version import __version__
from auto_archiver.core.config import read_yaml, store_yaml
@@ -256,3 +257,34 @@ def test_load_failed_extractor_cleanup(test_args, mocker, caplog):
assert "Error during setup of modules: Test exception" in caplog.text
# make sure the 'cleanup' is called
assert "cleanup" in caplog.text
def test_check_for_updates_ssl_error(orchestrator, mocker):
"""check_for_updates should not raise when the HTTP request fails."""
mocker.patch(
"auto_archiver.core.orchestrator.requests.get",
side_effect=SSLError("SSL handshake failed"),
)
# should not raise
orchestrator.check_for_updates()
def test_check_for_updates_timeout(orchestrator, mocker):
"""check_for_updates should not raise on connection timeout."""
from requests.exceptions import ConnectionError
mocker.patch(
"auto_archiver.core.orchestrator.requests.get",
side_effect=ConnectionError("Connection refused"),
)
orchestrator.check_for_updates()
def test_check_for_updates_new_version_available(orchestrator, mocker):
"""check_for_updates should not raise when a newer version exists."""
mocker.patch(
"auto_archiver.core.orchestrator.requests.get",
return_value=mocker.Mock(json=lambda: {"info": {"version": "99.0.0"}}),
)
# should complete without error
orchestrator.check_for_updates()

View File

@@ -14,6 +14,7 @@ from auto_archiver.utils.misc import (
calculate_file_hash,
random_str,
get_timestamp,
ydl_entry_to_filename,
)
@@ -139,3 +140,47 @@ class TestMiscUtils:
def test_invalid_timestamp_returns_none(self):
assert get_timestamp("invalid-date") is None
class TestYdlEntryToFilename:
"""Tests for ydl_entry_to_filename, especially .part file filtering."""
def _make_mock_ydl(self, prepared_filename):
class MockYDL:
def prepare_filename(self, entry):
return prepared_filename
return MockYDL()
def test_returns_exact_file_if_exists(self, tmp_path):
video = tmp_path / "video.mp4"
video.write_bytes(b"data")
ydl = self._make_mock_ydl(str(video))
assert ydl_entry_to_filename(ydl, {}) == str(video)
def test_skips_part_file_returns_complete(self, tmp_path):
"""Simulates yt-dlp leaving a .part file from a failed format
while a complete .webm exists."""
(tmp_path / "f5U3IKfoSYs.f399.mp4.part").write_bytes(b"incomplete")
webm = tmp_path / "f5U3IKfoSYs.webm"
webm.write_bytes(b"complete video")
# ydl.prepare_filename returns the expected .mp4 which doesn't exist
ydl = self._make_mock_ydl(str(tmp_path / "f5U3IKfoSYs.mp4"))
result = ydl_entry_to_filename(ydl, {})
assert result == str(webm)
assert not result.endswith(".part")
def test_skips_part_file_returns_false_if_no_other_match(self, tmp_path):
"""Only a .part file exists — should return False."""
(tmp_path / "video.f399.mp4.part").write_bytes(b"incomplete")
ydl = self._make_mock_ydl(str(tmp_path / "video.mp4"))
assert ydl_entry_to_filename(ydl, {}) is False
def test_returns_false_when_no_files_match(self, tmp_path):
(tmp_path / "unrelated.txt").write_bytes(b"data")
ydl = self._make_mock_ydl(str(tmp_path / "video.mp4"))
assert ydl_entry_to_filename(ydl, {}) is False