Patrick Robertson
4f2b9baa73
refactor youtubedlp archiver to work for all valid websites
...
1. Extract more metadata
2. Better extract thumbnail
3. Setup framework for specific sites to provide more granular metadata processing
2025-01-15 17:46:47 +01:00
Patrick Robertson
c3dd19f309
Sniff filetype of downloaded media and add extension
...
Also download in chunks - fixes 2 x TODOs
2025-01-15 17:46:47 +01:00
Patrick Robertson
eebd040e13
Merge pull request #163 from bellingcat/feat/unittest
...
CI Unit tests
2025-01-14 17:26:34 +01:00
Patrick Robertson
6f10270baf
Remove unittest and switch to pytest fully
2025-01-14 16:28:39 +01:00
Patrick Robertson
cef4037ad5
Add documentation on running tests to the readme
2025-01-14 11:30:06 +01:00
Patrick Robertson
1b1af2f0b1
Revert change to twitter_archiver
...
As per discussion at: https://github.com/bellingcat/auto-archiver/pull/165#discussion_r1905930837
2025-01-14 10:30:41 +01:00
Patrick Robertson
8f17a235f3
Switch to ubuntu-22.04 for CI tests
...
An issue with oscrypto means it currently does not work on 24.04. Ref: https://github.com/wbond/oscrypto/issues/78#issuecomment-2565688091
2025-01-14 10:24:14 +01:00
Patrick Robertson
ab2eb3c7f5
Add dev dependencies to poetry
2025-01-13 20:42:08 +01:00
Patrick Robertson
bdfedfcf61
Merge branch 'main' into feat/unittest
2025-01-13 19:50:47 +01:00
Erin Clark
9cdaea873b
Merge pull request #164 from bellingcat/ec_add_poetry
...
Migrate to Poetry
2025-01-13 18:49:15 +00:00
erinhmclark
84ee1b422f
Update and restrict versions of Poetry and Python.
2025-01-13 17:42:51 +00:00
Patrick Robertson
b9aea99de8
Prettify pytest output
2025-01-13 18:41:24 +01:00
Patrick Robertson
52f064908e
Add unit test badges to readme
2025-01-13 18:33:22 +01:00
Patrick Robertson
9b596e59d6
Run expensive download tests once per week, on a month at 2:35pm
...
(time is offset from the hour to alleviate high load on Github
2025-01-13 18:33:02 +01:00
Patrick Robertson
528b78db85
Flag tombstone tweets for twitter_syndication method
2025-01-13 18:17:24 +01:00
Patrick Robertson
57eacdc24a
Merge branch 'main' into feat/unittest
2025-01-13 18:06:55 +01:00
Patrick Robertson
bbef80de4c
Add unit tests for html_formatter, csv_db
2025-01-13 17:58:10 +01:00
Patrick Robertson
930d78096a
Merge pull request #162 from bellingcat/small_issues
...
Fix two small issues
2025-01-13 16:39:59 +01:00
Patrick Robertson
2353f9d6a5
Separate CI for download tests and core tests
2025-01-13 16:27:46 +01:00
Patrick Robertson
63973e2ce7
switch to pytest and pytest-recording
2025-01-13 16:23:20 +01:00
erinhmclark
e9a7f435a3
Add package dist directory to .gitignore
2025-01-13 13:33:23 +00:00
Patrick Robertson
e2bc84ccb9
Merge branch 'main' into feat/unittest
2025-01-13 13:15:13 +01:00
erinhmclark
72a8e76fbb
Update README.md for usage with Poetry.
2025-01-12 20:21:23 +00:00
erinhmclark
c69a5fa1c9
Refactor Dockerfile for multi-stage builds.
...
Combining environment and runtime stages due to Poetry's dependency on source code.
2025-01-12 12:38:12 +00:00
erinhmclark
d80b4b7557
Remove snscrape and Python 3.12 restriction.
2025-01-12 12:15:56 +00:00
erinhmclark
cc490f9c10
Updated Dockerfile (not optimised yet)
2025-01-12 12:15:56 +00:00
erinhmclark
08e83eb94e
Update pyproject.toml configuration for Poetry version 2.0.0.
2025-01-12 12:15:56 +00:00
erinhmclark
dd822b8b44
Update poetry.lock
2025-01-12 12:15:56 +00:00
erinhmclark
4a63ca7753
Update PyPi workflow to read python version from pyproject.toml.
2025-01-12 12:15:56 +00:00
erinhmclark
6d5b0090d9
Pull version from pyproject.toml file/
2025-01-12 12:15:56 +00:00
erinhmclark
26abd6f7ae
Added TODO comment for adding a version restriction.
2025-01-12 12:15:56 +00:00
erinhmclark
dba8f46016
Replaced comments for python-publish.yaml workflow.
2025-01-12 12:15:56 +00:00
erinhmclark
50e8c93477
Updated workflow for python-publish.yaml to use poetry (untested), and cleanup of pipenv files.
2025-01-12 12:15:56 +00:00
erinhmclark
6da837b374
Add note to update dynamic versioning and references to version.
2025-01-12 12:15:56 +00:00
erinhmclark
660ee82c67
Update Dockerfile for poetry.
...
Note: Review security with curl installation. Currently locked to known version, but additional checks could be added.
2025-01-12 12:15:56 +00:00
erinhmclark
5490947657
Add packaging to Poetry.
2025-01-12 12:15:56 +00:00
erinhmclark
fd9a6c26ed
Create Poetry environment.
...
Required addition of transitive package (pyOpenSSL) and version restrictions on cryptography, boto3.
2025-01-12 12:15:56 +00:00
Patrick Robertson
3546d4ad79
Fix 'download_syndication' method for tweet archiving (now requires a token)
...
Plus add in unit tests for token generation + download syndication
2025-01-12 12:55:00 +01:00
Patrick Robertson
c932fb7416
Improved logging when an invalid/deleted tweet is attempted to be downloaded
...
Plus: unit tests for non-existent tweet + invalid tweet ID
2025-01-12 12:00:45 +01:00
Patrick Robertson
f29950905c
Merge branch 'main' into small_issues
2025-01-12 11:47:55 +01:00
Patrick Robertson
8e99d62c97
Merge pull request #165 from bellingcat/fix/snscrape
...
Remove snscrape from the twitter_archiver
2025-01-09 11:06:14 +01:00
Patrick Robertson
9dc4eb35de
Switch to pytest and use vcr for request storing
2025-01-08 11:25:13 +01:00
Patrick Robertson
8c044c15f0
Add base test class for archivers with boilerplate code
...
Plus: create test class for twitter archiver. Currently WIP
2025-01-08 10:38:56 +01:00
Patrick Robertson
ab9335bb7a
Merge branch 'main' into feat/unittest
2025-01-08 10:35:45 +01:00
Patrick Robertson
add83c9650
Remove snscrape from twitter_archiver
...
1. snscrape twitter downloader no longer works (ref: https://github.com/JustAnotherArchivist/snscrape/issues/1045 )
2. snscrape limits python to versions <3.12
2025-01-07 19:40:19 +01:00
Miguel Sozinho Ramalho
a697f0a212
adds an unauthenticated Bluesky archiver ( #160 )
...
* adds a TODO for next code iterations
* implements bsky archiver
* adds new archiver to example orchestration file
* Fix downloading media for posts with multiple images
(Images are stored in media/images)
* Setup a basic framework for unit tests
Use 'python -m unittest' from the project root to run
---------
Co-authored-by: Patrick Robertson <robertson.patrick@gmail.com >
2025-01-07 10:28:07 +00:00
Patrick Robertson
bffa3a6254
Merge pull request #159 from bellingcat/print_pdf
...
Add 'print_pdf' option to the screenshot enricher. Fixes #132
2025-01-06 18:13:38 +01:00
Miguel Sozinho Ramalho
ef471f41e1
adds better debug for wayback failures ( #161 )
2025-01-06 16:49:11 +00:00
Patrick Robertson
928518cda7
Allow setting cookies for yt-dl ( #158 )
2025-01-06 16:19:53 +00:00
Patrick Robertson
1bd017000e
Add Github CI test workflow
2024-12-31 15:20:33 +01:00