Patrick Robertson
bdfedfcf61
Merge branch 'main' into feat/unittest
2025-01-13 19:50:47 +01:00
Erin Clark
9cdaea873b
Merge pull request #164 from bellingcat/ec_add_poetry
...
Migrate to Poetry
2025-01-13 18:49:15 +00:00
Patrick Robertson
528b78db85
Flag tombstone tweets for twitter_syndication method
2025-01-13 18:17:24 +01:00
Patrick Robertson
57eacdc24a
Merge branch 'main' into feat/unittest
2025-01-13 18:06:55 +01:00
Patrick Robertson
63973e2ce7
switch to pytest and pytest-recording
2025-01-13 16:23:20 +01:00
erinhmclark
d80b4b7557
Remove snscrape and Python 3.12 restriction.
2025-01-12 12:15:56 +00:00
erinhmclark
6d5b0090d9
Pull version from pyproject.toml file/
2025-01-12 12:15:56 +00:00
erinhmclark
6da837b374
Add note to update dynamic versioning and references to version.
2025-01-12 12:15:56 +00:00
Patrick Robertson
3546d4ad79
Fix 'download_syndication' method for tweet archiving (now requires a token)
...
Plus add in unit tests for token generation + download syndication
2025-01-12 12:55:00 +01:00
Patrick Robertson
c932fb7416
Improved logging when an invalid/deleted tweet is attempted to be downloaded
...
Plus: unit tests for non-existent tweet + invalid tweet ID
2025-01-12 12:00:45 +01:00
Patrick Robertson
f29950905c
Merge branch 'main' into small_issues
2025-01-12 11:47:55 +01:00
Patrick Robertson
add83c9650
Remove snscrape from twitter_archiver
...
1. snscrape twitter downloader no longer works (ref: https://github.com/JustAnotherArchivist/snscrape/issues/1045 )
2. snscrape limits python to versions <3.12
2025-01-07 19:40:19 +01:00
Miguel Sozinho Ramalho
a697f0a212
adds an unauthenticated Bluesky archiver ( #160 )
...
* adds a TODO for next code iterations
* implements bsky archiver
* adds new archiver to example orchestration file
* Fix downloading media for posts with multiple images
(Images are stored in media/images)
* Setup a basic framework for unit tests
Use 'python -m unittest' from the project root to run
---------
Co-authored-by: Patrick Robertson <robertson.patrick@gmail.com >
2025-01-07 10:28:07 +00:00
Patrick Robertson
bffa3a6254
Merge pull request #159 from bellingcat/print_pdf
...
Add 'print_pdf' option to the screenshot enricher. Fixes #132
2025-01-06 18:13:38 +01:00
Miguel Sozinho Ramalho
ef471f41e1
adds better debug for wayback failures ( #161 )
2025-01-06 16:49:11 +00:00
Patrick Robertson
928518cda7
Allow setting cookies for yt-dl ( #158 )
2025-01-06 16:19:53 +00:00
Patrick Robertson
0c803f15a5
Fix showing preview images in the .html file when using local storage
...
Local storage media urls are prefixed with '/', previously only http(s) media preview src were displayed
2024-12-31 09:29:31 +01:00
Patrick Robertson
a46f9997ea
Better logging when there's a timestamp parse error
2024-12-31 09:28:08 +01:00
msramalho
83da9ae089
adds pdf preview support for html formatter
2024-12-23 18:19:26 +00:00
Patrick Robertson
663c8ad93a
Add 'print_pdf' option to the screenshot enricher. Fixes #132
2024-12-20 07:14:03 +01:00
msramalho
e49550163f
adds proxy_server option to wacz
2024-10-06 10:45:34 +06:00
msramalho
e6f5981afc
numpy version downgrade
2024-10-06 10:10:04 +06:00
msramalho
c62bf1a34d
yt-dlp version bump
2024-10-05 17:43:07 +06:00
msramalho
b166d57e61
v0.12.0 bump
2024-08-21 13:34:34 +01:00
msramalho
004143a58a
version bump v0.11.6
2024-07-18 11:27:39 +01:00
msramalho
1e375bd740
version bump
2024-05-14 16:42:15 +01:00
Miguel Sozinho Ramalho
f8824691dd
refactors free twitter archiver strategies ( #142 )
2024-05-14 16:23:33 +01:00
msramalho
012cc36609
removes deprecated datetime method
2024-05-14 15:54:50 +01:00
Miguel Sozinho Ramalho
7cfe1e39cc
#135 fix cleanup of telethon session files ( #139 )
...
* closes #135
* version bump
2024-04-16 12:45:45 +01:00
Jett Chen
cf8691bad7
Add yt-dlp based archiving for TwitterArchiver ( #138 )
...
* Add ytdlp archiving capability
* Add type annotation
* version bump
---------
Co-authored-by: msramalho <19508417+msramalho@users.noreply.github.com >
2024-04-15 19:54:55 +01:00
R. Miles McCain
f603400d0d
Add direct Atlos integration ( #137 )
...
* Add Atlos feeder
* Add Atlos db
* Add Atlos storage
* Fix Atlos storages
* Fix Atlos feeder
* Only include URLs in Atlos feeder once they're processed
* Remove print
* Add Atlos documentation to README
* Formatting fixes
* Don't archive existing material
* avoid KeyError in atlos_db
* version bump
---------
Co-authored-by: msramalho <19508417+msramalho@users.noreply.github.com >
2024-04-15 19:25:17 +01:00
msramalho
eb37f0b45b
version bump
2024-04-15 19:02:54 +01:00
msramalho
75497f5773
minor bug fix when using an archiver_enricher in enrichers only
2024-04-15 19:02:40 +01:00
msramalho
9c7824de57
browsertrix docker updates
2024-04-15 19:01:55 +01:00
msramalho
f4827770e6
adds instagram no stories as success, and fix for telethon-based archivers.
2024-03-05 14:49:10 +00:00
msramalho
601572d76e
strip url
2024-02-29 11:54:01 +00:00
msramalho
d21e79a272
general security updates
2024-02-29 11:40:30 +00:00
msramalho
ccf5f857ef
adds configurable limits to instagram/youtube
2024-02-25 15:14:17 +00:00
msramalho
7de317d1b5
avoiding exception
2024-02-23 15:54:33 +00:00
msramalho
70075a1e5e
improving insta archiver
2024-02-23 15:37:28 +00:00
msramalho
5b9bc4919a
version bump
2024-02-23 14:08:23 +00:00
msramalho
f0158ffd9c
adds tagged posts and better parsing
2024-02-23 14:08:17 +00:00
msramalho
bfb35a43a9
adds more details from yt-dlp
2024-02-23 14:08:05 +00:00
msramalho
ef5b39c4f1
dind exception
2024-02-22 18:05:56 +00:00
msramalho
24ceafcb64
missing forward slash
2024-02-22 17:47:13 +00:00
msramalho
9fd4bb56a8
new attempt at dind wacz
2024-02-22 17:24:27 +00:00
msramalho
5324d562ba
cleanup wacz patch
2024-02-21 18:14:30 +00:00
msramalho
5bf0a0206d
version update
2024-02-21 17:26:07 +00:00
msramalho
4941823565
fix growing volume size in wacz_enricher
2024-02-21 17:25:55 +00:00
msramalho
27310c2911
fixes issue with api requests
2024-02-21 12:25:05 +00:00