1546 Commits

Author SHA1 Message Date
msramalho
450065b6fb removes print 2022-03-16 19:56:18 +01:00
msramalho
516db483d6 telethon archiver working for 0,1,1+ media objects 2022-03-16 19:51:02 +01:00
msramalho
c2ae382a4e isloates html page generation logic so it can be reused 2022-03-16 19:50:44 +01:00
msramalho
30787506a1 additional logging 2022-03-16 19:50:29 +01:00
msramalho
0035603bfb telethon-poc 2022-03-15 18:45:53 +01:00
msramalho
3b9b42b854 minor code cleanup 2022-03-15 11:32:39 +01:00
Logan Williams
0304860bce Don't check status for empty URL rows 2022-03-14 11:10:51 +01:00
Logan Williams
aaca6efac1 Merge pull request #19 from bellingcat/screenshots
Merge feature branch
2022-03-14 09:51:57 +01:00
Logan Williams
8d06eae96a Merge pull request #18 from bellingcat/hasing-and-multiple-names
configurable column names + minor improvements
2022-03-14 09:49:32 +01:00
msramalho
07bbf443ca improves documentation 2022-03-13 12:05:09 +01:00
msramalho
4c54926548 offset fix 2022-03-12 20:29:43 +01:00
msramalho
d8d9cf17dc fix offset 2022-03-12 20:25:52 +01:00
msramalho
f121c9dab7 enable tolower 2022-03-12 20:14:16 +01:00
msramalho
67b16064bb offby1 2022-03-12 20:11:38 +01:00
msramalho
ec4ae84487 case-insensitive is a bad idea 2022-03-12 20:06:31 +01:00
msramalho
69483d432c adds logs 2022-03-12 20:04:08 +01:00
msramalho
6e5e7212c2 fixes header offset 2022-03-12 19:56:00 +01:00
msramalho
486c3295b5 log 2022-03-12 19:54:10 +01:00
msramalho
6c5d6f521e implements fresh status retrieval if needed 2022-03-10 19:00:02 +01:00
Logan Williams
d30115935e Merge pull request #16 from bellingcat/screenshots
WIP: screenshots and hashing
2022-03-09 14:59:07 +01:00
msramalho
52333874c9 making column names configurable through the command line 2022-03-09 12:38:04 +01:00
msramalho
077c71f941 fixes index out fo range bug 2022-03-09 12:18:06 +01:00
msramalho
ff874fe0d3 simplifies access to google sheets, single get_values 2022-03-09 12:17:51 +01:00
msramalho
544e7578a6 removes duplicate code 2022-03-09 11:46:14 +01:00
msramalho
59027ac477 simplification 2022-03-09 11:44:19 +01:00
msramalho
39ec190e56 adds README instructions for geckodriver 2022-03-09 11:44:05 +01:00
Logan Williams
82ca6792c4 Fix issue with extracting time from Telegram media posts 2022-03-02 14:45:36 +01:00
Logan Williams
aa4b175dea Fix issue with timestamps being convereted to user format 2022-02-28 12:54:58 +01:00
Logan Williams
c6b159905b Switch to headless Firefox 2022-02-28 11:45:32 +01:00
Logan Williams
6ebce974f0 WIP: Make timezones more consistent in UTC 2022-02-28 08:42:59 +01:00
Logan Williams
2d50703489 Generate archivers for Telegram posts with images; move generation to function in base_archiver 2022-02-28 08:41:45 +01:00
Logan Williams
63a2847ac9 Add header argument; set up webdriver 2022-02-25 16:09:35 +01:00
Logan Williams
09dc5b5b81 Fix issue with query parameters by using urllib 2022-02-25 15:29:56 +01:00
Logan Williams
6a62c5798c Add Twitter non-video archiver 2022-02-25 13:55:43 +01:00
Logan Williams
1eb17e4de5 Add hash and screenshot methods; switch to more recent ytdl fork 2022-02-25 13:54:40 +01:00
Logan Williams
d76e3bc7ec Merge pull request #13 from bellingcat/refactor-archivers
WIP: Refactor archivers
2022-02-25 08:05:22 +01:00
msramalho
8bce84082a minor updates 2022-02-23 18:32:40 +01:00
msramalho
4bbbdcc7fd minor update 2022-02-23 18:30:06 +01:00
msramalho
214d52d36f improved tmp folder management 2022-02-23 16:43:42 +01:00
msramalho
3cafc444fc creates tmp folder if not exists 2022-02-23 16:32:38 +01:00
msramalho
1d62009c4f creates utils module and moves gworkseet there 2022-02-23 16:24:59 +01:00
msramalho
2601313249 removed archivers.py 2022-02-23 16:13:09 +01:00
msramalho
3096725a2b Merge branch 'refactor-archivers' of https://github.com/bellingcat/auto-archiver into refactor-archivers 2022-02-23 16:12:47 +01:00
msramalho
9a264a7dfe cleanup and docs 2022-02-23 16:07:58 +01:00
msramalho
9550cd509e making code more resilient to exceptions 2022-02-23 13:57:11 +01:00
msramalho
644aa0811c todo 2022-02-23 09:57:44 +01:00
msramalho
374852e740 cleanup 2022-02-23 09:57:04 +01:00
msramalho
2d145802b5 extracted worksheet operations 2022-02-23 09:54:03 +01:00
msramalho
e4603a9423 refactoring storage and bringing changes from origin 2022-02-22 16:03:35 +01:00
Logan Williams
07b5d357b4 Fix bugs in WaybackArchiver, follow redirects sometimes 2022-02-22 08:20:45 +01:00