JustAnotherArchivist
722bfd5f7c
Handle Twitter tombstones
...
Fixes #63
2020-05-29 22:12:37 +00:00
JustAnotherArchivist
b6cc3180d9
Force TwitterThreadScraper and TwitterListMembersScraper to fetch the old design
v0.3.1
2020-03-04 00:40:49 +00:00
JustAnotherArchivist
613395d1c2
Port TwitterSearchScraper to redesign
...
Fixes #57
2020-03-04 00:40:49 +00:00
JustAnotherArchivist
82a87b7b5a
Merge pull request #53 from JackDallas/add-more-insta-fields
...
Add more fields to the instagram scraper
2020-02-09 23:48:59 +00:00
Jack Dallas
9568028bf9
Update changed fields
2020-02-07 11:30:16 +00:00
JustAnotherArchivist
6df351772e
Fix crash in Facebook scraper on link-less entries
2020-02-05 16:15:10 +00:00
JustAnotherArchivist
541173b0c8
Merge pull request #54 from jodizzle/fix/vkontakte-user
...
Fix vkontakte-user: pagination returns JSON now, and handle some unscrapable profiles.
2020-02-05 14:56:12 +00:00
Jody Leonard
b6772d3778
vkontakte-user: Handle additional un-scrapeable profile case
2019-10-31 16:01:29 -04:00
Jody Leonard
20ea117a2c
Fix vkontakte-user pagination
2019-10-30 22:29:49 -04:00
JackDallas
ff54c350bc
Add more fields to the instagram scraper
2019-08-30 12:43:02 +01:00
JustAnotherArchivist
e6aae35304
Use setuptools_scm for versioning through git tags
v0.3.0
2019-07-01 17:41:18 +00:00
JustAnotherArchivist
b698a201f5
Update scraper list
2019-07-01 16:05:21 +00:00
JustAnotherArchivist
7fe72cf708
Add a note about reporting issues with proper debugging information
2019-07-01 16:01:11 +00:00
JustAnotherArchivist
4651cde447
Refactor CLI logging and add --dump-locals for better debugging
2019-07-01 15:46:10 +00:00
JustAnotherArchivist
c99cc4b5d3
Remove existing logging handlers
2019-07-01 15:42:06 +00:00
JustAnotherArchivist
628074d6fc
Print contents when ignoring a link-less entry
2019-07-01 01:35:00 +00:00
JustAnotherArchivist
64b293bd9e
Add support for media sets
...
Closes #48
2019-07-01 01:34:17 +00:00
JustAnotherArchivist
180f4dfeb7
Add support for photo.php URLs
...
Fixes #42
2019-06-30 18:36:39 +00:00
JustAnotherArchivist
6d6e3fa16c
Fix crash on (some?) inexistent groups
2019-06-30 18:36:30 +00:00
JustAnotherArchivist
5f7e6936c1
Add support for Facebook groups
...
Closes #47
2019-06-30 17:16:09 +00:00
JustAnotherArchivist
e2c05c9e0c
Split common code off into FacebookCommonScraper and refactor odd link detection in preparation of group scraping
2019-06-30 16:28:33 +00:00
JustAnotherArchivist
14e11b28d2
Add support for Twitter lists
...
Closes #46
2019-06-30 14:39:29 +00:00
JustAnotherArchivist
1a07b3b7e8
Add support for Twitter threads
2019-06-30 02:11:46 +00:00
JustAnotherArchivist
4d8cc7bdb9
Extract outlinks from Facebook
2019-06-27 15:29:05 +00:00
JustAnotherArchivist
eec83f181e
Check HTTP status code before attempting parsing
2019-06-27 15:25:26 +00:00
JustAnotherArchivist
fae7432c64
Log details about failed JSON parsing
2019-06-27 15:25:08 +00:00
JustAnotherArchivist
757818474d
Add tweet ID and username fields to Tweet items
2019-06-23 11:48:54 +00:00
JustAnotherArchivist
e6c934c0b8
Retrieve as many posts at once as possible for Instagram hashtags
2019-06-21 09:56:12 +00:00
JustAnotherArchivist
d2315feec1
Add support for Instagram locations
2019-06-21 09:55:30 +00:00
JustAnotherArchivist
765ceeeb10
More complete and more readable exception dump
2019-06-18 14:25:38 +00:00
JustAnotherArchivist
731a2e8c8b
Check that Instagram returned valid JSON, take 2
...
Fixes #22
2019-06-10 15:03:15 +00:00
JustAnotherArchivist
7d1916292c
Twitter: stop recursion based on whether the server returns the same position instead of detecting an empty feed
...
Fixes #37
2019-06-10 14:38:25 +00:00
JustAnotherArchivist
0d509c4ba0
Check that Instagram returned valid JSON ( fixes #22 )
2019-05-30 15:04:05 +00:00
JustAnotherArchivist
907a003a59
Fix crash when Twitter search produces no results ( fixes #41 )
2019-05-24 11:51:50 +00:00
JustAnotherArchivist
8ada279b57
Add warning if Twitter module gets no results
2019-05-24 11:50:39 +00:00
JustAnotherArchivist
900eae54a6
Ignore branded content link on Facebook silently
2019-05-24 11:49:44 +00:00
JustAnotherArchivist
7989af27b5
Handle tweets by temporarily blocked accounts (which show up in the search results but don't have a date or content)
2019-05-21 22:37:43 +00:00
JustAnotherArchivist
e528ca3f26
Dump locals only for snscrape modules ( closes #39 )
2019-05-18 01:08:49 +00:00
JustAnotherArchivist
32a427dac3
Fix pagination on Twitter ( fixes #40 )
2019-05-18 01:08:00 +00:00
JustAnotherArchivist
7001983556
Skip timeline entries that don't have a link ( fixes #36 )
2019-05-16 23:17:46 +00:00
JustAnotherArchivist
64438afc92
Work around tweet URLs that don't have a data-expanded-url attribute ( fixes #38 )
2019-05-16 22:51:22 +00:00
JustAnotherArchivist
9e6538556a
Dump also the deeper frames, not just the get_items one
2019-05-16 22:48:35 +00:00
JustAnotherArchivist
9c8bbf051c
Fix order of processing in Twitter module for more useful locals dump output
2019-05-16 22:22:53 +00:00
JustAnotherArchivist
c6a11298ac
Fix missing linebreak in locals dump output
2019-05-16 22:22:21 +00:00
JustAnotherArchivist
02cbf6ddf6
Dump locals to a temporary file in case of an exception
2019-05-16 18:29:30 +00:00
JustAnotherArchivist
3817aa59d4
Add support for extracting links from tweets (including cards)
...
Both the t.co and the original URLs can be extracted. Note that card links are always t.co since Twitter's HTML does not include the original URL for those.
2019-05-16 16:42:52 +00:00
JustAnotherArchivist
46a51008f8
Fix Instagram signature calculation
2019-05-16 16:19:51 +00:00
JustAnotherArchivist
f91979eb32
Add --max-position option to twitter-search scraper as a workaround for pagination stopping early ( #37 )
...
The value needs to be of the format 'TWEET-<seenID>-<newestID>' where <seenID> is the last result that was returned by a previous scrape and <newestID> is the first result returned by the initial scrape.
2019-05-10 17:30:15 +00:00
JustAnotherArchivist
85fff319bc
Disable Twitter's spelling correction
...
src=typd means "this is what was typed in and could be incorrect". src=spxr is "no, I really mean that". src=sprv appears to be an alias of spxr that is no longer used.
2019-05-10 16:43:59 +00:00
JustAnotherArchivist
6b145526b7
Update README with new modules
2019-04-21 23:10:32 +02:00