JustAnotherArchivist
1480260e47
Handle Telegram channels without public posts
2020-08-24 17:54:30 +00:00
JustAnotherArchivist
c8d688d39f
Fix crash on Telegram pages without a description
2020-08-24 17:53:50 +00:00
JustAnotherArchivist
9df4352089
Fix crash on VK pages without an info div
2020-08-24 17:42:33 +00:00
JustAnotherArchivist
dd25fd0526
Add support for extracting the entity behind a scrape
...
Closes #11
Backwards incompatibility: snscrape.modules.twitter.Account is now called User. However, this was previously only used on the list member scraper, which has been broken for a while since the list member list is no longer publicly accessible.
For compatibility reasons, the CLI does not output the entity by default; the new option --with-entity enables it.
2020-08-24 01:38:27 +00:00
JustAnotherArchivist
c90fd54b6b
Make datetime.date serialisable
2020-08-24 01:12:38 +00:00
JustAnotherArchivist
9528df48cd
Refactor base URL handling
2020-08-24 01:12:06 +00:00
JustAnotherArchivist
924c35f883
Refactor guest token extraction code
2020-08-22 22:59:43 +00:00
JustAnotherArchivist
588ec415ff
Force TwitterThreadScraper to fetch the old design (take 2)
2020-08-12 17:19:42 +00:00
JustAnotherArchivist
bf229414ba
Add JSONL output format
2020-08-12 15:09:02 +00:00
JustAnotherArchivist
afa819547d
Update README
2020-08-11 22:18:04 +00:00
JustAnotherArchivist
dbcdc159ef
Add support for scraping Facebook page visitor posts aka 'Community'
...
Closes #18
2020-08-11 22:14:27 +00:00
JustAnotherArchivist
30f945897a
Clean Facebook group post URLs
...
Most of the time, the URLs are already clean, but occasionally, Facebook includes tracking parameters (__xts__[0], __tn__)...
2020-08-11 20:48:14 +00:00
JustAnotherArchivist
eee5794ff9
Extract Facebook group post in chronological order (instead of by last comment)
...
Fixes #66
2020-08-11 20:47:42 +00:00
JustAnotherArchivist
966a6ebd8e
Skip promoted tweets/ads
...
Fixes #67
2020-08-11 20:28:35 +00:00
JustAnotherArchivist
4d3d0fe0d7
Update search API parameter values to the ones currently used on Twitter
2020-08-11 20:26:56 +00:00
JustAnotherArchivist
7b967ff82a
Twitter reverted their guest token change ( 90f9598e)
v0.3.4
2020-07-08 22:07:18 +00:00
JustAnotherArchivist
90f9598ecc
Adjust to Twitter's new method of handing out guest tokens
...
Fixes #64
v0.3.3
2020-06-24 21:22:58 +00:00
JustAnotherArchivist
7b3c7deb28
Catch login redirects on Instagram
v0.3.2
2020-05-30 00:56:34 +00:00
JustAnotherArchivist
040a11656c
Update README
2020-05-30 00:53:52 +00:00
JustAnotherArchivist
1459245258
Consistently raise ScraperException on fatal errors
2020-05-30 00:53:49 +00:00
JustAnotherArchivist
dbe4c5ce55
Remove Google+ module
...
Google+ was mostly shut down in early 2019. What remained (Google+ for G Suite) was renamed to Google Currents and is for internal communication only (and therefore out of scope for snscrape).
2020-05-30 00:35:06 +00:00
JustAnotherArchivist
80491ecc2c
Remove Gab module
...
Since Gab's move to a fork of Mastodon in July 2019, the module had been broken, and a new module would better be written from scratch as the platform changed entirely.
2020-05-30 00:23:33 +00:00
JustAnotherArchivist
1a71b58101
Add support for Telegram
...
Closes #50
2020-05-29 23:44:01 +00:00
JustAnotherArchivist
0ce37a69d4
Log exception details on crashes
2020-05-29 22:29:23 +00:00
JustAnotherArchivist
722bfd5f7c
Handle Twitter tombstones
...
Fixes #63
2020-05-29 22:12:37 +00:00
JustAnotherArchivist
b6cc3180d9
Force TwitterThreadScraper and TwitterListMembersScraper to fetch the old design
v0.3.1
2020-03-04 00:40:49 +00:00
JustAnotherArchivist
613395d1c2
Port TwitterSearchScraper to redesign
...
Fixes #57
2020-03-04 00:40:49 +00:00
JustAnotherArchivist
82a87b7b5a
Merge pull request #53 from JackDallas/add-more-insta-fields
...
Add more fields to the instagram scraper
2020-02-09 23:48:59 +00:00
Jack Dallas
9568028bf9
Update changed fields
2020-02-07 11:30:16 +00:00
JustAnotherArchivist
6df351772e
Fix crash in Facebook scraper on link-less entries
2020-02-05 16:15:10 +00:00
JustAnotherArchivist
541173b0c8
Merge pull request #54 from jodizzle/fix/vkontakte-user
...
Fix vkontakte-user: pagination returns JSON now, and handle some unscrapable profiles.
2020-02-05 14:56:12 +00:00
Jody Leonard
b6772d3778
vkontakte-user: Handle additional un-scrapeable profile case
2019-10-31 16:01:29 -04:00
Jody Leonard
20ea117a2c
Fix vkontakte-user pagination
2019-10-30 22:29:49 -04:00
JackDallas
ff54c350bc
Add more fields to the instagram scraper
2019-08-30 12:43:02 +01:00
JustAnotherArchivist
e6aae35304
Use setuptools_scm for versioning through git tags
v0.3.0
2019-07-01 17:41:18 +00:00
JustAnotherArchivist
b698a201f5
Update scraper list
2019-07-01 16:05:21 +00:00
JustAnotherArchivist
7fe72cf708
Add a note about reporting issues with proper debugging information
2019-07-01 16:01:11 +00:00
JustAnotherArchivist
4651cde447
Refactor CLI logging and add --dump-locals for better debugging
2019-07-01 15:46:10 +00:00
JustAnotherArchivist
c99cc4b5d3
Remove existing logging handlers
2019-07-01 15:42:06 +00:00
JustAnotherArchivist
628074d6fc
Print contents when ignoring a link-less entry
2019-07-01 01:35:00 +00:00
JustAnotherArchivist
64b293bd9e
Add support for media sets
...
Closes #48
2019-07-01 01:34:17 +00:00
JustAnotherArchivist
180f4dfeb7
Add support for photo.php URLs
...
Fixes #42
2019-06-30 18:36:39 +00:00
JustAnotherArchivist
6d6e3fa16c
Fix crash on (some?) inexistent groups
2019-06-30 18:36:30 +00:00
JustAnotherArchivist
5f7e6936c1
Add support for Facebook groups
...
Closes #47
2019-06-30 17:16:09 +00:00
JustAnotherArchivist
e2c05c9e0c
Split common code off into FacebookCommonScraper and refactor odd link detection in preparation of group scraping
2019-06-30 16:28:33 +00:00
JustAnotherArchivist
14e11b28d2
Add support for Twitter lists
...
Closes #46
2019-06-30 14:39:29 +00:00
JustAnotherArchivist
1a07b3b7e8
Add support for Twitter threads
2019-06-30 02:11:46 +00:00
JustAnotherArchivist
4d8cc7bdb9
Extract outlinks from Facebook
2019-06-27 15:29:05 +00:00
JustAnotherArchivist
eec83f181e
Check HTTP status code before attempting parsing
2019-06-27 15:25:26 +00:00
JustAnotherArchivist
fae7432c64
Log details about failed JSON parsing
2019-06-27 15:25:08 +00:00