snscrape

mirror of https://github.com/bellingcat/snscrape.git synced 2026-06-08 02:28:29 +03:00

Author	SHA1	Message	Date
JustAnotherArchivist	32a427dac3	Fix pagination on Twitter (fixes #40 )	2019-05-18 01:08:00 +00:00
JustAnotherArchivist	7001983556	Skip timeline entries that don't have a link (fixes #36 )	2019-05-16 23:17:46 +00:00
JustAnotherArchivist	64438afc92	Work around tweet URLs that don't have a data-expanded-url attribute (fixes #38 )	2019-05-16 22:51:22 +00:00
JustAnotherArchivist	9e6538556a	Dump also the deeper frames, not just the get_items one	2019-05-16 22:48:35 +00:00
JustAnotherArchivist	9c8bbf051c	Fix order of processing in Twitter module for more useful locals dump output	2019-05-16 22:22:53 +00:00
JustAnotherArchivist	c6a11298ac	Fix missing linebreak in locals dump output	2019-05-16 22:22:21 +00:00
JustAnotherArchivist	02cbf6ddf6	Dump locals to a temporary file in case of an exception	2019-05-16 18:29:30 +00:00
JustAnotherArchivist	3817aa59d4	Add support for extracting links from tweets (including cards) Both the t.co and the original URLs can be extracted. Note that card links are always t.co since Twitter's HTML does not include the original URL for those.	2019-05-16 16:42:52 +00:00
JustAnotherArchivist	46a51008f8	Fix Instagram signature calculation	2019-05-16 16:19:51 +00:00
JustAnotherArchivist	f91979eb32	Add --max-position option to twitter-search scraper as a workaround for pagination stopping early (#37 ) The value needs to be of the format 'TWEET-<seenID>-<newestID>' where <seenID> is the last result that was returned by a previous scrape and <newestID> is the first result returned by the initial scrape.	2019-05-10 17:30:15 +00:00
JustAnotherArchivist	85fff319bc	Disable Twitter's spelling correction src=typd means "this is what was typed in and could be incorrect". src=spxr is "no, I really mean that". src=sprv appears to be an alias of spxr that is no longer used.	2019-05-10 16:43:59 +00:00
JustAnotherArchivist	6b145526b7	Update README with new modules	2019-04-21 23:10:32 +02:00
JustAnotherArchivist	abf31764b1	Version 0.2.0 v0.2.0	2019-04-21 23:03:21 +02:00
JustAnotherArchivist	64693f74bb	Update Instagram query hash	2019-04-19 01:47:38 +02:00
JustAnotherArchivist	a7d08ed51c	Remove leftover debugging print	2019-04-19 01:40:29 +02:00
JustAnotherArchivist	f48ca7726e	Add support for Gab	2019-04-19 00:40:43 +02:00
JustAnotherArchivist	78c295f7e0	Add support for VKontakte (fixes #13 )	2019-04-18 18:39:21 +02:00
JustAnotherArchivist	a5aca1a14f	Add support for Instagram hashtags (fixes #29 )	2019-04-18 16:14:54 +02:00
JustAnotherArchivist	96f7d871c1	Ignore Scraper subclasses which don't set a name	2019-04-18 16:14:26 +02:00
JustAnotherArchivist	b5dfd37949	Support unix timestamps in --since	2019-04-18 16:01:35 +02:00
JustAnotherArchivist	b511397791	Add --since option to return only results newer than a certain date (fixes #19 )	2019-04-18 15:12:29 +02:00
JustAnotherArchivist	536fcb3303	Return proper items from scrapers including clean URLs (fixes #9 and #10 )	2019-04-18 14:44:21 +02:00
JustAnotherArchivist	f8d812f799	Include permalink.php, events, and notes (fixes #32 )	2019-04-18 04:22:47 +02:00
JustAnotherArchivist	c2cebd9166	Accept-Language header to get an English response unconditionally	2019-04-18 03:58:37 +02:00
JustAnotherArchivist	73bc99596f	Treat Twitter responses without a Content-Type header as invalid (fixes #21 )	2019-04-18 02:24:35 +02:00
JustAnotherArchivist	8458c12218	Rewrite link extraction on Facebook (fixes #17 ) Facebook's returned HTML has a large number of inconsistencies; some (most) pages include a <link rel="canonical" /> but some don't, for example. This was at the root of the failing post extraction for some Facebook pages (#17). The previous link extraction technique was also quite poor for other reasons though. The new method uses the relevant CSS classes instead. Despite probably being the result of a CSS minimiser or similar, these seem to be quite stable: they haven't changed in the past two years (but the more readable ones have!).	2019-04-18 02:14:21 +02:00
JustAnotherArchivist	b59c7e8d8f	Merge pull request #28 from peterk/master Adds socks proxy support (via requests)	2019-03-11 13:32:07 +01:00
Peter Krantz	3ceb849d98	Adds socks proxy support (via requests)	2019-01-10 22:54:42 +01:00
JustAnotherArchivist	f5ee1f7ac5	Merge pull request #26 from ludios/avoid-twitter-bans twitter: randomize user agent to avoid Twitter's (IP, UA)-keyed bans	2018-12-25 02:19:17 +01:00
Ivan Kozik	1984110f78	twitter: randomize user agent to avoid Twitter's (IP, UA)-keyed bans	2018-12-24 08:03:33 +00:00
JustAnotherArchivist	c5a5dcb92c	snscrape is now on PyPI	2018-10-09 17:26:03 +02:00
JustAnotherArchivist	cfb1c9a2aa	Version 0.1.3 v0.1.3	2018-10-01 03:26:22 +02:00
JustAnotherArchivist	d0d3c8b2a6	Better log output for temporary failures (fixes #2 )	2018-10-01 03:24:29 +02:00
JustAnotherArchivist	4d0350e541	Disable "quality filter" on Twitter (fixes #3 )	2018-10-01 02:51:33 +02:00
JustAnotherArchivist	d17aa15bcb	Version 0.1.2 v0.1.2	2018-09-11 12:44:07 +02:00
JustAnotherArchivist	d1ef280d6e	Fix snscrape.modules not getting installed	2018-09-11 12:43:10 +02:00
JustAnotherArchivist	2823272e0b	Version 0.1.1 v0.1.1	2018-09-11 12:30:35 +02:00
JustAnotherArchivist	540f557002	Fix typo in setup.py preventing installation	2018-09-11 12:30:21 +02:00
JustAnotherArchivist	5fc60fe978	Version 0.1 v0.1	2018-09-10 22:15:11 +02:00
JustAnotherArchivist	cf36e8be97	Add README, LICENSE, and metadata	2018-09-10 22:15:03 +02:00
JustAnotherArchivist	0350ab0692	Fix Facebook scraper returning strings instead of Items	2018-09-10 19:38:43 +02:00
JustAnotherArchivist	6b6ae3d33b	Rename from socialmediascraper to snscrape	2018-08-21 22:54:14 +02:00
JustAnotherArchivist	9fb3ac6013	Add support for Google+ user profiles	2018-08-21 18:58:43 +02:00
JustAnotherArchivist	897f5bebe6	Add support for POST requests	2018-08-21 18:58:09 +02:00
JustAnotherArchivist	e28a2cdb4b	Fix Instagram again - __a=1 is no longer supported, so we need to extract the JSON from the HTML page instead. - There is now a X-Instagram-GIS header that needs to be set correctly.	2018-08-21 18:55:40 +02:00
JustAnotherArchivist	5a084af85c	Fix Instagram Instagram dropped the max_id parameter, so it is no longer possible to iterate over the posts so easily. Switch to GraphQL instead, which is what's used in the browser as well.	2018-08-21 18:50:00 +02:00
JustAnotherArchivist	14831d4137	Add support for Facebook user profiles	2018-08-21 18:48:34 +02:00
JustAnotherArchivist	6d54655a7f	Add support for Instagram user profiles	2018-08-21 18:47:44 +02:00
JustAnotherArchivist	3ab69a1a0f	Merge Twitter user and hashtag into one, and add support for generic Twitter search scrapes	2018-08-21 18:46:34 +02:00
JustAnotherArchivist	d03c82d413	Support nested inheritance from socialmediascraper.base.Scraper	2018-08-21 18:44:15 +02:00

... 4 5 6 7 8

355 Commits