Commit Graph

  • 7fdc8bcb53 Randomise user agent when the guest token can't be found JustAnotherArchivist 2021-12-14 20:04:46 +00:00
  • 4b3c6aefe7 Add default values to user and tweet scrapers for a more untuitive usage JustAnotherArchivist 2021-12-12 04:57:16 +00:00
  • 525cd71225 Retry guest token retrieval JustAnotherArchivist 2021-12-12 00:10:59 +00:00
  • 72abff9e5c Reuse guest tokens across scrapes JustAnotherArchivist 2021-12-11 23:18:42 +00:00
  • bcaa477b3d Update list of scrapers JustAnotherArchivist 2021-12-08 08:29:02 +00:00
  • 66d4c99f82 Remove dev version notice JustAnotherArchivist 2021-12-08 08:25:21 +00:00
  • 0ac50f1383 Add README to package metadata JustAnotherArchivist 2021-12-08 08:18:25 +00:00
  • c2257ad16e Add Python 3.10 classifier JustAnotherArchivist 2021-12-08 08:15:05 +00:00
  • 58f654405f Add --citation JustAnotherArchivist 2021-12-08 07:51:28 +00:00
  • 35fb61a327 Fix crash on dumping scopes which have a variable pointing to a dataclass JustAnotherArchivist 2021-11-24 03:39:06 +00:00
  • a6b6f3faaa Throw an error on empty arguments JustAnotherArchivist 2021-10-10 17:43:27 +00:00
  • 5e829e2541 Refactor class instantiation to remove the need to repeat 'retries' everywhere JustAnotherArchivist 2021-09-30 09:58:10 +00:00
  • d4567da23c Improve list of scrapers on --help output JustAnotherArchivist 2021-09-30 09:35:17 +00:00
  • e5e0da25a0 Remove unused imports JustAnotherArchivist 2021-09-30 09:24:18 +00:00
  • 821326bcfb Fix a few f-strings JustAnotherArchivist 2021-09-30 09:23:56 +00:00
  • 4bf9ef239c Restructure usage section JustAnotherArchivist 2021-09-30 09:18:43 +00:00
  • e382891642 Fix Twitter trends not having a str representation JustAnotherArchivist 2021-09-21 21:40:50 +00:00
  • e5f4389464 Add Twitter trend scraper JustAnotherArchivist 2021-09-21 21:28:41 +00:00
  • d91f971f51 Refactor user label implementation and add support for bot accounts JustAnotherArchivist 2021-09-21 19:39:40 +00:00
  • 67e8295293 Merge pull request #280 from edsu/master JustAnotherArchivist 2021-09-19 03:35:49 +00:00
  • 5fc2562642 Add user label support on entity retrieval JustAnotherArchivist 2021-09-19 03:32:35 +00:00
  • 2825bd0a73 Remove accidental empty line JustAnotherArchivist 2021-09-19 03:31:56 +00:00
  • 9831f2a4a0 missing ext Ed Summers 2021-09-16 13:31:47 -04:00
  • a11eef6b06 User label url Ed Summers 2021-09-16 13:04:57 -04:00
  • 3fb731ade1 User Labels Ed Summers 2021-09-16 08:06:05 -04:00
  • c76f1637ce Handle 403s from Twitter search JustAnotherArchivist 2021-08-30 23:29:20 +00:00
  • ed117e8891 Log response status code and redirects JustAnotherArchivist 2021-08-29 18:26:00 +00:00
  • f9a3fafb3f Fix --cursor on twitter-search JustAnotherArchivist 2021-08-01 20:59:16 +00:00
  • 660b8c7a0a Retry empty result sets from Twitter as a workaround for random early stops JustAnotherArchivist 2021-07-18 23:59:52 +00:00
  • 0c22608dc7 Extract video view count JustAnotherArchivist 2021-07-01 17:58:45 +00:00
  • 2bb706feda Dump request and response attributes of RequestExceptions JustAnotherArchivist 2021-06-30 21:44:02 +00:00
  • 5e6bc4ec50 Fix type of content field (may be None on text-less posts) JustAnotherArchivist 2021-05-27 00:33:12 +00:00
  • 57d0aaafc1 Remove dirtyUrl which does not appear to be used anymore by Instagram JustAnotherArchivist 2021-05-27 00:32:03 +00:00
  • 157e4d4265 Fix default value of username field JustAnotherArchivist 2021-05-27 00:29:33 +00:00
  • 54588e9c42 Add support for fetching top instead of live/chronological tweets JustAnotherArchivist 2021-05-23 03:24:30 +00:00
  • 9e7274f3d7 Clean up params dict construction JustAnotherArchivist 2021-05-23 03:24:11 +00:00
  • ac4e335bdb Clean up duplicated default values JustAnotherArchivist 2021-05-23 03:03:32 +00:00
  • 1d255de48d Add hashtags and cashtags JustAnotherArchivist 2021-05-23 02:51:38 +00:00
  • 9c1dcd37f9 Add Tweet.{inReplyToTweetId,inReplyToUser} JustAnotherArchivist 2021-05-23 02:44:40 +00:00
  • f8dac183d0 Fix type of User.id JustAnotherArchivist 2021-05-23 02:43:53 +00:00
  • 45d1fa27de Add twitter-tweet scraper for retrieving tweets by ID, including scroll and recursion modes JustAnotherArchivist 2021-05-23 02:12:13 +00:00
  • 98b798b0e5 Remove obsolete twitter-thread scraper JustAnotherArchivist 2021-05-22 22:37:21 +00:00
  • f18b64e7da Add support for scraping Twitter users by ID JustAnotherArchivist 2021-05-22 21:17:14 +00:00
  • 460be9d581 Add _type attribute on all JSON objects, remove separate attribute on Twitter media JustAnotherArchivist 2021-05-22 18:14:54 +00:00
  • 97c8caea48 Set Accept-Language header on API requests to English JustAnotherArchivist 2021-04-20 01:50:14 +00:00
  • a34f93076a Merge pull request #218 from NoeCampos22/Place_Data JustAnotherArchivist 2021-04-20 01:45:22 +00:00
  • 8f1c470061 Tweet.place to Place dataclass NoeCampos22 2021-04-19 15:13:33 -05:00
  • dbf2a2f689 Get more data from the place NoeCampos22 2021-04-19 12:01:14 -05:00
  • 39a34a57ac Handle API endpoints that don't include geolocation data (e.g. twitter-profile scraper) JustAnotherArchivist 2021-04-13 20:15:42 +00:00
  • f44b39705a Fix coordinate extraction from place bounding boxes JustAnotherArchivist 2021-04-06 20:53:05 +00:00
  • f64ce217b7 Merge pull request #209 from Lukpier/master JustAnotherArchivist 2021-04-06 16:19:33 +00:00
  • cdf87f4b8f Retrieve tweet location Luca Pierri 2021-04-04 22:11:30 +02:00
  • 47fbc2a84d Add note on features exclusive to the dev version JustAnotherArchivist 2021-02-24 19:39:45 +00:00
  • 5cd3b7d7cc Fix crash on rare weird 503 responses from Twitter without content JustAnotherArchivist 2021-01-26 22:39:02 +00:00
  • 0121fa51c2 Fix crash on users with a broken URL in the profile description JustAnotherArchivist 2021-01-26 18:33:34 +00:00
  • 892941b609 Fix crash on reposts of hidden profiles JustAnotherArchivist 2020-12-13 23:22:17 +00:00
  • e3022628b6 Fix crash on photo reposts JustAnotherArchivist 2020-12-13 22:46:28 +00:00
  • fdc33d0dba Include properties in JSON representation JustAnotherArchivist 2020-11-05 05:55:26 +00:00
  • 6d6411cc24 Fix KeyError on entity for inexistent Twitter accounts JustAnotherArchivist 2020-11-03 23:21:28 +00:00
  • 61a1ecffc5 Merge pull request #141 from gitshrl/twitter/split-source-url-label JustAnotherArchivist 2020-10-27 18:44:10 +00:00
  • d2dce37fa0 add the original tweet source sahrul 2020-10-27 13:21:21 +07:00
  • d65f0434da split source into url and label sahrul 2020-10-26 16:46:10 +07:00
  • 7499384110 Merge pull request #131 from gitshrl/facebook/fix-group-pagination JustAnotherArchivist 2020-10-21 15:08:50 +00:00
  • 7a0f68b7ec fix pagination for facebook group scraper sahrul 2020-10-21 21:30:00 +07:00
  • 1a219fd2b6 Merge pull request #129 from gitshrl/facebook/fix-group-scraper JustAnotherArchivist 2020-10-21 14:03:59 +00:00
  • 6fb98dae12 update base url for facebook group scraper sahrul 2020-10-21 19:57:02 +07:00
  • 8c2c0fa47a Remove workaround for http://bugs.python.org/issue16308 as snscrape requires 3.8+ now anyway JustAnotherArchivist 2020-10-18 20:25:54 +00:00
  • 58c8365c33 Add test extra requirements JustAnotherArchivist 2020-10-18 20:03:29 +00:00
  • 2c11ec38fa Replace requests.models with plain requests JustAnotherArchivist 2020-10-18 02:35:55 +00:00
  • fe5e23502d collections.deque support and other minor improvements to snscrape._cli._repr JustAnotherArchivist 2020-10-18 02:10:35 +00:00
  • 644cd1d2fb Add support for various further complicated types to snscrape._cli._repr JustAnotherArchivist 2020-10-18 01:42:45 +00:00
  • 5ccfab6314 Add .gitignore JustAnotherArchivist 2020-10-18 01:14:04 +00:00
  • bf895ea5b1 Minor README cleanup JustAnotherArchivist 2020-10-17 21:21:20 +00:00
  • e956e2562b Replace pkg_resources with importlib.metadata JustAnotherArchivist 2020-10-17 21:16:45 +00:00
  • defe874bf4 Fix date extraction on VK JustAnotherArchivist 2020-10-17 02:22:15 +00:00
  • 3f8935ee4d Fix crash on video reposts JustAnotherArchivist 2020-10-17 02:20:40 +00:00
  • cd12500dbf Fix date extraction on quoted posts JustAnotherArchivist 2020-10-17 02:13:27 +00:00
  • 5dc61d50ac Add support for outlinks, photos, videos, and quoted posts on VK JustAnotherArchivist 2020-10-17 00:07:26 +00:00
  • 11a82e110a Remove obsolete comment JustAnotherArchivist 2020-10-16 18:37:51 +00:00
  • 16ebe8bf48 Introduce dedicated IntWithGranularity type and deprecate the direct *Granularity fields JustAnotherArchivist 2020-10-16 18:20:47 +00:00
  • 1bbe25647a Refactor deprecated properties JustAnotherArchivist 2020-10-16 18:11:52 +00:00
  • e22b461563 Add Python 3.9 classifier JustAnotherArchivist 2020-10-16 01:27:17 +00:00
  • c4a5715e18 Fix Facebook user and community scrapers JustAnotherArchivist 2020-10-16 01:20:50 +00:00
  • 5cb64faa72 Formally deprecate the already deprecated item attributes JustAnotherArchivist 2020-10-16 00:55:55 +00:00
  • 0f78aa45fc Refactor --format handling to avoid conversion to dict JustAnotherArchivist 2020-10-16 00:55:14 +00:00
  • 179112a310 Fix --format JustAnotherArchivist 2020-10-16 00:27:13 +00:00
  • 4ce9ed4eb3 Add --progress option that prints a status update every 100 results and at the end JustAnotherArchivist 2020-10-16 00:00:43 +00:00
  • 11414cb68f Rename cli module to make it clear that it is considered private API JustAnotherArchivist 2020-10-15 23:47:07 +00:00
  • bd53e729a0 Replace named tuples with dataclasses and move JSON conversion logic to the base classes JustAnotherArchivist 2020-10-15 23:41:30 +00:00
  • ffd9289edc Reduce the logging level of retryable retrieval errors from WARNING to INFO JustAnotherArchivist 2020-10-11 22:29:27 +00:00
  • b1a7b9607f Skip individual Telegram photo/video links JustAnotherArchivist 2020-10-07 01:27:26 +00:00
  • 119e53d07c Fix Telegram post URL extraction JustAnotherArchivist 2020-10-07 01:15:51 +00:00
  • c3e2e12369 Deprecate outlinksss JustAnotherArchivist 2020-10-01 22:00:26 +00:00
  • a70b361176 Use more assignment expressions where appropriate JustAnotherArchivist 2020-10-01 21:41:44 +00:00
  • 8b68f1a8af Fix link previews for pure-image previews JustAnotherArchivist 2020-10-01 18:56:55 +00:00
  • c72bf3174f Use assignment expressions for cleaner code JustAnotherArchivist 2020-10-01 18:54:57 +00:00
  • 472cef2382 Add support for link previews JustAnotherArchivist 2020-10-01 18:51:14 +00:00
  • b1d8475a03 Fix link extraction on Telegram JustAnotherArchivist 2020-10-01 18:29:08 +00:00
  • 3d3faf80bf Add python_requires to make it even clearer that 3.8+ is required JustAnotherArchivist 2020-09-26 16:32:00 +00:00
  • bbb372284b Bump Python version in README JustAnotherArchivist 2020-09-26 15:56:55 +00:00