JustAnotherArchivist
a34f93076a
Merge pull request #218 from NoeCampos22/Place_Data
...
Extract more information on Twitter places
2021-04-20 01:45:22 +00:00
NoeCampos22
8f1c470061
Tweet.place to Place dataclass
2021-04-19 15:13:33 -05:00
NoeCampos22
dbf2a2f689
Get more data from the place
...
Data like the country, place type and the single place name are now also returned on the JSON.
2021-04-19 12:01:14 -05:00
JustAnotherArchivist
39a34a57ac
Handle API endpoints that don't include geolocation data (e.g. twitter-profile scraper)
...
Fixes #215
2021-04-13 20:15:42 +00:00
JustAnotherArchivist
f44b39705a
Fix coordinate extraction from place bounding boxes
2021-04-06 20:53:05 +00:00
JustAnotherArchivist
f64ce217b7
Merge pull request #209 from Lukpier/master
...
Add tweet location (place full name & geo coordinates) where available
2021-04-06 16:19:33 +00:00
Luca Pierri
cdf87f4b8f
Retrieve tweet location
2021-04-06 16:08:34 +00:00
JustAnotherArchivist
47fbc2a84d
Add note on features exclusive to the dev version
...
Cf. #195
2021-02-24 19:39:45 +00:00
JustAnotherArchivist
5cd3b7d7cc
Fix crash on rare weird 503 responses from Twitter without content
2021-01-26 22:39:02 +00:00
JustAnotherArchivist
0121fa51c2
Fix crash on users with a broken URL in the profile description
2021-01-26 18:33:34 +00:00
JustAnotherArchivist
892941b609
Fix crash on reposts of hidden profiles
2020-12-13 23:22:17 +00:00
JustAnotherArchivist
e3022628b6
Fix crash on photo reposts
2020-12-13 22:46:28 +00:00
JustAnotherArchivist
fdc33d0dba
Include properties in JSON representation
...
This fixes the lack of the profile URL on Twitter users because it's generated using the username rather than set explicitly as a field.
2020-11-05 05:55:26 +00:00
JustAnotherArchivist
6d6411cc24
Fix KeyError on entity for inexistent Twitter accounts
2020-11-03 23:21:28 +00:00
JustAnotherArchivist
61a1ecffc5
Merge pull request #141 from gitshrl/twitter/split-source-url-label
...
Split tweet source into URL and label
2020-10-27 18:44:10 +00:00
sahrul
d2dce37fa0
add the original tweet source
2020-10-27 13:21:21 +07:00
sahrul
d65f0434da
split source into url and label
2020-10-26 16:46:10 +07:00
JustAnotherArchivist
7499384110
Merge pull request #131 from gitshrl/facebook/fix-group-pagination
...
Fix pagination error for Facebook group scraper
2020-10-21 15:08:50 +00:00
sahrul
7a0f68b7ec
fix pagination for facebook group scraper
2020-10-21 21:30:00 +07:00
JustAnotherArchivist
1a219fd2b6
Merge pull request #129 from gitshrl/facebook/fix-group-scraper
...
Update base URL for Facebook group scraper
2020-10-21 14:03:59 +00:00
sahrul
6fb98dae12
update base url for facebook group scraper
2020-10-21 19:57:02 +07:00
JustAnotherArchivist
8c2c0fa47a
Remove workaround for http://bugs.python.org/issue16308 as snscrape requires 3.8+ now anyway
2020-10-18 20:25:54 +00:00
JustAnotherArchivist
58c8365c33
Add test extra requirements
2020-10-18 20:03:29 +00:00
JustAnotherArchivist
2c11ec38fa
Replace requests.models with plain requests
...
requests.models is all but undocumented, and the three types needed here are all in the requests namespace as well.
2020-10-18 02:35:55 +00:00
JustAnotherArchivist
fe5e23502d
collections.deque support and other minor improvements to snscrape._cli._repr
2020-10-18 02:12:09 +00:00
JustAnotherArchivist
644cd1d2fb
Add support for various further complicated types to snscrape._cli._repr
2020-10-18 01:42:45 +00:00
JustAnotherArchivist
5ccfab6314
Add .gitignore
2020-10-18 01:14:04 +00:00
JustAnotherArchivist
bf895ea5b1
Minor README cleanup
2020-10-17 21:21:20 +00:00
JustAnotherArchivist
e956e2562b
Replace pkg_resources with importlib.metadata
2020-10-17 21:16:45 +00:00
JustAnotherArchivist
defe874bf4
Fix date extraction on VK
...
Only the most recent posts have the nice timestamp property...
2020-10-17 02:22:15 +00:00
JustAnotherArchivist
3f8935ee4d
Fix crash on video reposts
2020-10-17 02:20:40 +00:00
JustAnotherArchivist
cd12500dbf
Fix date extraction on quoted posts
2020-10-17 02:13:27 +00:00
JustAnotherArchivist
5dc61d50ac
Add support for outlinks, photos, videos, and quoted posts on VK
2020-10-17 00:07:26 +00:00
JustAnotherArchivist
11a82e110a
Remove obsolete comment
...
Cf. f296f9d2
2020-10-16 18:37:51 +00:00
JustAnotherArchivist
16ebe8bf48
Introduce dedicated IntWithGranularity type and deprecate the direct *Granularity fields
2020-10-16 18:20:47 +00:00
JustAnotherArchivist
1bbe25647a
Refactor deprecated properties
2020-10-16 18:11:52 +00:00
JustAnotherArchivist
e22b461563
Add Python 3.9 classifier
2020-10-16 01:27:17 +00:00
JustAnotherArchivist
c4a5715e18
Fix Facebook user and community scrapers
...
Facebook is redirecting the previous user agent to the mobile site; use current Firefox ESR instead.
2020-10-16 01:20:50 +00:00
JustAnotherArchivist
5cb64faa72
Formally deprecate the already deprecated item attributes
2020-10-16 00:55:55 +00:00
JustAnotherArchivist
0f78aa45fc
Refactor --format handling to avoid conversion to dict
2020-10-16 00:55:14 +00:00
JustAnotherArchivist
179112a310
Fix --format
...
Broken by the switch to dataclasses in bd53e729
2020-10-16 00:27:13 +00:00
JustAnotherArchivist
4ce9ed4eb3
Add --progress option that prints a status update every 100 results and at the end
...
Closes #116
2020-10-16 00:00:43 +00:00
JustAnotherArchivist
11414cb68f
Rename cli module to make it clear that it is considered private API
2020-10-15 23:47:07 +00:00
JustAnotherArchivist
bd53e729a0
Replace named tuples with dataclasses and move JSON conversion logic to the base classes
...
Named tuples were never really adequate for this since the order aspect of them doesn't make sense.
Further, named tuples don't support multiple inheritance. This meant that the objects returned by get_items() were not actually Items, for example. Since Python 3.9, such named tuples cannot be created anymore.
Fixes #111
2020-10-15 23:44:28 +00:00
JustAnotherArchivist
ffd9289edc
Reduce the logging level of retryable retrieval errors from WARNING to INFO
...
There is no real need to report these as WARNINGs as snscrape tries and in most cases manages to recover. Without --verbose, snscrape's output can be confusing (see #76 ). If the retries fail as well, snscrape will still log that as an ERROR and crash loudly.
2020-10-11 22:29:27 +00:00
JustAnotherArchivist
b1a7b9607f
Skip individual Telegram photo/video links
2020-10-07 01:27:26 +00:00
JustAnotherArchivist
119e53d07c
Fix Telegram post URL extraction
2020-10-07 01:15:51 +00:00
JustAnotherArchivist
c3e2e12369
Deprecate outlinksss
2020-10-01 22:00:26 +00:00
JustAnotherArchivist
a70b361176
Use more assignment expressions where appropriate
2020-10-01 21:45:25 +00:00
JustAnotherArchivist
8b68f1a8af
Fix link previews for pure-image previews
...
... and any other preview that doesn't have all the things for some reason.
2020-10-01 18:56:55 +00:00