mirror of
https://github.com/bellingcat/vk-url-scraper.git
synced 2026-06-12 21:38:36 +03:00
adds command line interface
This commit is contained in:
35
README.md
35
README.md
@@ -1,11 +1,37 @@
|
||||
# vk-url-scraper
|
||||
Library to scrape data and especially media links (videos and photos) from vk.com URLs.
|
||||
|
||||
You can use it via the [command line](#command-line-usage) or as a [python library](#python-library-usage).
|
||||
|
||||
## Quick usage API
|
||||
`pip install vk-url-scraper` to install.
|
||||
## Installation
|
||||
You can install the most recent release from [pypi](https://pypi.org/project/vk-url-scraper/) via `pip install vk-url-scraper`.
|
||||
|
||||
To use the library you will need a valid username/password combination for vk.com.
|
||||
|
||||
## Command line usage
|
||||
```bash
|
||||
# run this to learn more about the parameters
|
||||
vk_url_scraper --help
|
||||
|
||||
# scrape a URL and get the JSON result in the console
|
||||
vk_url_scraper -username "username here" --password "password here" --urls https://vk.com/wall12345_6789
|
||||
# OR
|
||||
vk_url_scraper -u "username here" -p "password here" --urls https://vk.com/wall12345_6789
|
||||
# you can also have multiple urls
|
||||
vk_url_scraper -u "username here" -p "password here" --urls https://vk.com/wall12345_6789 https://vk.com/photo-12345_6789 https://vk.com/video12345_6789
|
||||
|
||||
|
||||
# save the JSON output into a file
|
||||
vk_url_scraper -u "username here" -p "password here" --urls https://vk.com/wall12345_6789 > output.json
|
||||
|
||||
# download any photos or videos found in these URLS
|
||||
# this will use or create an output/ folder and dump the files there
|
||||
vk_url_scraper -u "username here" -p "password here" --download --urls https://vk.com/wall12345_6789
|
||||
# or
|
||||
vk_url_scraper -u "username here" -p "password here" -d --urls https://vk.com/wall12345_6789
|
||||
```
|
||||
|
||||
## Python library usage
|
||||
```python
|
||||
from vk_url_scraper import VkScraper
|
||||
|
||||
@@ -41,6 +67,8 @@ print(res[0]["text]) # eg: -> to get the text from code
|
||||
see [docs] for all available functions.
|
||||
|
||||
### TODO
|
||||
* scrape album links
|
||||
* scrape profile links
|
||||
* docs online from sphinx
|
||||
|
||||
## Development
|
||||
@@ -54,6 +82,9 @@ see [docs] for all available functions.
|
||||
3. To test: `pytest .` (`pytest -v --color=yes --doctest-modules tests/ vk_url_scraper/` to user verbose, colors, and test docstring examples)
|
||||
3. `make docs` to generate shpynx docs -> edit [config.py](docs/source/conf.py) if needed
|
||||
|
||||
To test the command line interface available in [__main__.py](__vk_url_scraper/__main__.py) you need to pass the `-m` option to python like so: `python -m vk_url_scraper -u "" -p "" --urls ...`
|
||||
|
||||
|
||||
## Releasing new version
|
||||
1. edit [version.py](vk_url_scraper/version.py) with proper versioning
|
||||
2. run `./scripts/release.sh` to create a tag and push, alternatively
|
||||
|
||||
Reference in New Issue
Block a user