mirror of
https://github.com/bellingcat/vk-url-scraper.git
synced 2026-06-11 21:08:37 +03:00
57 lines
1.8 KiB
Markdown
57 lines
1.8 KiB
Markdown
# vk-url-scraper
|
|
Library to scrape data and especially media links (videos and photos) from vk.com URLs.
|
|
|
|
|
|
# TODO
|
|
* docs online from sphinx
|
|
|
|
## Quick usage
|
|
`pip install vk-url-scraper` to install.
|
|
|
|
|
|
```python
|
|
from vk_url_scraper import VkScraper
|
|
|
|
vks = VkScraper("username", "password")
|
|
|
|
# scrape any "photo" URL
|
|
res = vks.scrape("https://vk.com/photo1_278184324?rev=1")
|
|
|
|
# scrape any "wall" URL
|
|
res = vks.scrape("https://vk.com/wall-1_398461")
|
|
|
|
# scrape any "video" URL
|
|
res = vks.scrape("https://vk.com/video-6596301_145810025")
|
|
print(res[0]["text]) # eg: -> to get the text from code
|
|
```
|
|
|
|
```python
|
|
# Every scrape* function returns a list of dict like
|
|
{
|
|
"id": "wall_id",
|
|
"text": "text in this post" ,
|
|
"datetime": utc datetime of post,
|
|
"attachments": {
|
|
# if photo, video, link exists
|
|
"photo": [list of urls with max quality],
|
|
"video": [list of urls with max quality],
|
|
"link": [list of urls with max quality],
|
|
},
|
|
"payload": "original JSON response converted to dict which you can parse for more data
|
|
}
|
|
```
|
|
|
|
see [docs] for all available functions.
|
|
|
|
### Development
|
|
1. setup environment with `pip install -r requirements` or `pipenv install -r requirements`
|
|
2. To run all checks to `make run-checks` (fixes style) or individually
|
|
1. To fix style: `black .` and `isort .` -> `flake8 .` to validate lint
|
|
2. To do type checking: `mypy .`
|
|
3. To test: `pytest .` (`pytest -v --color=yes --doctest-modules tests/ vk_url_scraper/` to user verbose, colors, and test docstring examples)
|
|
3. `make docs` to generate shpynx docs -> edit [config.py](docs/source/conf.py) if needed
|
|
|
|
### Releasing new version
|
|
1. edit [version.py](vk_url_scraper/version.py) with proper versioning
|
|
2. `git tag vx.y.z` to tag version
|
|
3. `git push origin vx.y.z` -> this will trigger workflow and put project on [pypi](https://pypi.org/project/vk-url-scraper/) |