mirror of
https://github.com/bellingcat/vk-url-scraper.git
synced 2026-06-07 19:08:38 +03:00
c4a13334286385ace60155d1ed9a5d354fc5f8fd
vk-url-scraper
Library to scrape data and especially media links (videos and photos) from vk.com URLs.
Quick usage API
pip install vk-url-scraper to install.
from vk_url_scraper import VkScraper
vks = VkScraper("username", "password")
# scrape any "photo" URL
res = vks.scrape("https://vk.com/photo1_278184324?rev=1")
# scrape any "wall" URL
res = vks.scrape("https://vk.com/wall-1_398461")
# scrape any "video" URL
res = vks.scrape("https://vk.com/video-6596301_145810025")
print(res[0]["text]) # eg: -> to get the text from code
# Every scrape* function returns a list of dict like
{
"id": "wall_id",
"text": "text in this post" ,
"datetime": utc datetime of post,
"attachments": {
# if photo, video, link exists
"photo": [list of urls with max quality],
"video": [list of urls with max quality],
"link": [list of urls with max quality],
},
"payload": "original JSON response converted to dict which you can parse for more data
}
see [docs] for all available functions.
TODO
- docs online from sphinx
Development
(more info in CONTRIBUTING.md).
- setup dev environment with
pip install -r dev-requirementsorpipenv install -r dev-requirements - setup environment with
pip install -r requirementsorpipenv install -r requirements - To run all checks to
make run-checks(fixes style) or individually- To fix style:
black .andisort .->flake8 .to validate lint - To do type checking:
mypy . - To test:
pytest .(pytest -v --color=yes --doctest-modules tests/ vk_url_scraper/to user verbose, colors, and test docstring examples)
- To fix style:
make docsto generate shpynx docs -> edit config.py if needed
Releasing new version
- edit version.py with proper versioning
- run
./scripts/release.shto create a tag and push, alternativelygit tag vx.y.zto tag versiongit push origin vx.y.z-> this will trigger workflow and put project on pypi
Fixing a failed release
If for some reason the GitHub Actions release workflow failed with an error that needs to be fixed, you'll have to delete both the tag and corresponding release from GitHub. After you've pushed a fix, delete the tag from your local clone with
git tag -l | xargs git tag -d && git fetch -t
Then repeat the steps above.
Languages
Python
97.2%
Shell
1.6%
Makefile
1.2%