mirror of
https://github.com/bellingcat/vk-url-scraper.git
synced 2026-06-08 03:18:37 +03:00
187cfa83c88d8b9c6db074c8e9086f913b0bc9bf
vk-url-scraper
Library to scrape data and especially media links (videos and photos) from vk.com URLs.
TODO
- docs online from sphinx
Quick usage
pip install vk-url-scraper to install.
from vk_url_scraper import VkScraper
vks = VkScraper("username", "password")
# scrape any "photo" URL
res = vks.scrape("https://vk.com/photo1_278184324?rev=1")
# scrape any "wall" URL
res = vks.scrape("https://vk.com/wall-1_398461")
# scrape any "video" URL
res = vks.scrape("https://vk.com/video-6596301_145810025")
print(res[0]["text]) # eg: -> to get the text from code
# Every scrape* function returns a list of dict like
{
"id": "wall_id",
"text": "text in this post" ,
"datetime": utc datetime of post,
"attachments": {
# if photo, video, link exists
"photo": [list of urls with max quality],
"video": [list of urls with max quality],
"link": [list of urls with max quality],
},
"payload": "original JSON response converted to dict which you can parse for more data
}
see [docs] for all available functions.
Development
- setup environment with
pip install -r requirementsorpipenv install -r requirements - To run all checks to
make run-checks(fixes style) or individually- To fix style:
black .andisort .->flake8 .to validate lint - To do type checking:
mypy . - To test:
pytest .(pytest -v --color=yes --doctest-modules tests/ vk_url_scraper/to user verbose, colors, and test docstring examples)
- To fix style:
make docsto generate shpynx docs -> edit config.py if needed
Releasing new version
- edit version.py with proper versioning
git tag vx.y.zto tag versiongit push origin vx.y.z-> this will trigger workflow and put project on pypi
Languages
Python
97.2%
Shell
1.6%
Makefile
1.2%