bellingcat/vk-url-scraper

Fork 0

mirror of https://github.com/bellingcat/vk-url-scraper.git synced 2026-06-08 03:18:37 +03:00

Go to file

msramalho 187cfa83c8 docs

2022-06-18 00:11:24 +02:00

.github

no release tags

2022-06-17 22:48:09 +02:00

docs

fix datetime issue

2022-06-17 19:42:32 +02:00

scripts

ported vk scraper logic into lib

2022-06-17 19:15:20 +02:00

tests

docs

2022-06-18 00:11:24 +02:00

vk_url_scraper

docs

2022-06-18 00:11:24 +02:00

.flake8

2022-06-17 19:57:36 +02:00

.gitignore

ported vk scraper logic into lib

2022-06-17 19:15:20 +02:00

.readthedocs.yaml

docs

2022-06-18 00:11:24 +02:00

CHANGELOG.md

docs

2022-06-18 00:11:24 +02:00

CONTRIBUTING.md

ported vk scraper logic into lib

2022-06-17 19:15:20 +02:00

dev-requirements.txt

Initial commit

2022-06-17 13:25:27 +01:00

LICENSE

ported vk scraper logic into lib

2022-06-17 19:15:20 +02:00

Makefile

docs

2022-06-18 00:11:24 +02:00

mypy.ini

Initial commit

2022-06-17 13:25:27 +01:00

Pipfile

ported vk scraper logic into lib

2022-06-17 19:15:20 +02:00

Pipfile.lock

reqs

2022-06-17 19:20:11 +02:00

pyproject.toml

Initial commit

2022-06-17 13:25:27 +01:00

pytest.ini

Initial commit

2022-06-17 13:25:27 +01:00

README.md

docs

2022-06-18 00:11:24 +02:00

RELEASE_PROCESS.md

ported vk scraper logic into lib

2022-06-17 19:15:20 +02:00

requirements.txt

reqs

2022-06-17 19:23:31 +02:00

setup.py

ported vk scraper logic into lib

2022-06-17 19:15:20 +02:00

README.md

vk-url-scraper

Library to scrape data and especially media links (videos and photos) from vk.com URLs.

TODO

docs online from sphinx

Quick usage

pip install vk-url-scraper to install.

from vk_url_scraper import VkScraper

vks = VkScraper("username", "password")

# scrape any "photo" URL
res = vks.scrape("https://vk.com/photo1_278184324?rev=1")

# scrape any "wall" URL
res = vks.scrape("https://vk.com/wall-1_398461")

# scrape any "video" URL
res = vks.scrape("https://vk.com/video-6596301_145810025")
print(res[0]["text]) # eg: -> to get the text from code

# Every scrape* function returns a list of dict like
{
	"id": "wall_id",
	"text": "text in this post" ,
	"datetime": utc datetime of post,
	"attachments": {
		# if photo, video, link exists
		"photo": [list of urls with max quality],
		"video": [list of urls with max quality],
		"link": [list of urls with max quality],
	},
	"payload": "original JSON response converted to dict which you can parse for more data
}

see [docs] for all available functions.

Development

setup environment with pip install -r requirements or pipenv install -r requirements
To run all checks to make run-checks (fixes style) or individually
1. To fix style: black . and isort . -> flake8 . to validate lint
2. To do type checking: mypy .
3. To test: pytest . (pytest -v --color=yes --doctest-modules tests/ vk_url_scraper/ to user verbose, colors, and test docstring examples)
make docs to generate shpynx docs -> edit config.py if needed

Releasing new version

edit version.py with proper versioning
git tag vx.y.z to tag version
git push origin vx.y.z -> this will trigger workflow and put project on pypi