Compare commits

..

9 Commits

Author SHA1 Message Date
msramalho
5d30d18b7b Bump version to v0.3.29 for release 2024-07-16 16:05:59 +01:00
msramalho
b2d462441e fixing issues with upstream vk api 2024-07-16 16:05:35 +01:00
msramalho
73f17407c0 reverting library dependencies 2024-01-23 18:09:56 +00:00
msramalho
95d249f5d0 min py to 3.10 2024-01-23 13:01:38 +00:00
msramalho
ccb8c1f5c7 min python to 3.8 2024-01-23 12:50:55 +00:00
msramalho
e525ff24b1 lint 2024-01-23 12:45:45 +00:00
msramalho
699b4ebdd8 fix lib dependencies in pypi version 2024-01-23 12:41:25 +00:00
msramalho
8d1a86a7fa fix captcha processing 2024-01-23 12:41:14 +00:00
msramalho
b01dbe6299 fix vk_api dependency changes 2024-01-23 11:56:49 +00:00
13 changed files with 1668 additions and 1291 deletions

View File

@@ -30,7 +30,7 @@ jobs:
strategy: strategy:
fail-fast: false fail-fast: false
matrix: matrix:
python: ['3.7', '3.10'] python: ['3.10']
task: # --show-capture=no on purpose, -s for captchas task: # --show-capture=no on purpose, -s for captchas
- name: Test - name: Test
run: | run: |

View File

@@ -7,7 +7,7 @@ sphinx:
build: build:
os: "ubuntu-22.04" os: "ubuntu-22.04"
tools: tools:
python: "3.8" python: "3.10"
python: python:
install: install:

View File

@@ -13,4 +13,4 @@ run-checks :
black . black .
flake8 . flake8 .
mypy . mypy .
CUDA_VISIBLE_DEVICES='' pytest -v --color=yes --doctest-modules tests/ vk_url_scraper/ CUDA_VISIBLE_DEVICES='' pytest -v --color=yes .

View File

@@ -4,7 +4,6 @@ verify_ssl = true
name = "pypi" name = "pypi"
[packages] [packages]
vk-api = ">=11.9.9"
yt-dlp = ">=2023.2.17" yt-dlp = ">=2023.2.17"
flake8 = "*" flake8 = "*"
mypy = ">=0.961" mypy = ">=0.961"
@@ -30,6 +29,9 @@ pycryptodomex = ">=3.17"
requests = ">=2.28.2" requests = ">=2.28.2"
urllib3 = ">=1.26.14" urllib3 = ">=1.26.14"
websockets = ">=10.4" websockets = ">=10.4"
# vk-api = {ref = "77b5a0d51a6bbf54d59554332f28a488615fbd6c", git = "git+https://github.com/python273/vk_api.git"}
# vk-api = "*"
vk-api = {ref = "b99dac0ec2f832a6c4b20bde49869e7229ce4742", git = "git+https://github.com/python273/vk_api.git"}
[dev-packages] [dev-packages]
sphinx-copybutton = "==0.5.0" sphinx-copybutton = "==0.5.0"

2797
Pipfile.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -20,7 +20,7 @@ To use the library you will need a valid username/password combination for vk.co
vk_url_scraper --help vk_url_scraper --help
# scrape a URL and get the JSON result in the console # scrape a URL and get the JSON result in the console
vk_url_scraper -username "username here" --password "password here" --urls https://vk.com/wall12345_6789 vk_url_scraper --username "username here" --password "password here" --urls https://vk.com/wall12345_6789
# OR # OR
vk_url_scraper -u "username here" -p "password here" --urls https://vk.com/wall12345_6789 vk_url_scraper -u "username here" -p "password here" --urls https://vk.com/wall12345_6789
# you can also have multiple urls # you can also have multiple urls
@@ -89,7 +89,7 @@ see [docs] for all available functions.
2. To run all checks to `make run-checks` (fixes style) or individually 2. To run all checks to `make run-checks` (fixes style) or individually
1. To fix style: `black .` and `isort .` -> `flake8 .` to validate lint 1. To fix style: `black .` and `isort .` -> `flake8 .` to validate lint
2. To do type checking: `mypy .` 2. To do type checking: `mypy .`
3. To test: `pytest .` (`pytest -v --color=yes --doctest-modules tests/ vk_url_scraper/` to user verbose, colors, and test docstring examples) 3. To test: `pytest .` (`pytest -v --color=yes --doctest-modules tests/ vk_url_scraper/` to use verbose, colors, and test docstring examples)
3. `make docs` to generate shpynx docs -> edit [config.py](docs/source/conf.py) if needed 3. `make docs` to generate shpynx docs -> edit [config.py](docs/source/conf.py) if needed
To test the command line interface available in [__main__.py](__vk_url_scraper/__main__.py) you need to pass the `-m` option to python like so: `python -m vk_url_scraper -u "" -p "" --urls ...` To test the command line interface available in [__main__.py](__vk_url_scraper/__main__.py) you need to pass the `-m` option to python like so: `python -m vk_url_scraper -u "" -p "" --urls ...`
@@ -97,10 +97,11 @@ To test the command line interface available in [__main__.py](__vk_url_scraper/_
## Releasing new version ## Releasing new version
1. edit [version.py](vk_url_scraper/version.py) with proper versioning 1. edit [version.py](vk_url_scraper/version.py) with proper versioning
2. run `./scripts/release.sh` to create a tag and push, alternatively 2. make sure to run `pipenv run pip freeze > requirements.txt` if you manage libs with pipenv
3. run `./scripts/release.sh` to create a tag and push, alternatively
1. `git tag vx.y.z` to tag version 1. `git tag vx.y.z` to tag version
2. `git push origin vx.y.z` -> this will trigger workflow and put project on [pypi](https://pypi.org/project/vk-url-scraper/) 2. `git push origin vx.y.z` -> this will trigger workflow and put project on [pypi](https://pypi.org/project/vk-url-scraper/)
3. go to https://readthedocs.org/ to deploy new docs version (if webhook is not setup) 4. go to https://readthedocs.org/ to deploy new docs version (if webhook is not setup)
### Fixing a failed release ### Fixing a failed release

View File

@@ -1,7 +1,7 @@
Installation Installation
============ ============
**vk-url-scraper** supports Python >= 3.7. **vk-url-scraper** supports Python >= 3.10.
## Installing with `pip` ## Installing with `pip`

View File

@@ -1,19 +1,103 @@
# aiohttp==3.9.1
# These requirements were autogenerated by pipenv aiosignal==1.3.1
# To regenerate from the project's Pipfile, run: alabaster==0.7.16
# anyio==4.4.0
# pipenv lock --requirements async-timeout==4.0.3
# attrs==23.2.0
Babel==2.15.0
# -i https://pypi.org/simple backports.tarfile==1.2.0
brotli>=1.0.9; platform_python_implementation >= 'CPython' beautifulsoup4==4.13.0b2
certifi>=2022.12.7; python_version >= '3.6' black==24.4.2
charset-normalizer>=3.0.1; python_version >= '3.6' bleach==6.0.0
idna>=3.4; python_version >= '3.5' Brotli==1.1.0
mutagen>=1.46.0; python_version >= '3.7' certifi==2024.7.4
pycryptodomex>=3.17; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4' cffi==1.17.0rc1
requests>=2.28.2; python_version >= '3.7' and python_version < '4' charset-normalizer==3.3.2
urllib3>=1.26.14; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4, 3.5' click==8.1.7
vk-api>=11.9.9 colorama==0.4.6
websockets>=10.4; python_version >= '3.7' commonmark==0.9.1
yt-dlp>=2023.2.17 coverage==7.6.0
cryptography==42.0.8
docutils==0.18.1
exceptiongroup==1.2.2
flake8==7.1.0
frozenlist==1.4.1
furo==2023.3.27
h11==0.14.0
idna==3.7
imagesize==1.4.1
importlib_metadata==8.0.0
iniconfig==2.0.0
isort==6.0.0b2
jaraco.classes==3.4.0
jaraco.context==5.3.0
jaraco.functools==4.0.1
jeepney==0.8.0
Jinja2==3.1.4
keyring==25.2.1
livereload==2.6.3
markdown-it-py==2.2.0
MarkupSafe==2.1.5
mccabe==0.7.0
mdit-py-plugins==0.3.5
mdurl==0.1.2
more-itertools==10.3.0
multidict==6.0.4
mutagen==1.47.0
mypy==1.10.1
mypy-extensions==1.0.0
myst-parser==0.18.1
nh3==0.2.18
packaging==24.1
pathspec==0.12.1
pkginfo==1.10.0
platformdirs==4.2.2
pluggy==1.5.0
py==1.11.0
pycodestyle==2.12.0
pycparser==2.22
pycryptodomex==3.20.0
pyflakes==3.2.0
Pygments==2.18.0
pyparsing==3.0.9
pytest==8.2.2
pytest-cov==5.0.0
pytest-sphinx==0.6.3
python-dotenv==1.0.1
pytz==2022.1
PyYAML==6.0.2rc1
readme_renderer==43.0
requests==2.32.3
requests-toolbelt==1.0.0
rfc3986==2.0.0
rich==13.7.1
SecretStorage==3.3.3
six==1.16.0
sniffio==1.3.1
snowballstemmer==2.2.0
soupsieve==2.5
Sphinx==5.0.2
sphinx-autobuild==2024.4.16
sphinx-autodoc-typehints==1.19.1
sphinx-basic-ng==1.0.0b2
sphinx-copybutton==0.5.2
sphinxcontrib-applehelp==1.0.8
sphinxcontrib-devhelp==1.0.6
sphinxcontrib-htmlhelp==2.0.5
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.7
sphinxcontrib-serializinghtml==1.1.10
starlette==0.37.2
tomli==2.0.1
tornado==6.4
twine==5.1.1
typing_extensions==4.12.2
urllib3==2.2.2
uvicorn==0.30.1
vk-api @ git+https://github.com/python273/vk_api.git@b99dac0ec2f832a6c4b20bde49869e7229ce4742
watchfiles==0.22.0
webencodings==0.5.1
websockets==12.0
yarl==1.9.4
yt-dlp==2024.7.15.232803.dev0
zipp==3.19.2

View File

@@ -57,7 +57,7 @@ setup(
package_data={"vk_url_scraper": ["py.typed"]}, package_data={"vk_url_scraper": ["py.typed"]},
install_requires=read_requirements("requirements.txt"), install_requires=read_requirements("requirements.txt"),
extras_require={"dev": read_requirements("dev-requirements.txt")}, extras_require={"dev": read_requirements("dev-requirements.txt")},
python_requires=">=3.7", python_requires=">=3.10",
entry_points={ entry_points={
"console_scripts": [ "console_scripts": [
"vk_url_scraper=vk_url_scraper.__main__:main", "vk_url_scraper=vk_url_scraper.__main__:main",

View File

@@ -81,7 +81,7 @@ def test_scrape_wall_url_with_photos():
== "Хабаровск\nАллея героев\nПомолимся об укокоении воинов:\nАлександра, Игоря, Эдуарда, \nДионисия, Евгения, Александра, Артемия, Иннокентия, Андрея." == "Хабаровск\nАллея героев\nПомолимся об укокоении воинов:\nАлександра, Игоря, Эдуарда, \nДионисия, Евгения, Александра, Артемия, Иннокентия, Андрея."
) )
assert str(res[0]["datetime"]) == str(datetime.datetime(2022, 6, 15, 10, 37, 24)) assert str(res[0]["datetime"]) == str(datetime.datetime(2022, 6, 15, 10, 37, 24))
assert len(res[0]["payload"]) == 17 assert len(res[0]["payload"]) == 18
assert len(res[0]["attachments"].keys()) == 1 assert len(res[0]["attachments"].keys()) == 1
assert list(res[0]["attachments"].keys()) == ["photo"] assert list(res[0]["attachments"].keys()) == ["photo"]
assert len(res[0]["attachments"]["photo"]) == 9 assert len(res[0]["attachments"]["photo"]) == 9
@@ -93,7 +93,7 @@ def test_scrape_wall_url_with_photos_inner_videos_and_links_with_inner_photos():
assert res[0]["id"] == "wall-17315087_74182" assert res[0]["id"] == "wall-17315087_74182"
assert res[0]["text"] == "" assert res[0]["text"] == ""
assert str(res[0]["datetime"]) == str(datetime.datetime(2022, 3, 24, 11, 1, 9)) assert str(res[0]["datetime"]) == str(datetime.datetime(2022, 3, 24, 11, 1, 9))
assert len(res[0]["payload"]) == 17 assert len(res[0]["payload"]) == 18
assert len(res[0]["attachments"].keys()) == 3 assert len(res[0]["attachments"].keys()) == 3
for k in ["photo", "link", "video"]: for k in ["photo", "link", "video"]:
assert k in list(res[0]["attachments"].keys()) assert k in list(res[0]["attachments"].keys())
@@ -128,7 +128,7 @@ def test_scrape_photo_only():
== "Делимся расписанием конкурса [https://vk.com/wall-1_399468|«Код Петербурга»]. Все важные этапы — на одной схеме \n\nЕсли участвуете, обязательно сохраните себе. Так будет удобнее планировать работу над проектом, и вы точно не упустите лучший момент для отправки сервиса на модерацию." == "Делимся расписанием конкурса [https://vk.com/wall-1_399468|«Код Петербурга»]. Все важные этапы — на одной схеме \n\nЕсли участвуете, обязательно сохраните себе. Так будет удобнее планировать работу над проектом, и вы точно не упустите лучший момент для отправки сервиса на модерацию."
) )
assert str(res[0]["datetime"]) == str(datetime.datetime(2022, 6, 7, 9, 43)) assert str(res[0]["datetime"]) == str(datetime.datetime(2022, 6, 7, 9, 43))
assert len(res[0]["payload"]) == 15 assert len(res[0]["payload"]) == 16
assert len(res[0]["attachments"].keys()) == 1 assert len(res[0]["attachments"].keys()) == 1
assert list(res[0]["attachments"].keys()) == ["photo"] assert list(res[0]["attachments"].keys()) == ["photo"]
assert len(res[0]["attachments"]["photo"]) == 1 assert len(res[0]["attachments"]["photo"]) == 1
@@ -139,7 +139,6 @@ def test_scrape_video_only():
assert len(res) == 1 assert len(res) == 1
assert res[0]["id"] == "video38556806_456251917" assert res[0]["id"] == "video38556806_456251917"
assert str(res[0]["datetime"]) == str(datetime.datetime(2022, 3, 24, 5, 42, 38)) assert str(res[0]["datetime"]) == str(datetime.datetime(2022, 3, 24, 5, 42, 38))
assert len(res[0]["payload"]) == 34
assert len(res[0]["attachments"].keys()) == 1 assert len(res[0]["attachments"].keys()) == 1
assert list(res[0]["attachments"].keys()) == ["video"] assert list(res[0]["attachments"].keys()) == ["video"]

View File

@@ -59,7 +59,7 @@ class VkScraper:
password : str password : str
Matching password on vk.com Matching password on vk.com
token : str token : str
Access token received after authenticating, can be found in the vl_config.v2.json file Access token received after authenticating, can be found in the vk_config.v2.json file
session_file : str session_file : str
File name where the VK session is saved so future logins are easier, this will not be created if token is passed File name where the VK session is saved so future logins are easier, this will not be created if token is passed
captcha_handler : func captcha_handler : func
@@ -339,7 +339,9 @@ class VkScraper:
filename = os.path.join(destination, f"{r['id']}_{i}.%(ext)s") filename = os.path.join(destination, f"{r['id']}_{i}.%(ext)s")
ydl = yt_dlp.YoutubeDL( ydl = yt_dlp.YoutubeDL(
{ {
"format": "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best", "format": (
"bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best"
),
"merge_output_format": "mp4", "merge_output_format": "mp4",
"retries": 5, "retries": 5,
"noplaylist": True, "noplaylist": True,

View File

@@ -17,7 +17,7 @@ def captcha_handler(captcha):
key = input( key = input(
f"CAPTCHA DETECTED, please solve it and input the solution. url= {captcha.get_url()} :" f"CAPTCHA DETECTED, please solve it and input the solution. url= {captcha.get_url()} :"
).strip() ).strip()
return captcha.try_again(key) return captcha.try_again(key.strip())
@contextmanager @contextmanager

View File

@@ -2,7 +2,7 @@ _MAJOR = "0"
_MINOR = "3" _MINOR = "3"
# On main and in a nightly release the patch should be one ahead of the last # On main and in a nightly release the patch should be one ahead of the last
# released build. # released build.
_PATCH = "26" _PATCH = "29"
# This is mainly for nightly builds which have the suffix ".dev$DATE". See # This is mainly for nightly builds which have the suffix ".dev$DATE". See
# https://semver.org/#is-v123-a-semantic-version for the semantics. # https://semver.org/#is-v123-a-semantic-version for the semantics.
_SUFFIX = "" _SUFFIX = ""