mirror of
https://github.com/bellingcat/auto-archiver.git
synced 2026-06-11 20:58:29 +03:00
Compare commits
12 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
ab03e48708 | ||
|
|
f56cd6891b | ||
|
|
9e03d745d8 | ||
|
|
7badf89c28 | ||
|
|
d59530c8e7 | ||
|
|
0ec5451f66 | ||
|
|
99e9ac2465 | ||
|
|
42162c5e3f | ||
|
|
3afe519176 | ||
|
|
f13349bacf | ||
|
|
92c79ed994 | ||
|
|
2643b8e717 |
1
.gitignore
vendored
1
.gitignore
vendored
@@ -34,4 +34,5 @@ docs/_build/
|
|||||||
docs/source/autoapi/
|
docs/source/autoapi/
|
||||||
docs/source/modules/autogen/
|
docs/source/modules/autogen/
|
||||||
scripts/settings_page.html
|
scripts/settings_page.html
|
||||||
|
scripts/settings/src/schema.json
|
||||||
.vite
|
.vite
|
||||||
|
|||||||
@@ -21,7 +21,7 @@ build:
|
|||||||
# generate the config editor page. Schema then HTML
|
# generate the config editor page. Schema then HTML
|
||||||
- VIRTUAL_ENV=$READTHEDOCS_VIRTUALENV_PATH poetry run python scripts/generate_settings_schema.py
|
- VIRTUAL_ENV=$READTHEDOCS_VIRTUALENV_PATH poetry run python scripts/generate_settings_schema.py
|
||||||
# install node dependencies and build the settings
|
# install node dependencies and build the settings
|
||||||
- cd scripts/settings && npm install && npm run build && yes | cp dist/index.html ../../docs/source/installation/settings_base.html && cd ../..
|
- cd scripts/settings && npm install && npm run build && yes | cp -v dist/index.html ../../docs/source/installation/settings.html && cd ../..
|
||||||
|
|
||||||
|
|
||||||
sphinx:
|
sphinx:
|
||||||
|
|||||||
@@ -29,7 +29,7 @@ View the [Installation Guide](https://auto-archiver.readthedocs.io/en/latest/ins
|
|||||||
|
|
||||||
To get started quickly using Docker:
|
To get started quickly using Docker:
|
||||||
|
|
||||||
`docker pull bellingcat/auto-archiver && docker run --rm -v secrets:/app/secrets bellingcat/auto-archiver --config secrets/orchestration.yaml`
|
`docker pull bellingcat/auto-archiver && docker run -it --rm -v secrets:/app/secrets bellingcat/auto-archiver --config secrets/orchestration.yaml`
|
||||||
|
|
||||||
Or pip:
|
Or pip:
|
||||||
|
|
||||||
|
|||||||
@@ -36,3 +36,12 @@ open docs/_build/html/index.html
|
|||||||
sphinx-autobuild docs/source docs/_build/html
|
sphinx-autobuild docs/source docs/_build/html
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
|
### Managing Readthedocs (RTD) Versions
|
||||||
|
|
||||||
|
Version management is done at [https://app.readthedocs.org/projects/auto-archiver/](https://app.readthedocs.org/projects/auto-archiver/)
|
||||||
|
(login required). Once logged in, you can create new versions, delete old versions or change visibility of versions. More info on
|
||||||
|
[RTD](https://docs.readthedocs.com/platform/stable/versions.html).
|
||||||
|
|
||||||
|
Currently, the Auto Archiver project is set up to automatically create a new docs version for each `vX.Y.Z` release. For more on this,
|
||||||
|
see the RTD [instructions on automation](https://docs.readthedocs.com/platform/stable/guides/automation-rules.html) or edit the existing automation rule in the project settings.
|
||||||
@@ -86,7 +86,7 @@ gsheet_feeder_db:
|
|||||||
|
|
||||||
You can also pass these settings directly on the command line without having to edit the file, here'a an example of how to do that (using docker):
|
You can also pass these settings directly on the command line without having to edit the file, here'a an example of how to do that (using docker):
|
||||||
|
|
||||||
`docker run --rm -v $PWD/secrets:/app/secrets -v $PWD/local_archive:/app/local_archive bellingcat/auto-archiver:dockerize --gsheet_feeder_db.sheet "My Awesome Sheet 2"`.
|
`docker run -it --rm -v $PWD/secrets:/app/secrets -v $PWD/local_archive:/app/local_archive bellingcat/auto-archiver:dockerize --gsheet_feeder_db.sheet "My Awesome Sheet 2"`.
|
||||||
|
|
||||||
Here, the sheet name has been overridden/specified in the command line invocation.
|
Here, the sheet name has been overridden/specified in the command line invocation.
|
||||||
|
|
||||||
|
|||||||
60
docs/source/installation/faq.md
Normal file
60
docs/source/installation/faq.md
Normal file
@@ -0,0 +1,60 @@
|
|||||||
|
# Frequently Asked Questions
|
||||||
|
|
||||||
|
|
||||||
|
### Q: What websites does the Auto Archiver support?
|
||||||
|
**A:** The Auto Archiver works for a large variety of sites. Firstly, the Auto Archiver can download
|
||||||
|
and archive any video website supported by YT-DLP, a powerful video-downloading tool ([full list of of
|
||||||
|
sites here](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md)). Aside from these sites,
|
||||||
|
there are various different 'Extractors' for specific websites. See the full list of extractors that
|
||||||
|
are available on the [extractors](../modules/extractor.md) page. Some sites supported include:
|
||||||
|
|
||||||
|
* Twitter
|
||||||
|
* Instagram
|
||||||
|
* Telegram
|
||||||
|
* VKontact
|
||||||
|
* Tiktok
|
||||||
|
* Bluesky
|
||||||
|
|
||||||
|
```{note} What websites the Auto Archiver can archie depends on what extractors you have enabled in
|
||||||
|
your configuration. See [configuration](./configurations.md) for more info.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Q: Does the Auto Archiver only work for social media posts ?
|
||||||
|
**A:** No, the Auto Archiver can archive any web page on the internet, not just social media posts.
|
||||||
|
However, for social media posts Auto Archiver can extract more relevant/useful information (such as
|
||||||
|
post comments, likes, author etc.) which may not be available for a generic website. If you are looking
|
||||||
|
to more generally archive webpages, then you should make sure to enable the [](../modules/autogen/extractor/wacz_extractor_enricher.md)
|
||||||
|
and the [](../modules/autogen/extractor/wayback_extractor_enricher.md).
|
||||||
|
|
||||||
|
### Q: What kind of data is stored for each webpage that's archived?
|
||||||
|
**A:** This depends on the website archived, but more generally, for social media posts any videos and photos in
|
||||||
|
the post will be archived. For video sites, the video will be downloaded separately. For most of these sites, additional
|
||||||
|
metadata such as published date, uploader/author and ratings/comments will also be saved. Additionally, further data can be
|
||||||
|
saved depending on the enrichers that you have enabled. Some other types of data saved are timestamps if you have the
|
||||||
|
[](../modules/autogen/enricher/timestamping_enricher.md) or [](../modules/autogen/enricher/opentimestamps_enricher.md) enabled,
|
||||||
|
screenshots of the web page with the [](../modules/autogen/enricher/screenshot_enricher.md), and for videos, thumbnails of the
|
||||||
|
video with the [](../modules/autogen/enricher/thumbnail_enricher.md). You can also store things like hashes (SHA256, or pdq hashes)
|
||||||
|
with the various hash enrichers.
|
||||||
|
|
||||||
|
### Q: Where is my data stored?
|
||||||
|
**A:** With the default configuration, data is stored on your local computer in the `local_storage` folder. You can adjust these settings by
|
||||||
|
changing the [storage modules](../modules/storage.md) you have enabled. For example, you could choose to store your data in an S3 bucket or
|
||||||
|
on Google Drive.
|
||||||
|
|
||||||
|
```{note}
|
||||||
|
You can choose to store your data in multiple places, for example your local drive **and** an S3 bucket for redundancy.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Q: What should I do is something doesn't work?
|
||||||
|
**A:** First, read through the log files to see if you can find a specific reason why something isn't working. Learn more about logging
|
||||||
|
and how to enable debug logging in the [Logging Howto](../how_to/logging.md).
|
||||||
|
|
||||||
|
If you cannot find an answer in the logs, then try searching this documentation or existing / closed issues on the [Github Issue Tracker](https://github.com/bellingcat/auto-archiver/issues?q=is%3Aissue%20). If you still cannot find an answer, then consider opening an issue on the Github Issue Tracker or asking in the Bellingcat Discord
|
||||||
|
'Auto Archiver' group.
|
||||||
|
|
||||||
|
#### Common reasons why an archiving might not work:
|
||||||
|
|
||||||
|
* The website may have temporarily adjusted its settings - sometimes sites like Telegram or Twitter adjust their scraping protection settings. Often,
|
||||||
|
waiting a day or two and then trying again can work.
|
||||||
|
* The site requires you to be logged in - you could try using cookies or authentication to bypass any blocks. See [](../installation/authentication.md) for more information.
|
||||||
|
* The website you're trying to archive has changed its settings/structure. Make sure you're using the latest version of Auto Archiver and try again.
|
||||||
@@ -1,5 +1,11 @@
|
|||||||
# Installation
|
# Installation
|
||||||
|
|
||||||
|
```{toctree}
|
||||||
|
:maxdepth: 1
|
||||||
|
|
||||||
|
upgrading.md
|
||||||
|
```
|
||||||
|
|
||||||
There are 3 main ways to use the auto-archiver. We recommend the 'docker' method for most uses. This installs all the requirements in one command.
|
There are 3 main ways to use the auto-archiver. We recommend the 'docker' method for most uses. This installs all the requirements in one command.
|
||||||
|
|
||||||
1. Easiest (recommended): [via docker](#installing-with-docker)
|
1. Easiest (recommended): [via docker](#installing-with-docker)
|
||||||
|
|||||||
File diff suppressed because one or more lines are too long
@@ -1,7 +1,6 @@
|
|||||||
# Getting Started
|
# Getting Started
|
||||||
|
|
||||||
```{toctree}
|
```{toctree}
|
||||||
:maxdepth: 1
|
|
||||||
:hidden:
|
:hidden:
|
||||||
|
|
||||||
installation.md
|
installation.md
|
||||||
@@ -9,6 +8,7 @@ configurations.md
|
|||||||
config_editor.md
|
config_editor.md
|
||||||
authentication.md
|
authentication.md
|
||||||
requirements.md
|
requirements.md
|
||||||
|
faq.md
|
||||||
config_cheatsheet.md
|
config_cheatsheet.md
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -27,17 +27,18 @@ The way you run the Auto Archiver depends on how you installed it (docker instal
|
|||||||
If you installed Auto Archiver using docker, open up your terminal, and copy-paste / type the following command:
|
If you installed Auto Archiver using docker, open up your terminal, and copy-paste / type the following command:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker run --rm -v $PWD/secrets:/app/secrets -v $PWD/local_archive:/app/local_archive bellingcat/auto-archiver
|
docker run -it --rm -v $PWD/secrets:/app/secrets -v $PWD/local_archive:/app/local_archive bellingcat/auto-archiver
|
||||||
```
|
```
|
||||||
|
|
||||||
breaking this command down:
|
breaking this command down:
|
||||||
1. `docker run` tells docker to start a new container (an instance of the image)
|
1. `docker run` tells docker to start a new container (an instance of the image)
|
||||||
2. `--rm` makes sure this container is removed after execution (less garbage locally)
|
2. `-it` tells docker to run in 'interactive mode' so that we get nice colour logs
|
||||||
3. `-v $PWD/secrets:/app/secrets` - your secrets folder with settings
|
3. `--rm` makes sure this container is removed after execution (less garbage locally)
|
||||||
|
4. `-v $PWD/secrets:/app/secrets` - your secrets folder with settings
|
||||||
1. `-v` is a volume flag which means a folder that you have on your computer will be connected to a folder inside the docker container
|
1. `-v` is a volume flag which means a folder that you have on your computer will be connected to a folder inside the docker container
|
||||||
2. `$PWD/secrets` points to a `secrets/` folder in your current working directory (where your console points to), we use this folder as a best practice to hold all the secrets/tokens/passwords/... you use
|
2. `$PWD/secrets` points to a `secrets/` folder in your current working directory (where your console points to), we use this folder as a best practice to hold all the secrets/tokens/passwords/... you use
|
||||||
3. `/app/secrets` points to the path the docker container where this image can be found
|
3. `/app/secrets` points to the path the docker container where this image can be found
|
||||||
4. `-v $PWD/local_archive:/app/local_archive` - (optional) if you use local_storage
|
5. `-v $PWD/local_archive:/app/local_archive` - (optional) if you use local_storage
|
||||||
1. `-v` same as above, this is a volume instruction
|
1. `-v` same as above, this is a volume instruction
|
||||||
2. `$PWD/local_archive` is a folder `local_archive/` in case you want to archive locally and have the files accessible outside docker
|
2. `$PWD/local_archive` is a folder `local_archive/` in case you want to archive locally and have the files accessible outside docker
|
||||||
3. `/app/local_archive` is a folder inside docker that you can reference in your orchestration.yml file
|
3. `/app/local_archive` is a folder inside docker that you can reference in your orchestration.yml file
|
||||||
@@ -48,14 +49,14 @@ The invocations below will run the auto-archiver Docker image using a configurat
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Have auto-archiver run with the default settings, generating a settings file in ./secrets/orchestration.yaml
|
# Have auto-archiver run with the default settings, generating a settings file in ./secrets/orchestration.yaml
|
||||||
docker run --rm -v $PWD/secrets:/app/secrets -v $PWD/local_archive:/app/local_archive bellingcat/auto-archiver
|
docker run -it --rm -v $PWD/secrets:/app/secrets -v $PWD/local_archive:/app/local_archive bellingcat/auto-archiver
|
||||||
|
|
||||||
# uses the same configuration, but with the `gsheet_feeder`, a header on row 2 and with some different column names
|
# uses the same configuration, but with the `gsheet_feeder`, a header on row 2 and with some different column names
|
||||||
# Note this expects you to have followed the [Google Sheets setup](how_to/google_sheets.md) and added your service_account.json to the `secrets/` folder
|
# Note this expects you to have followed the [Google Sheets setup](how_to/google_sheets.md) and added your service_account.json to the `secrets/` folder
|
||||||
# notice that columns is a dictionary so you need to pass it as JSON and it will override only the values provided
|
# notice that columns is a dictionary so you need to pass it as JSON and it will override only the values provided
|
||||||
docker run --rm -v $PWD/secrets:/app/secrets -v $PWD/local_archive:/app/local_archive bellingcat/auto-archiver --feeders=gsheet_feeder --gsheet_feeder.sheet="use it on another sheets doc" --gsheet_feeder.header=2 --gsheet_feeder.columns='{"url": "link"}'
|
docker run -it --rm -v $PWD/secrets:/app/secrets -v $PWD/local_archive:/app/local_archive bellingcat/auto-archiver --feeders=gsheet_feeder --gsheet_feeder.sheet="use it on another sheets doc" --gsheet_feeder.header=2 --gsheet_feeder.columns='{"url": "link"}'
|
||||||
# Runs auto-archiver for the first time, but in 'full' mode, enabling all modules to get a full settings file
|
# Runs auto-archiver for the first time, but in 'full' mode, enabling all modules to get a full settings file
|
||||||
docker run --rm -v $PWD/secrets:/app/secrets -v $PWD/local_archive:/app/local_archive bellingcat/auto-archiver --mode full
|
docker run -it --rm -v $PWD/secrets:/app/secrets -v $PWD/local_archive:/app/local_archive bellingcat/auto-archiver --mode full
|
||||||
```
|
```
|
||||||
|
|
||||||
------------
|
------------
|
||||||
|
|||||||
30
docs/source/installation/upgrading.md
Normal file
30
docs/source/installation/upgrading.md
Normal file
@@ -0,0 +1,30 @@
|
|||||||
|
|
||||||
|
# Upgrading
|
||||||
|
|
||||||
|
If an update is available, then you will see a message in the logs when you
|
||||||
|
run Auto Archiver. Here's what those logs look like:
|
||||||
|
|
||||||
|
```{code} bash
|
||||||
|
********* IMPORTANT: UPDATE AVAILABLE ********
|
||||||
|
A new version of auto-archiver is available (v0.13.6, you have 0.13.4)
|
||||||
|
Make sure to update to the latest version using: `pip install --upgrade auto-archiver`
|
||||||
|
```
|
||||||
|
|
||||||
|
Upgrading Auto Archiver depends on the way you installed it.
|
||||||
|
|
||||||
|
## Docker
|
||||||
|
|
||||||
|
To upgrade using docker, update the docker image with:
|
||||||
|
|
||||||
|
```
|
||||||
|
docker pull bellingcat/auto-archiver:latest
|
||||||
|
```
|
||||||
|
|
||||||
|
## Pip
|
||||||
|
|
||||||
|
To upgrade the pip package, use:
|
||||||
|
|
||||||
|
```
|
||||||
|
pip install --upgrade auto-archiver
|
||||||
|
```
|
||||||
|
|
||||||
@@ -59,4 +59,5 @@ output_schema = {
|
|||||||
current_file_dir = os.path.dirname(os.path.abspath(__file__))
|
current_file_dir = os.path.dirname(os.path.abspath(__file__))
|
||||||
output_file = os.path.join(current_file_dir, "settings/src/schema.json")
|
output_file = os.path.join(current_file_dir, "settings/src/schema.json")
|
||||||
with open(output_file, "w") as file:
|
with open(output_file, "w") as file:
|
||||||
|
print(f"Writing schema to {output_file}")
|
||||||
json.dump(output_schema, file, indent=4, cls=SchemaEncoder)
|
json.dump(output_schema, file, indent=4, cls=SchemaEncoder)
|
||||||
|
|||||||
34
scripts/settings/package-lock.json
generated
34
scripts/settings/package-lock.json
generated
@@ -12,7 +12,7 @@
|
|||||||
"@dnd-kit/sortable": "^10.0.0",
|
"@dnd-kit/sortable": "^10.0.0",
|
||||||
"@emotion/react": "latest",
|
"@emotion/react": "latest",
|
||||||
"@emotion/styled": "latest",
|
"@emotion/styled": "latest",
|
||||||
"@mui/icons-material": "latest",
|
"@mui/icons-material": "^6.4.7",
|
||||||
"@mui/material": "latest",
|
"@mui/material": "latest",
|
||||||
"react": "19.0.0",
|
"react": "19.0.0",
|
||||||
"react-dom": "19.0.0",
|
"react-dom": "19.0.0",
|
||||||
@@ -997,9 +997,9 @@
|
|||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/@mui/core-downloads-tracker": {
|
"node_modules/@mui/core-downloads-tracker": {
|
||||||
"version": "6.4.6",
|
"version": "6.4.7",
|
||||||
"resolved": "https://registry.npmjs.org/@mui/core-downloads-tracker/-/core-downloads-tracker-6.4.6.tgz",
|
"resolved": "https://registry.npmjs.org/@mui/core-downloads-tracker/-/core-downloads-tracker-6.4.7.tgz",
|
||||||
"integrity": "sha512-rho5Q4IscbrVmK9rCrLTJmjLjfH6m/NcqKr/mchvck0EIXlyYUB9+Z0oVmkt/+Mben43LMRYBH8q/Uzxj/c4Vw==",
|
"integrity": "sha512-XjJrKFNt9zAKvcnoIIBquXyFyhfrHYuttqMsoDS7lM7VwufYG4fAPw4kINjBFg++fqXM2BNAuWR9J7XVIuKIKg==",
|
||||||
"license": "MIT",
|
"license": "MIT",
|
||||||
"funding": {
|
"funding": {
|
||||||
"type": "opencollective",
|
"type": "opencollective",
|
||||||
@@ -1007,9 +1007,9 @@
|
|||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/@mui/icons-material": {
|
"node_modules/@mui/icons-material": {
|
||||||
"version": "6.4.6",
|
"version": "6.4.7",
|
||||||
"resolved": "https://registry.npmjs.org/@mui/icons-material/-/icons-material-6.4.6.tgz",
|
"resolved": "https://registry.npmjs.org/@mui/icons-material/-/icons-material-6.4.7.tgz",
|
||||||
"integrity": "sha512-rGJBvIQQbQAlyKYljHQ8wAQS/K2/uYwvemcpygnAmCizmCI4zSF9HQPuiG8Ql4YLZ6V/uKjA3WHIYmF/8sV+pQ==",
|
"integrity": "sha512-Rk8cs9ufQoLBw582Rdqq7fnSXXZTqhYRbpe1Y5SAz9lJKZP3CIdrj0PfG8HJLGw1hrsHFN/rkkm70IDzhJsG1g==",
|
||||||
"license": "MIT",
|
"license": "MIT",
|
||||||
"dependencies": {
|
"dependencies": {
|
||||||
"@babel/runtime": "^7.26.0"
|
"@babel/runtime": "^7.26.0"
|
||||||
@@ -1022,7 +1022,7 @@
|
|||||||
"url": "https://opencollective.com/mui-org"
|
"url": "https://opencollective.com/mui-org"
|
||||||
},
|
},
|
||||||
"peerDependencies": {
|
"peerDependencies": {
|
||||||
"@mui/material": "^6.4.6",
|
"@mui/material": "^6.4.7",
|
||||||
"@types/react": "^17.0.0 || ^18.0.0 || ^19.0.0",
|
"@types/react": "^17.0.0 || ^18.0.0 || ^19.0.0",
|
||||||
"react": "^17.0.0 || ^18.0.0 || ^19.0.0"
|
"react": "^17.0.0 || ^18.0.0 || ^19.0.0"
|
||||||
},
|
},
|
||||||
@@ -1033,14 +1033,14 @@
|
|||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/@mui/material": {
|
"node_modules/@mui/material": {
|
||||||
"version": "6.4.6",
|
"version": "6.4.7",
|
||||||
"resolved": "https://registry.npmjs.org/@mui/material/-/material-6.4.6.tgz",
|
"resolved": "https://registry.npmjs.org/@mui/material/-/material-6.4.7.tgz",
|
||||||
"integrity": "sha512-6UyAju+DBOdMogfYmLiT3Nu7RgliorimNBny1pN/acOjc+THNFVE7hlxLyn3RDONoZJNDi/8vO4AQQr6dLAXqA==",
|
"integrity": "sha512-K65StXUeGAtFJ4ikvHKtmDCO5Ab7g0FZUu2J5VpoKD+O6Y3CjLYzRi+TMlI3kaL4CL158+FccMoOd/eaddmeRQ==",
|
||||||
"license": "MIT",
|
"license": "MIT",
|
||||||
"dependencies": {
|
"dependencies": {
|
||||||
"@babel/runtime": "^7.26.0",
|
"@babel/runtime": "^7.26.0",
|
||||||
"@mui/core-downloads-tracker": "^6.4.6",
|
"@mui/core-downloads-tracker": "^6.4.7",
|
||||||
"@mui/system": "^6.4.6",
|
"@mui/system": "^6.4.7",
|
||||||
"@mui/types": "^7.2.21",
|
"@mui/types": "^7.2.21",
|
||||||
"@mui/utils": "^6.4.6",
|
"@mui/utils": "^6.4.6",
|
||||||
"@popperjs/core": "^2.11.8",
|
"@popperjs/core": "^2.11.8",
|
||||||
@@ -1061,7 +1061,7 @@
|
|||||||
"peerDependencies": {
|
"peerDependencies": {
|
||||||
"@emotion/react": "^11.5.0",
|
"@emotion/react": "^11.5.0",
|
||||||
"@emotion/styled": "^11.3.0",
|
"@emotion/styled": "^11.3.0",
|
||||||
"@mui/material-pigment-css": "^6.4.6",
|
"@mui/material-pigment-css": "^6.4.7",
|
||||||
"@types/react": "^17.0.0 || ^18.0.0 || ^19.0.0",
|
"@types/react": "^17.0.0 || ^18.0.0 || ^19.0.0",
|
||||||
"react": "^17.0.0 || ^18.0.0 || ^19.0.0",
|
"react": "^17.0.0 || ^18.0.0 || ^19.0.0",
|
||||||
"react-dom": "^17.0.0 || ^18.0.0 || ^19.0.0"
|
"react-dom": "^17.0.0 || ^18.0.0 || ^19.0.0"
|
||||||
@@ -1143,9 +1143,9 @@
|
|||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/@mui/system": {
|
"node_modules/@mui/system": {
|
||||||
"version": "6.4.6",
|
"version": "6.4.7",
|
||||||
"resolved": "https://registry.npmjs.org/@mui/system/-/system-6.4.6.tgz",
|
"resolved": "https://registry.npmjs.org/@mui/system/-/system-6.4.7.tgz",
|
||||||
"integrity": "sha512-FQjWwPec7pMTtB/jw5f9eyLynKFZ6/Ej9vhm5kGdtmts1z5b7Vyn3Rz6kasfYm1j2TfrfGnSXRvvtwVWxjpz6g==",
|
"integrity": "sha512-7wwc4++Ak6tGIooEVA9AY7FhH2p9fvBMORT4vNLMAysH3Yus/9B9RYMbrn3ANgsOyvT3Z7nE+SP8/+3FimQmcg==",
|
||||||
"license": "MIT",
|
"license": "MIT",
|
||||||
"dependencies": {
|
"dependencies": {
|
||||||
"@babel/runtime": "^7.26.0",
|
"@babel/runtime": "^7.26.0",
|
||||||
|
|||||||
@@ -13,7 +13,7 @@
|
|||||||
"@dnd-kit/sortable": "^10.0.0",
|
"@dnd-kit/sortable": "^10.0.0",
|
||||||
"@emotion/react": "latest",
|
"@emotion/react": "latest",
|
||||||
"@emotion/styled": "latest",
|
"@emotion/styled": "latest",
|
||||||
"@mui/icons-material": "latest",
|
"@mui/icons-material": "^6.4.7",
|
||||||
"@mui/material": "latest",
|
"@mui/material": "latest",
|
||||||
"react": "19.0.0",
|
"react": "19.0.0",
|
||||||
"react-dom": "19.0.0",
|
"react-dom": "19.0.0",
|
||||||
|
|||||||
@@ -4,7 +4,7 @@ import Container from '@mui/material/Container';
|
|||||||
import Typography from '@mui/material/Typography';
|
import Typography from '@mui/material/Typography';
|
||||||
import Box from '@mui/material/Box';
|
import Box from '@mui/material/Box';
|
||||||
import FileUploadIcon from '@mui/icons-material/FileUpload';
|
import FileUploadIcon from '@mui/icons-material/FileUpload';
|
||||||
//
|
|
||||||
import {
|
import {
|
||||||
DndContext,
|
DndContext,
|
||||||
closestCenter,
|
closestCenter,
|
||||||
@@ -204,7 +204,7 @@ function ModuleTypes({ stepType, setEnabledModules, enabledModules, configValues
|
|||||||
{stepType}
|
{stepType}
|
||||||
</Typography>
|
</Typography>
|
||||||
<Typography variant="body1" >
|
<Typography variant="body1" >
|
||||||
Select the <a href="<a href={`https://auto-archiver.readthedocs.io/en/latest/modules/${stepType.slice(0,-1)}.html`}" target="_blank">{stepType}</a> you wish to enable. Drag to reorder.
|
Select the <a href={`https://auto-archiver.readthedocs.io/en/latest/modules/${stepType.slice(0,-1)}.html`} target="_blank">{stepType}</a> you wish to enable. Drag to reorder.
|
||||||
</Typography>
|
</Typography>
|
||||||
</Box>
|
</Box>
|
||||||
{showError ? <Typography variant="body1" color="error" >Only one {stepType.slice(0,-1)} can be enabled at a time.</Typography> : null}
|
{showError ? <Typography variant="body1" color="error" >Only one {stepType.slice(0,-1)} can be enabled at a time.</Typography> : null}
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
@@ -6,7 +6,7 @@ import { viteSingleFile } from "vite-plugin-singlefile"
|
|||||||
export default defineConfig({
|
export default defineConfig({
|
||||||
plugins: [react(), viteSingleFile()],
|
plugins: [react(), viteSingleFile()],
|
||||||
build: {
|
build: {
|
||||||
minify: false,
|
// minify: false,
|
||||||
sourcemap: true,
|
// sourcemap: true,
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|||||||
@@ -8,6 +8,7 @@ flexible setup in various environments.
|
|||||||
import argparse
|
import argparse
|
||||||
from ruamel.yaml import YAML, CommentedMap
|
from ruamel.yaml import YAML, CommentedMap
|
||||||
import json
|
import json
|
||||||
|
import os
|
||||||
|
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
|
||||||
@@ -230,6 +231,10 @@ def read_yaml(yaml_filename: str) -> CommentedMap:
|
|||||||
def store_yaml(config: CommentedMap, yaml_filename: str) -> None:
|
def store_yaml(config: CommentedMap, yaml_filename: str) -> None:
|
||||||
config_to_save = deepcopy(config)
|
config_to_save = deepcopy(config)
|
||||||
|
|
||||||
|
## if the save path is the default location (secrets) then create the 'secrets' folder
|
||||||
|
if os.path.dirname(yaml_filename) == "secrets":
|
||||||
|
os.makedirs("secrets", exist_ok=True)
|
||||||
|
|
||||||
auth_dict = config_to_save.get("authentication", {})
|
auth_dict = config_to_save.get("authentication", {})
|
||||||
if auth_dict and auth_dict.get("load_from_file"):
|
if auth_dict and auth_dict.get("load_from_file"):
|
||||||
# remove all other values from the config, don't want to store it in the config file
|
# remove all other values from the config, don't want to store it in the config file
|
||||||
|
|||||||
@@ -112,7 +112,7 @@ class ArchivingOrchestrator:
|
|||||||
def check_steps(self, config):
|
def check_steps(self, config):
|
||||||
for module_type in MODULE_TYPES:
|
for module_type in MODULE_TYPES:
|
||||||
if not config["steps"].get(f"{module_type}s", []):
|
if not config["steps"].get(f"{module_type}s", []):
|
||||||
if module_type == "feeder" or module_type == "formatter" and config["steps"].get(f"{module_type}"):
|
if (module_type == "feeder" or module_type == "formatter") and config["steps"].get(f"{module_type}"):
|
||||||
raise SetupError(
|
raise SetupError(
|
||||||
f"It appears you have '{module_type}' set under 'steps' in your configuration file, but as of version 0.13.0 of Auto Archiver, you must use '{module_type}s'. Change this in your configuration file and try again. \
|
f"It appears you have '{module_type}' set under 'steps' in your configuration file, but as of version 0.13.0 of Auto Archiver, you must use '{module_type}s'. Change this in your configuration file and try again. \
|
||||||
Here's how that would look: \n\nsteps:\n {module_type}s:\n - [your_{module_type}_name_here]\n {'extractors:...' if module_type == 'feeder' else '...'}\n"
|
Here's how that would look: \n\nsteps:\n {module_type}s:\n - [your_{module_type}_name_here]\n {'extractors:...' if module_type == 'feeder' else '...'}\n"
|
||||||
@@ -377,7 +377,8 @@ Here's how that would look: \n\nsteps:\n extractors:\n - [your_extractor_name_
|
|||||||
try:
|
try:
|
||||||
loaded_module: BaseModule = self.module_factory.get_module(module, self.config)
|
loaded_module: BaseModule = self.module_factory.get_module(module, self.config)
|
||||||
except (KeyboardInterrupt, Exception) as e:
|
except (KeyboardInterrupt, Exception) as e:
|
||||||
logger.error(f"Error during setup of modules: {e}\n{traceback.format_exc()}")
|
if not isinstance(e, KeyboardInterrupt) and not isinstance(e, SetupError):
|
||||||
|
logger.error(f"Error during setup of modules: {e}\n{traceback.format_exc()}")
|
||||||
if loaded_module and module_type == "extractor":
|
if loaded_module and module_type == "extractor":
|
||||||
loaded_module.cleanup()
|
loaded_module.cleanup()
|
||||||
raise e
|
raise e
|
||||||
|
|||||||
@@ -2,13 +2,14 @@ from loguru import logger
|
|||||||
|
|
||||||
from auto_archiver.core.feeder import Feeder
|
from auto_archiver.core.feeder import Feeder
|
||||||
from auto_archiver.core.metadata import Metadata
|
from auto_archiver.core.metadata import Metadata
|
||||||
|
from auto_archiver.core.consts import SetupError
|
||||||
|
|
||||||
|
|
||||||
class CLIFeeder(Feeder):
|
class CLIFeeder(Feeder):
|
||||||
def setup(self) -> None:
|
def setup(self) -> None:
|
||||||
self.urls = self.config["urls"]
|
self.urls = self.config["urls"]
|
||||||
if not self.urls:
|
if not self.urls:
|
||||||
raise ValueError(
|
raise SetupError(
|
||||||
"No URLs provided. Please provide at least one URL via the command line, or set up an alternative feeder. Use --help for more information."
|
"No URLs provided. Please provide at least one URL via the command line, or set up an alternative feeder. Use --help for more information."
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|||||||
@@ -15,6 +15,9 @@ supported by `yt-dlp`, such as YouTube, Facebook, and others. It provides functi
|
|||||||
for retrieving videos, subtitles, comments, and other metadata, and it integrates with
|
for retrieving videos, subtitles, comments, and other metadata, and it integrates with
|
||||||
the broader archiving framework.
|
the broader archiving framework.
|
||||||
|
|
||||||
|
For a full list of video platforms supported by `yt-dlp`, see the
|
||||||
|
[official documentation](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md)
|
||||||
|
|
||||||
### Features
|
### Features
|
||||||
- Supports downloading videos and playlists.
|
- Supports downloading videos and playlists.
|
||||||
- Retrieves metadata like titles, descriptions, upload dates, and durations.
|
- Retrieves metadata like titles, descriptions, upload dates, and durations.
|
||||||
|
|||||||
@@ -49,7 +49,7 @@ class CookieSettingDriver(webdriver.Firefox):
|
|||||||
self.driver.add_cookie({"name": name, "value": value})
|
self.driver.add_cookie({"name": name, "value": value})
|
||||||
elif self.cookiejar:
|
elif self.cookiejar:
|
||||||
domain = urlparse(url).netloc
|
domain = urlparse(url).netloc
|
||||||
regex = re.compile(f"(www)?\.?{domain}$")
|
regex = re.compile(f"(www)?.?{domain}$")
|
||||||
for cookie in self.cookiejar:
|
for cookie in self.cookiejar:
|
||||||
if regex.match(cookie.domain):
|
if regex.match(cookie.domain):
|
||||||
try:
|
try:
|
||||||
|
|||||||
Reference in New Issue
Block a user