From d776be8a817683c4e2b5db4bb2756eda6001f373 Mon Sep 17 00:00:00 2001 From: Patrick Robertson Date: Wed, 12 Feb 2025 11:41:54 +0000 Subject: [PATCH 1/5] Fix links to docs --- CONTRIBUTING.md | 4 ++-- README.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 81977b5..7e71f82 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -25,7 +25,7 @@ If you’d like to fix a bug or improve existing code: If you want to add a new module to Auto Archiver: -1. Ensure your module follows the existing [coding style and project structure](https://auto-archiver.readthedocs.io/en/development/creating_modules.html). +1. Ensure your module follows the existing [coding style and project structure](https://auto-archiver.readthedocs.io/en/latest/development/creating_modules.html). 2. Write clear documentation explaining what your module does and how to use it. 3. Ideally, include unit tests for your module! 4. Follow the steps in Section 2 to submit a pull request. @@ -42,7 +42,7 @@ If you have any questions about how the source code works or need help using Aut We welcome contributions to the documentation! -📖 Please read [Contributing to the Auto Archiver Documentation](https://auto-archiver.readthedocs.io/en/development/docs.html) to learn how you can help improve the project's documentation. +📖 Please read [Contributing to the Auto Archiver Documentation](https://auto-archiver.readthedocs.io/en/latest/development/docs.html) to learn how you can help improve the project's documentation. ------------------ diff --git a/README.md b/README.md index 76ee789..2ef3aea 100644 --- a/README.md +++ b/README.md @@ -33,5 +33,5 @@ Or pip: ## Contributing -We welcome contributions to the Auto Archiver project! See the [Contributing Guide](https://auto-archiver.readthedocs.io/en/contributing.html) for how to get involved! +We welcome contributions to the Auto Archiver project! See the [Contributing Guide](https://auto-archiver.readthedocs.io/en/latest/contributing.html) for how to get involved! From 17f13db56ca9b5b3cb825bcacae2c3f6b3eb3d44 Mon Sep 17 00:00:00 2001 From: Patrick Robertson Date: Wed, 12 Feb 2025 11:45:09 +0000 Subject: [PATCH 2/5] Make that code block a shell --- docs/source/development/docs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/development/docs.md b/docs/source/development/docs.md index e21f954..bb389ff 100644 --- a/docs/source/development/docs.md +++ b/docs/source/development/docs.md @@ -18,7 +18,7 @@ poetry install **Create the documentation:** - Build the documentation: -``` +```shell # Using makefile (Linux/macOS): make -C docs html From 86254bdd4ece973ffca431419dd25a158e206814 Mon Sep 17 00:00:00 2001 From: Patrick Robertson Date: Wed, 12 Feb 2025 11:48:01 +0000 Subject: [PATCH 3/5] Fix link in how to --- docs/source/how_to.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/how_to.md b/docs/source/how_to.md index e8e5d9b..5cef626 100644 --- a/docs/source/how_to.md +++ b/docs/source/how_to.md @@ -3,7 +3,7 @@ ## How to use Google Sheets to load and store archive information The `--gsheet_feeder.sheet` property is the name of the Google Sheet to check for URLs. This sheet must have been shared with the Google Service account used by `gspread`. -This sheet must also have specific columns (case-insensitive) in the `header` as specified in [gsheet_feeder.__manifest__.py](src/auto_archiver/modules/gsheet_feeder/__manifest__.py). The default names of these columns and their purpose is: +This sheet must also have specific columns (case-insensitive) in the `header` - see the [Gsheet Feeder Docs](modules/autogen/feeder/gsheet_feeder.md) for more info. The default names of these columns and their purpose is: Inputs: From 70f155dfce2c51623a0c52f86587207310018162 Mon Sep 17 00:00:00 2001 From: Patrick Robertson Date: Wed, 12 Feb 2025 11:48:51 +0000 Subject: [PATCH 4/5] add more of the USPs to the readme --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 2ef3aea..20ad1de 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ -Auto Archiver is a Python tool to automatically archive content on the web. It takes URLs from different sources (e.g. a CSV file, Google Sheets, command line etc.) and archives the content of each one. It can archive social media posts, videos, images and webpages. Content can enriched, then saved either locally or remotely (S3 bucket, Google Drive). The status of the archiving process can be appended to a CSV report, or if using Google Sheets – back to the original sheet. +Auto Archiver is a Python tool to automatically archive content on the web in a secure and verifiable way. It takes URLs from different sources (e.g. a CSV file, Google Sheets, command line etc.) and archives the content of each one. It can archive social media posts, videos, images and webpages. Content can enriched, then saved either locally or remotely (S3 bucket, Google Drive). The status of the archiving process can be appended to a CSV report, or if using Google Sheets – back to the original sheet.
**[See the Auto Arciver documentation for more information.](https://auto-archiver.readthedocs.io/en/latest/)** From da267f20d769c1360b49de0c3326b0b6dd996546 Mon Sep 17 00:00:00 2001 From: erinhmclark Date: Wed, 12 Feb 2025 11:54:40 +0000 Subject: [PATCH 5/5] Update screenshot refs --- docs/source/how_to.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/source/how_to.md b/docs/source/how_to.md index 5cef626..bf3b9fc 100644 --- a/docs/source/how_to.md +++ b/docs/source/how_to.md @@ -26,22 +26,22 @@ Outputs: For example, this is a spreadsheet configured with all of the columns for the auto archiver and a few URLs to archive. (Note that the column names are not case sensitive.) -![A screenshot of a Google Spreadsheet with column headers defined as above, and several Youtube and Twitter URLs in the "Link" column](docs/demo-before.png) +![A screenshot of a Google Spreadsheet with column headers defined as above, and several Youtube and Twitter URLs in the "Link" column](../demo-before.png) Now the auto archiver can be invoked, with this command in this example: `docker run --rm -v $PWD/secrets:/app/secrets -v $PWD/local_archive:/app/local_archive bellingcat/auto-archiver:dockerize --config secrets/orchestration-global.yaml --gsheet_feeder.sheet "Auto archive test 2023-2"`. Note that the sheet name has been overridden/specified in the command line invocation. When the auto archiver starts running, it updates the "Archive status" column. -![A screenshot of a Google Spreadsheet with column headers defined as above, and several Youtube and Twitter URLs in the "Link" column. The auto archiver has added "archive in progress" to one of the status columns.](docs/demo-progress.png) +![A screenshot of a Google Spreadsheet with column headers defined as above, and several Youtube and Twitter URLs in the "Link" column. The auto archiver has added "archive in progress" to one of the status columns.](../demo-progress.png) The links are downloaded and archived, and the spreadsheet is updated to the following: -![A screenshot of a Google Spreadsheet with videos archived and metadata added per the description of the columns above.](docs/demo-after.png) +![A screenshot of a Google Spreadsheet with videos archived and metadata added per the description of the columns above.](../demo-after.png) Note that the first row is skipped, as it is assumed to be a header row (`--gsheet_feeder.header=1` and you can change it if you use more rows above). Rows with an empty URL column, or a non-empty archive column are also skipped. All sheets in the document will be checked. The "archive location" link contains the path of the archived file, in local storage, S3, or in Google Drive. -![The archive result for a link in the demo sheet.](docs/demo-archive.png) +![The archive result for a link in the demo sheet.](../demo-archive.png) ---