Use a script to auto-generate documentation for the core modules from the manifest file

2026-06-11 04:38:29 +03:00 · 2025-02-10 22:51:04 +00:00
parent 824728739a
commit 2650cd8fb2
19 changed files with 216 additions and 53 deletions
--- a/docs/source/modules/database.md
+++ b/docs/source/modules/database.md
@@ -0,0 +1,8 @@
+# Database Modules
+
+Database modules are used to store the status and results of the extraction and enrichment processes somewhere. The database modules are responsible for creating and managing entires for each item that has been processed.
+
+The default (enabled) databases are the CSV Database and the Console Database.
+
+```{include} autogen/database.md
+```
--- a/docs/source/modules/enricher.md
+++ b/docs/source/modules/enricher.md
@@ -0,0 +1,7 @@
+# Enricher Modules
+
+Enricher modules are used to add additional information to the items  that have been extracted. Common enrichment tasks include adding metadata to items, such as the hash of the item, a screenshot of the webpage when the item was extracted, or general metadata like the date and time the item was extracted.
+
+
+```{include} autogen/enricher.md
+```
--- a/docs/source/modules/extractor.md
+++ b/docs/source/modules/extractor.md
@@ -0,0 +1,11 @@
+# Extractor Modules
+
+Extractor modules are used to extract the content of a given URL. Typically, one extractor will work for one website or platform (e.g. a Telegram extractor or an Instagram), however, there are several wide-ranging extractors which work for a wide range of websites.
+
+Extractors that are able to extract content from a wide range of websites include:
+1. Generic Extractor: parses videos and images on sites using the powerful yt-dlp library.
+2. Wayback Machine Extractor: sends pages to the Waygback machine for archiving, and stores the link.
+3. WACZ Extractor: runs a web browser to 'browse' the URL and save a copy of the page in WACZ format. 
+
+```{include} autogen/extractor.md
+```
--- a/docs/source/modules/feeder.md
+++ b/docs/source/modules/feeder.md
@@ -0,0 +1,8 @@
+# Feeder Modules
+
+Feeder modules are used to feed URLs into the `auto-archiver` for processing. Feeders can take these URLs from a variety of sources, such as a file, a database, or the command line.
+
+The default feeder is the command line feeder, which allows you to input URLs directly into the `auto-archiver` from the command line.
+
+```{include} autogen/feeder.md
+```
--- a/docs/source/modules/formatter.md
+++ b/docs/source/modules/formatter.md
@@ -0,0 +1,6 @@
+# Formatter Modules
+
+Formatter modules are used to format the data extracted from a URL into a specific format. Currently the most widely-used formatter is the HTML formatter, which formats the data into an easily viewable HTML page.
+
+```{include} autogen/formatter.md
+```
--- a/docs/source/modules/storage.md
+++ b/docs/source/modules/storage.md
@@ -0,0 +1,8 @@
+# Storage Modules
+
+Storage modules are used to store the data extracted from a URL in a persistent location. This can be on your local hard disk, or on a remote server (e.g. S3 or Google Drive).
+
+The default is to store the files downloaded (e.g. images, videos) in a local directory.
+
+```{include} autogen/storage.md
+```