mirror of
https://github.com/bellingcat/auto-archiver.git
synced 2026-06-11 04:38:29 +03:00
Use a script to auto-generate documentation for the core modules from the manifest file
This commit is contained in:
8
docs/source/modules/database.md
Normal file
8
docs/source/modules/database.md
Normal file
@@ -0,0 +1,8 @@
|
||||
# Database Modules
|
||||
|
||||
Database modules are used to store the status and results of the extraction and enrichment processes somewhere. The database modules are responsible for creating and managing entires for each item that has been processed.
|
||||
|
||||
The default (enabled) databases are the CSV Database and the Console Database.
|
||||
|
||||
```{include} autogen/database.md
|
||||
```
|
||||
7
docs/source/modules/enricher.md
Normal file
7
docs/source/modules/enricher.md
Normal file
@@ -0,0 +1,7 @@
|
||||
# Enricher Modules
|
||||
|
||||
Enricher modules are used to add additional information to the items that have been extracted. Common enrichment tasks include adding metadata to items, such as the hash of the item, a screenshot of the webpage when the item was extracted, or general metadata like the date and time the item was extracted.
|
||||
|
||||
|
||||
```{include} autogen/enricher.md
|
||||
```
|
||||
11
docs/source/modules/extractor.md
Normal file
11
docs/source/modules/extractor.md
Normal file
@@ -0,0 +1,11 @@
|
||||
# Extractor Modules
|
||||
|
||||
Extractor modules are used to extract the content of a given URL. Typically, one extractor will work for one website or platform (e.g. a Telegram extractor or an Instagram), however, there are several wide-ranging extractors which work for a wide range of websites.
|
||||
|
||||
Extractors that are able to extract content from a wide range of websites include:
|
||||
1. Generic Extractor: parses videos and images on sites using the powerful yt-dlp library.
|
||||
2. Wayback Machine Extractor: sends pages to the Waygback machine for archiving, and stores the link.
|
||||
3. WACZ Extractor: runs a web browser to 'browse' the URL and save a copy of the page in WACZ format.
|
||||
|
||||
```{include} autogen/extractor.md
|
||||
```
|
||||
8
docs/source/modules/feeder.md
Normal file
8
docs/source/modules/feeder.md
Normal file
@@ -0,0 +1,8 @@
|
||||
# Feeder Modules
|
||||
|
||||
Feeder modules are used to feed URLs into the `auto-archiver` for processing. Feeders can take these URLs from a variety of sources, such as a file, a database, or the command line.
|
||||
|
||||
The default feeder is the command line feeder, which allows you to input URLs directly into the `auto-archiver` from the command line.
|
||||
|
||||
```{include} autogen/feeder.md
|
||||
```
|
||||
6
docs/source/modules/formatter.md
Normal file
6
docs/source/modules/formatter.md
Normal file
@@ -0,0 +1,6 @@
|
||||
# Formatter Modules
|
||||
|
||||
Formatter modules are used to format the data extracted from a URL into a specific format. Currently the most widely-used formatter is the HTML formatter, which formats the data into an easily viewable HTML page.
|
||||
|
||||
```{include} autogen/formatter.md
|
||||
```
|
||||
8
docs/source/modules/storage.md
Normal file
8
docs/source/modules/storage.md
Normal file
@@ -0,0 +1,8 @@
|
||||
# Storage Modules
|
||||
|
||||
Storage modules are used to store the data extracted from a URL in a persistent location. This can be on your local hard disk, or on a remote server (e.g. S3 or Google Drive).
|
||||
|
||||
The default is to store the files downloaded (e.g. images, videos) in a local directory.
|
||||
|
||||
```{include} autogen/storage.md
|
||||
```
|
||||
Reference in New Issue
Block a user