mirror of
https://github.com/bellingcat/auto-archiver.git
synced 2026-06-07 19:08:30 +03:00
updates docs to reflect new general approach extractor
This commit is contained in:
@@ -4,8 +4,9 @@ Extractor modules are used to extract the content of a given URL. Typically, one
|
||||
|
||||
Extractors that are able to extract content from a wide range of websites include:
|
||||
1. Generic Extractor: parses videos and images on sites using the powerful yt-dlp library.
|
||||
2. Wayback Machine Extractor: sends pages to the Wayback machine for archiving, and stores the link.
|
||||
3. WACZ Extractor: runs a web browser to 'browse' the URL and save a copy of the page in WACZ format.
|
||||
2. Antibot Extractor: uses a headless browser to bypass bot detection and extract content.
|
||||
3. WACZ Extractor: runs a web browser to 'browse' the URL and save a copy of the page in WACZ format.
|
||||
4. Wayback Machine Extractor: sends pages to the Wayback machine for archiving, and stores the archived link.
|
||||
|
||||
```{include} autogen/extractor.md
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user