From 0d1447117c004358a6111142e99a3adce09d4070 Mon Sep 17 00:00:00 2001 From: msramalho <19508417+msramalho@users.noreply.github.com> Date: Sat, 5 Jul 2025 15:56:13 +0100 Subject: [PATCH] updates docs to reflect new general approach extractor --- docs/source/modules/extractor.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/source/modules/extractor.md b/docs/source/modules/extractor.md index e6375db..096124e 100644 --- a/docs/source/modules/extractor.md +++ b/docs/source/modules/extractor.md @@ -4,8 +4,9 @@ Extractor modules are used to extract the content of a given URL. Typically, one Extractors that are able to extract content from a wide range of websites include: 1. Generic Extractor: parses videos and images on sites using the powerful yt-dlp library. -2. Wayback Machine Extractor: sends pages to the Wayback machine for archiving, and stores the link. -3. WACZ Extractor: runs a web browser to 'browse' the URL and save a copy of the page in WACZ format. +2. Antibot Extractor: uses a headless browser to bypass bot detection and extract content. +3. WACZ Extractor: runs a web browser to 'browse' the URL and save a copy of the page in WACZ format. +4. Wayback Machine Extractor: sends pages to the Wayback machine for archiving, and stores the archived link. ```{include} autogen/extractor.md ```