cisticola

mirror of https://github.com/bellingcat/cisticola.git synced 2026-06-08 03:18:34 +03:00

Author	SHA1	Message	Date
Logan Williams	d20db5f828	Catch exceptions in get_posts so that archiving continues despites errors	2022-03-31 20:27:18 +02:00
Logan Williams	16aad4ef2c	TelegramTelethonScraper: Using the username is fine.	2022-03-31 16:50:20 +02:00
Logan Williams	94cf6c3d84	TelegramTelethonScraper: Use channel_id when channel has been previously encountered	2022-03-31 16:37:54 +02:00
Logan Williams	061af984ee	Merge pull request #20 from bellingcat/separate-media-archiving WIP: Separate media archiving and CLI	2022-03-31 16:28:30 +02:00
Logan Williams	7f87b03de5	Add option to clear registered scrapers, necessary for tests	2022-03-31 16:17:35 +02:00
Logan Williams	c8d1b96e3f	Fix bug in handling retweets without media	2022-03-31 15:51:17 +02:00
Logan Williams	a5cffa615f	Fix Twitter profile scraper, catch exceptions in controller	2022-03-31 15:37:58 +02:00
Logan Williams	2dc9213d64	Use new RawChannelInfo class	2022-03-31 15:17:25 +02:00
Logan Williams	61c99d33f6	Add Postgres support with psycopg2	2022-03-31 08:15:53 +02:00
Logan Williams	cff1953d21	Initial CLI tool	2022-03-31 08:15:11 +02:00
Logan Williams	1c1ff7fb6f	Fix bug with Telethon scraper and certain media; add media_archived flag to TwitterScraper	2022-03-31 08:15:09 +02:00
Logan Williams	19056a1d9a	Merge pull request #23 from bellingcat/profile Added methods for retrieving channel profile metadata, refactored Gab scraper to use gabber	2022-03-31 08:13:17 +02:00
Tristan Lee	b7871b060d	added capability to scrape Gab group posts	2022-03-30 09:11:07 -05:00
Tristan Lee	1f99e52436	refactored Gab scraper to use gabber instead of garc	2022-03-30 08:05:10 -05:00
Tristan Lee	b805d50132	made tesets work, fixed several issues with Rumble scraper	2022-03-29 16:09:51 -05:00
Tristan Lee	67d1abf024	added methods for extracting channel profile metadata, and tests	2022-03-28 21:11:34 -05:00
Tristan Lee	ea40ea2640	merged main	2022-03-28 20:22:34 -05:00
Tristan Lee	5d6473e946	Merge pull request #19 from bellingcat/separate-media-archiving Separate media archiving	2022-03-28 20:20:57 -05:00
Tristan Lee	16870d7daa	implemented methods for extracting profile metadata (still need to test)	2022-03-28 20:16:59 -05:00
Logan Williams	a80dbddbbc	Add snscrape delayed media archiving support; add explicit bool	2022-03-28 11:42:15 +02:00
Tristan Lee	d68cbd207a	Merge pull request #17 from bellingcat/channel-db Add Channel object to ORM, store in DB	2022-03-24 13:07:03 -05:00
Logan Williams	63fdae9f1b	Implement media archiving after the initial scrape for Twitter and Telethon	2022-03-24 16:52:11 +01:00
Logan Williams	65edde6d20	Fix bug after merge	2022-03-22 11:56:28 +01:00
Logan Williams	2a3b5c8200	Merge branch 'main' into channel-db	2022-03-22 11:49:07 +01:00
Logan Williams	fa516da763	Rename TransformedResult to the clearer Post	2022-03-22 11:41:55 +01:00
Logan Williams	c0a094eefa	Load channels from google sheet in test.py	2022-03-22 11:37:47 +01:00
Logan Williams	571b019137	Fix tests for Twitter transformer	2022-03-22 11:33:27 +01:00
Logan Williams	806f07f458	Add functions for scraping based on Channel database	2022-03-22 11:26:46 +01:00
Logan Williams	885b4687ce	Add ORM for Channel class; update foreign key relations; add platform_id to TransformedResult	2022-03-22 11:25:52 +01:00
Logan Williams	d5bf3629c2	Merge pull request #16 from bellingcat/docs Docs	2022-03-16 15:20:51 +01:00
Tristan Lee	93554b19e9	fixed typo	2022-03-15 13:05:41 -05:00
Tristan Lee	d68d76c0ab	added missing docstrings, created Makefile target for sphinx-apidoc, added quickstart page for installation and configuration instructions	2022-03-15 12:40:18 -05:00
Tristan Lee	ee9a8c10dd	merged main into branch	2022-03-15 09:16:11 -05:00
Tristan Lee	e287fd03d9	merged scraper into main and fixed minor merge conflict	2022-03-15 09:12:12 -05:00
Tristan Lee	a3c859ec79	added more docstrings and comments	2022-03-14 19:38:33 -05:00
Tristan Lee	c3eab2f176	merged main	2022-03-14 18:19:57 -05:00
Tristan Lee	e4cf9daf73	added docstrings, improved Sphinx docs	2022-03-14 18:04:27 -05:00
Tristan Lee	db03cbf141	Merge pull request #13 from bellingcat/transformer Merged Transformer branch into main, including example of Transformer instance for Twitter and associated test	2022-03-14 11:13:57 -05:00
Tristan Lee	750f0cc887	added scraper for Instagram	2022-03-14 10:28:10 -05:00
Logan Williams	fe0d762df0	Add Transformer and ETLController docstrings	2022-03-14 14:02:57 +01:00
Logan Williams	fd4b617743	Add TwitterTransformer test	2022-03-14 13:39:10 +01:00
Tristan Lee	965bf1e2dc	added youtube scraper, moved from official youtube-dl repo to using yt-dlp because download speed for youtube videos is much better	2022-03-11 17:19:52 -06:00
Tristan Lee	821c39004b	incorporated vkontakte scraper	2022-03-10 22:32:39 -06:00
Tristan Lee	3d919316a9	added Bitchute scraper, minor change to Bitchute scraper to correctly extract author name and id	2022-03-10 13:03:01 -06:00
Tristan Lee	5783206ad8	implemented method to reset database, to enable the 'contoller' fixture scope to be shared across the whole package, which will enable the transformer tests to be run without re-running the scrapers	2022-03-10 10:20:49 -06:00
Logan Williams	fa5037d67c	Implement transformer for TwitterScraper that handles media; implement image OCR and EXIF extraction	2022-03-10 15:34:24 +01:00
Tristan Lee	6cf3b8842d	renamed 'archive_media' and 'media' to avoid name collision, changed scope of test fixture controller to 'function' so that db is fresh for each executed test	2022-03-09 13:19:35 -06:00
Tristan Lee	739e1d8484	added capability of running scraper without archiving media, and implemented prototype Telethon scraper for Telegram	2022-03-09 12:12:01 -06:00
Tristan Lee	506fb54a53	added wrapper for requests that retries after encountering exception	2022-03-07 13:28:33 -06:00
Logan Williams	253a9bea49	Merge pull request #3 from bellingcat/media Expanding media archiving, implementing Odysee scraper	2022-03-07 11:47:08 +01:00

1 2

82 Commits