Logan Williams
|
8ee20a239c
|
Merge branch 'main' into initial-release
|
2022-04-03 11:35:12 +02:00 |
|
Tristan Lee
|
90c99aec00
|
ensured that Gettr username is lowercase for API requests to work correctly
|
2022-04-02 22:36:25 -05:00 |
|
Tristan Lee
|
b0a52e5ad7
|
handled case where Rumble video has no view information displayed
|
2022-04-02 21:26:29 -05:00 |
|
Logan Williams
|
01bbabe0cb
|
Fix issues with new datetime baed 'media_archived' column
|
2022-04-02 18:45:08 +00:00 |
|
Logan Williams
|
63633617d2
|
Configure with Telethon and VK only
|
2022-04-02 18:34:14 +00:00 |
|
Logan Williams
|
0099558c68
|
Merge pull request #26 from bellingcat/deferred-media-archiving
Implemented deferred media archiving for all scrapers
|
2022-04-02 14:15:35 +02:00 |
|
Tristan Lee
|
0bab20e371
|
ensured that before being scraped, all channels are added to the database, preventing channel.platform_id from being null.
|
2022-04-01 17:03:02 -05:00 |
|
Tristan Lee
|
8ecb904249
|
merged main
|
2022-04-01 02:05:25 -05:00 |
|
Tristan Lee
|
282f33eff3
|
implemented deferred media archiving for all scrapers, and implemented tests for them. Refactored archiving methods of Instagram and Gettr scrapers to be able to use default archiving method
|
2022-04-01 01:30:49 -05:00 |
|
Logan Williams
|
d20db5f828
|
Catch exceptions in get_posts so that archiving continues despites errors
|
2022-03-31 20:27:18 +02:00 |
|
Logan Williams
|
16aad4ef2c
|
TelegramTelethonScraper: Using the username is fine.
|
2022-03-31 16:50:20 +02:00 |
|
Logan Williams
|
94cf6c3d84
|
TelegramTelethonScraper: Use channel_id when channel has been previously encountered
|
2022-03-31 16:37:54 +02:00 |
|
Logan Williams
|
061af984ee
|
Merge pull request #20 from bellingcat/separate-media-archiving
WIP: Separate media archiving and CLI
|
2022-03-31 16:28:30 +02:00 |
|
Logan Williams
|
7f87b03de5
|
Add option to clear registered scrapers, necessary for tests
|
2022-03-31 16:17:35 +02:00 |
|
Logan Williams
|
c8d1b96e3f
|
Fix bug in handling retweets without media
|
2022-03-31 15:51:17 +02:00 |
|
Logan Williams
|
a5cffa615f
|
Fix Twitter profile scraper, catch exceptions in controller
|
2022-03-31 15:37:58 +02:00 |
|
Logan Williams
|
2dc9213d64
|
Use new RawChannelInfo class
|
2022-03-31 15:17:25 +02:00 |
|
Logan Williams
|
61c99d33f6
|
Add Postgres support with psycopg2
|
2022-03-31 08:15:53 +02:00 |
|
Logan Williams
|
cff1953d21
|
Initial CLI tool
|
2022-03-31 08:15:11 +02:00 |
|
Logan Williams
|
1c1ff7fb6f
|
Fix bug with Telethon scraper and certain media; add media_archived flag to TwitterScraper
|
2022-03-31 08:15:09 +02:00 |
|
Logan Williams
|
19056a1d9a
|
Merge pull request #23 from bellingcat/profile
Added methods for retrieving channel profile metadata, refactored Gab scraper to use gabber
|
2022-03-31 08:13:17 +02:00 |
|
Tristan Lee
|
b7871b060d
|
added capability to scrape Gab group posts
|
2022-03-30 09:11:07 -05:00 |
|
Tristan Lee
|
1f99e52436
|
refactored Gab scraper to use gabber instead of garc
|
2022-03-30 08:05:10 -05:00 |
|
Tristan Lee
|
b805d50132
|
made tesets work, fixed several issues with Rumble scraper
|
2022-03-29 16:09:51 -05:00 |
|
Tristan Lee
|
67d1abf024
|
added methods for extracting channel profile metadata, and tests
|
2022-03-28 21:11:34 -05:00 |
|
Tristan Lee
|
ea40ea2640
|
merged main
|
2022-03-28 20:22:34 -05:00 |
|
Tristan Lee
|
5d6473e946
|
Merge pull request #19 from bellingcat/separate-media-archiving
Separate media archiving
|
2022-03-28 20:20:57 -05:00 |
|
Tristan Lee
|
16870d7daa
|
implemented methods for extracting profile metadata (still need to test)
|
2022-03-28 20:16:59 -05:00 |
|
Logan Williams
|
a80dbddbbc
|
Add snscrape delayed media archiving support; add explicit bool
|
2022-03-28 11:42:15 +02:00 |
|
Tristan Lee
|
d68cbd207a
|
Merge pull request #17 from bellingcat/channel-db
Add Channel object to ORM, store in DB
|
2022-03-24 13:07:03 -05:00 |
|
Logan Williams
|
63fdae9f1b
|
Implement media archiving after the initial scrape for Twitter and Telethon
|
2022-03-24 16:52:11 +01:00 |
|
Logan Williams
|
65edde6d20
|
Fix bug after merge
|
2022-03-22 11:56:28 +01:00 |
|
Logan Williams
|
2a3b5c8200
|
Merge branch 'main' into channel-db
|
2022-03-22 11:49:07 +01:00 |
|
Logan Williams
|
fa516da763
|
Rename TransformedResult to the clearer Post
|
2022-03-22 11:41:55 +01:00 |
|
Logan Williams
|
c0a094eefa
|
Load channels from google sheet in test.py
|
2022-03-22 11:37:47 +01:00 |
|
Logan Williams
|
571b019137
|
Fix tests for Twitter transformer
|
2022-03-22 11:33:27 +01:00 |
|
Logan Williams
|
806f07f458
|
Add functions for scraping based on Channel database
|
2022-03-22 11:26:46 +01:00 |
|
Logan Williams
|
885b4687ce
|
Add ORM for Channel class; update foreign key relations; add platform_id to TransformedResult
|
2022-03-22 11:25:52 +01:00 |
|
Logan Williams
|
d5bf3629c2
|
Merge pull request #16 from bellingcat/docs
Docs
|
2022-03-16 15:20:51 +01:00 |
|
Tristan Lee
|
93554b19e9
|
fixed typo
|
2022-03-15 13:05:41 -05:00 |
|
Tristan Lee
|
d68d76c0ab
|
added missing docstrings, created Makefile target for sphinx-apidoc, added quickstart page for installation and configuration instructions
|
2022-03-15 12:40:18 -05:00 |
|
Tristan Lee
|
ee9a8c10dd
|
merged main into branch
|
2022-03-15 09:16:11 -05:00 |
|
Tristan Lee
|
e287fd03d9
|
merged scraper into main and fixed minor merge conflict
|
2022-03-15 09:12:12 -05:00 |
|
Tristan Lee
|
a3c859ec79
|
added more docstrings and comments
|
2022-03-14 19:38:33 -05:00 |
|
Tristan Lee
|
c3eab2f176
|
merged main
|
2022-03-14 18:19:57 -05:00 |
|
Tristan Lee
|
e4cf9daf73
|
added docstrings, improved Sphinx docs
|
2022-03-14 18:04:27 -05:00 |
|
Tristan Lee
|
db03cbf141
|
Merge pull request #13 from bellingcat/transformer
Merged Transformer branch into main, including example of Transformer instance for Twitter and associated test
|
2022-03-14 11:13:57 -05:00 |
|
Tristan Lee
|
750f0cc887
|
added scraper for Instagram
|
2022-03-14 10:28:10 -05:00 |
|
Logan Williams
|
fe0d762df0
|
Add Transformer and ETLController docstrings
|
2022-03-14 14:02:57 +01:00 |
|
Logan Williams
|
fd4b617743
|
Add TwitterTransformer test
|
2022-03-14 13:39:10 +01:00 |
|