82 Commits

Author SHA1 Message Date
Tristan Lee
89d89521fa added headed argument to more robustly handle issues with scrapers headless mode 2023-09-19 16:30:13 -05:00
Tristan Lee
e548b6fca9 Update README.md (fixed typo) 2023-09-15 16:15:19 -05:00
Tristan Lee
0b273cb7bd Updated README.md to include Playwright installation command 2023-09-15 16:14:58 -05:00
Tristan Lee
b916512bde removed auth module and authorization, since msToken isnt actually required to run scraper 2023-09-11 21:43:33 -05:00
Tristan Lee
92861e0e5d configured verbosity argument with logging level 2023-09-11 21:29:37 -05:00
Tristan Lee
6fa1e5026c made downloading more robust against transient and permanent errors, fixed issue where media file URLs weren't being updated after scraping 2023-09-09 00:42:56 -05:00
Tristan Lee
1f4b956ce9 made scraping more robust against transient playwright exceptions, set order of hashtags to scrape based on file modified time 2023-09-07 11:18:22 -05:00
Tristan Lee
6a56c354e1 Update README.md 2023-09-06 13:17:27 -05:00
Tristan Lee
10821e30f2 preparing for publishing (removed pipenv commands from workflow, added Contributing section on README, added functionality to pin dependency versions with requirements.txt) 2023-09-06 09:51:31 -05:00
Tristan Lee
8c32a3cf16 updated README, made yt-dlp downloading more robust against errors, changed name of videos folder to media (since images and audio files are also downloaded now) 2023-09-04 13:51:28 -05:00
Tristan Lee
cf575e6cf6 updated README and added authorization 2023-09-01 18:33:32 -05:00
Tristan Lee
a7bd023c21 simplified downloading logic (methods for keeping track of files less necessary since scraping can be done in Python), added functionality to use yt-dlp to download videos, added functionality to download TikTok image galleries 2023-09-01 17:05:13 -05:00
Miguel Sozinho Ramalho
06b4a74c7d Update README.md 2023-03-16 09:48:50 +00:00
msramalho
e1ac3b5057 fixing not founds 2023-03-13 10:08:35 +00:00
msramalho
ad9cac8cdd readme fix 2023-02-13 16:54:20 +00:00
msramalho
980a27ff96 pypi fixes 2023-02-13 16:48:26 +00:00
Richard Mwewa
99467f0e91 Update README.md 2023-01-19 03:35:34 +02:00
johannawild
a0f4320635 Update README.md 2022-05-16 13:14:40 +02:00
johannawild
c3d9b415c6 Update README.md 2022-05-16 13:13:57 +02:00
johannawild
db08aacab5 Update README.md 2022-05-16 13:12:54 +02:00
johannawild
99a0b16d66 Update README.md 2022-05-16 13:11:50 +02:00
johannawild
26b4bcc00d Update README.md 2022-05-10 16:34:24 +02:00
johannawild
5866763adc Update README.md 2022-05-10 16:34:05 +02:00
johannawild
2c345fa27a Update README.md 2022-05-10 16:33:38 +02:00
johannawild
161699d2b9 Update README.md 2022-05-06 11:24:42 +02:00
Tristan Lee
f377408960 updated README with new hashtag_frequencies table 2022-05-06 02:57:56 -05:00
Tristan Lee
0cb9d4b1b9 made docstrings more consistent, changed argument of hashtag_frequencies script to use the hashtag rather than the post_id file for the hashtag, to make it easier to use 2022-05-06 01:49:55 -05:00
Tristan Lee
af5bcc9433 fixed typo in Windows venv activation command 2022-05-05 02:58:42 -05:00
Tristan Lee
cd883eeeb1 minor fixes in the README and LICENSE 2022-05-05 02:39:23 -05:00
Tristan Lee
64354f6099 Updated plot figure in README 2022-05-05 02:32:32 -05:00
Tristan Lee
14c52e5d75 simplified logging, used warnings.warn and calling exceptions rather than logging them, various code cleanups and clarifications 2022-05-05 02:23:50 -05:00
johannawild
34a7c432a3 Merge branch 'main' into tristan_edits 2022-05-04 17:00:41 +02:00
johannawild
0126f36107 Update README.md 2022-05-04 16:55:37 +02:00
johannawild
b3a8fd6a9e Update README.md 2022-05-04 16:53:48 +02:00
johannawild
858835c881 Update README.md 2022-05-04 16:52:22 +02:00
Tristan Lee
52d37d9ff8 merged changes 2022-05-04 01:31:39 -05:00
johannawild
234b763f49 Update README.md 2022-05-04 00:44:04 +02:00
johannawild
ed15e3b6d7 Update README.md 2022-05-04 00:42:27 +02:00
johannawild
fa2f113b42 Update README.md 2022-05-04 00:41:42 +02:00
johannawild
0b15617b5c Update README.md 2022-05-04 00:41:13 +02:00
johannawild
304293046a Update README.md 2022-05-04 00:40:47 +02:00
johannawild
24e5828ec9 Update README.md 2022-05-04 00:40:10 +02:00
johannawild
137be84305 Update README.md 2022-05-04 00:39:49 +02:00
johannawild
3e0dd154d8 Update README.md 2022-05-04 00:38:27 +02:00
johannawild
ccd185aa87 Update README.md 2022-05-04 00:37:49 +02:00
johannawild
dbf6eb595e Update README.md 2022-05-04 00:36:38 +02:00
johannawild
83d25faf31 Update README.md 2022-05-04 00:34:06 +02:00
johannawild
6348142773 Update README.md 2022-05-04 00:33:25 +02:00
johannawild
6d5ca21103 Update README.md 2022-05-04 00:32:22 +02:00
johannawild
81905dd570 Update README.md 2022-05-04 00:30:29 +02:00