simplified downloading logic (methods for keeping track of files less necessary since scraping can be done in Python), added functionality to use yt-dlp to download videos, added functionality to download TikTok image galleries

This commit is contained in:
Tristan Lee
2023-09-01 17:05:13 -05:00
parent 06b4a74c7d
commit a7bd023c21
17 changed files with 364 additions and 1244 deletions

View File

@@ -1,16 +1,12 @@
# TikTok hashtag analysis toolset
> IMPORTANT NOTE: this tool relies on [drawrowfly/tiktok-scraper](https://github.com/drawrowfly/tiktok-scraper) which seems to be broken at time of writing and without updates for some time with several open issues ([796](https://github.com/drawrowfly/tiktok-scraper/issues/796) [#799](https://github.com/drawrowfly/tiktok-scraper/issues/799)) that need to be fixed before this library can work smoothly :/
The tool helps to download posts and videos from TikTok for a given set of hashtags over a period of time. Users can create a growing database of posts for specific hashtags which can then be used for further hashtag analysis. It uses the [tiktok-scraper](https://github.com/drawrowfly/tiktok-scraper) Node package to download the posts and videos.
The tool helps to download posts and videos from TikTok for a given set of hashtags over a period of time. Users can create a growing database of posts for specific hashtags which can then be used for further hashtag analysis. It uses the [TikTokApi](https://github.com/davidteather/TikTok-Api) Python package to download the posts and uses [yt-dlp](https://github.com/yt-dlp/yt-dlp) to download the videos.
[![PyPI version](https://badge.fury.io/py/tiktok-hashtag-analysis.svg)](https://badge.fury.io/py/tiktok-hashtag-analysis)
## Pre-requisites
1. Make sure you have Python 3.6 or a later version installed
2. And, you need to have node version 16. On Mac, do `brew install node` followed by `npm install -g n` and then `n 16`
4. Download and install TikTok scraper: https://github.com/drawrowfly/tiktok-scraper
5. Install the tool with pip: `pip install tiktok-hashtag-analysis`
1. Make sure you have Python 3.9 or a later version installed
2. Install the tool with pip: `pip install tiktok-hashtag-analysis`
1. or directly from the repo version: `pip install git+https://github.com/bellingcat/tiktok-hashtag-analysis`
You should now be ready to start using it.
@@ -19,27 +15,23 @@ You should now be ready to start using it.
## About the tool
### Command-line arguments
```
tiktok-hashtag-analysis --help
usage: tiktok-hashtag-analysis [-h] [-t [T ...]] [-f F] [-p] [-v] [-ht HASHTAG] [-n NUMBER] [-plt] [-d] {download,frequencies}
usage: tiktok-hashtag-analysis [-h] [--file FILE] [-d] [--number NUMBER] [-p] [-t] [--output-dir OUTPUT_DIR] [--log LOG] [hashtags ...]
Analyze hashtags within posts scraped from TikTok.
positional arguments:
{download,frequencies}
command to initialize
hashtags List of hashtags to scrape
options:
optional arguments:
-h, --help show this help message and exit
-t [T ...] List of hashtags to scrape (module: run_downloader)
-f F File name containing list of hashtags to scrape (module: run_downloader)
-p Download post data (module: run_downloader)
-v Download video files (module: run_downloader)
-ht HASHTAG, --hashtag HASHTAG
The hashtag of scraped posts to analyze (module: hashtag_frequencies)
-n NUMBER, --number NUMBER
The number of top n occurrences (module: hashtag_frequencies)
-plt, --plot Plot the occurrences (module: hashtag_frequencies)
-d, --print List top n hashtags (module: hashtag_frequencies)
--file FILE File name containing list of hashtags to scrape
-d, --download Download video files corresponding to scraped posts
--number NUMBER The number of co-occurring hashtags to analyze
-p, --plot Plot the most common co-occurring hashtags
-t, --table Print a table of the most common co-occurring hashtags
--output-dir OUTPUT_DIR
Directory to save scraped data and visualizations to
--log LOG File to write logs to
```
### Structure of output data
@@ -67,9 +59,9 @@ The `data` folder contains all the downloaded data as shown in the tree diagram
## How to use
### Post downloading
Running the `tiktok-hashtag-analysis download` command with the following options will scrape posts containing the hashtags `#london`, `#paris`, or `#newyork`:
Running the `tiktok-hashtag-analysis` command with the following options will scrape posts containing the hashtags `#london`, `#paris`, or `#newyork`:
tiktok-hashtag-analysis download -t london paris newyork -p
tiktok-hashtag-analysis london paris newyork
and will produce an output similar to the following log:
@@ -100,7 +92,7 @@ Assume we want to analyze the 20 most frequently occurring hashtags in the downl
- The results can be plotted and saved as a PNG file by executing the following command:
`tiktok-hashtag-analysis frequencies london 20 -p`
`tiktok-hashtag-analysis frequencies --hashtag london --number 20 --plot`
which will produce a figure similar to that shown below:
<p align="center">
@@ -111,7 +103,7 @@ Assume we want to analyze the 20 most frequently occurring hashtags in the downl
- The results can be displayed in tabular form by executing the following command:
`tiktok-hashtag-analysis frequencies london 20 -d`
`tiktok-hashtag-analysis frequencies --hashtag london --number 20 --print`
which will produce a terminal output similar to the following:
```