# TikTok hashtag analysis toolset The tool helps to download posts and videos from TikTok for a given set of hashtags over a period of time. Users can create a growing database of posts for specific hashtags which can then be used for further hashtag analysis. It uses the [TikTokApi](https://github.com/davidteather/TikTok-Api) Python package to download the posts and uses [yt-dlp](https://github.com/yt-dlp/yt-dlp) to download the videos. [](https://badge.fury.io/py/tiktok-hashtag-analysis) ## Pre-requisites 1. Make sure you have Python 3.9 or a later version installed 2. Install the tool with pip: `pip install tiktok-hashtag-analysis` 1. or directly from the repo version: `pip install git+https://github.com/bellingcat/tiktok-hashtag-analysis` You should now be ready to start using it. The scraper this tool uses requires an `msToken` taken from the TikTok website on your browser. The first time you run the tool, it will ask for this token. You can see how to retrieve the token by accessing your browser's "Developer Tools", and how to input its value into the tool's command-line interface in [this video](https://github.com/bellingcat/tiktok-hashtag-analysis/assets/18430739/b9d40957-c59e-4b6d-a843-13d210f89055). ## About the tool ### Command-line arguments ``` usage: tiktok-hashtag-analysis [-h] [--file FILE] [-d] [--number NUMBER] [-p] [-t] [--output-dir OUTPUT_DIR] [--config CONFIG] [--log LOG] [hashtags ...] Analyze hashtags within posts scraped from TikTok. positional arguments: hashtags List of hashtags to scrape optional arguments: -h, --help show this help message and exit --file FILE File name containing list of hashtags to scrape -d, --download Download video files corresponding to scraped posts --number NUMBER The number of co-occurring hashtags to analyze -p, --plot Plot the most common co-occurring hashtags -t, --table Print a table of the most common co-occurring hashtags --output-dir OUTPUT_DIR Directory to save scraped data and visualizations to --config CONFIG File name of configuration file to store TikTok credentials to --log LOG File to write logs to ``` ### Structure of output data ``` $ tree ../data ../data ├── london │ ├── plots │ ├── posts.json │ └── media ├── newyork │ ├── plots │ ├── posts.json │ └── media └── paris │ ├── plots │ ├── posts.json │ └── media ``` The `data` folder contains all the downloaded data as shown in the tree diagram above. - Each hashtag has a folder with two subfolders `plots` and `media` that store plots of the most common co-occurring hashtags, and media downloaded from the posts. The posts are stored in the `posts.json` file, and downloaded media is stored as `.mp4` files (for videos) or audio and image files (for image galleries) in the `media` folder. ## How to use ### Post downloading Running the `tiktok-hashtag-analysis` command with the following options will scrape posts that contain the hashtags `#london`, `#paris`, or `#newyork`: tiktok-hashtag-analysis london paris newyork and will produce an output similar to the following log: $ tiktok-hashtag-analysis download london paris newyork Hashtags to scrape: ['london', 'paris', 'newyork'] Scraped 963 posts containing the hashtag 'london' Scraped 961 posts containing the hashtag 'paris' Scraped 940 posts containing the hashtag 'newyork' Successfully scraped 2864 total entries - The list of hashtags to scrape is specified as a positional argument ### Video downloading Running the `tiktok-hashtag-analysis` script with the following options will scrape trending posts containing the hashtag `#london`: `tiktok-hashtag-analysis london --download` - The `--download` flag specifies that video files for scraped posts should be downloaded Note that video downloading is a time and data rate consuming task, as a result we recommend using one hashtag at a time when using the `--download` flag to avoid complications. ## Analyzing results ### Most common co-occurring hashtags In addition to scraping data and downloading media, the `tiktok-hashtag-analysis` script can also analyze the frequencies of the most common co-occurring hashtags in a given set of posts. Assume we want to analyze the 20 most frequently co-occurring hashtags in the downloaded posts of the `#london` hashtag. - The results can be plotted and saved as a PNG file by executing the following command: `tiktok-hashtag-analysis london --number 20 --plot` which will produce a figure similar to that shown below: