Merge pull request #13 from bellingcat/dev

Dev
Update main.py
2026-06-12 05:28:29 +03:00 · 2023-08-25 23:13:15 +02:00 · 2023-08-25 23:12:06 +02:00 · 2023-08-25 23:11:18 +02:00 · 2023-08-25 23:10:29 +02:00 · 2023-08-25 23:09:36 +02:00
17 changed files with 383 additions and 239 deletions
--- a/README.md
+++ b/README.md
@@ -1,20 +1,18 @@
 # RPST (Reddit Post Scraping Tool)
 Given a subreddit name and a keyword, RPST will return all posts from a specified listing (default is 'top') that contain the provided keyword.
-[![Upload Python Package](https://github.com/rly0nheart/reddit-post-scraping-tool/actions/workflows/python-publish.yml/badge.svg)](https://github.com/rly0nheart/reddit-post-scraping-tool/actions/workflows/python-publish.yml) [![CodeQL](https://github.com/rly0nheart/reddit-post-scraping-tool/actions/workflows/codeql.yml/badge.svg)](https://github.com/rly0nheart/reddit-post-scraping-tool/actions/workflows/codeql.yml) ![.Net](https://img.shields.io/badge/.NET-5C2D91?style=flat&logo=.net&logoColor=white) ![Python](https://img.shields.io/badge/python-3670A0?style=flat&logo=python&logoColor=ffdd54)
+[![Upload Python Package](https://github.com/bellingcat/reddit-post-scraping-tool/actions/workflows/python-publish.yml/badge.svg)](https://github.com/rly0nheart/reddit-post-scraping-tool/actions/workflows/python-publish.yml) [![CodeQL](https://github.com/bellingcat/reddit-post-scraping-tool/actions/workflows/codeql.yml/badge.svg)](https://github.com/rly0nheart/reddit-post-scraping-tool/actions/workflows/codeql.yml) ![.Net](https://img.shields.io/badge/.NET-5C2D91?style=flat&logo=.net&logoColor=white) ![Python](https://img.shields.io/badge/python-3670A0?style=flat&logo=python&logoColor=ffdd54)
 ![2023-08-09_04-05](https://github.com/bellingcat/reddit-post-scraping-tool/assets/74001397/d8917a35-3eac-44ce-aa96-1f9685095254)
 ![2023-08-09_04-05_1](https://github.com/bellingcat/reddit-post-scraping-tool/assets/74001397/d2fe7269-91d4-49ad-87fb-44282c5637a7)
 ***
 # ✅ Features
 ## GUI
- [x] Dark mode (Right-click)
+- [x] Dark mode (*Right-click*)
- [x] Saves results to a JSON (Right-click)
+- [x] Saves results to a JSON file (*Right-click*)
 - [x] Logs errors to a file 
 ## CLI
- [x] Saves results to a JSON (-j/--json)
+- [x] Saves results to JSON (*specifiy* `--json`)
- [x] Automatically checks for new updates. Notifies user if update were found.
+- [x] Saves results to CSV (*specify* `--csv`)
 - [x] Automatically checks for new updates, and notifies user if updates were found.
 # 📃 TODO
 ## GUI
@@ -23,11 +21,8 @@ Given a subreddit name and a keyword, RPST will return all posts from a specifie
 - [ ] Make it save results to a CSV file
 # 📖 Wiki
-[Refer to the Wiki](https://github.com/rly0nheart/reddit-post-scraping-tool/wiki) for installation instructions, in addition to all other documentation.
+[Refer to the Wiki](https://github.com/bellingcat/reddit-post-scraping-tool/wiki) for installation instructions, in addition to all other documentation.
 ***
 <a href="https://www.buymeacoffee.com/_rly0nheart"><img src="https://img.buymeacoffee.com/button-api/?text=Buy me a coffee&emoji=&slug=_rly0nheart&button_colour=40DCA5&font_colour=ffffff&font_family=Comic&outline_colour=000000&coffee_colour=FFDD00" /></a>
-# 😁 Donations
+![me](https://github.com/bellingcat/reddit-post-scraping-tool/assets/74001397/21e0bb33-7a84-45d6-92ba-00e40891ba31)
 If you like `RPST` and would like to show support, you can Buy A Coffee for the developer using the button below
 <a href="https://www.buymeacoffee.com/_rly0nheart" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/default-orange.png" alt="Buy Me A Coffee" height="41" width="174"></a>
 Your support will be much appreciated😊
--- a/GUI/RPST/README.md
+++ b/GUI/RPST/README.md
@@ -1,20 +1,18 @@
 # RPST (Reddit Post Scraping Tool)
 Given a subreddit name and a keyword, RPST will return all posts from a specified listing (default is 'top') that contain the provided keyword.
-[![Upload Python Package](https://github.com/rly0nheart/reddit-post-scraping-tool/actions/workflows/python-publish.yml/badge.svg)](https://github.com/rly0nheart/reddit-post-scraping-tool/actions/workflows/python-publish.yml) [![CodeQL](https://github.com/rly0nheart/reddit-post-scraping-tool/actions/workflows/codeql.yml/badge.svg)](https://github.com/rly0nheart/reddit-post-scraping-tool/actions/workflows/codeql.yml) ![.Net](https://img.shields.io/badge/.NET-5C2D91?style=flat&logo=.net&logoColor=white) ![Python](https://img.shields.io/badge/python-3670A0?style=flat&logo=python&logoColor=ffdd54)
+[![Upload Python Package](https://github.com/bellingcat/reddit-post-scraping-tool/actions/workflows/python-publish.yml/badge.svg)](https://github.com/rly0nheart/reddit-post-scraping-tool/actions/workflows/python-publish.yml) [![CodeQL](https://github.com/bellingcat/reddit-post-scraping-tool/actions/workflows/codeql.yml/badge.svg)](https://github.com/rly0nheart/reddit-post-scraping-tool/actions/workflows/codeql.yml) ![.Net](https://img.shields.io/badge/.NET-5C2D91?style=flat&logo=.net&logoColor=white) ![Python](https://img.shields.io/badge/python-3670A0?style=flat&logo=python&logoColor=ffdd54)
 ![2023-08-09_04-05](https://github.com/bellingcat/reddit-post-scraping-tool/assets/74001397/d8917a35-3eac-44ce-aa96-1f9685095254)
 ![2023-08-09_04-05_1](https://github.com/bellingcat/reddit-post-scraping-tool/assets/74001397/d2fe7269-91d4-49ad-87fb-44282c5637a7)
 ***
 # ✅ Features
 ## GUI
- [x] Dark mode (Right-click)
+- [x] Dark mode (*Right-click*)
- [x] Saves results to a JSON (Right-click)
+- [x] Saves results to a JSON file (*Right-click*)
 - [x] Logs errors to a file 
 ## CLI
- [x] Saves results to a JSON (-j/--json)
+- [x] Saves results to JSON (*specifiy* `--json`)
- [x] Automatically checks for new updates. Notifies user if update were found.
+- [x] Saves results to CSV (*specify* `--csv`)
 - [x] Automatically checks for new updates, and notifies user if updates were found.
 # 📃 TODO
 ## GUI
@@ -22,8 +20,21 @@ Given a subreddit name and a keyword, RPST will return all posts from a specifie
 - [x] Add manual dark mode option, that will be persistent in all sessions
 - [ ] Make it save results to a CSV file
 # Images & Screenshots
 ## GUI
 * ![2023-08-09_04-05](https://github.com/bellingcat/reddit-post-scraping-tool/assets/74001397/d8917a35-3eac-44ce-aa96-1f9685095254)
 * ![2023-08-09_04-05_1](https://github.com/bellingcat/reddit-post-scraping-tool/assets/74001397/d2fe7269-91d4-49ad-87fb-44282c5637a7)
 ## CLI
 * ![2023-08-25_15-39](https://github.com/bellingcat/reddit-post-scraping-tool/assets/74001397/4bca09b3-271f-452d-81a7-39c9986539f2)
 * ![2023-08-25_15-30](https://github.com/bellingcat/reddit-post-scraping-tool/assets/74001397/2b39bdfa-87d0-4038-90cd-14e7d3b6a84b)
 * ![2023-08-25_15-35](https://github.com/bellingcat/reddit-post-scraping-tool/assets/74001397/47ba23ad-8d32-49c5-8c16-34a903fbc581)
 # 📖 Wiki
-[Refer to the Wiki](https://github.com/rly0nheart/reddit-post-scraping-tool/wiki) for installation instructions, in addition to all other documentation.
+[Refer to the Wiki](https://github.com/bellingcat/reddit-post-scraping-tool/wiki) for installation instructions, in addition to all other documentation.
 # 😁 Donations
 If you like `RPST` and would like to show support, you can Buy A Coffee for the developer using the button below
--- a/GUI/RPST/RPST.vbproj
+++ b/GUI/RPST/RPST.vbproj
@@ -13,11 +13,11 @@
    <PackageProjectUrl>https://github.com/bellingcat/reddit-post-scraping-tool</PackageProjectUrl>
    <PackageReadmeFile>README.md</PackageReadmeFile>
    <RepositoryUrl>https://github.com/bellingcat/reddit-post-scraping-tool</RepositoryUrl>
-    <AssemblyVersion>1.6.0.0</AssemblyVersion>
+    <AssemblyVersion>1.7.0.1</AssemblyVersion>
-    <FileVersion>1.6.0.0</FileVersion>
+    <FileVersion>1.7.0.1</FileVersion>
    <PackageLicenseFile>LICENSE</PackageLicenseFile>
    <PackageRequireLicenseAcceptance>True</PackageRequireLicenseAcceptance>
-    <Version>1.6.0</Version>
+    <Version>1.7.0</Version>
    <PackageTags>reddit;scraper;reddit-scraper;osint</PackageTags>
    <PackageReleaseNotes></PackageReleaseNotes>
    <AnalysisLevel>6.0-recommended</AnalysisLevel>
@@ -39,7 +39,7 @@
  </ItemGroup>
  <ItemGroup>
-    <PackageReference Include="Newtonsoft.Json" Version="13.0.2" />
+    <PackageReference Include="Newtonsoft.Json" Version="13.0.3" />
  </ItemGroup>
  <ItemGroup>
@@ -78,4 +78,4 @@
    </None>
  </ItemGroup>
-</Project>
+</Project>
--- a/images/2023-08-08_07-04.png
+++ b/images/2023-08-08_07-04.png
--- a/images/2023-08-08_07-04_1.png
+++ b/images/2023-08-08_07-04_1.png
--- a/images/2023-08-08_07-12.png
+++ b/images/2023-08-08_07-12.png
--- a/images/2023-08-08_07-12_1.png
+++ b/images/2023-08-08_07-12_1.png
--- a/images/2023-08-25_15-30.png
+++ b/images/2023-08-25_15-30.png
--- a/images/2023-08-25_15-31.png
+++ b/images/2023-08-25_15-31.png
--- a/images/2023-08-25_15-35.png
+++ b/images/2023-08-25_15-35.png
--- a/images/2023-08-25_15-39.png
+++ b/images/2023-08-25_15-39.png
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -7,18 +7,18 @@ packages = ["rpst"]
 [project]
 name = "reddit-post-scraping-tool"
-version = "1.6.0.0"
+version = "1.7.0.1"
 description = "Given a subreddit name and a keyword, RPST returns all top (by default) posts that contain the specified keyword."
 readme = "README.md"
 requires-python = ">=3.8"
 license = {file = "LICENSE"}
-keywords = ["osint", "reddit-crawler", "reddit-scraping", "reddit"]
+keywords = ["reddit-crawler", "reddit-scraping", "reddit", "reddit-api"]
 authors = [{name = "Richard Mwewa", email = "rly0nheart@duck.com"}]
 classifiers = [
  "Development Status :: 5 - Production/Stable",
  "Programming Language :: Python :: 3",
  "Programming Language :: Visual Basic",
-  "Intended Audience :: Information Technology",
+  "Intended Audience :: End Users/Desktop",
  "License :: OSI Approved :: MIT License",
  "Operating System :: OS Independent",
  "Natural Language :: English"
@@ -26,6 +26,7 @@ classifiers = [
 dependencies = [
    "rich",
    "glyphoji",
    "requests",
 ]
@@ -35,4 +36,4 @@ documentation = "https://github.com/bellingcat/reddit-post-scraping-tool/wiki"
 repository = "https://github.com/bellingcat/reddit-post-scraping-tool.git"
 [project.scripts]
-rpst = "rpst.__main:run"
+rpst = "rpst.main:run"
--- a/rpst/init.py
+++ b/rpst/init.py
@@ -0,0 +1 @@
--- a/rpst/__rpst_.py
+++ b/rpst/__rpst_.py
@@ -1,201 +0,0 @@
 import json
 import logging
 import argparse
 import requests
 from rich.tree import Tree
 from rich import print as xprint
 from rich.markdown import Markdown
 from rich.logging import RichHandler
 def write_post_data(post_data: dict, filename: str):
    """
    Writes post data to a specified JSON file.
    :param post_data: A dictionary containing post data.
    :param filename: The name of the file to which post data will be written.
    """
    # Write the data to a JSON file
    with open(filename + ".json", 'a') as file:
        file.write(json.dumps(post_data))
        file.write('\n')  # write a newline to separate posts
    log.info(f"Post data written to '{file.name}'")
 def check_updates(version_tag: str):
    """
    This function checks if there's a new release of a project on GitHub. If there is, it logs an
    information message and prints the release notes.
    :param version_tag: A string representing the current version of the project.
    """
    # Make a GET request to the GitHub API to get the latest release of the project
    response = requests.get("https://api.github.com/repos/bellingcat/reddit-post-scraping-tool/releases/latest").json()
    # Check if the latest release's tag matches the current version tag
    if response['tag_name'] != version_tag:
        # If not, convert the release notes from Markdown to HTML
        raw_release_notes = response['body']
        markdown_release_notes = Markdown(raw_release_notes)
        # Log an info message about the new release
        log.info(
            f"A new release of RPST is available ({response['tag_name']}). "
            f"Run 'pip install --upgrade reddit-post-scraping-tool' to get the updates."
        )
        # Print the release notes
        xprint(markdown_release_notes)
 def format_post_data(post: dict, keyword: str, output: bool):
    """
    This function extracts relevant data from a Reddit post and displays it in a tree structure,
    followed by the post's selftext.
    :param post: A dictionary containing the data of a Reddit post.
    :param keyword: The keyword that is used to find posts, in his case gets uses as the filename.
    :param output: If specified, all found posts will be written to a json file.
    """
    # Define the data to extract from the post
    post_data = {
        'Author': post['data']['author'],
        'ID': post['data']['id'],
        'Subreddit': post["data"]["subreddit_name_prefixed"],
        'Visibility': post['data']['subreddit_type'],
        'Thumbnail': post["data"]["thumbnail"],
        'NSFW': post['data']['over_18'],
        'Gilded': post['data']['gilded'],
        'Upvotes': post["data"]["ups"],
        'Upvote ratio': post["data"]["upvote_ratio"],
        'Downvotes': post["data"]["downs"],
        'Awards': post["data"]["total_awards_received"],
        'Top award': post['data']['top_awarded_type'],
        'Is crosspostable?': post['data']['is_crosspostable'],
        'Score': post["data"]["score"],
        'Category': post['data']['category'],
        'Domain': post["data"]["domain"],
        'Created': post['data']['created'],
        'Approved at': post['data']['approved_at_utc'],
        'Approved by': post['data']['approved_by'],
    }
    if output:
        write_post_data(filename=keyword, post_data=post_data)
    # Create a tree structure with the post's title as the root
    post_tree = Tree("\n" + post['data']['title'])
    # Add each piece of extracted data as a branch of the tree
    for post_key, post_value in post_data.items():
        post_tree.add(f"{post_key}: {post_value}")
    # Print the tree structure
    xprint(post_tree)
    # Print the post's selftext
    print(post['data']['selftext'] + "\n")
 def get_posts(arguments: argparse):
    """
    Scrapes a given subreddit for posts that contain a specified keyword.
    The search is limited by the number of posts and timeframe specified. The results are either
    printed to the console or saved to a specified file, based on the 'output' argument.
    :param arguments: Namespace object from argparse.
    Expected Object Attributes
    --------------------------
        - keyword: The keyword to search for in the posts.
        - subreddit: The subreddit to scrape.
        - listing: The type of posts to scrape. This could be 'hot', 'new', etc.
        - timeframe: The timeframe from which to scrape posts. This could be 'day', 'week', etc.
        - limit: The maximum number of posts to scrape.
        - json: If specified, all found posts will be written to a json file.
    Also logs the number of posts in which the keyword was found.
    """
    keyword = arguments.keyword
    subreddit = arguments.subreddit
    listing = arguments.listing
    timeframe = arguments.timeframe
    limit = arguments.limit
    json_output = arguments.json
    # Start a new session
    session = requests.session()
    # Set the User-Agent to mimic a Safari browser on a Mac
    session.headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, '
                                     'like Gecko) Version/14.1.1 Safari/605.1.15'}
    # Send a GET request to the specified subreddit and listing,
    # limiting the response by the specified limit and timeframe
    response = session.get(f'https://reddit.com/r/{subreddit}/{listing}'
                           f'.json?limit={limit}&t={timeframe}').json()
    # Initialize a counter for the number of posts found that contain the keyword
    found_posts = 0
    # Loop through each post in the response
    for post in response['data']['children']:
        # If the keyword is found in the post's selftext or title, increment the counter and process the post
        if keyword.lower() in post['data']['selftext'] or keyword.lower() in post['data']['title']:
            found_posts += 1
            format_post_data(post=post, keyword=keyword, output=json_output)
    # Log the number of posts in which the keyword was found
    log.info(f"Keyword ('{keyword}') was found in {found_posts}/{len(response['data']['children'])} "
             f"{listing} posts from r/{subreddit}.")
 def create_parser():
    """
    Creates and configures an argument parser for the command line arguments.
    :return: A configured argparse.ArgumentParser object ready to parse the command line arguments.
    """
    parser = argparse.ArgumentParser(
        description='RPST: Reddit Post Scraping Tool  —by Richard Mwewa | https://about.me/rly0nheart',
        epilog='Given a subreddit name and a keyword, '
               'RPST returns all top (by default) posts that contain the specified keyword.'
    )
    parser.add_argument('-k', '--keyword', help='The keyword to search for in the posts.', required=True)
    parser.add_argument('-s', '--subreddit', help='The subreddit to scrape.', required=True)
    parser.add_argument(
        '-c', '--limit',
        help='The maximum number of posts to scrape (1-100). (default: %(default)s)',
        default=10,
        type=int,
        choices=range(1, 101)  # This enforces that the limit must be between 1 and 100 inclusive.
    )
    parser.add_argument(
        '-l', '--listing',
        default='top',
        const='top',
        nargs='?',
        choices=['controversial', 'hot', 'best', 'new', 'rising'],
        help='The type of posts to scrape (default: %(default)s)'
    )
    parser.add_argument(
        '-t', '--timeframe',
        default='all',
        const='all',
        nargs='?',
        choices=['hour', 'day', 'week', 'month', 'year', 'all'],
        help='The timeframe from which to scrape posts (default: %(default)s)'
    )
    parser.add_argument(
        '-j', '--json',
        help='Write all found posts to a json file.',
        action='store_true'
    )
    return parser
 logging.basicConfig(level="NOTSET", format="%(message)s",
                    handlers=[RichHandler(markup=True, log_time_format='[%H:%M:%S%p]')])
 log = logging.getLogger("rich")
--- a/rpst/__main.py
+++ b/rpst/__main.py
@@ -1,5 +1,7 @@
 from datetime import datetime
-from rpst.__rpst import log, get_posts, check_updates, create_parser
+
 from .rpst import get_posts
 from .utils import create_parser, set_loglevel, check_updates
 def run():
@@ -10,20 +12,22 @@ def run():
    # Create a parser and parse the command line arguments
    parser = create_parser()
-    arguments = parser.parse_args()
+    args = parser.parse_args()
    log = set_loglevel(args=args)
    # Record the start time
    start_time = datetime.now()
    try:
        # Check for updates
-        check_updates(version_tag="1.6.0.0")
+        check_updates(version_tag="1.7.0.1")
        # Get posts with the provided/parsed arguments
-        get_posts(arguments=arguments)
+        get_posts(args=args)
    except KeyboardInterrupt:
        log.warning("User interruption detected.")
    except Exception as e:
        log.error(f"An error occurred: {e}")
    finally:
-        log.info(f'Finished in {datetime.now() - start_time} seconds.')
+        log.info(f"Finished in {datetime.now() - start_time} seconds.")
--- a/rpst/rpst.py
+++ b/rpst/rpst.py
@@ -0,0 +1,131 @@
 import argparse
 from datetime import datetime
 import requests
 from glyphoji import glyph
 from rich.tree import Tree
 from rich import print as xprint
 from .utils import convert_timestamp_to_datetime, write_post_data
 def create_post_branch(post: dict, keyword: str, tree: Tree, args: argparse) -> Tree:
    """
    This function extracts relevant data from a Reddit post and adds it in a tree branch structure,
    followed by the post's selftext.
    :param post: A dictionary containing the data of a Reddit post.
    :param keyword: The keyword that is used to find posts, in his case gets uses as the filename.
    :param tree: Tree where the post branch will be added.
    :param args: A namespace object from argparse.
    :returns: The main tree with added post branches.
    """
    # Define the data to extract from the post.
    post_data = {
        # "Author": post["data"]["author"],
        f"{glyph.id_button} ID": post["data"]["id"],
        f"{glyph.people_hugging} Subreddit": post["data"]["subreddit_name_prefixed"],
        f"{glyph.face_with_peeking_eye} Visibility": post["data"]["subreddit_type"],
        f"{glyph.framed_picture} Thumbnail": post["data"]["thumbnail"],
        f"{glyph.white_question_mark} Gilded": post["data"]["gilded"],
        f"{glyph.up_arrow} Upvotes": post["data"]["ups"],
        f"{glyph.chart_increasing} Upvote ratio": post["data"]["upvote_ratio"],
        f"{glyph.down_arrow} Downvotes": post["data"]["downs"],
        f"{glyph.trophy} Awards": post["data"]["total_awards_received"],
        f"{glyph.trophy} Top award": post["data"]["top_awarded_type"],
        f"{glyph.no_one_under_eighteen} Is NSFW?": post["data"]["over_18"],
        f"{glyph.left_arrow_curving_right} Is crosspostable?": post["data"][
            "is_crosspostable"
        ],
        f"{glyph.bar_chart} Score": post["data"]["score"],
        f"{glyph.card_file_box} Category": post["data"]["category"],
        f"{glyph.globe_with_meridians} Domain": post["data"]["domain"],
        f"{glyph.calendar} Posted on": convert_timestamp_to_datetime(
            post["data"]["created"]
        ),
        f"{glyph.calendar} Approved at": post["data"]["approved_at_utc"],
        f"{glyph.bust_in_silhouette} Approved by": post["data"]["approved_by"],
    }
    # Add the post's branch to the main tree.
    post_branch = tree.add(f"{glyph.page_with_curl} {post['data']['title']}")
    # Add each piece of extracted data as a branch of the post_branch.
    for post_key, post_value in post_data.items():
        post_branch.add(f"{post_key}: {post_value}", style="dim")
    # This ensures that the post's selftext is also added to the written json/csv file.
    post_data[f"{glyph.clipboard} Text"] = post["data"]["selftext"]
    write_post_data(
        filename=keyword, post_data=post_data, tree_branch=post_branch, args=args
    )
    post_branch.add(post["data"]["selftext"], style="italic")
    return tree
 def get_posts(args: argparse):
    """
    Scrapes a given subreddit for posts that contain a specified keyword.
    The search is limited by the number of posts and timeframe specified.
    :param args: Namespace object from argparse.
    Expected Object Attributes
    --------------------------
        - keyword: The keyword to search for in the posts.
        - subreddit: The subreddit to scrape.
        - listing: The type of posts to scrape. This could be 'hot', 'new', etc.
        - timeframe: The timeframe from which to scrape posts. This could be 'day', 'week', etc.
        - limit: The maximum number of posts to scrape.
        - json: If specified, all found posts will be written to a json file.
    """
    keyword = args.keyword
    subreddit = args.subreddit
    listing = args.listing
    timeframe = args.timeframe
    limit = args.limit
    # Create main result tree.
    main_tree = Tree(
        f"[bold]{glyph.calendar} {datetime.now()}[/]", guide_style="bold bright_blue"
    )
    # Start a new session
    session = requests.session()
    # Set the User-Agent to mimic a Safari browser on a Mac.
    session.headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, "
        "like Gecko) Version/14.1.1 Safari/605.1.15"
    }
    # Send a GET request to the specified subreddit and listing,
    # limiting the response by the specified limit and timeframe.
    response = session.get(
        f"https://reddit.com/r/{subreddit}/{listing}"
        f".json?limit={limit}&t={timeframe}"
    ).json()
    # Initialize a counter for the number of posts found that contain the keyword.
    found_posts = 0
    # Loop through each post in the response
    for post_index, post in enumerate(response["data"]["children"], start=1):
        # If the keyword is found in the post's selftext or title, increment the counter and process the post.
        if (
            keyword.lower() in post["data"]["selftext"]
            or keyword.lower() in post["data"]["title"]
        ):
            # Create a branch for found post(s) and show post index and post author as the title
            found_tree = main_tree.add(
                f"{glyph.bust_in_silhouette} #{post_index} by [bold]@{post['data']['author']}[/]"
            )
            found_posts += 1
            create_post_branch(post=post, keyword=keyword, tree=found_tree, args=args)
    # Log the number of posts in which the keyword was found
    main_tree.add(
        f"{glyph.check_mark_button} Keyword ('{keyword}') was found in "
        f"{found_posts}/{len(response['data']['children'])} {listing} posts from r/{subreddit}."
    )
    xprint(main_tree)
--- a/rpst/utils.py
+++ b/rpst/utils.py
@@ -0,0 +1,202 @@
 import os
 import csv
 import json
 import logging
 import argparse
 from datetime import datetime
 import requests
 from glyphoji import glyph
 from rich.tree import Tree
 from rich import print as xprint
 from rich.markdown import Markdown
 from rich.logging import RichHandler
 def convert_timestamp_to_datetime(timestamp: int) -> str:
    """
    Converts a Unix timestamp to a formatted datetime string.
    :param timestamp: The Unix timestamp to be converted.
    :return: A formatted datetime string in the format "dd MMMM yyyy, hh:mm:ssAM/PM".
    """
    utc_from_timestamp = datetime.utcfromtimestamp(timestamp)
    datetime_object = utc_from_timestamp.strftime("%d %B %Y, %I:%M:%S%p")
    return datetime_object
 def create_parser():
    """
    Creates and configures an argument parser for the command line arguments.
    :return: A configured argparse.ArgumentParser object ready to parse the command line arguments.
    """
    parser = argparse.ArgumentParser(
        description="RPST (Reddit Post Scraping Tool)  —by Richard Mwewa | https://about.me/rly0nheart",
        epilog="Given a subreddit name and a keyword, "
        "RPST returns all top (by default) posts that contain the specified keyword.",
    )
    parser.add_argument(
        "-k", "--keyword", help="The keyword to search for in the posts.", required=True
    )
    parser.add_argument(
        "-s", "--subreddit", help="The subreddit to scrape.", required=True
    )
    parser.add_argument(
        "-c",
        "--limit",
        help="The maximum number of posts to scrape (1-100). (default: %(default)s)",
        default=10,
        type=int,
        choices=range(
            1, 101
        ),  # This enforces that the limit must be between 1 and 100 inclusive.
    )
    parser.add_argument(
        "-l",
        "--listing",
        default="top",
        const="top",
        nargs="?",
        choices=["controversial", "hot", "best", "new", "rising"],
        help="The type of posts to scrape (default: %(default)s)",
    )
    parser.add_argument(
        "-t",
        "--timeframe",
        default="all",
        const="all",
        nargs="?",
        choices=["hour", "day", "week", "month", "year", "all"],
        help="The timeframe from which to scrape posts (default: %(default)s)",
    )
    parser.add_argument(
        "--json",
        help="Write all found posts to a json file.",
        action="store_true",
    )
    parser.add_argument(
        "--csv",
        help="Write all found posts to a csv file.",
        action="store_true",
    )
    parser.add_argument(
        "-d",
        "--debug",
        help="run rpst in debug mode (show network logs)",
        action="store_true",
    )
    return parser
 def check_updates(version_tag: str):
    """
    This function checks if there's a new release of a project on GitHub. If there is, it logs an
    information message and prints the release notes.
    :param version_tag: A string representing the current version of the project.
    """
    # Make a GET request to the GitHub API to get the latest release of the project.
    response = requests.get(
        "https://api.github.com/repos/bellingcat/reddit-post-scraping-tool/releases/latest"
    ).json()
    # Check if the latest release's tag matches the current version tag.
    if response["tag_name"] != version_tag:
        # If not, convert the release notes from Markdown to HTML.
        raw_release_notes = response["body"]
        # Log an info message about the new release.
        xprint(
            f"{glyph.up_arrow} A new release of RPST is available ({response['tag_name']}). "
            f"Run 'pip install --upgrade reddit-post-scraping-tool' to get the updates."
        )
        # Print the release notes.
        xprint(Markdown(raw_release_notes))
 def set_loglevel(args: argparse) -> logging.getLogger:
    """
    Configures the logging level based on the provided arguments.
    If `args.debug` is True, the logging level is set to "NOTSET," allowing all log messages to be displayed.
    Otherwise, the logging level is set to "INFO," and only informational and higher-severity messages are displayed.
    The function also configures a RichHandler for formatting the log messages,
     including a specific time format and hiding the log level.
    :param args: A namespace object from argparse containing the debugging option (args.debug).
    :return: A logger object associated with the name "rich."
    """
    if args.debug:
        logging.basicConfig(
            level="NOTSET",
            format="%(message)s",
            handlers=[
                RichHandler(
                    markup=True, log_time_format="[%H:%M:%S%p]", show_level=False
                )
            ],
        )
    else:
        logging.basicConfig(
            level="INFO",
            format="%(message)s",
            handlers=[
                RichHandler(
                    markup=True, log_time_format="[%H:%M:%S%p]", show_level=False
                )
            ],
        )
    return logging.getLogger("rich")
 def write_post_data(post_data: dict, filename: str, args, tree_branch: Tree):
    """
    Writes post data to a specified JSON or CSV file based on the args provided, and updates
    the provided tree with the status.
    :param post_data: A dictionary containing post data.
    :param filename: The name of the file to which post data will be written.
    :param args: A namespace object from argparse containing the output format options (args.json or args.csv).
    :param tree_branch: A rich Tree object to which status information will be added.
    """
    home_directory = os.path.expanduser("~")
    if args.json:
        json_file_path = os.path.join(home_directory, f"{filename}.json")
        with open(json_file_path, "a", encoding="utf-8") as file:
            file.write(json.dumps(post_data, ensure_ascii=False))
            file.write("\n")  # Separate posts with newline
        tree_branch.add(
            f"{glyph.page_facing_up} JSON data successfully written/appended to file: "
            f"[italic][link file://{json_file_path}]{json_file_path}[/]"
        )
    else:
        tree_branch.add(
            f"{glyph.cross_mark_button} JSON data writing operation was skipped. No changes made."
        )
    if args.csv:
        csv_file_path = os.path.join(home_directory, f"{filename}.csv")
        with open(csv_file_path, "a", newline="", encoding="utf-8") as csvfile:
            writer = csv.DictWriter(csvfile, fieldnames=post_data.keys())
            # Write headers if file is empty
            if csvfile.tell() == 0:
                writer.writeheader()
            writer.writerow(post_data)
        tree_branch.add(
            f"{glyph.page_facing_up} CSV data successfully written/appended to file: "
            f"[italic][link file://{csv_file_path}]{csv_file_path}[/]"
        )
    else:
        tree_branch.add(
            f"{glyph.cross_mark_button} CSV data writing operation was skipped. No changes made."
        )
Author	SHA1	Message	Date
Richard Mwewa	7198e0be90	Merge pull request #13 from bellingcat/dev Dev	2023-08-25 23:13:15 +02:00
Richard Mwewa	f9f0ed5085	Update main.py	2023-08-25 23:12:06 +02:00
Richard Mwewa	b0c53a1511	Update RPST.vbproj	2023-08-25 23:11:18 +02:00
Richard Mwewa	54b9abc4b6	Update pyproject.toml	2023-08-25 23:10:29 +02:00
Richard Mwewa	4ae14ff02e	Delete .nomedia	2023-08-25 23:09:36 +02:00
Richard Mwewa	376eeab243	Add files via upload	2023-08-25 23:09:17 +02:00
Richard Mwewa	6d427849d5	Create .nomedia	2023-08-25 23:04:27 +02:00
Richard Mwewa	ad8cb63541	Update README.md	2023-08-25 23:03:34 +02:00
Richard Mwewa	57f8c24cee	Update rpst.py	2023-08-25 17:09:42 +02:00
Richard Mwewa	750967c322	Merge pull request #12 from bellingcat/dev Dev	2023-08-25 15:53:49 +02:00
Richard Mwewa	cfef86cbe3	Update README.md	2023-08-25 15:53:23 +02:00
Richard Mwewa	2a2696403d	Update README.md	2023-08-25 15:52:49 +02:00
Richard Mwewa	b0a8d75d8c	Update README.md	2023-08-25 15:49:07 +02:00
Richard Mwewa	b31c38f5cc	Update README.md	2023-08-25 15:48:19 +02:00
Richard Mwewa	b5b7df868e	Update README.md	2023-08-25 15:44:13 +02:00
Richard Mwewa	566f558720	Update utils.py	2023-08-25 15:27:18 +02:00
Richard Mwewa	c3e5ce6441	Update rpst.py	2023-08-25 15:26:42 +02:00
Richard Mwewa	7c164938c9	Update RPST.vbproj	2023-08-25 15:06:31 +02:00
Richard Mwewa	b08c4a147b	Create utils.py	2023-08-25 15:04:37 +02:00
Richard Mwewa	8f259b7a40	Update pyproject.toml 1.7.0.0	2023-08-25 14:54:31 +02:00
Richard Mwewa	f117c99cc7	Update and rename __main.py to main.py 1.7.0.0	2023-08-25 14:52:27 +02:00
Richard Mwewa	3a9a87e67c	Update and rename __rpst.py to rpst.py 1.7.0.0	2023-08-25 14:51:16 +02:00
Richard Mwewa	cce254e976	Update pyproject.toml	2023-08-14 02:51:49 +02:00
Richard Mwewa	418b2acc4c	Update README.md	2023-08-12 05:25:07 +02:00
Richard Mwewa	d26699cc1f	Update README.md	2023-08-12 05:24:35 +02:00
Richard Mwewa	9efb1cea4a	Merge pull request #10 from bellingcat/dev Dev	2023-08-12 05:19:16 +02:00
Richard Mwewa	ba6eeb38a6	Add files via upload	2023-08-12 05:09:57 +02:00
Richard Mwewa	2053c0f0bc	Update pyproject.toml	2023-08-12 05:05:23 +02:00
Richard Mwewa	8bef73001c	Update __main.py	2023-08-12 05:04:56 +02:00
Richard Mwewa	c9d9628326	Update __rpst.py Saved posts will also include the selftext.	2023-08-12 05:04:24 +02:00
Richard Mwewa	2f1619b4c5	Merge pull request #9 from bellingcat/dev Dev	2023-08-12 03:54:42 +02:00
Richard Mwewa	33db66dbc3	Add files via upload	2023-08-12 03:53:33 +02:00
Richard Mwewa	bbbdab906d	Update pyproject.toml	2023-08-12 03:49:05 +02:00
Richard Mwewa	74264224a5	Update __main.py	2023-08-12 03:47:28 +02:00
Richard Mwewa	ce75d40f76	Update __rpst.py Changed post ouput format	2023-08-12 03:46:23 +02:00
Richard Mwewa	406e34c4bb	Update __rpst.py	2023-08-09 22:50:33 +02:00
Richard Mwewa	38140ea2be	Merge pull request #8 from bellingcat/dev Update and rename __rpst_.py to __rpst.py	2023-08-09 22:39:33 +02:00
Richard Mwewa	4c3d3a688f	Update and rename __rpst_.py to __rpst.py Yep, I suck	2023-08-09 22:38:48 +02:00
Richard Mwewa	a03b649904	Merge pull request #7 from bellingcat/dev Create __init__.py	2023-08-09 22:36:41 +02:00
Richard Mwewa	aa3b506a96	Create __init__.py	2023-08-09 22:36:20 +02:00