diff --git a/docs/images/cisticola_logo.svg b/docs/images/cisticola_logo.svg
index f570be8..92c0ca5 100644
--- a/docs/images/cisticola_logo.svg
+++ b/docs/images/cisticola_logo.svg
@@ -7,7 +7,7 @@
viewBox="0 0 51.688999 11.797"
version="1.1"
id="svg5"
- inkscape:version="1.1.2 (76b9e6a115, 2022-02-25)"
+ inkscape:version="1.2.2 (1:1.2.2+202305151915+b0a8486541)"
sodipodi:docname="cisticola_logo.svg"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
@@ -28,14 +28,16 @@
fit-margin-right="0"
fit-margin-bottom="0"
inkscape:zoom="2.0838024"
- inkscape:cx="52.548168"
+ inkscape:cx="141.56813"
inkscape:cy="115.65396"
inkscape:window-width="1920"
- inkscape:window-height="999"
+ inkscape:window-height="1005"
inkscape:window-x="0"
inkscape:window-y="0"
inkscape:window-maximized="1"
- inkscape:current-layer="layer4" />
+ inkscape:current-layer="layer3"
+ inkscape:showpageshadow="2"
+ inkscape:deskcolor="#d1d1d1" />
+
diff --git a/docs/source/about.rst b/docs/source/about.rst
index a233fe0..233d829 100644
--- a/docs/source/about.rst
+++ b/docs/source/about.rst
@@ -8,20 +8,37 @@ Definitions
- *Platform*: a social media website, for example Telegram, YouTube, or Rumble.
- *Channel*: an account or group on a platform, for example Twitter users, Telegram private chat groups, YouTube channels, and Gab groups.
- *Post*: a single item created by a channel, for example a Telegram message, a Tweet, or a YouTube video. Posts can contain one or more media attachments.
-- *Media*: a file uploaded to a platform by a channel as part of a post.
+- *Media*: a file uploaded to a platform by a channel as part of a post. Often images or video but can include audio, or for some platforms arbitrary file types (such as PDFs).
Components
----------
-Cisticola has many components
+Cisticola has many components, including:
-- :py:mod:`cisticola.base`: contains Object Relational Mapping (ORM) dataclasses that imperatively map to pre-defined SQL tables
-- :py:mod:`cisticola.scraper`: contains platform-specific modules for scraping raw data from platforms. For example, the :py:mod:`cisticola.scraper.bitchute` module extracts raw data from Bitchute.
-- :py:mod:`cisticola.transformer`: contains platform-specific modules for converting raw data into a standardized, cross-platform format.
+- The :py:mod:`cisticola.base` module contains Object Relational Mapping (ORM) dataclasses that imperatively map to pre-defined SQL tables
+- The :py:mod:`cisticola.scraper` subpackage contains platform-specific modules for scraping raw data from platforms. For example, the :py:mod:`cisticola.scraper.bitchute` module extracts raw data from Bitchute.
+- The :py:mod:`cisticola.transformer` subpackage contains platform-specific modules for converting raw data into a standardized, cross-platform format.
The data extracted by scrapers varies by platform, but typically includes media files attached to posts.
Separating the "scraping" and "transforming" steps is useful because it ensures that no data is thrown away during the transormation. There may be some fields in the raw data that aren't included in the transformed format, but could be found to be useful in the future.
+Tables
+------
+The database Cisticola uses to archive and store data consists of 6 tables. Their names, respective ORM mapping in :py:mod:`cisticola.base`, and a brief description are shown below:
+
+- ``channels`` (:py:class:`cisticola.base.Channel`): User-specified information about a channel
+- ``raw_posts`` (:py:class:`cisticola.base.ScraperResult`): Minimally processed information scraped from a post
+- ``posts`` (:py:class:`cisticola.base.Post`): Processed information about a post
+- ``raw_channel_info`` (:py:class:`cisticola.base.RawChannelInfo`): Minimally processed information scraped from a channel
+- ``channel_info`` (:py:class:`cisticola.base.ChannelInfo`): Processed information about a channel
+- ``media`` (:py:class:`cisticola.base.Media`): Processed information about a media file attached to a post
+
+The diagram below shows all columns in each table and their data types, with certain shared primary and foreign key columns colored differently to distinguish them.
+
+.. image:: ../images/database_schema.svg
+ :target: _images/database_schema.svg
+ :width: 100%
+
TODO
- Add diagram
- Describe common workflow and steps
diff --git a/docs/source/conf.py b/docs/source/conf.py
index c291fb8..fc3607f 100644
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -57,4 +57,4 @@ autodoc_default_options = {'exclude-members': '_sa_class_manager'}
html_favicon = '../images/favicon.ico'
html_logo = '../images/cisticola_logo.svg'
-html_theme_options = {'style_nav_header_background': '#000000'}
\ No newline at end of file
+html_theme_options = {'style_nav_header_background': '#292a2b'}
\ No newline at end of file
diff --git a/docs/source/deployment.rst b/docs/source/deployment.rst
new file mode 100644
index 0000000..91e0c56
--- /dev/null
+++ b/docs/source/deployment.rst
@@ -0,0 +1,16 @@
+Deployment
+==========
+
+.. warning::
+
+ We are working on making cisticola more to install, configure, and use. If you're confused by these steps don't worry, it will get more accessible.
+
+Docker
+------
+The easiest way to deploy Cisticola is to use Docker. Docker works like a virtual machine running inside your computer, it isolates everything and makes installation simple.
+
+1. Install `Docker `_
+
+Manual Installation
+-------------------
+TODO
\ No newline at end of file
diff --git a/docs/source/developer_guide.rst b/docs/source/developer_guide.rst
new file mode 100644
index 0000000..2fb512f
--- /dev/null
+++ b/docs/source/developer_guide.rst
@@ -0,0 +1,23 @@
+Developer Guide
+===============
+
+Installation
+------------
+
+To install the necessary dependencies for building the documentation and running unit tests, run the following command from the package root directory:
+
+.. code-block::
+
+ pipenv install --dev
+
+Documentation
+-------------
+If changes are made to the package structure or additional modules are created, you can update the Sphinx source ``cisticola.*.rst`` files by running the following command from the ``docs/`` directory:
+
+.. code-block::
+
+ pipenv run make apidoc
+
+Formatting
+----------
+Cisticola uses `black `_ to format source code.
\ No newline at end of file
diff --git a/docs/source/index.rst b/docs/source/index.rst
index be9e5a3..e3dfcdf 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -6,4 +6,6 @@ Welcome to Cisticola's documentation!
about
quickstart
+ deployment
+ developer_guide
cisticola
\ No newline at end of file
diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst
index f6ca747..5d8621e 100644
--- a/docs/source/quickstart.rst
+++ b/docs/source/quickstart.rst
@@ -16,35 +16,10 @@ and then install the dependencies using the following command from the package r
pipenv install
-To install the necessary dependencies for building the documentation and running unit tests, run the following command from the package root directory:
-
-.. code-block::
-
- pipenv install --dev
-
Environment Variables
---------------------
-Three of the scrapers in *cisticola* (:py:mod:`~cisticola.scraper.gab.GabScraper`, :py:mod:`~cisticola.scraper.instagram.InstagramScraper`, and :py:mod:`~cisticola.scraper.telegram_telethon.TelegramTelethonScraper`) require platform credentials to work correctly.
-
-Gab
-"""
-
-The Gab credentials can be configured by running the following command from the root directory:
-
-.. code-block::
-
- pipenv run garc configure
-
-which will direct you to provide the username and password for your Gab account.
-
-Instagram
-"""""""""
-
-The Instagram credentials can be configured by setting the following environment variables, either in the project's ``.env`` file or in the system's environment:
-
-- ``INSTAGRAM_USERNAME``: username of your Instagram account
-- ``INSTAGRAM_PASSWORD``: password of your Instagram account
+One of the scrapers in *cisticola* (:py:mod:`~cisticola.scraper.telegram_telethon.TelegramTelethonScraper`) requires platform credentials to work correctly.
Telegram Telethon
"""""""""""""""""
@@ -57,6 +32,12 @@ The Telegram credentials can be configured by setting the following environment
If you do not already have a Telegram application, you can create one by following the instructions on `this page`_.
+To initialize a Telegram session, run the following script from the package's root directory using the command-line:
+
+.. bash::
+
+ bash telethon_session_init.py
+
Documentation
-------------
@@ -86,11 +67,7 @@ To see the logging output from a test run, add the ``--capture=no`` flag to the
Examples
--------
-An example of a *cisticola* ingest file ``russian_telegram_ingest.py`` is included in the package root directory, showing how the list of channels to scrape is defined, and how the :py:mod:`~cisticola.scraper.base.ScraperController` and :py:mod:`~cisticola.transformer.base.Transformer` classes are used. To run the ingest script, run the following command from the package root directory:
-
-.. code-block::
-
- pipenv run python russian_telegram_ingest.py
+The script ``app.py`` is included in the package root directory, showing how the list of channels to scrape is defined, and how the :py:mod:`~cisticola.scraper.base.ScraperController` and :py:mod:`~cisticola.transformer.base.Transformer` classes are used.
.. _pipenv: https://pipenv.pypa.io/en/latest/
.. _Sphinx: https://www.sphinx-doc.org/en/master/
diff --git a/pytest.ini b/pytest.ini
index ae2a8b6..f3545f6 100644
--- a/pytest.ini
+++ b/pytest.ini
@@ -11,10 +11,6 @@ addopts =
--cov-report html:reports/coverage
--html='reports/tests.html'
--self-contained-html
-markers =
- profile: marks tests for only extracting channel metadata (deselect with '-m "not profile"')
- media: marks tests for archiving all media attachments (deselect with '-m "not media"')
- unarchived: marks tests for archiving all unarchived media attachments (deselect with '-m "not unarchived"')
filterwarnings =
ignore:the imp module is deprecated:DeprecationWarning
ignore:The localize method is no longer necessary, as this time zone supports the fold attribute