Migrate to uv, add tests, and fix location-independent paths

This commit is contained in:
seangreaves
2026-02-21 11:35:29 +00:00
parent acbb957c14
commit 60eed23cb5
27 changed files with 3593 additions and 208 deletions

3
.env.example Normal file
View File

@@ -0,0 +1,3 @@
# Companies House API key.
# Get one at: https://developer.company-information.service.gov.uk/how-to-create-an-application
COMPANIES_HOUSE_API_KEY=your_api_key_here

3
.gitignore vendored
View File

@@ -58,3 +58,6 @@ notebooks/investigations
.ipynb_checkpoints
*.DS_Store
# Environment / secrets
.env

1
.python-version Normal file
View File

@@ -0,0 +1 @@
3.10

View File

@@ -1 +0,0 @@
web: voila --port=$PORT --no-browser --debug --show_tracebacks=True --Voila.ip=0.0.0.0 dashboard/Sugartrail.ipynb --VoilaConfiguration.file_whitelist="['.*']"

View File

@@ -11,27 +11,69 @@ Sugartrail is a network analysis and visualisation tool developed to make it eas
## Requirements
You will require an API key from Companies House to get data. First you will need to create a live application to get an API key which you can do by following the [Companies House guide](https://developer.company-information.service.gov.uk/how-to-create-an-application).
You will need a Companies House API key. Create a live application to get one by following the [Companies House guide](https://developer.company-information.service.gov.uk/how-to-create-an-application).
## Installation
1. Make sure you have Conda installed
This project uses [uv](https://docs.astral.sh/uv/) for package management.
2. Download the tool's repository using the command:
1. Install uv if you don't have it:
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```
2. Clone the repository:
```bash
git clone https://github.com/ribenamaplesyrup/sugartrail.git
cd sugartrail
```
3. Navigate to the main directory and run:
3. Install dependencies (uv will manage the Python version automatically):
```bash
conda create -n candystore python=3.10
conda activate candystore
pip install -e .
uv sync
```
4. For a quickstart run `python -m voila --no-browser --debug dashboard/Sugartrail.ipynb --VoilaConfiguration.file_allowlist="['.*']"` and navigate to the url printed in your terminal where Voilà is running at (no-code). For a more detailed explanation of the tool's capabilities, run `jupyter notebook notebooks` and open either `quickstart.ipynb` or `001_getting_started.ipynb`.
## API Key Setup
Copy the example env file and add your key:
```bash
cp .env.example .env
```
Then open `.env` and replace `your_api_key_here` with your actual Companies House API key:
```
COMPANIES_HOUSE_API_KEY=your_api_key_here
```
The key is loaded automatically when you import the package. You can also set it at runtime in a notebook or script:
```python
import sugartrail
sugartrail.api.basic_auth.username = "your_api_key_here"
```
## Running
For a quickstart, launch the no-code dashboard:
```bash
uv run python -m voila --no-browser --debug dashboard/Sugartrail.ipynb --VoilaConfiguration.file_allowlist="['.*']"
```
Then navigate to the URL printed in your terminal.
For notebook tutorials:
```bash
uv run jupyter notebook notebooks
```
Open either `quickstart.ipynb` or `001_getting_started.ipynb`.
## Examples & Tutorials

View File

@@ -1,9 +0,0 @@
notebook>=7.2
voila>=0.5
pandas>=2.2
ipywidgets>=8.1
tqdm
ratelimit
pyvis>=0.3
ipyleaflet>=0.19
regex

View File

@@ -25,7 +25,15 @@
"source": [
"### Prerequisites\n",
"\n",
"Sugartrail uses the [Companies House Public Data API](https://developer-specs.company-information.service.gov.uk/companies-house-public-data-api/reference) to gather data on connected companies, persons and addresses. To access this API you will need a key which you can aquire by registering a [user account](https://developer.company-information.service.gov.uk/get-started/). Once you've aquired the key, insert it below as the string value of `api.basic_auth.username`:"
"Sugartrail uses the [Companies House Public Data API](https://developer-specs.company-information.service.gov.uk/companies-house-public-data-api/reference) to gather data on connected companies, persons and addresses. To access this API you will need a key which you can acquire by registering a [user account](https://developer.company-information.service.gov.uk/get-started/).\n",
"\n",
"The recommended way to set your key is to add it to a `.env` file in the project root:\n",
"\n",
"```\n",
"COMPANIES_HOUSE_API_KEY=your_key_here\n",
"```\n",
"\n",
"It will be loaded automatically when you import `sugartrail`. Alternatively you can set it manually in the cell below."
]
},
{
@@ -37,7 +45,19 @@
"source": [
"import sugartrail\n",
"from ipywidgets import VBox, HBox\n",
"import pandas as pd"
"import pandas as pd\n",
"from pathlib import Path\n",
"from IPython.display import Image, display\n",
"\n",
"def _find_project_root():\n",
" p = Path.cwd()\n",
" while p != p.parent:\n",
" if (p / 'pyproject.toml').exists():\n",
" return p\n",
" p = p.parent\n",
" return Path.cwd()\n",
"\n",
"_PROJECT_ROOT = _find_project_root()"
]
},
{
@@ -47,7 +67,10 @@
"metadata": {},
"outputs": [],
"source": [
"sugartrail.api.basic_auth.username = \"\""
"# API key is loaded automatically from .env (COMPANIES_HOUSE_API_KEY).\n",
"# To set it manually instead: sugartrail.api.basic_auth.username = \"your_key\"\n",
"if not sugartrail.api.basic_auth.username:\n",
" print(\"No API key found — add COMPANIES_HOUSE_API_KEY to your .env file\")"
]
},
{
@@ -100,11 +123,13 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "f73b17d8",
"metadata": {},
"outputs": [],
"source": [
"![title](../assets/images/spy.png)"
"display(Image(str(_PROJECT_ROOT / 'assets' / 'images' / 'spy.png')))"
]
},
{
@@ -116,11 +141,13 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "e21f3c98",
"metadata": {},
"outputs": [],
"source": [
"![title](../assets/images/scrooge.png)"
"display(Image(str(_PROJECT_ROOT / 'assets' / 'images' / 'scrooge.png')))"
]
},
{
@@ -493,9 +520,7 @@
"cell_type": "code",
"execution_count": null,
"id": "7256c5f9",
"metadata": {
"scrolled": false
},
"metadata": {},
"outputs": [],
"source": [
"map_data, path_table = sugartrail.mapvis.build_map(network) \n",
@@ -520,11 +545,13 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "f6674e52",
"metadata": {},
"outputs": [],
"source": [
"<img src=\"../assets/images/kingdom_table.png\" alt=\"Drawing\" style=\"width: 700px;\"/>\n"
"display(Image(str(_PROJECT_ROOT / 'assets' / 'images' / 'kingdom_table.png'), width=700))"
]
},
{
@@ -598,7 +625,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
"version": "3.10.18"
}
},
"nbformat": 4,

View File

@@ -87,7 +87,19 @@
"source": [
"import sugartrail\n",
"import pandas as pd\n",
"from ipywidgets import HTML, Widget, Layout, Output, VBox, HBox, Textarea"
"from ipywidgets import HTML, Widget, Layout, Output, VBox, HBox, Textarea\n",
"from pathlib import Path\n",
"from IPython.display import Image, display\n",
"\n",
"def _find_project_root():\n",
" p = Path.cwd()\n",
" while p != p.parent:\n",
" if (p / 'pyproject.toml').exists():\n",
" return p\n",
" p = p.parent\n",
" return Path.cwd()\n",
"\n",
"_PROJECT_ROOT = _find_project_root()"
]
},
{
@@ -105,7 +117,10 @@
"metadata": {},
"outputs": [],
"source": [
"sugartrail.api.basic_auth.username = \"\""
"# API key is loaded automatically from .env (COMPANIES_HOUSE_API_KEY).\n",
"# To set it manually instead: sugartrail.api.basic_auth.username = \"your_key\"\n",
"if not sugartrail.api.basic_auth.username:\n",
" print(\"No API key found — add COMPANIES_HOUSE_API_KEY to your .env file\")"
]
},
{
@@ -117,14 +132,13 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "a8e5dbe1",
"metadata": {},
"outputs": [],
"source": [
"<figure>\n",
"<img src=\"../assets/images/western_crown.png\" style=\"width:100%\">\n",
"<figcaption align = \"center\"> 470-482 Oxford Street </figcaption>\n",
"</figure>"
"display(Image(str(_PROJECT_ROOT / 'assets' / 'images' / 'western_crown.png')))"
]
},
{
@@ -200,9 +214,7 @@
"cell_type": "code",
"execution_count": null,
"id": "7bdde00f",
"metadata": {
"scrolled": false
},
"metadata": {},
"outputs": [],
"source": [
"# generate map (may take half a minute!)\n",
@@ -233,14 +245,13 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "439ba049",
"metadata": {},
"outputs": [],
"source": [
"<figure>\n",
"<img src=\"../assets/images/537.png\" style=\"width:100%\">\n",
"<figcaption align = \"center\"> 537 Oxford Street </figcaption>\n",
"</figure>"
"display(Image(str(_PROJECT_ROOT / 'assets' / 'images' / '537.png')))"
]
},
{
@@ -258,14 +269,13 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "67b89126",
"metadata": {},
"outputs": [],
"source": [
"<figure>\n",
"<img src=\"../assets/images/524.png\" style=\"width:100%\">\n",
"<figcaption align = \"center\"> 524 Oxford Street </figcaption>\n",
"</figure>"
"display(Image(str(_PROJECT_ROOT / 'assets' / 'images' / '524.png')))"
]
},
{
@@ -284,14 +294,13 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "004ff136",
"metadata": {},
"outputs": [],
"source": [
"<figure>\n",
"<img src=\"../assets/images/470.png\" style=\"width:100%\">\n",
"<figcaption align = \"center\">470-482 Oxford Street</figcaption>\n",
"</figure>"
"display(Image(str(_PROJECT_ROOT / 'assets' / 'images' / '470.png')))"
]
},
{
@@ -304,14 +313,13 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "2143ce03",
"metadata": {},
"outputs": [],
"source": [
"<figure>\n",
"<img src=\"../assets/images/447.png\" style=\"width:100%\">\n",
"<figcaption align = \"center\">447 Oxford Street</figcaption>\n",
"</figure>"
"display(Image(str(_PROJECT_ROOT / 'assets' / 'images' / '447.png')))"
]
},
{
@@ -324,14 +332,13 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "1b74fcca",
"metadata": {},
"outputs": [],
"source": [
"<figure>\n",
"<img src=\"../assets/images/407.png\" style=\"width:100%\">\n",
"<figcaption align = \"center\"> 407-409 Oxford Street </figcaption>\n",
"</figure>"
"display(Image(str(_PROJECT_ROOT / 'assets' / 'images' / '407.png')))"
]
},
{
@@ -344,14 +351,13 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "a3a6e274",
"metadata": {},
"outputs": [],
"source": [
"<figure>\n",
"<img src=\"../assets/images/269.png\" style=\"width:100%\">\n",
"<figcaption align = \"center\"> 267-269 Oxford Street </figcaption>\n",
"</figure>"
"display(Image(str(_PROJECT_ROOT / 'assets' / 'images' / '269.png')))"
]
},
{
@@ -364,14 +370,13 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "54301d43",
"metadata": {},
"outputs": [],
"source": [
"<figure>\n",
"<img src=\"../assets/images/263.png\" style=\"width:100%\">\n",
"<figcaption align = \"center\"> 263-265 Oxford Street </figcaption>\n",
"</figure>"
"display(Image(str(_PROJECT_ROOT / 'assets' / 'images' / '263.png')))"
]
},
{
@@ -393,14 +398,13 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "a5883ee7",
"metadata": {},
"outputs": [],
"source": [
"<figure>\n",
"<img src=\"../assets/images/240.png\" style=\"width:100%\">\n",
"<figcaption align = \"center\"> 240-242 Oxford Street </figcaption>\n",
"</figure>"
"display(Image(str(_PROJECT_ROOT / 'assets' / 'images' / '240.png')))"
]
},
{
@@ -419,14 +423,13 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "af81028b",
"metadata": {},
"outputs": [],
"source": [
"<figure>\n",
"<img src=\"../assets/images/158.png\" style=\"width:100%\">\n",
"<figcaption align = \"center\"> 158 Oxford Street </figcaption>\n",
"</figure>"
"display(Image(str(_PROJECT_ROOT / 'assets' / 'images' / '158.png')))"
]
},
{
@@ -440,14 +443,13 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "6c3bf19e",
"metadata": {},
"outputs": [],
"source": [
"<figure>\n",
"<img src=\"../assets/images/146.png\" style=\"width:100%\">\n",
"<figcaption align = \"center\"> 146-148 Oxford Street </figcaption>\n",
"</figure>"
"display(Image(str(_PROJECT_ROOT / 'assets' / 'images' / '146.png')))"
]
},
{
@@ -471,14 +473,13 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "fa1727b0",
"metadata": {},
"outputs": [],
"source": [
"<figure>\n",
"<img src=\"../assets/images/142.png\" style=\"width:100%\">\n",
"<figcaption align = \"center\"> 142 Oxford Street </figcaption>\n",
"</figure>"
"display(Image(str(_PROJECT_ROOT / 'assets' / 'images' / '142.png')))"
]
},
{
@@ -492,14 +493,13 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "047afd96",
"metadata": {},
"outputs": [],
"source": [
"<figure>\n",
"<img src=\"../assets/images/41.png\" style=\"width:100%\">\n",
"<figcaption align = \"center\"> 41 Oxford Street </figcaption>\n",
"</figure>"
"display(Image(str(_PROJECT_ROOT / 'assets' / 'images' / '41.png')))"
]
},
{
@@ -512,14 +512,13 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "5bd0e6f3",
"metadata": {},
"outputs": [],
"source": [
"<figure>\n",
"<img src=\"../assets/images/37.png\" style=\"width:100%\">\n",
"<figcaption align = \"center\"> 37-39 Oxford Street </figcaption>\n",
"</figure>"
"display(Image(str(_PROJECT_ROOT / 'assets' / 'images' / '37.png')))"
]
},
{
@@ -549,14 +548,13 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "a3f9a2d7",
"metadata": {},
"outputs": [],
"source": [
"<figure>\n",
"<img src=\"../assets/images/4.png\" style=\"width:100%\">\n",
"<figcaption align = \"center\"> 4 Oxford Street </figcaption>\n",
"</figure>"
"display(Image(str(_PROJECT_ROOT / 'assets' / 'images' / '4.png')))"
]
},
{
@@ -587,6 +585,14 @@
"- analyse hotspots for registering new companies over time to see if there are emerging popular locations, in other words where is the new Oxford Street?\n",
"- analyse other types of companies connected to souvenir and candy shops (money exchanges, security firms etc.)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "878e2d1c-4f19-4f1d-9972-b152b006a22b",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
@@ -605,7 +611,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.15"
"version": "3.10.18"
}
},
"nbformat": 4,

View File

@@ -23,10 +23,22 @@
"metadata": {},
"outputs": [],
"source": [
"import sugartrail \n",
"import sugartrail\n",
"import pandas as pd\n",
"import zipfile\n",
"import os"
"import os\n",
"from pathlib import Path\n",
"from IPython.display import Image, display\n",
"\n",
"def _find_project_root():\n",
" p = Path.cwd()\n",
" while p != p.parent:\n",
" if (p / 'pyproject.toml').exists():\n",
" return p\n",
" p = p.parent\n",
" return Path.cwd()\n",
"\n",
"_PROJECT_ROOT = _find_project_root()"
]
},
{
@@ -36,7 +48,10 @@
"metadata": {},
"outputs": [],
"source": [
"sugartrail.api.basic_auth.username = \"\""
"# API key is loaded automatically from .env (COMPANIES_HOUSE_API_KEY).\n",
"# To set it manually instead: sugartrail.api.basic_auth.username = \"your_key\"\n",
"if not sugartrail.api.basic_auth.username:\n",
" print(\"No API key found — add COMPANIES_HOUSE_API_KEY to your .env file\")"
]
},
{
@@ -161,11 +176,13 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "e8644d6b",
"metadata": {},
"outputs": [],
"source": [
"![title](../assets/images/regent_storefront.jpeg)"
"display(Image(str(_PROJECT_ROOT / 'assets' / 'images' / 'regent_storefront.jpeg')))"
]
},
{
@@ -177,11 +194,13 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "11b08c79",
"metadata": {},
"outputs": [],
"source": [
"![title](../assets/images/exclusive.png)"
"display(Image(str(_PROJECT_ROOT / 'assets' / 'images' / 'exclusive.png')))"
]
},
{
@@ -193,11 +212,13 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "be5e4352",
"metadata": {},
"outputs": [],
"source": [
"![title](../assets/images/review.png)"
"display(Image(str(_PROJECT_ROOT / 'assets' / 'images' / 'review.png')))"
]
},
{
@@ -328,11 +349,13 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "c307994f",
"metadata": {},
"outputs": [],
"source": [
"![title](../assets/images/regent.png)"
"display(Image(str(_PROJECT_ROOT / 'assets' / 'images' / 'regent.png')))"
]
},
{
@@ -383,11 +406,13 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "03b64f03",
"metadata": {},
"outputs": [],
"source": [
"![title](../assets/images/shelton.png)"
"display(Image(str(_PROJECT_ROOT / 'assets' / 'images' / 'shelton.png')))"
]
},
{
@@ -512,14 +537,6 @@
"source": [
"len(shelton_street_network.company_ids)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e9207a89",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
@@ -538,7 +555,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
"version": "3.10.18"
}
},
"nbformat": 4,

View File

@@ -36,7 +36,10 @@
"metadata": {},
"outputs": [],
"source": [
"sugartrail.api.basic_auth.username = \"\""
"# API key is loaded automatically from .env (COMPANIES_HOUSE_API_KEY).\n",
"# To set it manually instead: sugartrail.api.basic_auth.username = \"your_key\"\n",
"if not sugartrail.api.basic_auth.username:\n",
" print(\"No API key found — add COMPANIES_HOUSE_API_KEY to your .env file\")"
]
},
{
@@ -180,14 +183,6 @@
"source": [
"Reading both paths tells us how Zahawi & Zahawi connect to Gorgeous Connections. Zahawi & Zahawi has Nadhim Zahawi as an officer who has YOUGOV PLC as an appointment which has Benjamin Elliot as an officer who is also an officer of Gorgeous Services Limited."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1cc6c344",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
@@ -206,7 +201,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
"version": "3.10.18"
}
},
"nbformat": 4,

View File

@@ -26,7 +26,10 @@
"metadata": {},
"outputs": [],
"source": [
"sugartrail.api.basic_auth.username = \"\""
"# API key is loaded automatically from .env (COMPANIES_HOUSE_API_KEY).\n",
"# To set it manually instead: sugartrail.api.basic_auth.username = \"your_key\"\n",
"if not sugartrail.api.basic_auth.username:\n",
" print(\"No API key found — add COMPANIES_HOUSE_API_KEY to your .env file\")"
]
},
{
@@ -137,6 +140,14 @@
"source": [
"sugartrail.graphvis.visualise_connections(s_path_network, f'{sugartrail.const.data_path}visualisations')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "92a8b69f-20b5-4c48-a6cd-778dbd68975d",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
@@ -155,7 +166,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
"version": "3.10.18"
}
},
"nbformat": 4,

View File

@@ -44,9 +44,7 @@
"id": "89b0082a",
"metadata": {},
"outputs": [],
"source": [
"sugartrail.api.basic_auth.username = \"\""
]
"source": "# API key is loaded automatically from .env (COMPANIES_HOUSE_API_KEY).\n# To set it manually instead: sugartrail.api.basic_auth.username = \"your_key\"\nif not sugartrail.api.basic_auth.username:\n print(\"No API key found — add COMPANIES_HOUSE_API_KEY to your .env file\")"
},
{
"cell_type": "markdown",
@@ -166,4 +164,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}

View File

@@ -1,3 +1,33 @@
[project]
name = "sugartrail"
version = "1.0.0"
requires-python = ">=3.10"
dependencies = [
"notebook>=7.2",
"voila>=0.5",
"pandas>=2.2",
"ipywidgets>=8.1",
"tqdm",
"ratelimit",
"pyvis>=0.3",
"ipyleaflet>=0.19",
"regex",
"requests",
"python-dotenv",
]
[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"
[tool.setuptools.packages.find]
exclude = ["notebooks*", "dashboard*", "assets*", "test*"]
[dependency-groups]
dev = ["pytest"]
[tool.pytest.ini_options]
pythonpath = ["."]
markers = [
"integration: tests that hit the real Companies House API (require COMPANIES_HOUSE_API_KEY)",
]

View File

@@ -1,5 +0,0 @@
[build]
builder = "nixpacks"
buildCommand = """
pip install -e .
"""

View File

@@ -1 +0,0 @@
python-3.10.4

View File

@@ -1,11 +0,0 @@
from setuptools import setup, find_packages
with open("config/requirements.txt") as requirement_file:
requirements = requirement_file.read().split()
setup(
name="sugartrail",
version="1.0.0",
install_requires=requirements,
packages=find_packages(exclude=["notebooks", "dashboard", "assets"]),
)

View File

@@ -3,12 +3,17 @@ import time
import os
import functools
from ratelimit import limits, sleep_and_retry
from dotenv import load_dotenv
load_dotenv()
access_token = ""
username = ""
password = ""
size = "5000"
basic_auth = requests.auth.HTTPBasicAuth(username, password)
# Reads COMPANIES_HOUSE_API_KEY from environment / .env file automatically.
# You can also set it at runtime: sugartrail.api.basic_auth.username = "your_key"
basic_auth = requests.auth.HTTPBasicAuth(
os.environ.get("COMPANIES_HOUSE_API_KEY", ""), ""
)
def auth(func):
"""Checks if user has set API Key."""

View File

@@ -1,5 +1,7 @@
import sugartrail
from pathlib import Path
data_path = "../assets/"
networks_path = "../assets/networks/"
vis_path = "../assets/visualisations/"
_PROJECT_ROOT = Path(__file__).resolve().parent.parent
data_path = str(_PROJECT_ROOT / 'assets') + '/'
networks_path = str(_PROJECT_ROOT / 'assets' / 'networks') + '/'
vis_path = str(_PROJECT_ROOT / 'assets' / 'visualisations') + '/'

View File

@@ -1,5 +1,7 @@
import networkx as nx
from pathlib import Path
from pyvis.network import Network
from IPython.display import HTML
def visualise_connections(network:list, viz_path):
"""Generates a pyviz force directed graph visualisation from a list of nodes
@@ -57,5 +59,9 @@ def visualise_connections(network:list, viz_path):
}
"""
nt.set_options(physics_options)
# Display
return nt.show(f'{viz_path}/graph.html')
# Save to file and display inline (avoids IFrame browser URL issues)
html_path = Path(viz_path) / 'graph.html'
html_path.parent.mkdir(parents=True, exist_ok=True)
html_content = nt.generate_html()
html_path.write_text(html_content)
return HTML(html_content)

View File

@@ -39,9 +39,13 @@ def process_address_changes(address_changes):
"""Attempt retrieval of 'new_address' value if Companies House record is
incomplete."""
for i in reversed(range(1,len(address_changes['items']))):
if 'new_address' not in address_changes['items'][i]['description_values'].keys():
if 'old_address' in address_changes['items'][i-1]['description_values'].keys():
address_changes['items'][i]['description_values']['new_address'] = address_changes['items'][i-1]['description_values']['old_address']
item = address_changes['items'][i]
prev_item = address_changes['items'][i-1]
if 'description_values' not in item:
continue
if 'new_address' not in item['description_values']:
if 'description_values' in prev_item and 'old_address' in prev_item['description_values']:
item['description_values']['new_address'] = prev_item['description_values']['old_address']
return address_changes
def build_address_history(company_id):
@@ -74,7 +78,7 @@ def build_address_history(company_id):
entry["lat"] = ""
entry["lon"] = ""
entry["company_number"] = str(company_id)
if 'old_address' in change['description_values']:
if 'description_values' in change and 'old_address' in change['description_values']:
entry["address"] = change['description_values']['old_address']
else:
entry["address"] = ""

38
test/conftest.py Normal file
View File

@@ -0,0 +1,38 @@
import pytest
import sugartrail.api
def pytest_addoption(parser):
parser.addoption(
"--integration",
action="store_true",
default=False,
help="Run integration tests against the real Companies House API",
)
def pytest_collection_modifyitems(config, items):
if not config.getoption("--integration"):
skip = pytest.mark.skip(reason="pass --integration to run API tests")
for item in items:
if "integration" in item.keywords:
item.add_marker(skip)
@pytest.fixture
def no_auth():
"""Ensure basic_auth has no username for tests that check unauthenticated behaviour.
Restores the original value after the test."""
original = sugartrail.api.basic_auth.username
sugartrail.api.basic_auth.username = ""
yield
sugartrail.api.basic_auth.username = original
@pytest.fixture
def mock_auth():
"""Set a fake API key so auth checks pass without real credentials."""
original = sugartrail.api.basic_auth.username
sugartrail.api.basic_auth.username = "fake_api_key"
yield
sugartrail.api.basic_auth.username = original

View File

@@ -0,0 +1,114 @@
"""Integration tests for the Companies House API.
These tests make real network requests and require a valid API key in .env.
Run them with:
uv run pytest test/test_api_integration.py -v
They are excluded from the default test run (which uses no markers).
"""
import pytest
import sugartrail.api as api
# ---------------------------------------------------------------------------
# Stable test fixtures from the example notebooks
# ---------------------------------------------------------------------------
COMPANY_ID = "11951034" # DOMAIN FOUNDATION (quickstart example)
COMPANY_ID_2 = "11004735" # KINGDOM OF SWEETS LTD (getting_started example)
OFFICER_ID = "W806gf93kuLBqHdMWnoaBuG08m8" # Alexis MARCO
@pytest.fixture(autouse=True)
def require_api_key():
"""Skip every test in this file if no API key is configured."""
if not api.basic_auth.username:
pytest.skip("No API key found — set COMPANIES_HOUSE_API_KEY in .env")
# ---------------------------------------------------------------------------
# Auth
# ---------------------------------------------------------------------------
@pytest.mark.integration
def test_api_auth_succeeds():
assert api.test() is True
# ---------------------------------------------------------------------------
# Company endpoints
# ---------------------------------------------------------------------------
@pytest.mark.integration
def test_get_company_returns_expected_name():
result = api.get_company(COMPANY_ID)
assert result is not None
assert result["company_number"] == COMPANY_ID
assert "company_name" in result
@pytest.mark.integration
def test_get_company_officers_returns_list():
result = api.get_company_officers(COMPANY_ID)
assert result is not None
assert "items" in result
assert isinstance(result["items"], list)
assert len(result["items"]) > 0
@pytest.mark.integration
def test_get_psc_returns_result():
result = api.get_psc(COMPANY_ID)
assert result is not None
assert "items" in result
@pytest.mark.integration
def test_get_address_changes_returns_result():
result = api.get_address_changes(COMPANY_ID)
assert result is not None
assert "items" in result
@pytest.mark.integration
def test_get_company_unknown_id_returns_none():
result = api.get_company("00000000")
assert result is None
# ---------------------------------------------------------------------------
# Officer endpoints
# ---------------------------------------------------------------------------
@pytest.mark.integration
def test_get_appointments_returns_list():
result = api.get_appointments(OFFICER_ID)
assert result is not None
assert "items" in result
assert isinstance(result["items"], list)
assert len(result["items"]) > 0
@pytest.mark.integration
def test_get_correspondance_address_returns_result():
result = api.get_correspondance_address(OFFICER_ID)
assert result is not None
assert "items" in result
@pytest.mark.integration
def test_get_duplicate_officers_returns_list_or_none():
result = api.get_duplicate_officers(OFFICER_ID)
# May return None or a list depending on whether duplicates exist
assert result is None or isinstance(result, list)
# ---------------------------------------------------------------------------
# Address endpoints
# ---------------------------------------------------------------------------
@pytest.mark.integration
def test_get_companies_at_address_returns_result():
# Use a postcode known to have registered companies
result = api.get_companies_at_address("EC1A 1BB")
assert result is not None
assert "items" in result
@pytest.mark.integration
def test_get_officers_at_address_returns_list():
result = api.get_officers_at_address("EC1A 1BB")
# Returns a filtered list (may be empty for some addresses)
assert result is None or isinstance(result, list)

View File

@@ -1,61 +1,142 @@
import sugartrail
import pytest
import json
from pathlib import Path
import sugartrail
# test 1: network initialised without auth and without arguments:
# Use an absolute path so tests pass regardless of working directory.
# quickstart_a.json is a pre-generated file committed to the repo.
NETWORK_FILE = str(Path(__file__).parent.parent / 'assets/networks/quickstart_a.json')
def test_init_without_arguments(capsys):
NETWORK_FILE = str(Path(__file__).parent.parent / 'assets/networks/quickstart_a.json')
# ---------------------------------------------------------------------------
# Initialisation without arguments
# ---------------------------------------------------------------------------
def test_init_without_arguments(no_auth, capsys):
sugartrail.base.Network()
captured = capsys.readouterr()
assert captured.out == 'No input provided. Please provide either officer_id, company_id, address or file as input.\n'
# test 2: network initialised without auth and with arguments prints auth requirement:
def test_init_officer_without_auth(capsys):
sugartrail.base.Network(officer_id = '_')
# ---------------------------------------------------------------------------
# Initialisation without auth prints requirement message
# ---------------------------------------------------------------------------
def test_init_officer_without_auth(no_auth, capsys):
sugartrail.base.Network(officer_id='_')
captured = capsys.readouterr()
assert captured.out == 'Authentication required\n'
def test_init_company_without_auth(capsys):
sugartrail.base.Network(company_id = '_')
def test_init_company_without_auth(no_auth, capsys):
sugartrail.base.Network(company_id='_')
captured = capsys.readouterr()
assert captured.out == 'Authentication required\n'
def test_init_address_without_auth(capsys):
sugartrail.base.Network(address = '_')
def test_init_address_without_auth(no_auth, capsys):
sugartrail.base.Network(address='_')
captured = capsys.readouterr()
assert captured.out == 'Authentication required\n'
# test 3: network initialised without auth and with arguments remains stateless:
def test_empty_officer_without_auth(capsys):
network = sugartrail.base.Network(officer_id = '_')
assert network._officer_id == None
# ---------------------------------------------------------------------------
# State is unchanged when auth fails
# ---------------------------------------------------------------------------
def test_empty_company_without_auth(capsys):
network = sugartrail.base.Network(company_id = '_')
assert network._company_id == None
def test_empty_officer_without_auth(no_auth, capsys):
network = sugartrail.base.Network(officer_id='_')
assert network._officer_id is None
def test_empty_address_without_auth(capsys):
network = sugartrail.base.Network(address = '_')
assert network._address == None
def test_empty_company_without_auth(no_auth, capsys):
network = sugartrail.base.Network(company_id='_')
assert network._company_id is None
# test 4: network initialised with 'file' arg without auth loads network:
def test_empty_address_without_auth(no_auth, capsys):
network = sugartrail.base.Network(address='_')
assert network._address is None
# ---------------------------------------------------------------------------
# Loading a network from file (no auth required)
# ---------------------------------------------------------------------------
def test_file_init_without_auth():
network = sugartrail.base.Network(file ='./assets/networks/domain_corp_network.json')
with open('./assets/networks/domain_corp_network.json') as f:
network = sugartrail.base.Network(file=NETWORK_FILE)
with open(NETWORK_FILE) as f:
network_json = json.load(f)
for key in network.__dict__.keys():
if key not in sugartrail.base.Network._unserialisable_attributes:
assert network.__dict__[key] == network_json[key]
# test 5: network loads network from file without auth:
def test_file_load_without_auth():
def test_file_load_without_auth(capsys):
network = sugartrail.base.Network()
network.load('./assets/networks/domain_corp_network.json')
with open('./assets/networks/domain_corp_network.json') as f:
capsys.readouterr() # discard "No input provided" message
network.load(NETWORK_FILE)
with open(NETWORK_FILE) as f:
network_json = json.load(f)
for key in network.__dict__.keys():
if key not in sugartrail.base.Network._unserialisable_attributes:
assert network.__dict__[key] == network_json[key]
# ---------------------------------------------------------------------------
# find_path — pure graph traversal, no API calls needed
# ---------------------------------------------------------------------------
@pytest.fixture
def two_node_network(no_auth, capsys):
"""Network with a seed company connected to one officer."""
network = sugartrail.base.Network()
capsys.readouterr()
network.graph = {
'CO001': {
'title': 'Test Corp',
'depth': 0,
'node_type': 'Company',
'arcs': [],
},
'OFF001': {
'title': 'John Smith',
'depth': 1,
'node_type': 'Person',
'arcs': [{'arc_type': 'Officer', 'start_node': 'CO001'}],
},
}
network.n = 1
return network
def test_find_path_from_seed_returns_single_node(two_node_network):
path = two_node_network.find_path('CO001')
assert len(path) == 1
assert path[0]['id'] == 'CO001'
assert path[0]['node_type'] == 'Company'
def test_find_path_to_connected_node_includes_both(two_node_network):
path = two_node_network.find_path('OFF001')
ids = [item['id'] for item in path]
assert 'CO001' in ids
assert 'OFF001' in ids
def test_find_path_is_ordered_from_seed(two_node_network):
path = two_node_network.find_path('OFF001')
# Path should start at depth 0 (the seed)
assert path[0]['depth'] == 0
# ---------------------------------------------------------------------------
# Graph property views
# ---------------------------------------------------------------------------
def test_company_ids_property(two_node_network):
companies = two_node_network.company_ids
assert len(companies) == 1
assert companies[0]['company_id'] == 'CO001'
def test_officer_ids_property(two_node_network):
officers = two_node_network.officer_ids
assert len(officers) == 1
assert officers[0]['officer_id'] == 'OFF001'

266
test/test_hop.py Normal file
View File

@@ -0,0 +1,266 @@
"""Tests for hop logic using mocked API responses — no real network calls."""
import pytest
from unittest.mock import patch
import sugartrail
import sugartrail.hop
# ---------------------------------------------------------------------------
# Fixtures
# ---------------------------------------------------------------------------
@pytest.fixture
def empty_network(no_auth, capsys):
"""A bare Network with no seed — safe to build upon in tests."""
network = sugartrail.base.Network()
capsys.readouterr() # discard the "No input provided" message
return network
@pytest.fixture
def company_network(empty_network):
"""Network seeded with a single company node at depth 0."""
empty_network.graph = {
'CO001': {
'depth': 0,
'title': 'Test Corp',
'node_type': 'Company',
'arcs': [],
}
}
empty_network._company_id = 'CO001'
empty_network.n = 0
return empty_network
@pytest.fixture
def officer_network(empty_network):
"""Network seeded with a single officer node at depth 0."""
empty_network.graph = {
'OFF001': {
'depth': 0,
'title': 'John Smith',
'node_type': 'Person',
'arcs': [],
}
}
empty_network._officer_id = 'OFF001'
empty_network.n = 0
return empty_network
@pytest.fixture
def address_network(empty_network):
"""Network seeded with a single address node at depth 0."""
addr = '1 High Street London EC1A 1BB'
empty_network.graph = {
addr: {
'depth': 0,
'title': addr,
'node_type': 'Address',
'arcs': [],
}
}
empty_network._address = addr
empty_network.n = 0
return empty_network
# ---------------------------------------------------------------------------
# search_company_id
# ---------------------------------------------------------------------------
MOCK_COMPANY_OFFICERS = {
'items': [
{
'name': 'SMITH, JOHN',
'links': {'officer': {'appointments': '/officers/OFF001/appointments'}},
}
]
}
MOCK_APPOINTMENTS = {
'items': [{'name': 'JOHN SMITH'}]
}
def test_search_company_adds_officer(company_network):
hop = sugartrail.hop.Hop()
hop.get_company_address_history = False
hop.get_psc_correspondance_address = False
with patch('sugartrail.api.get_company_officers', return_value=MOCK_COMPANY_OFFICERS), \
patch('sugartrail.api.get_appointments', return_value=MOCK_APPOINTMENTS):
hop.search_company_id(company_network, 'CO001')
assert 'OFF001' in company_network.graph
node = company_network.graph['OFF001']
assert node['node_type'] == 'Person'
assert node['depth'] == 1
assert any(arc['arc_type'] == 'Officer' for arc in node['arcs'])
def test_search_company_does_not_duplicate_officer(company_network):
"""Calling search_company_id twice should not duplicate arcs."""
hop = sugartrail.hop.Hop()
hop.get_company_address_history = False
hop.get_psc_correspondance_address = False
with patch('sugartrail.api.get_company_officers', return_value=MOCK_COMPANY_OFFICERS), \
patch('sugartrail.api.get_appointments', return_value=MOCK_APPOINTMENTS):
hop.search_company_id(company_network, 'CO001')
hop.search_company_id(company_network, 'CO001')
assert len(company_network.graph['OFF001']['arcs']) == 1
def test_search_company_no_officers(company_network):
hop = sugartrail.hop.Hop()
hop.get_company_address_history = False
hop.get_psc_correspondance_address = False
with patch('sugartrail.api.get_company_officers', return_value=None):
hop.search_company_id(company_network, 'CO001')
# Only the seed company should be in the graph
assert list(company_network.graph.keys()) == ['CO001']
def test_search_company_respects_maxsize(company_network):
"""Officers are skipped when appointment count exceeds maxsize."""
large_response = {'items': [MOCK_COMPANY_OFFICERS['items'][0]] * 100}
hop = sugartrail.hop.Hop()
hop.get_company_address_history = False
hop.get_psc_correspondance_address = False
hop.officer_appointments_maxsize = 5 # not directly used here, but via search_officer_id
with patch('sugartrail.api.get_company_officers', return_value=large_response), \
patch('sugartrail.api.get_appointments', return_value=MOCK_APPOINTMENTS):
hop.search_company_id(company_network, 'CO001')
# Officers are still added by search_company_id (maxsize applies in search_officer_id)
assert 'OFF001' in company_network.graph
# ---------------------------------------------------------------------------
# search_officer_id
# ---------------------------------------------------------------------------
MOCK_OFFICER_APPOINTMENTS = {
'items': [
{'appointed_to': {'company_number': 'CO002', 'company_name': 'Another Corp'}}
]
}
MOCK_CORRESPONDENCE_ADDRESS = {
'items': [
{'address': {'address_line_1': '1 High Street', 'locality': 'London', 'postal_code': 'EC1A 1BB'}}
]
}
def test_search_officer_adds_company(officer_network):
hop = sugartrail.hop.Hop()
hop.get_officer_correspondance_address = False
hop.get_officer_duplicates = False
with patch('sugartrail.api.get_appointments', return_value=MOCK_OFFICER_APPOINTMENTS):
hop.search_officer_id(officer_network, 'OFF001')
assert 'CO002' in officer_network.graph
node = officer_network.graph['CO002']
assert node['node_type'] == 'Company'
assert node['depth'] == 1
assert any(arc['arc_type'] == 'Appointment' for arc in node['arcs'])
def test_search_officer_adds_correspondence_address(officer_network):
hop = sugartrail.hop.Hop()
hop.get_officer_duplicates = False
with patch('sugartrail.api.get_appointments', return_value=MOCK_OFFICER_APPOINTMENTS), \
patch('sugartrail.api.get_correspondance_address', return_value=MOCK_CORRESPONDENCE_ADDRESS):
hop.search_officer_id(officer_network, 'OFF001')
address_nodes = [k for k, v in officer_network.graph.items() if v['node_type'] == 'Address']
assert len(address_nodes) == 1
assert any(arc['arc_type'] == 'Officer Corresponance Address'
for arc in officer_network.graph[address_nodes[0]]['arcs'])
def test_search_officer_skips_when_appointments_exceed_maxsize(officer_network):
large_appts = {'items': [MOCK_OFFICER_APPOINTMENTS['items'][0]] * 100}
hop = sugartrail.hop.Hop()
hop.get_officer_correspondance_address = False
hop.get_officer_duplicates = False
hop.officer_appointments_maxsize = 5
with patch('sugartrail.api.get_appointments', return_value=large_appts):
hop.search_officer_id(officer_network, 'OFF001')
# No companies added; entity recorded as oversized
assert 'CO002' not in officer_network.graph
assert len(officer_network.maxsize_entities) == 1
assert officer_network.maxsize_entities[0]['node'] == 'OFF001'
# ---------------------------------------------------------------------------
# search_address
# ---------------------------------------------------------------------------
MOCK_COMPANIES_AT_ADDRESS = {
'items': [
{'company_number': 'CO003', 'company_name': 'Street Corp'}
]
}
MOCK_OFFICERS_AT_ADDRESS = [
{
'title': 'Jane Doe',
'links': {'self': '/officers/OFF002'},
}
]
def test_search_address_adds_company(address_network):
addr = list(address_network.graph.keys())[0]
hop = sugartrail.hop.Hop()
hop.get_officers_at_address = False
with patch('sugartrail.api.get_companies_at_address', return_value=MOCK_COMPANIES_AT_ADDRESS):
hop.search_address(address_network, addr, None)
assert 'CO003' in address_network.graph
node = address_network.graph['CO003']
assert node['node_type'] == 'Company'
assert node['depth'] == 1
assert any(arc['arc_type'] == 'Company at Address' for arc in node['arcs'])
def test_search_address_adds_officer(address_network):
addr = list(address_network.graph.keys())[0]
hop = sugartrail.hop.Hop()
hop.get_companies_at_address = False
with patch('sugartrail.api.get_officers_at_address', return_value=MOCK_OFFICERS_AT_ADDRESS):
hop.search_address(address_network, addr, None)
assert 'OFF002' in address_network.graph
node = address_network.graph['OFF002']
assert node['node_type'] == 'Person'
assert node['depth'] == 1
def test_search_address_skips_companies_when_maxsize_exceeded(address_network):
addr = list(address_network.graph.keys())[0]
large_response = {'items': [MOCK_COMPANIES_AT_ADDRESS['items'][0]] * 100}
hop = sugartrail.hop.Hop()
hop.get_officers_at_address = False
hop.companies_at_address_maxsize = 5
with patch('sugartrail.api.get_companies_at_address', return_value=large_response):
hop.search_address(address_network, addr, None)
assert 'CO003' not in address_network.graph
assert len(address_network.maxsize_entities) == 1
assert address_network.maxsize_entities[0]['type'] == 'Address'

140
test/test_processing.py Normal file
View File

@@ -0,0 +1,140 @@
import sugartrail.processing as processing
# Minimal path entries used across tests
def make_entry(title, depth, node_type, id, link_type="", link=""):
return {
'title': title,
'depth': depth,
'node_type': node_type,
'id': id,
'link_type': link_type,
'link': link,
}
# --- condense_path ---
def test_condense_path_single_entry():
path = [make_entry('Test Corp', 0, 'Company', 'CO001')]
result = processing.condense_path(path)
assert len(result) == 1
assert result[0]['id'] == 'CO001'
assert result[0]['link'] == ['']
def test_condense_path_two_distinct_entries():
path = [
make_entry('Test Corp', 0, 'Company', 'CO001'),
make_entry('John Smith', 1, 'Person', 'OFF001', 'Officer', 'CO001'),
]
result = processing.condense_path(path)
assert len(result) == 2
def test_condense_path_deduplicates_identical_entries():
entry = make_entry('Test Corp', 0, 'Company', 'CO001')
path = [entry.copy(), entry.copy()]
result = processing.condense_path(path)
assert len(result) == 1
# --- asciiify_path ---
def test_asciiify_path_adds_node_index():
path = [make_entry('Test Corp', 0, 'Company', 'CO001')]
# condense_path must run first to set link as a list
path = processing.condense_path(path)
result = processing.asciiify_path(path)
assert 'node_index' in result[0]
assert result[0]['node_index'] == 'a'
def test_asciiify_path_link_becomes_string():
path = [
make_entry('Test Corp', 0, 'Company', 'CO001'),
make_entry('John Smith', 1, 'Person', 'OFF001', 'Officer', 'CO001'),
]
path = processing.condense_path(path)
result = processing.asciiify_path(path)
# link field should now be a comma-separated string, not a list
for item in result:
assert isinstance(item['link'], str)
# --- process_address_changes ---
def test_process_address_changes_fills_missing_new_address():
data = {
'items': [
{
'description_values': {'new_address': '2 New St', 'old_address': '1 Old St'},
'date': '2020-01-01',
},
{
# missing new_address — should be filled from item[0]'s old_address
'description_values': {'old_address': '0 Older St'},
'date': '2019-01-01',
},
]
}
result = processing.process_address_changes(data)
assert result['items'][1]['description_values']['new_address'] == '1 Old St'
def test_process_address_changes_leaves_existing_new_address_intact():
data = {
'items': [
{
'description_values': {'new_address': '2 New St', 'old_address': '1 Old St'},
'date': '2020-01-01',
},
{
'description_values': {'new_address': 'Already Set', 'old_address': '0 Older St'},
'date': '2019-01-01',
},
]
}
result = processing.process_address_changes(data)
assert result['items'][1]['description_values']['new_address'] == 'Already Set'
def test_process_address_changes_single_item():
data = {'items': [{'description_values': {'new_address': 'Only St'}, 'date': '2020-01-01'}]}
result = processing.process_address_changes(data)
assert result['items'][0]['description_values']['new_address'] == 'Only St'
def test_process_address_changes_missing_description_values():
# Some Companies House responses omit 'description_values' entirely — should not raise KeyError
data = {
'items': [
{'description_values': {'new_address': '2 New St', 'old_address': '1 Old St'}, 'date': '2020-01-01'},
{'date': '2019-01-01'}, # no description_values at all
]
}
result = processing.process_address_changes(data)
assert 'description_values' not in result['items'][1]
# --- build_address_history (mocked API) ---
from unittest.mock import patch
MOCK_COMPANY_INFO = {
'date_of_creation': '2018-01-01',
'registered_office_address': {'address_line_1': '1 High St', 'locality': 'London', 'postal_code': 'EC1A 1BB'},
}
def test_build_address_history_item_missing_description_values():
"""Filing history items without 'description_values' should not raise KeyError."""
address_changes = {
'items': [
# normal item
{'description_values': {'old_address': '0 Old St'}, 'date': '2019-06-01'},
# item with no description_values at all
{'date': '2018-01-01'},
]
}
with patch('sugartrail.api.get_company', return_value=MOCK_COMPANY_INFO), \
patch('sugartrail.api.get_address_changes', return_value=address_changes):
result = processing.build_address_history('CO001')
assert result is not None
addresses = [entry['address'] for entry in result]
# The item missing description_values should produce an empty address, not crash
assert '' in addresses

88
test/test_utils.py Normal file
View File

@@ -0,0 +1,88 @@
import sugartrail.utils as utils
# --- normalise_address ---
def test_normalise_address_all_keys():
addr = {
'premises': '1',
'address_line_1': 'High Street',
'locality': 'London',
'postal_code': 'EC1A 1BB',
'country': 'England',
}
assert utils.normalise_address(addr) == "1 High Street London EC1A 1BB England"
def test_normalise_address_partial_keys():
addr = {'address_line_1': 'High Street', 'postal_code': 'EC1A 1BB'}
assert utils.normalise_address(addr) == "High Street EC1A 1BB"
def test_normalise_address_empty():
assert utils.normalise_address({}) == ""
def test_normalise_address_ignores_unknown_keys():
addr = {'address_line_1': 'High Street', 'fax': '0800 000 000'}
assert utils.normalise_address(addr) == "High Street"
# --- normalise_name ---
def test_normalise_name_moves_first_word_to_end():
# "SMITH JOHN" → pop SMITH → ['JOHN', 'SMITH'] → "JOHN SMITH"
assert utils.normalise_name("SMITH JOHN") == "JOHN SMITH"
def test_normalise_name_single_word():
# Only one word: pop it and re-append → same result
assert utils.normalise_name("SMITH") == "SMITH"
def test_normalise_name_removes_commas():
# Commas are stripped before splitting
result = utils.normalise_name("SMITH,JOHN")
assert "SMITH" in result
assert "JOHN" in result
# --- infer_postcode ---
def test_infer_postcode_standard_format():
assert utils.infer_postcode("1 High Street EC1A 1BB London") == "EC1A 1BB"
def test_infer_postcode_short_format():
assert utils.infer_postcode("Some Road W1A 0AX") == "W1A 0AX"
def test_infer_postcode_no_match():
assert utils.infer_postcode("1 High Street London") is None
def test_infer_postcode_empty_string():
assert utils.infer_postcode("") is None
# --- ensure_json_extension ---
def test_ensure_json_extension_adds_extension():
assert utils.ensure_json_extension("myfile") == "myfile.json"
def test_ensure_json_extension_keeps_existing():
assert utils.ensure_json_extension("myfile.json") == "myfile.json"
def test_ensure_json_extension_replaces_other():
# os.path.splitext strips the extension, then .json is appended
assert utils.ensure_json_extension("myfile.txt") == "myfile.json"
# --- flatten ---
def test_flatten_nested_dict():
d = {"a": {"b": 1, "c": 2}, "d": 3}
assert utils.flatten(d) == {"a.b": 1, "a.c": 2, "d": 3}
def test_flatten_already_flat():
d = {"a": 1, "b": 2}
assert utils.flatten(d) == {"a": 1, "b": 2}
def test_flatten_deeply_nested():
d = {"a": {"b": {"c": 42}}}
assert utils.flatten(d) == {"a.b.c": 42}
def test_flatten_empty():
assert utils.flatten({}) == {}

2535
uv.lock generated Normal file

File diff suppressed because it is too large Load Diff