added network viz tutorial

This commit is contained in:
seangreaves
2023-10-02 17:51:31 +01:00
parent 0a43de5c4b
commit d6c8c8a930
7 changed files with 711 additions and 11 deletions

View File

@@ -1,10 +1,11 @@
# Sugartrail
![title](assets/images/sweetstreet.png)
![title](assets/images/brexit_network.png)
Sugartrail is a network analysis and visualisation tool developed to make it easier and faster for researchers to explore connections between companies, officers and addresses within Companies House. The tool can be used for the following use-cases:
- Get all companies, officers and addresses connected to a company within n degrees of seperation, based upon user-defined connection criteria. If two companies are connected, get the path of companies, officers and addresses connecting those companies.
- Get all companies, officers and addresses connected to a company within n degrees of seperation, based upon user-defined connection criteria. If two companies are connected, get the path of companies, officers and addresses connecting those companies.
- Check if two companies are connected, and if so get the path to show how they are connected.
## Requirements
@@ -13,7 +14,7 @@ You will require an API key from Companies House to get data. First you will nee
## No-Install Usage
A hosted demo of the Sugartrail dashboard can be accessed [here](https://stark-island-99644.herokuapp.com/) (might take a few seconds to load the page). This demo times out after 30 mins so may not be suitable for building large networks at present.
A hosted demo of the Sugartrail dashboard can be accessed [here](https://stark-island-99644.herokuapp.com/) (might take a few seconds to load the page). This demo times out after 30 mins so may not be suitable for building large networks at present.
## Demo
@@ -41,7 +42,7 @@ jupyter nbextension enable --py --sys-prefix ipyleaflet
4. For a quickstart run `voila --no-browser --debug --Voila.ip=0.0.0.0 dashboard/Sugartrail.ipynb --VoilaConfiguration.file_whitelist="['.*']"` and navigate to the url printed in your terminal where Voilà is running at (no-code). For a more detailed explanation of the tool's capabilities, run `jupyter notebook notebooks` and open either `quickstart.ipynb` or `001_getting_started.ipynb`.
## Examples & Tutorials
## Examples & Tutorials
Tutorial | Title | Description | Format
------------- | ------------- | ------------- | -------------
@@ -50,4 +51,5 @@ Tutorial | Title | Description | Format
002 | [Candy Connections](https://github.com/ribenamaplesyrup/sugartrail/blob/main/notebooks/002_candy_connections.ipynb) | Explore how many of Oxford Streets souvenir and candy shops are connected through a single company 🇺🇸🇬🇧🍬 | Jupyter Notebook
003 | [Virtual Offices](https://github.com/ribenamaplesyrup/sugartrail/blob/main/notebooks/003_virtual_offices.ipynb) | Explore addresses (such as virtual offices) with thousands of companies registered. This tutorial also compares two different methods of retrieving data from Companies House; the Companies House API and the Companies House Data Product download. | Jupyter Notebook
004 | [Connection Check](https://github.com/ribenamaplesyrup/sugartrail/blob/main/notebooks/004_connection_check.ipynb) | Investigate if two different companies are connected and if so how. | Jupyter Notebook
005 | [Connection Visualise](https://github.com/ribenamaplesyrup/sugartrail/blob/main/notebooks/005_connection_visualise.ipynb) | Visualise how 7 networks interconnect. | Jupyter Notebook
_ | [Sugartrail Dashboard](https://stark-island-99644.herokuapp.com/) | Get companies, officers and addresses connected to select company and visualise results within basic interface. | Voila Dashboard

Binary file not shown.

After

Width:  |  Height:  |  Size: 434 KiB

File diff suppressed because one or more lines are too long

View File

@@ -1,19 +1,32 @@
aiohttp==3.8.5
aiosignal==1.3.1
anyio==3.6.2
appnope==0.1.3
argcomplete==3.1.1
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
arrow==1.2.3
asttokens==2.2.1
async-timeout==4.0.3
attrs==22.2.0
Babel==2.11.0
backcall==0.2.0
beautifulsoup4==4.11.1
bleach==5.0.1
bokeh==3.2.2
branca==0.6.0
certifi==2022.12.7
cffi==1.15.1
charset-normalizer==2.1.1
click==8.1.6
cloudpickle==2.2.1
colorcet==3.0.1
comm==0.1.2
contourpy==1.1.1
cycler==0.11.0
dask==2023.9.2
datashader==0.15.2
datashape==0.5.2
dateparser==1.1.6
debugpy==1.6.5
decorator==5.1.1
@@ -22,8 +35,14 @@ entrypoints==0.4
exceptiongroup==1.1.0
executing==1.2.0
fastjsonschema==2.16.2
fonttools==4.42.1
fqdn==1.5.1
frozenlist==1.4.0
fsspec==2023.9.1
graphviz==0.20.1
holoviews==1.17.1
idna==3.4
importlib-metadata==6.8.0
iniconfig==2.0.0
ipykernel==6.19.4
ipyleaflet==0.17.2
@@ -34,34 +53,52 @@ isoduration==20.11.0
jedi==0.18.2
Jinja2==3.1.2
json5==0.9.11
jsonpickle==3.0.2
jsonpointer==2.3
jsonschema==4.17.3
jupyter_client==7.4.1
jupyter_core==5.1.2
jupyter-events==0.5.0
jupyter-server==1.23.4
jupyter_client==7.4.1
jupyter_core==5.1.2
jupyter_server_terminals==0.4.3
jupyterlab-pygments==0.2.2
jupyterlab_server==2.18.0
jupyterlab-widgets==3.0.5
jupyterlab_server==2.18.0
kiwisolver==1.4.5
linkify-it-py==2.0.2
llvmlite==0.41.0
locket==1.0.0
Markdown==3.4.4
markdown-it-py==3.0.0
MarkupSafe==2.1.1
matplotlib==3.8.0
matplotlib-inline==0.1.6
mdit-py-plugins==0.4.0
mdurl==0.1.2
mistune==2.0.4
multidict==6.0.4
multipledispatch==1.0.0
nbclassic==0.4.8
nbclient==0.7.2
nbconvert==7.2.7
nbformat==5.7.1
nest-asyncio==1.5.6
networkx==3.1
notebook==6.5.2
notebook_shim==0.2.2
numba==0.58.0
numpy==1.24.1
packaging==22.0
pandas==1.5.2
pandocfilters==1.5.0
panel==1.2.3
param==1.13.0
parso==0.8.3
partd==1.4.0
pexpect==4.8.0
pickleshare==0.7.5
pip==22.3.1
Pillow==10.0.1
pipx==1.2.0
platformdirs==2.6.2
pluggy==1.0.0
prometheus-client==0.15.0
@@ -70,13 +107,17 @@ psutil==5.9.4
ptyprocess==0.7.0
pure-eval==0.2.2
pycparser==2.21
pyct==0.5.0
Pygments==2.14.0
pyparsing==3.1.1
pyrsistent==0.19.3
pytest==7.2.1
python-dateutil==2.8.2
python-json-logger==2.0.4
pytz==2022.7
pytz-deprecation-shim==0.1.0.post0
pyvis==0.3.2
pyviz_comms==3.0.0
PyYAML==6.0
pyzmq==24.0.1
ratelimit==2.2.1
@@ -84,29 +125,35 @@ regex==2022.10.31
requests==2.28.1
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
scipy==1.11.2
Send2Trash==1.8.0
setuptools==65.6.3
six==1.16.0
sniffio==1.3.0
soupsieve==2.3.2.post1
stack-data==0.6.2
sugartrail==1.0.0
terminado==0.17.1
tinycss2==1.2.1
tomli==2.0.1
toolz==0.12.0
tornado==6.2
tqdm==4.66.1
traitlets==5.8.0
traittypes==0.2.1
typing_extensions==4.8.0
tzdata==2022.7
tzlocal==4.2
uc-micro-py==1.0.2
uri-template==1.2.0
urllib3==1.26.13
userpath==1.9.0
voila==0.4.0
wcwidth==0.2.5
webcolors==1.12
webencodings==0.5.1
websocket-client==1.4.2
websockets==10.4
wheel==0.38.4
widgetsnbextension==4.0.5
xarray==2023.8.0
xyzservices==2022.9.0
yarl==1.9.2
zipp==3.17.0

View File

@@ -0,0 +1,203 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "177cb892",
"metadata": {},
"source": [
"*In this tutorial we will visualise connections between 7 entities.*"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "d4c815ff",
"metadata": {},
"outputs": [],
"source": [
"import sugartrail\n",
"from tqdm import tqdm\n",
"sugartrail.api.basic_auth.username = \"\""
]
},
{
"cell_type": "markdown",
"id": "71030bd7",
"metadata": {},
"source": [
"We will explore connections between several persons associated with donations and lobbying efforts within the UK Conservative Party:"
]
},
{
"cell_type": "markdown",
"id": "bff72de9",
"metadata": {},
"source": [
"- O. PATERSON ([uhmCAOx6PDrXSxKDXJSD1Vv2prc](https://find-and-update.company-information.service.gov.uk/officers/uhmCAOx6PDrXSxKDXJSD1Vv2prc/appointments))\n",
"- M. AMERSI ([3wTyHYmLN5-J6XiTww5SL0iL3fI](https://find-and-update.company-information.service.gov.uk/officers/3wTyHYmLN5-J6XiTww5SL0iL3fI/appointments))\n",
"- A. Temerko ([lBdRiCfTDhMcaLwOU6393XUfPDg](https://find-and-update.company-information.service.gov.uk/officers/lBdRiCfTDhMcaLwOU6393XUfPDg/appointments))\n",
"- A. BAMFORD ([KwkjxuswE9qwWKLU0ndEaau9cq0](https://find-and-update.company-information.service.gov.uk/officers/KwkjxuswE9qwWKLU0ndEaau9cq0/appointments))\n",
"- B. ELLIOT ([g8BmvnpH8blqT87i93sgJeowx7I](https://find-and-update.company-information.service.gov.uk/officers/g8BmvnpH8blqT87i93sgJeowx7I/appointments))\n",
"- L. CHERNUKHIN ([D-2pqWTW2QY0ooHbL5O7soMwTRc](https://find-and-update.company-information.service.gov.uk/officers/D-2pqWTW2QY0ooHbL5O7soMwTRc/appointments))\n",
"- P. CRUDDAS ([WtiEW0LL-mMmPaRLrQSCjsWBpXY](https://find-and-update.company-information.service.gov.uk/officers/WtiEW0LL-mMmPaRLrQSCjsWBpXY/appointments))"
]
},
{
"cell_type": "markdown",
"id": "b59db34a",
"metadata": {},
"source": [
"To do this, lets create a list of dictionaries with the id for each entity. These networks are initialised from officers but we could include addresses or companies with the keys 'address' and 'company_id':"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "5fec2142",
"metadata": {},
"outputs": [],
"source": [
"entities = [{\"officer_id\": \"uhmCAOx6PDrXSxKDXJSD1Vv2prc\"},\n",
" {\"officer_id\": \"3wTyHYmLN5-J6XiTww5SL0iL3fI\"},\n",
" {\"officer_id\": \"lBdRiCfTDhMcaLwOU6393XUfPDg\"},\n",
" {\"officer_id\": \"KwkjxuswE9qwWKLU0ndEaau9cq0\"},\n",
" {\"officer_id\": \"g8BmvnpH8blqT87i93sgJeowx7I\"},\n",
" {\"officer_id\": \"D-2pqWTW2QY0ooHbL5O7soMwTRc\"},\n",
" {\"officer_id\": \"WtiEW0LL-mMmPaRLrQSCjsWBpXY\"}]"
]
},
{
"cell_type": "markdown",
"id": "8cbee60b",
"metadata": {},
"source": [
"Lets attempt to find connections between entities through building a network for each entity that is 3 degrees deep. You can build the networks from scratch (Option 1) or load the pre-downloaded networks by uncommenting the code below (Option 2):"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "16453850",
"metadata": {},
"outputs": [],
"source": [
"# Option 1: Build networks from scratch\n",
"entity_graphs = []\n",
"for entity in entities: \n",
" if list(entity.keys())[0] == \"officer_id\":\n",
" entity_graphs.append(sugartrail.base.Network(officer_id=entity[\"officer_id\"]))\n",
" elif list(entity.keys())[0] == \"address\":\n",
" entity_graphs.append(sugartrail.base.Network(address=entity[\"address\"]))\n",
" elif list(entity.keys())[0] == \"company_id\":\n",
" entity_graphs.append(sugartrail.base.Network(company_id=entity[\"company_id\"]))\n",
" \n",
"for entity in tqdm(entity_graphs):\n",
" entity.hop.officer_appointments_maxsize = 20\n",
" entity.hop.officers_at_address_maxsize = 20\n",
" entity.hop.companies_at_address_maxsize = 20\n",
" entity.perform_hop(3)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "40439af2",
"metadata": {},
"outputs": [],
"source": [
"## Option 2: Load networks\n",
"# entity_graphs = sugartrail.processing.load_multiple_networks(f'{sugartrail.const.data_path}/networks/multinode/')"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "7503bc29",
"metadata": {},
"outputs": [],
"source": [
"s_path_network = sugartrail.processing.find_multi_network_connections(entity_graphs)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "c204f503",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"../assets/visualisations/graph.html\n"
]
},
{
"data": {
"text/html": [
"\n",
" <iframe\n",
" width=\"100%\"\n",
" height=\"600px\"\n",
" src=\"../assets/visualisations/graph.html\"\n",
" frameborder=\"0\"\n",
" allowfullscreen\n",
" \n",
" ></iframe>\n",
" "
],
"text/plain": [
"<IPython.lib.display.IFrame at 0x7ff0e0c6fe20>"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sugartrail.processing.visualise_connections(s_path_network, f'{sugartrail.const.data_path}/visualisations')"
]
},
{
"cell_type": "markdown",
"id": "280758b4",
"metadata": {},
"source": [
"Lets now save all our networks:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "d49c0ce7",
"metadata": {},
"outputs": [],
"source": [
"for i, entity in enumerate(entity_graphs):\n",
" entity.save(f'multinode/{list(entity_graphs[i].graph.keys())[0]}_network.json')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

3
sugartrail/const.py Normal file
View File

@@ -0,0 +1,3 @@
import sugartrail
data_path = "../assets"

View File

@@ -1,10 +1,16 @@
from sugartrail import api
from sugartrail import api, base
import requests
import urllib
import regex as re
import collections
import os
import IPython
from string import ascii_letters as alc
import networkx as nx
from pyvis.network import Network
import warnings
warnings.filterwarnings('ignore')
def flatten(d, parent_key='', sep='.'):
"""Flatten nested dictionary."""
@@ -205,3 +211,101 @@ def normalise_address(address_dict):
address_list.append(address_dict[key])
address_string = ' '.join(address_list)
return address_string
def load_multiple_networks(networks_dir):
"""Loads multiple network files from a directory into a list"""
entity_graphs = []
for filename in os.listdir(networks_dir):
if filename.endswith('.json'):
network = base.Network(file=f'{networks_dir}/{filename}')
entity_graphs.append(network)
return entity_graphs
def find_multi_network_connections(networks: list):
"""Finds the shortest paths connecting 2+ networks from a list of networks,
returning nodes within these found paths."""
s_path_network = []
for i, entity in enumerate(networks):
for j in range(i+1,len(networks)):
connections = [(x, networks[i].graph[x]['depth']+networks[j].graph[x]['depth']) for x in list(filter(networks[i].graph.__contains__, networks[j].graph.keys())) if x]
sorted_data = sorted(connections, key=lambda x: x[1])
filtered_data = [x[0] for x in list(filter(lambda x: x[1] == sorted_data[0][1], sorted_data))]
for connection in filtered_data:
for entity_graph in [networks[i], networks[j]]:
for node in entity_graph.find_path(connection):
network_node = {'title': node['title'],
'node_type': node['node_type'],
'id': node['id'],
'link_type': node['link_type'],
'link' : "",
'depth': node['depth']
}
if node['link']:
for link in [x.strip() for x in node['link'].split(',')]:
new_node = network_node.copy()
new_node['link'] = next((item['id'] for item in entity_graph.find_path(connection) if item["node_index"] == link), None)
if new_node not in s_path_network:
s_path_network.append(new_node)
else:
new_node = network_node.copy()
s_path_network.append(new_node)
return s_path_network
def visualise_connections(network:list, viz_path):
"""Generates a pyviz force directed graph visualisation from a list of nodes
showing how they connect. Resulting HTML file saved to viz_path.
"""
G = nx.DiGraph()
# Add nodes and edges to the graph
for item in network:
node_id = item['id']
G.add_node(node_id, label=item['title'], type=item['node_type'], depth=item['depth'])
# Add edges based on link_type and link
if item['link']:
G.add_edge(node_id, item['link'], type=item['link_type'])
# Create a pyvis network using the new graph
nt = Network(notebook=True, cdn_resources='in_line')
nt.from_nx(G)
# Map node_type to corresponding emoji URL
emoji_urls = {
"Person": "https://emoji.beeimg.com/👤/240/apple",
"Company": "https://emoji.beeimg.com/💰/240/apple",
"Address": "https://emoji.beeimg.com/🏢/240/apple"
}
# Update nodes to use the image based on node_type
for node in nt.nodes:
node_type = node["type"] # Get the type from the node
node["size"] = 30
if node_type in emoji_urls:
node["image"] = emoji_urls[node_type]
# check if node is origin node:
if node["depth"] == 0:
node["image"] = "https://emoji.beeimg.com/🌐/240/apple"
node["color"] = "white"
node["shape"] = "circularImage"
else:
node["shape"] = "image"
for edge in nt.edges:
edge["color"] = "black"
# Enable physics
nt.toggle_physics(True)
physics_options = """
{
"physics": {
"solver": "barnesHut",
"barnesHut": {
"gravitationalConstant": -10000,
"centralGravity": 0.3,
"springLength": 100,
"springConstant": 0.05,
"damping": 0.09,
"avoidOverlap": 0.5
},
"minVelocity": 0.75,
"maxVelocity": 5
}
}
"""
nt.set_options(physics_options)
# Display
return nt.show(f'{viz_path}/graph.html')