mirror of
https://github.com/bellingcat/sugartrail.git
synced 2026-06-11 13:08:30 +03:00
added tutorial connecting two companies
This commit is contained in:
1
.gitignore
vendored
1
.gitignore
vendored
@@ -51,5 +51,6 @@ docs/_build/
|
||||
|
||||
# Testing notebook
|
||||
notebooks/testing.ipynb
|
||||
notebooks/investigations
|
||||
|
||||
*.DS_Store
|
||||
|
||||
363
notebooks/004_connection_check.ipynb
Normal file
363
notebooks/004_connection_check.ipynb
Normal file
@@ -0,0 +1,363 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b7641405",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"*In this tutorial we will investigate two seperate companies and check if they are connected.*"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e39bd44d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"There are instances where we may want to see if two companies are connected. We can do this by simply building a network for each company and comparing them to see if there are any common officers, addresses or companies.\n",
|
||||
"\n",
|
||||
"Lets test this approach with two example companies, Zahawi & Zahawi Ltd (07285998) and Gorgeous Services Limited (05714521):"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "53435932",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import sugartrail\n",
|
||||
"import pandas as pd\n",
|
||||
"sugartrail.api.basic_auth.username = \"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "489a4141",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Create one network for Zahawi & Zahawi including some limits to reduce the number of possibly irrelevant connections:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "300cecde",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"zahawi_connections = sugartrail.base.Network(company_id='07285998')\n",
|
||||
"zahawi_connections.hop.officer_appointments_maxsize = 20\n",
|
||||
"zahawi_connections.hop.officers_at_address_maxsize = 20\n",
|
||||
"zahawi_connections.hop.companies_at_address_maxsize = 20"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bf8ddb84",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Create a second network for Gorgeous Services:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "9480e020",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"gorgeous_connections = sugartrail.base.Network(company_id='05714521')\n",
|
||||
"gorgeous_connections.hop.officer_appointments_maxsize = 20\n",
|
||||
"gorgeous_connections.hop.officers_at_address_maxsize = 20\n",
|
||||
"gorgeous_connections.hop.companies_at_address_maxsize = 20"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "fd678b28",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can now pass both networks to the `find_network_connections` method which returns any connections found between two networks. The method accepts two networks as input and an optional `max_depth` value (defaults to 5) which sets the maximum depth of network we will build for both. `find_network_connections` builds each network up to the `max_depth` value and completes when connections are found or the `max_depth` is reached."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "b4036e3d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"1/5 hops completed.\n",
|
||||
"2/5 hops completed.\n",
|
||||
"3/5 hops completed.\n",
|
||||
"Found connection(s)!\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"connections = sugartrail.processing.find_network_connections(zahawi_connections, gorgeous_connections)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bac64a8e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Looks like a connection was found. We can see by the long string of characters that its an officer ID:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "be034584",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['g8BmvnpH8blqT87i93sgJeowx7I']"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"connections"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6cd89faa",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can now trace the path from Zahawi & Zahawi to this connection:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "9544095a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>title</th>\n",
|
||||
" <th>depth</th>\n",
|
||||
" <th>node_type</th>\n",
|
||||
" <th>id</th>\n",
|
||||
" <th>link_type</th>\n",
|
||||
" <th>link</th>\n",
|
||||
" <th>node_index</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>ZAHAWI & ZAHAWI LTD</td>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>Company</td>\n",
|
||||
" <td>07285998</td>\n",
|
||||
" <td></td>\n",
|
||||
" <td></td>\n",
|
||||
" <td>a</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>1</th>\n",
|
||||
" <td>Nadhim ZAHAWI</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>Person</td>\n",
|
||||
" <td>tKup8kXPh3-jx_5Bs-BkF5XCyPM</td>\n",
|
||||
" <td>Officer</td>\n",
|
||||
" <td>a</td>\n",
|
||||
" <td>b</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>2</th>\n",
|
||||
" <td>YOUGOV PLC</td>\n",
|
||||
" <td>2</td>\n",
|
||||
" <td>Company</td>\n",
|
||||
" <td>03607311</td>\n",
|
||||
" <td>Appointment</td>\n",
|
||||
" <td>b</td>\n",
|
||||
" <td>c</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>3</th>\n",
|
||||
" <td>Benjamin William ELLIOT</td>\n",
|
||||
" <td>3</td>\n",
|
||||
" <td>Person</td>\n",
|
||||
" <td>g8BmvnpH8blqT87i93sgJeowx7I</td>\n",
|
||||
" <td>Officer</td>\n",
|
||||
" <td>c</td>\n",
|
||||
" <td>d</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" title depth node_type id \\\n",
|
||||
"0 ZAHAWI & ZAHAWI LTD 0 Company 07285998 \n",
|
||||
"1 Nadhim ZAHAWI 1 Person tKup8kXPh3-jx_5Bs-BkF5XCyPM \n",
|
||||
"2 YOUGOV PLC 2 Company 03607311 \n",
|
||||
"3 Benjamin William ELLIOT 3 Person g8BmvnpH8blqT87i93sgJeowx7I \n",
|
||||
"\n",
|
||||
" link_type link node_index \n",
|
||||
"0 a \n",
|
||||
"1 Officer a b \n",
|
||||
"2 Appointment b c \n",
|
||||
"3 Officer c d "
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"pd.DataFrame(zahawi_connections.find_path('g8BmvnpH8blqT87i93sgJeowx7I'))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "613910a7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"... and the path from Gorgeous Connections to the connection:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "f810b714",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>title</th>\n",
|
||||
" <th>depth</th>\n",
|
||||
" <th>node_type</th>\n",
|
||||
" <th>id</th>\n",
|
||||
" <th>link_type</th>\n",
|
||||
" <th>link</th>\n",
|
||||
" <th>node_index</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>GORGEOUS SERVICES LIMITED</td>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>Company</td>\n",
|
||||
" <td>05714521</td>\n",
|
||||
" <td></td>\n",
|
||||
" <td></td>\n",
|
||||
" <td>a</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>1</th>\n",
|
||||
" <td>Benjamin William ELLIOT</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>Person</td>\n",
|
||||
" <td>g8BmvnpH8blqT87i93sgJeowx7I</td>\n",
|
||||
" <td>Officer</td>\n",
|
||||
" <td>a</td>\n",
|
||||
" <td>b</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" title depth node_type id \\\n",
|
||||
"0 GORGEOUS SERVICES LIMITED 0 Company 05714521 \n",
|
||||
"1 Benjamin William ELLIOT 1 Person g8BmvnpH8blqT87i93sgJeowx7I \n",
|
||||
"\n",
|
||||
" link_type link node_index \n",
|
||||
"0 a \n",
|
||||
"1 Officer a b "
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"pd.DataFrame(gorgeous_connections.find_path('g8BmvnpH8blqT87i93sgJeowx7I'))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3e6ffa85",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Reading both paths tells us how Zahawi & Zahawi connect to Gorgeous Connections."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -221,9 +221,9 @@ class Network:
|
||||
f.close
|
||||
|
||||
def load(self, filename):
|
||||
"""Loads network stored in JSON format from '../assets/networks/'."""
|
||||
"""Loads network stored in JSON format."""
|
||||
if filename:
|
||||
f = open(f'../assets/networks/{filename}')
|
||||
f = open(f'{filename}')
|
||||
network_data = json.load(f)
|
||||
self.graph = network_data['graph']
|
||||
self.company_records = network_data['company_records']
|
||||
@@ -372,7 +372,7 @@ class Network:
|
||||
path = sugartrail.processing.asciiify_path(path)
|
||||
return path
|
||||
|
||||
def perform_hop(self, hops, company_data=None):
|
||||
def perform_hop(self, hops, company_data=None, print_progress=True):
|
||||
"""Gets companies, officers and addresses within n-degrees of seperation
|
||||
from current nodes, where n is the number of hops."""
|
||||
hop_history = []
|
||||
@@ -396,26 +396,29 @@ class Network:
|
||||
if address not in self.processed_addresses:
|
||||
self.hop.search_address(self, address, company_data)
|
||||
self.processed_addresses.append(address)
|
||||
IPython.display.clear_output(wait=True)
|
||||
print("Hop number: " + str(hop+1))
|
||||
print("Processed " + str(i+1) + "/" + str(len(selected_addresses)) + " addresses.")
|
||||
if print_progress:
|
||||
IPython.display.clear_output(wait=True)
|
||||
print("Hop number: " + str(hop+1))
|
||||
print("Processed " + str(i+1) + "/" + str(len(selected_addresses)) + " addresses.")
|
||||
for j,company in enumerate(selected_companies):
|
||||
if company not in self.processed_companies:
|
||||
self.hop.search_company_id(self,company)
|
||||
self.processed_companies.append(company)
|
||||
IPython.display.clear_output(wait=True)
|
||||
print("Hop number: " + str(hop+1))
|
||||
print("Processed " + str(len(selected_addresses)) + "/" + str(len(selected_addresses)) + " addresses.")
|
||||
print("Processed " + str(j+1) + "/" + str(len(selected_companies)) + " companies.")
|
||||
if print_progress:
|
||||
IPython.display.clear_output(wait=True)
|
||||
print("Hop number: " + str(hop+1))
|
||||
print("Processed " + str(len(selected_addresses)) + "/" + str(len(selected_addresses)) + " addresses.")
|
||||
print("Processed " + str(j+1) + "/" + str(len(selected_companies)) + " companies.")
|
||||
for k,officer in enumerate(selected_officers):
|
||||
if officer not in self.processed_officers:
|
||||
self.hop.search_officer_id(self,officer)
|
||||
self.processed_officers.append(officer)
|
||||
IPython.display.clear_output(wait=True)
|
||||
print("Hop number: " + str(hop+1))
|
||||
print("Processed " + str(len(selected_addresses)) + "/" + str(len(selected_addresses)) + " addresses.")
|
||||
print("Processed " + str(len(selected_companies)) + "/" + str(len(selected_companies)) + " companies.")
|
||||
print("Processed " + str(k+1) + "/" + str(len(selected_officers)) + " officers.")
|
||||
if print_progress:
|
||||
IPython.display.clear_output(wait=True)
|
||||
print("Hop number: " + str(hop+1))
|
||||
print("Processed " + str(len(selected_addresses)) + "/" + str(len(selected_addresses)) + " addresses.")
|
||||
print("Processed " + str(len(selected_companies)) + "/" + str(len(selected_companies)) + " companies.")
|
||||
print("Processed " + str(k+1) + "/" + str(len(selected_officers)) + " officers.")
|
||||
self.maxsize_entities = [i for n, i in enumerate(self.maxsize_entities) if i not in self.maxsize_entities[n + 1:]]
|
||||
self.processed_officers, self.processed_companies, self.processed_addresses = [],[],[]
|
||||
self.n += 1
|
||||
|
||||
@@ -68,23 +68,24 @@ class Hop:
|
||||
# get company address history
|
||||
address_history = sugartrail.processing.build_address_history(company_id)
|
||||
# network.address_history.extend(address_history)
|
||||
for address in address_history:
|
||||
if 'address' in address:
|
||||
network.address_history.append(address)
|
||||
new_address = address['address']
|
||||
if new_address not in network.graph:
|
||||
network.graph[new_address] = {
|
||||
'depth': network.n+1,
|
||||
'title': new_address,
|
||||
'node_type': "Address",
|
||||
'arcs': []
|
||||
if address_history:
|
||||
for address in address_history:
|
||||
if 'address' in address:
|
||||
network.address_history.append(address)
|
||||
new_address = address['address']
|
||||
if new_address not in network.graph:
|
||||
network.graph[new_address] = {
|
||||
'depth': network.n+1,
|
||||
'title': new_address,
|
||||
'node_type': "Address",
|
||||
'arcs': []
|
||||
}
|
||||
arc = {
|
||||
'arc_type': "Historic Address",
|
||||
'start_node': company_id
|
||||
}
|
||||
arc = {
|
||||
'arc_type': "Historic Address",
|
||||
'start_node': company_id
|
||||
}
|
||||
if arc not in network.graph[new_address]['arcs'] and network.graph[new_address]['depth'] == network.n+1:
|
||||
network.graph[new_address]['arcs'].append(arc)
|
||||
if arc not in network.graph[new_address]['arcs'] and network.graph[new_address]['depth'] == network.n+1:
|
||||
network.graph[new_address]['arcs'].append(arc)
|
||||
|
||||
def search_officer_id(self, network, officer_id):
|
||||
"""Gets officers, companies and addresses connected to input officer
|
||||
|
||||
@@ -110,6 +110,21 @@ def process_address_changes(address_changes):
|
||||
address_changes['items'][i]['description_values']['new_address'] = address_changes['items'][i-1]['description_values']['old_address']
|
||||
return address_changes
|
||||
|
||||
def find_network_connections(first_network, second_network, max_depth=5):
|
||||
"""Returns a list of nodes connecting ."""
|
||||
hops = 0
|
||||
while hops < max_depth:
|
||||
first_network.perform_hop(1, print_progress=False)
|
||||
second_network.perform_hop(1, print_progress=False)
|
||||
hops += 1
|
||||
print(str(hops) + "/" + str(max_depth) + " hops completed.")
|
||||
connectors = [x for x in list(filter(first_network.graph.__contains__, second_network.graph.keys())) if x]
|
||||
if connectors:
|
||||
print("Found connection(s)!")
|
||||
return connectors
|
||||
print("No connections found.")
|
||||
return
|
||||
|
||||
def build_address_history(company_id):
|
||||
"""Returns a list of dicts containing historic addresses for input company
|
||||
(company_id)."""
|
||||
|
||||
Reference in New Issue
Block a user