3.4 KiB
geoclustering
📍 command-line tool for clustering geolocations.
Features
- Uses DBSCAN or OPTICS to perform clustering.
- Outputs clustering results as
json,txtandgeojson. - Creates a kepler.gl visualization of clusters.
Clustering Method
A cluster is created when a certain number of points (=> --size) each are within a given distance (=> --distance) of at least one other point in the cluster.
Install
Clone the repository:
git clone https://github.com/bellingcat/geoclustering
cd geoclustering
Install keplergl build dependencies:
# macos
brew install proj gdal
Install project with pip:
pip install .
Usage
Usage: geoclustering [OPTIONS] FILENAME
Tool to cluster geolocations. A cluster is created when a certain number of
points (--size) each are within a given distance (--distance) of at least
one other point in the cluster. Input is supplied as a csv file. At a
minimum, each row needs to have a 'lat' and a 'lon' column. Other rows are
reflected to the output.
Options:
-d, --distance FLOAT (in km) Max. distance between two points in
a cluster. [required]
-s, --size INTEGER Min. number of points in a cluster.
[required]
-o, --output PATH Output directory for results. Default:
./output
-a, --algorithm [dbscan|optics]
Clustering algorithm to be used. `optics`
produces tighter clusters but is slower.
Default: dbscan
--open Open the generated visualization in the
default browser automatically.
--debug Print debug output.
--help Show this message and exit.
Input
Inputs are supplied as a .csv file. The only required fields are lat and lon, all other fields are reflected to the output.
id,name,lat,lon
1,Bonnibelle Mathwen,40.1324085,64.4911086
...
Output
If at least one cluster was found, the tool outputs a folder with json, geojson, text and a kepler.gl html files.
JSON
Encodes an array of clusters, each containing an array of points.
[
{
"cluster_id": 0,
"points": [
{
"id": 9,
"name": "Rosanna Foggo",
"lat": -6.2074293,
"lon": 106.8915948
}
]
}
]
GeoJSON
Encodes a single FeatureCollection, containing all points as Feature objects.
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
106.891595,
-6.207429
]
},
"properties": {
"id": 9,
"name": "Rosanna Foggo",
"cluster_id": 0
}
}
]
}
txt
Encodes cluster as blocks separated by a newline, where each line in a cluster block contains one point.
Cluster 0
id 9, name Rosanna Foggo, lat -6.2074293, lon 106.8915948
// ...
