changed bbox of network, added variable in json data to set label threshold, changed to British spelling convention

This commit is contained in:
Tristan Lee
2022-12-16 01:13:35 -06:00
parent 8c4dd6f87c
commit 9f39ceee7b
6 changed files with 23 additions and 21 deletions

View File

@@ -1,22 +1,22 @@
# GESARA Named Entity Network Visualization
# GESARA Named Entity Network Visualisation
This project generates an [interactive visualization](https://bellingcat.github.io/gesara-entity-viz/) of [named entities](https://spacy.io/usage/linguistic-features#named-entities) in English-language posts archived in a database of Telegram channels that have posted about the GESARA conspiracy theory.
This project generates an [interactive visualisation](https://bellingcat.github.io/gesara-entity-viz/) of [named entities](https://spacy.io/usage/linguistic-features#named-entities) in English-language posts archived in a database of Telegram channels that have posted about the GESARA conspiracy theory.
This visualization was developed by Bellingcat based on an excellent [Sigma.js demo](https://github.com/jacomyal/sigma.js/tree/main/demo), and uses [react-sigma-v2](https://github.com/sim51/react-sigma-v2) to interface sigma.js with React.
This visualisation was developed by Bellingcat based on an excellent [Sigma.js demo](https://github.com/jacomyal/sigma.js/tree/main/demo), and uses [react-sigma-v2](https://github.com/sim51/react-sigma-v2) to interface sigma.js with React.
You can view the live visualization [here](https://bellingcat.github.io/gesara-entity-viz/). With GitHub pages configured, after making changes to the `main` branch, you need th run the command `npm run deploy` for the latest changes to be reflected in the live visualization.
You can view the live visualisation [here](https://bellingcat.github.io/gesara-entity-viz/). With GitHub pages configured, after making changes to the `main` branch, you need to run the command `npm run deploy` for the latest changes to be reflected in the live visualisation.
## Python Scripts
In the `scripts/` subdirectory, you can run Python scripts that were used to generate the network and visualization:
In the `scripts/` subdirectory, you can run Python scripts that were used to generate the network and visualisation:
### `generate_network.py`
Extracts the data from a PostgreSQL database, cleans the entity data, generates a NetworkX graph, prunes the edges using the [Marginal Likelihood Filter](https://github.com/naviddianati/GraphPruning), and exports the pruned graph.
### `generate_visualization.py`
### `generate_visualisation.py`
After visualizing the network using [Gephi](https://gephi.org/) (using the Force Atlas 2 algorithm, with the "LinLog mode" and "Prevent Overlap" options enabled, and exporting as the file `entity_network_layout.graphml`), this script converts the node, edge, and cluster data into a format readable by this sigma.js project.
After visualising the network using [Gephi](https://gephi.org/) (using the Force Atlas 2 algorithm, with the "LinLog mode" and "Prevent Overlap" options enabled, and exporting as the file `entity_network_layout.graphml`), this script converts the node, edge, and cluster data into a format readable by this sigma.js project.
## NPM Scripts

File diff suppressed because one or more lines are too long

View File

@@ -12,7 +12,7 @@ COLORS = colorcet.glasbey_dark
OUTPUT_JSON = "../public/dataset_entities.json"
NODE_SCALING = 0.5
NODE_SCALING = 0.35
# GraphML file generated by Gephi
INPUT_GRAPHML = "data/entity_network_layout.graphml"
CLUSTERS = [
@@ -46,7 +46,9 @@ CLUSTERS = [
{"key": "37", "clusterLabel": "Payment platforms"},
{"key": "42", "clusterLabel": "Vote audit"},
]
BOUNDING_BOX = {"x": [-300, 400], "y": [-600, 150]}
BOUNDING_BOX = {"x": [100, 200], "y": [-370,-50]}
LABEL_THRESHOLD = 15
if __name__ == "__main__":
@@ -84,6 +86,7 @@ if __name__ == "__main__":
]
+ [{"key": "100", "clusterLabel": "Other", "color": "#999999"}],
"bbox": BOUNDING_BOX,
'labelThreshold': LABEL_THRESHOLD
}
with open(OUTPUT_JSON, "w") as f:

View File

@@ -19,7 +19,8 @@ export interface Dataset {
nodes: NodeData[];
edges: [string, string][];
clusters: Cluster[];
bbox: {'x': Extent, 'y': Extent}
bbox: {'x': Extent, 'y': Extent},
labelThreshold: number
}
export interface FiltersState {

View File

@@ -14,18 +14,17 @@ const DescriptionPanel: FC = () => {
}
>
<p>
This visualisation represents a <i>network</i> of{" "}
This interactive visualisation represents a <i>network</i> of{" "}
<a target="_blank" rel="noreferrer" href="https://spacy.io/usage/linguistic-features#named-entities">
named entities
</a> in English-language posts archived in a database of Telegram channels that have posted about the GESARA conspiracy theory. Each{" "}
<i>node</i> represents an entity, <i>edges</i> between nodes indicate that one or more posts contain both entities
.
<i>node</i> represents an entity, <i>edges</i> between nodes indicate that one or more posts contain both entities.
</p>
<p>
This kind of visualization shows the ecosystem of the people, organizations, and ideas these conspiracy Telegram channels talk about, as well as the connections between them.
This kind of visualisation shows the ecosystem of the people, organisations, and ideas these conspiracy Telegram channels talk about, as well as the connections between them.
</p>
<p>
Some social media channels were identified by researchers from{" "}
Some Telegram channels were identified by researchers from{" "}
<a target="_blank" rel="noreferrer" href="https://www.bellingcat.com/">
Bellingcat
</a>{" "}and{" "}
@@ -59,14 +58,13 @@ const DescriptionPanel: FC = () => {
.
</p>
<p>
The network was visualized using{" "}
The network was visualised using{" "}
<a target="_blank" rel="noreferrer" href="https://gephi.org/">
Gephi
</a>. Node sizes are related to the number of channels the entity was posted about in the database.
Nodes are colored based a{" "}
</a>. The radius of each node is proportional to the number of channels in the database whose posts mention the entity. Nodes are coloured based on a{" "}
<a target="_blank" rel="noreferrer" href="https://arxiv.org/abs/0803.0476">
community detection algorithm
</a>.
</a>. Edges are weighted by the number of posts that mention both entities.
For visualisation purposes, edges were pruned using the{" "}
<a target="_blank" rel="noreferrer" href="https://github.com/naviddianati/GraphPruning">
Marginal Likelihood Filter

View File

@@ -53,7 +53,7 @@ const Root: FC = () => {
defaultNodeType: "image",
labelDensity: 0.07,
labelGridCellSize: 60,
labelRenderedSizeThreshold: 10,
labelRenderedSizeThreshold: dataset.labelThreshold,
labelFont: "Lato, sans-serif",
zIndex: true,
}}