Files
personas/personas/_shared/community-skills/bellingcat-osint-toolkit/references/bellingcat-own-repos.md
salvacybersec 700122807d feat(bellingcat-osint-toolkit): add references/bellingcat-own-repos.md
Comprehensive reference for the 46 active non-fork repos in the
github.com/bellingcat org — tools Bellingcat ships as code (vs the
external tools they curate, which the existing 12 category refs cover).

Sections:
- Power tools (auto-archiver, octosuite, telegram-phone-number-checker,
  snscrape, vk-url-scraper, whisperbox-transcribe, EDGAR) with install
  commands + key invocations
- Geolocation toolbox (ShadowFinder, instagram-location-search,
  osm-search, geoclustering, search-grid-generator, ColourHighlighter,
  rgb-viz)
- Satellite / Earth Engine (sar-interference-tracker, cloud-free-subregion,
  Multispectral Imagery Explorer, umbra-open-data-tracker, ee_forest_area_tracker)
- Social-media scrapers (TikTok, Reddit, YouTube, Odysee, GETTR, Facebook,
  cisticola coordinator)
- People search (name-variant-search, alias-generator)
- Telegram (phone-checker, group-joiner, gesara-entity-viz)
- Companies / finance (EDGAR, sugartrail)
- Aircraft tracking (adsb-history)
- Image triage (smart-image-sorter via HuggingFace)
- Web-history forensics (wayback-google-analytics, uniform-timezone)
- Conflict tracking (ukraine-timemap, iran-conflict-damage-proxy-map,
  vis-tj-kg-map-2022)
- Research methodologies (RS4OSINT, open-source-research-notebooks,
  open-questions, quitobaquito, twitter-geocode-searches)
- Council / government records (CouncilSearcher)
- Persona affinity quick-pivot table for all 13 personas

Each entry has stars, language, use case, persona affinity, and (where
useful) the exact install + first-use commands. SKILL.md updated to
reference the new file in the layout tree and "when to load" table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 01:20:36 +03:00

20 KiB

Bellingcat Toolkit — Own Repos (Tools They Built)

The bellingcat-osint-toolkit skill's main catalog (data/all-tools.csv + 12 category refs) lists tools Bellingcat curates. This reference covers tools Bellingcat built and ships as code — 46 active non-fork repos across the GitHub org, sorted by use case.

Source: https://github.com/orgs/bellingcat/repositories Updated: 2026-05-02. Re-pull with curl -s "https://api.github.com/orgs/bellingcat/repos?per_page=100".

Power tools — install and use

auto-archiver — bulk web/social-media preservation

1073★ Python. Personas: scribe, oracle, herald, sentinel.

Multi-source archiver: pulls URLs from CSV / Google Sheets / CLI, archives videos, images, social-media posts, webpages, and writes status back to the source spreadsheet. Storage backends: local, S3, Google Drive.

# Pip
pip install auto-archiver
auto-archiver --help

# Docker (preferred for heavy enrichers)
docker pull bellingcat/auto-archiver
docker run -it --rm -v secrets:/app/secrets bellingcat/auto-archiver \
  --config secrets/orchestration.yaml

Companions:

  • auto-archiver-api (13★) — REST API to manage users / sheets / URLs and dispatch workers
  • auto-archiver-extension (3★) — browser extension front-end
  • auto-archiver-setup-tool (11★) — Vue front-end for the API

Docs: https://auto-archiver.readthedocs.io/.

When to reach for it: an investigation needs durable preservation of dozens-to-thousands of URLs (Telegram channels going dark, breaking-news videos before takedown, court-evidence chain).

octosuite — GitHub OSINT CLI + Python lib

1892★ Python. Personas: oracle, sentinel, neo.

Terminal toolkit for GitHub data analysis. CLI + interactive TUI + Python library all from the same package.

pip install octosuite

# CLI
octosuite user torvalds                        # profile
octosuite user torvalds --repos --per-page 50  # all repos
octosuite user torvalds --followers --json
octosuite repo torvalds/linux --commits
octosuite repo torvalds/linux --stargazers --export ./data
octosuite org github --members --json
octosuite search "supply chain attack" --repos
octosuite -t                                   # interactive TUI

# Library
import octosuite
user = octosuite.User("torvalds")
exists, profile = user.exists()
if exists:
    repos = user.repos(page=1, per_page=100)

When to reach for it: profiling a threat-actor's GitHub footprint, finding unpublished commits in an org, supply-chain audit on a maintainer, triangulating an alias across GH events.

telegram-phone-number-checker — phone → Telegram correlation

1695★ Python. Personas: oracle, wraith, frodo.

Given a phone number (or batch), check whether it's bound to a Telegram account. Pivot for people-search on +country-code-known leads.

pip install telegram-phone-number-checker
telegram-phone-number-checker check +12025550101
telegram-phone-number-checker batch numbers.txt

Requires Telegram API credentials (api_id + api_hash from https://my.telegram.org). Rate-limited; use moderately to avoid bans.

snscrape — multi-platform social network scraper

346★ Python. Personas: oracle, frodo, ghost.

Twitter (deprecated), Mastodon, Telegram, Reddit, Facebook, VK, Instagram, WeChat, etc. Bellingcat maintains a fork — many platforms broke after upstream changes; check repo status before relying on it.

pip install snscrape
snscrape twitter-user elonmusk
snscrape telegram-channel durov --max-results 100

vk-url-scraper — VKontakte (Russian social) scraping

53★ Python. Personas: oracle, ghost, frodo (russia).

pip install vk-url-scraper
vk-url-scraper --help

Library API also available. Useful for VK posts, photos, geotagged content, group enumeration.

whisperbox-transcribe — Whisper audio/video transcription API

67★ Python. Personas: scribe, herald, oracle.

Deploy Whisper as a service. Drop a video/audio URL, get transcript + translation. Useful when an investigation accumulates hours of foreign- language broadcast / Telegram voice notes.

git clone https://github.com/bellingcat/whisperbox-transcribe
cd whisperbox-transcribe
docker compose up -d
curl -X POST -F "url=https://..." http://localhost:8000/jobs

EDGAR — SEC corporate-data Python lib

203★ Python. Personas: ledger, frodo.

Programmatic interface to SEC EDGAR — public filings, financials, ownership.

pip install edgar-tool
edgar --help
edgar 10-K AAPL --year 2024

Used for sanction-screening, insider trading patterns, beneficial-ownership chains, ESG.

Geolocation toolbox

Repo Lang Use Personas
ShadowFinder 570 Python Map locations where a shadow of given length could occur at date/time oracle, frodo, centurion
instagram-location-search 679 Python Find Instagram location IDs near (lat, lon) oracle, frodo
osm-search 207 Vue OpenStreetMap proximity search UI oracle, frodo
geoclustering 45 Python Cluster a list of (lat,lon) points; CLI oracle, frodo, marshal
search-grid-generator 13 Vue Quickly generate KML search grids for area-of-interest mapping oracle, marshal
ColourHighlighter 4 TypeScript WebGL color filters / LUTs for screen-share geolocation oracle (geo-analyst)
rgb-viz 4 JavaScript Interactive viz of an image's R/G/B channels oracle (forensics)
pip install bellingcat-shadowfinder
shadowfinder 1.5  --datetime "2024-03-15T14:00:00" --output map.html

pip install bellingcat-instagram-location-search
ig-location-search --lat 40.7128 --lon -74.0060

Satellite / Earth Engine

Repo Lang Use Personas
sar-interference-tracker 556 JavaScript GEE script to detect SAR satellite radar interference warden, marshal, centurion
cloud-free-subregion 59 JavaScript GEE app — find cloud-free Sentinel-2 imagery for an AOI oracle, marshal
Multispectral-Satellite-Imagery-Explorer 13 JavaScript GEE app to explore Landsat-8 multispectral bands oracle, marshal
umbra-open-data-tracker 33 Python Monitor Umbra SAR open-data catalogue, emit KML coverage warden, marshal
ee_forest_area_tracker 4 (?) Forest-area tracking via Earth Engine oracle, scholar

GEE scripts: copy-paste into https://code.earthengine.google.com/.

Social-media scrapers (live status varies — verify before relying)

Repo Lang Platform / use Personas
tiktok-hashtag-analysis 358 Python Analyze hashtag co-occurrence + post stats oracle, herald
tiktok-timestamp 58 HTML Tiny client-side TikTok video timestamp retriever oracle
polyphemus 18 Python Odysee (alt-tech video) scraper oracle, ghost
gogettr 13 Python GETTR public API client for archival oracle, ghost
facebook-downloader 40 Python Public FB video downloader oracle
reddit-post-scraping-tool 92 Python Subreddit + keyword → top posts containing keyword oracle, ghost
youtube-comment-scraper 27 Python Scrape YT comments, find users commenting on N videos oracle
cisticola 20 Python Coordinator for multiple scrapers + DB layer oracle (heavy)

People search / aliases

Repo Lang Use Personas
name-variant-search 50 JavaScript Generate search variations of a human name oracle, wraith
alias-generator 22 JavaScript Node module — likely aliases for a given name oracle, wraith
npm install -g @bellingcat/alias-generator
alias-generator "John Smith"   # produces J. Smith, Smith John, etc.

Use both in tandem: feed the name through name-variant-search for cultural/transliteration variants, then pipe each variant through your people-search stack (Sherlock, WhatsMyName, etc.).

Telegram-specific

Repo Lang Use Personas
telegram-phone-number-checker 1695 Python Phone → Telegram presence check oracle, wraith, frodo
telegram-group-joiner 55 (web) Auto-join public/private Telegram groups oracle, ghost
gesara-entity-viz 4 Python Entity viz over a GESARA-conspiracy Telegram corpus ghost, herald

Pair with this repo's own telegram skill (custom WAHA scraper) for operational-scale Telegram archival.

Companies / finance

Repo Lang Use Personas
EDGAR 203 Python SEC EDGAR Python lib (filings, ownership, financials) ledger, frodo
sugartrail 76 HTML UK Companies House network viz — companies, officers, addresses ledger

sugartrail is browser-based; deploy locally for big networks. Pair with OpenCorporates / OpenSanctions in references/companies-and-finance.md.

Aircraft / transport intel

Repo Lang Use Personas
adsb-history 72 Vue Collect & query ADS-B aircraft history by region/altitude/type warden, echo, frodo
git clone https://github.com/bellingcat/adsb-history
docker compose up -d
# Then visit http://localhost:5173 for the Vue front-end

Backfill investigations on private-jet movements, military transport patterns, surveillance flights.

Image / media triage

Repo Lang Use Personas
smart-image-sorter 62 Jupyter Notebook Zero-shot image classification via HuggingFace open-source models oracle, sentinel

Use case: triage thousands of OSINT-collected images by content (e.g. "weapon", "uniformed personnel", "vehicle"), then deep-dive the hits.

Web-history forensics

Repo Lang Use Personas
wayback-google-analytics 234 Python Scrape current AND historic Google Analytics tags from sites oracle, sentinel
uniform-timezone 33 Browser Standardize timestamps across social-media UIs scribe, oracle

wayback-google-analytics is gold for de-anonymizing networks of related sites: GA tag IDs reused across domains often link sister-sites that hide ownership.

Conflict / civilian-harm tracking

Repo Lang Use Personas
ukraine-timemap 287 JavaScript TimeMap instance for Civilian Harm in Ukraine centurion, frodo, marshal
iran-conflict-damage-proxy-map 6 JavaScript Iran conflict damage tracking centurion, frodo, marshal
vis-tj-kg-map-2022 3 (?) Tajikistan-Kyrgyzstan border-clash interactive map centurion, frodo

These are public TimeMap front-ends backing Bellingcat's published investigations. Reference architectures for building your own conflict trackers — fork + adapt.

Specialized / research methodologies

Repo Lang Use Personas
RS4OSINT 45 TeX Guide to Remote Sensing for OSINT (PDF + LaTeX source) oracle, marshal, scholar
open-source-research-notebooks 298 Jupyter Notebook Tutorial notebooks for command-line + code OSINT investigations scholar, oracle
open-questions 360 Jupyter Notebook Difficult research projects waiting for contributors scholar, all-osint
o9a-product-scripts 8 Python Scripts from Order of Nine Angles investigation wraith (HUMINT)
quitobaquito 14 Python Hydrology change methodology with GEE scholar, oracle
twitter-geocode-searches 26 Python Methodology for geofenced Twitter search oracle
coronavirus-aid-data 5 Python Data for Covid-19 relief-fund analysis article ledger
who-killed-abelardo 4 (web) Audio map visualization (single-investigation viz) wraith, herald
avoc 59 CSS 2024 Tech Fellowship working repo (browse before reach)

open-source-research-notebooks is the best entry point for a researcher new to Bellingcat methodology — it teaches the toolkit-via-Jupyter workflow.

Council / government records

Repo Lang Use Personas
CouncilSearcher 13 Python Find verbatim quotes from UK + Ireland council meetings scribe, frodo

Niche but powerful for any UK municipal-level investigation. Drop a search term, get back transcript-grounded matches.

Infrastructure / supporting

Repo Notes
toolkit 539 The GitBook / curated-tools repo (this skill's source)
hackathon-submission-template 11 Template for Bellingcat Global Hackathon
bcat-discord-bot 5 Bellingcat's own Discord bot
challenge-framework 5 Static-site challenge framework
google-apps-script 31 Handy GAS snippets
datasheet-server 32 CSV → dynamic API server
4-year-anniversary-network 2 Anniversary visualization

Persona affinity quick-pivot

Persona Top repos to know
Oracle octosuite, telegram-phone-number-checker, auto-archiver, snscrape, ShadowFinder, instagram-location-search, osm-search, name-variant-search, alias-generator, smart-image-sorter, wayback-google-analytics
Frodo telegram-phone-number-checker, vk-url-scraper, ShadowFinder, instagram-location-search, EDGAR, ukraine-timemap, snscrape
Wraith telegram-phone-number-checker, name-variant-search, alias-generator, o9a-product-scripts
Sentinel octosuite, wayback-google-analytics, auto-archiver, smart-image-sorter
Scribe auto-archiver, whisperbox-transcribe, uniform-timezone, CouncilSearcher
Ledger EDGAR, sugartrail, coronavirus-aid-data
Centurion sar-interference-tracker, ukraine-timemap, iran-conflict-damage-proxy-map, search-grid-generator, geoclustering
Marshal sar-interference-tracker, umbra-open-data-tracker, cloud-free-subregion, search-grid-generator, ukraine-timemap
Warden sar-interference-tracker, umbra-open-data-tracker, adsb-history
Echo adsb-history (movement signatures), snscrape (signals correlation)
Herald tiktok-hashtag-analysis, gesara-entity-viz, auto-archiver, whisperbox-transcribe
Ghost gesara-entity-viz, polyphemus, gogettr, snscrape, telegram-group-joiner
Scholar open-source-research-notebooks, RS4OSINT, open-questions, quitobaquito

Install patterns

Most Python repos follow:

pip install <repo-name>
<repo-name> --help
# OR
git clone https://github.com/bellingcat/<repo-name>
cd <repo-name> && pip install -e .

Earth Engine repos: open https://code.earthengine.google.com/ and paste the script. Some require enabling specific imagery collections.

Vue/JavaScript apps: npm install && npm run dev for local; docker compose up if a docker-compose.yml is present.

Update freshness

Run periodically:

curl -s "https://api.github.com/orgs/bellingcat/repos?per_page=100&sort=updated" \
  | jq -r '.[] | select(.fork==false and .archived==false)
              | "\(.stargazers_count)\t\(.language // "?")\t\(.name)\t\(.description // "")"' \
  | sort -rn -k1 \
  | head -50

Watch the org page directly: https://github.com/orgs/bellingcat/repositories?type=all.

Pitfalls

  • Bellingcat's social-media scrapers (snscrape, vk-url-scraper, etc.) break frequently after platform API changes. Always run --version and read recent issues before relying on output for an investigation.
  • auto-archiver enrichers (Wayback, video DL, Telegram) each have their own auth + rate limits. The full pipeline is heavy — start with one enricher to validate flow before scaling.
  • telegram-phone-number-checker requires a Telegram developer account (api_id/api_hash). Excessive use will rate-limit or ban the account used; rotate.
  • GEE scripts need a Google account with Earth Engine access (free for research/non-profit). Quotas apply on heavy queries.
  • Several repos are unlicensed or have ambiguous licenses — for derivative work check the LICENSE file. Bellingcat's official repos are predominantly MIT.
  • Archived repos (12 of 62) are NOT included here — those are read-only historical references, not actively maintained.