feat(bellingcat-osint-toolkit): add references/bellingcat-own-repos.md

Comprehensive reference for the 46 active non-fork repos in the
github.com/bellingcat org — tools Bellingcat ships as code (vs the
external tools they curate, which the existing 12 category refs cover).

Sections:
- Power tools (auto-archiver, octosuite, telegram-phone-number-checker,
  snscrape, vk-url-scraper, whisperbox-transcribe, EDGAR) with install
  commands + key invocations
- Geolocation toolbox (ShadowFinder, instagram-location-search,
  osm-search, geoclustering, search-grid-generator, ColourHighlighter,
  rgb-viz)
- Satellite / Earth Engine (sar-interference-tracker, cloud-free-subregion,
  Multispectral Imagery Explorer, umbra-open-data-tracker, ee_forest_area_tracker)
- Social-media scrapers (TikTok, Reddit, YouTube, Odysee, GETTR, Facebook,
  cisticola coordinator)
- People search (name-variant-search, alias-generator)
- Telegram (phone-checker, group-joiner, gesara-entity-viz)
- Companies / finance (EDGAR, sugartrail)
- Aircraft tracking (adsb-history)
- Image triage (smart-image-sorter via HuggingFace)
- Web-history forensics (wayback-google-analytics, uniform-timezone)
- Conflict tracking (ukraine-timemap, iran-conflict-damage-proxy-map,
  vis-tj-kg-map-2022)
- Research methodologies (RS4OSINT, open-source-research-notebooks,
  open-questions, quitobaquito, twitter-geocode-searches)
- Council / government records (CouncilSearcher)
- Persona affinity quick-pivot table for all 13 personas

Each entry has stars, language, use case, persona affinity, and (where
useful) the exact install + first-use commands. SKILL.md updated to
reference the new file in the layout tree and "when to load" table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
salvacybersec
2026-05-02 01:20:36 +03:00
parent 266969acb7
commit 700122807d
2 changed files with 391 additions and 12 deletions

View File

@@ -30,18 +30,22 @@ bellingcat-osint-toolkit/
│ ├── refresh.sh pull fresh CSV from upstream nightly release
│ └── regenerate-references.py rebuild references/*.md tables from CSV
└── references/
├── archiving.md 8 tools
├── companies-and-finance.md 26 tools
├── conflict.md 6 tools
├── data-org-and-analysis.md 11 tools
├── environment-and-wildlife.md 24 tools
├── geolocation.md 9 tools
├── image-video.md 35 tools
├── maps-and-satellites.md 83 tools
├── people.md 33 tools
├── social-media.md 63 tools
├── transport.md 27 tools
── websites.md 17 tools
├── archiving.md 8 tools (curated externals)
├── companies-and-finance.md 26 tools (curated externals)
├── conflict.md 6 tools (curated externals)
├── data-org-and-analysis.md 11 tools (curated externals)
├── environment-and-wildlife.md 24 tools (curated externals)
├── geolocation.md 9 tools (curated externals)
├── image-video.md 35 tools (curated externals)
├── maps-and-satellites.md 83 tools (curated externals)
├── people.md 33 tools (curated externals)
├── social-media.md 63 tools (curated externals)
├── transport.md 27 tools (curated externals)
── websites.md 17 tools (curated externals)
└── bellingcat-own-repos.md 46 active repos Bellingcat ships
(octosuite, auto-archiver, EDGAR,
ShadowFinder, telegram-phone-checker,
sar-interference-tracker, etc.)
```
For ad-hoc queries the agent can grep the CSV directly:
@@ -73,6 +77,7 @@ bash scripts/refresh.sh && python3 scripts/regenerate-references.py
| Wildlife trafficking, environmental crime, terrain | `references/environment-and-wildlife.md` | 24 |
| Preserve a webpage, video, social post | `references/archiving.md` | 8 |
| Clean / merge / publish data; build the investigation file | `references/data-org-and-analysis.md` | 11 |
| Bellingcat's OWN open-source tools (octosuite, auto-archiver, EDGAR, ShadowFinder, etc.) | `references/bellingcat-own-repos.md` | 46 active repos |
## Persona affinity

View File

@@ -0,0 +1,374 @@
# Bellingcat Toolkit — Own Repos (Tools They Built)
The `bellingcat-osint-toolkit` skill's main catalog (`data/all-tools.csv` + 12
category refs) lists tools Bellingcat **curates**. This reference covers
tools Bellingcat **built and ships** as code — 46 active non-fork repos
across the GitHub org, sorted by use case.
> Source: <https://github.com/orgs/bellingcat/repositories>
> Updated: 2026-05-02. Re-pull with
> `curl -s "https://api.github.com/orgs/bellingcat/repos?per_page=100"`.
## Power tools — install and use
### auto-archiver — bulk web/social-media preservation
1073★ Python. **Personas: scribe, oracle, herald, sentinel.**
Multi-source archiver: pulls URLs from CSV / Google Sheets / CLI, archives
videos, images, social-media posts, webpages, and writes status back to
the source spreadsheet. Storage backends: local, S3, Google Drive.
```bash
# Pip
pip install auto-archiver
auto-archiver --help
# Docker (preferred for heavy enrichers)
docker pull bellingcat/auto-archiver
docker run -it --rm -v secrets:/app/secrets bellingcat/auto-archiver \
--config secrets/orchestration.yaml
```
Companions:
- `auto-archiver-api` (13★) — REST API to manage users / sheets / URLs and dispatch workers
- `auto-archiver-extension` (3★) — browser extension front-end
- `auto-archiver-setup-tool` (11★) — Vue front-end for the API
Docs: <https://auto-archiver.readthedocs.io/>.
When to reach for it: an investigation needs durable preservation of
dozens-to-thousands of URLs (Telegram channels going dark, breaking-news
videos before takedown, court-evidence chain).
### octosuite — GitHub OSINT CLI + Python lib
1892★ Python. **Personas: oracle, sentinel, neo.**
Terminal toolkit for GitHub data analysis. CLI + interactive TUI + Python
library all from the same package.
```bash
pip install octosuite
# CLI
octosuite user torvalds # profile
octosuite user torvalds --repos --per-page 50 # all repos
octosuite user torvalds --followers --json
octosuite repo torvalds/linux --commits
octosuite repo torvalds/linux --stargazers --export ./data
octosuite org github --members --json
octosuite search "supply chain attack" --repos
octosuite -t # interactive TUI
# Library
import octosuite
user = octosuite.User("torvalds")
exists, profile = user.exists()
if exists:
repos = user.repos(page=1, per_page=100)
```
When to reach for it: profiling a threat-actor's GitHub footprint, finding
unpublished commits in an org, supply-chain audit on a maintainer,
triangulating an alias across GH events.
### telegram-phone-number-checker — phone → Telegram correlation
1695★ Python. **Personas: oracle, wraith, frodo.**
Given a phone number (or batch), check whether it's bound to a Telegram
account. Pivot for people-search on +country-code-known leads.
```bash
pip install telegram-phone-number-checker
telegram-phone-number-checker check +12025550101
telegram-phone-number-checker batch numbers.txt
```
Requires Telegram API credentials (api_id + api_hash from
<https://my.telegram.org>). Rate-limited; use moderately to avoid bans.
### snscrape — multi-platform social network scraper
346★ Python. **Personas: oracle, frodo, ghost.**
Twitter (deprecated), Mastodon, Telegram, Reddit, Facebook, VK, Instagram,
WeChat, etc. Bellingcat maintains a fork — many platforms broke after
upstream changes; check repo status before relying on it.
```bash
pip install snscrape
snscrape twitter-user elonmusk
snscrape telegram-channel durov --max-results 100
```
### vk-url-scraper — VKontakte (Russian social) scraping
53★ Python. **Personas: oracle, ghost, frodo (russia).**
```bash
pip install vk-url-scraper
vk-url-scraper --help
```
Library API also available. Useful for VK posts, photos, geotagged
content, group enumeration.
### whisperbox-transcribe — Whisper audio/video transcription API
67★ Python. **Personas: scribe, herald, oracle.**
Deploy Whisper as a service. Drop a video/audio URL, get transcript +
translation. Useful when an investigation accumulates hours of foreign-
language broadcast / Telegram voice notes.
```bash
git clone https://github.com/bellingcat/whisperbox-transcribe
cd whisperbox-transcribe
docker compose up -d
curl -X POST -F "url=https://..." http://localhost:8000/jobs
```
### EDGAR — SEC corporate-data Python lib
203★ Python. **Personas: ledger, frodo.**
Programmatic interface to SEC EDGAR — public filings, financials, ownership.
```bash
pip install edgar-tool
edgar --help
edgar 10-K AAPL --year 2024
```
Used for sanction-screening, insider trading patterns, beneficial-ownership
chains, ESG.
## Geolocation toolbox
| Repo | ★ | Lang | Use | Personas |
| --------------------------------- | ---- | ---------- | ---------------------------------------------------------------- | ---------------------- |
| `ShadowFinder` | 570 | Python | Map locations where a shadow of given length could occur at date/time | oracle, frodo, centurion |
| `instagram-location-search` | 679 | Python | Find Instagram location IDs near (lat, lon) | oracle, frodo |
| `osm-search` | 207 | Vue | OpenStreetMap proximity search UI | oracle, frodo |
| `geoclustering` | 45 | Python | Cluster a list of (lat,lon) points; CLI | oracle, frodo, marshal |
| `search-grid-generator` | 13 | Vue | Quickly generate KML search grids for area-of-interest mapping | oracle, marshal |
| `ColourHighlighter` | 4 | TypeScript | WebGL color filters / LUTs for screen-share geolocation | oracle (geo-analyst) |
| `rgb-viz` | 4 | JavaScript | Interactive viz of an image's R/G/B channels | oracle (forensics) |
```bash
pip install bellingcat-shadowfinder
shadowfinder 1.5 --datetime "2024-03-15T14:00:00" --output map.html
pip install bellingcat-instagram-location-search
ig-location-search --lat 40.7128 --lon -74.0060
```
## Satellite / Earth Engine
| Repo | ★ | Lang | Use | Personas |
| ------------------------------------------ | --- | ---------- | --------------------------------------------------------- | --------------------- |
| `sar-interference-tracker` | 556 | JavaScript | GEE script to detect SAR satellite radar interference | warden, marshal, centurion |
| `cloud-free-subregion` | 59 | JavaScript | GEE app — find cloud-free Sentinel-2 imagery for an AOI | oracle, marshal |
| `Multispectral-Satellite-Imagery-Explorer` | 13 | JavaScript | GEE app to explore Landsat-8 multispectral bands | oracle, marshal |
| `umbra-open-data-tracker` | 33 | Python | Monitor Umbra SAR open-data catalogue, emit KML coverage | warden, marshal |
| `ee_forest_area_tracker` | 4 | (?) | Forest-area tracking via Earth Engine | oracle, scholar |
GEE scripts: copy-paste into <https://code.earthengine.google.com/>.
## Social-media scrapers (live status varies — verify before relying)
| Repo | ★ | Lang | Platform / use | Personas |
| --------------------------------- | --- | ---------- | ------------------------------------------------------- | ---------------- |
| `tiktok-hashtag-analysis` | 358 | Python | Analyze hashtag co-occurrence + post stats | oracle, herald |
| `tiktok-timestamp` | 58 | HTML | Tiny client-side TikTok video timestamp retriever | oracle |
| `polyphemus` | 18 | Python | Odysee (alt-tech video) scraper | oracle, ghost |
| `gogettr` | 13 | Python | GETTR public API client for archival | oracle, ghost |
| `facebook-downloader` | 40 | Python | Public FB video downloader | oracle |
| `reddit-post-scraping-tool` | 92 | Python | Subreddit + keyword → top posts containing keyword | oracle, ghost |
| `youtube-comment-scraper` | 27 | Python | Scrape YT comments, find users commenting on N videos | oracle |
| `cisticola` | 20 | Python | Coordinator for multiple scrapers + DB layer | oracle (heavy) |
## People search / aliases
| Repo | ★ | Lang | Use | Personas |
| --------------------- | --- | ---------- | ---------------------------------------------------- | -------------- |
| `name-variant-search` | 50 | JavaScript | Generate search variations of a human name | oracle, wraith |
| `alias-generator` | 22 | JavaScript | Node module — likely aliases for a given name | oracle, wraith |
```bash
npm install -g @bellingcat/alias-generator
alias-generator "John Smith" # produces J. Smith, Smith John, etc.
```
Use both in tandem: feed the name through `name-variant-search` for
cultural/transliteration variants, then pipe each variant through your
people-search stack (Sherlock, WhatsMyName, etc.).
## Telegram-specific
| Repo | ★ | Lang | Use | Personas |
| ------------------------------- | --- | ------ | ----------------------------------------------------- | ------------------------ |
| `telegram-phone-number-checker` | 1695| Python | Phone → Telegram presence check | oracle, wraith, frodo |
| `telegram-group-joiner` | 55 | (web) | Auto-join public/private Telegram groups | oracle, ghost |
| `gesara-entity-viz` | 4 | Python | Entity viz over a GESARA-conspiracy Telegram corpus | ghost, herald |
Pair with this repo's own `telegram` skill (custom WAHA scraper) for
operational-scale Telegram archival.
## Companies / finance
| Repo | ★ | Lang | Use | Personas |
| ----------- | --- | ------ | ---------------------------------------------------------------------- | -------------- |
| `EDGAR` | 203 | Python | SEC EDGAR Python lib (filings, ownership, financials) | ledger, frodo |
| `sugartrail`| 76 | HTML | UK Companies House network viz — companies, officers, addresses | ledger |
`sugartrail` is browser-based; deploy locally for big networks. Pair with
OpenCorporates / OpenSanctions in `references/companies-and-finance.md`.
## Aircraft / transport intel
| Repo | ★ | Lang | Use | Personas |
| -------------- | -- | ---- | ---------------------------------------------------------------- | --------------------- |
| `adsb-history` | 72 | Vue | Collect & query ADS-B aircraft history by region/altitude/type | warden, echo, frodo |
```bash
git clone https://github.com/bellingcat/adsb-history
docker compose up -d
# Then visit http://localhost:5173 for the Vue front-end
```
Backfill investigations on private-jet movements, military transport
patterns, surveillance flights.
## Image / media triage
| Repo | ★ | Lang | Use | Personas |
| --------------------- | -- | ---------------- | -------------------------------------------------------------------- | --------------------- |
| `smart-image-sorter` | 62 | Jupyter Notebook | Zero-shot image classification via HuggingFace open-source models | oracle, sentinel |
Use case: triage thousands of OSINT-collected images by content (e.g.
"weapon", "uniformed personnel", "vehicle"), then deep-dive the hits.
## Web-history forensics
| Repo | ★ | Lang | Use | Personas |
| -------------------------- | --- | -------- | ------------------------------------------------------------ | ---------------- |
| `wayback-google-analytics` | 234 | Python | Scrape current AND historic Google Analytics tags from sites | oracle, sentinel |
| `uniform-timezone` | 33 | Browser | Standardize timestamps across social-media UIs | scribe, oracle |
`wayback-google-analytics` is gold for de-anonymizing networks of related
sites: GA tag IDs reused across domains often link sister-sites that
hide ownership.
## Conflict / civilian-harm tracking
| Repo | ★ | Lang | Use | Personas |
| --------------------------------- | --- | ---------- | ------------------------------------------------------------------ | -------------------- |
| `ukraine-timemap` | 287 | JavaScript | TimeMap instance for Civilian Harm in Ukraine | centurion, frodo, marshal |
| `iran-conflict-damage-proxy-map` | 6 | JavaScript | Iran conflict damage tracking | centurion, frodo, marshal |
| `vis-tj-kg-map-2022` | 3 | (?) | Tajikistan-Kyrgyzstan border-clash interactive map | centurion, frodo |
These are public TimeMap front-ends backing Bellingcat's published
investigations. Reference architectures for building your own conflict
trackers — fork + adapt.
## Specialized / research methodologies
| Repo | ★ | Lang | Use | Personas |
| --------------------------------- | --- | ---------------- | ---------------------------------------------------------------- | -------------------- |
| `RS4OSINT` | 45 | TeX | Guide to Remote Sensing for OSINT (PDF + LaTeX source) | oracle, marshal, scholar |
| `open-source-research-notebooks` | 298 | Jupyter Notebook | Tutorial notebooks for command-line + code OSINT investigations | scholar, oracle |
| `open-questions` | 360 | Jupyter Notebook | Difficult research projects waiting for contributors | scholar, all-osint |
| `o9a-product-scripts` | 8 | Python | Scripts from Order of Nine Angles investigation | wraith (HUMINT) |
| `quitobaquito` | 14 | Python | Hydrology change methodology with GEE | scholar, oracle |
| `twitter-geocode-searches` | 26 | Python | Methodology for geofenced Twitter search | oracle |
| `coronavirus-aid-data` | 5 | Python | Data for Covid-19 relief-fund analysis article | ledger |
| `who-killed-abelardo` | 4 | (web) | Audio map visualization (single-investigation viz) | wraith, herald |
| `avoc` | 59 | CSS | 2024 Tech Fellowship working repo | (browse before reach)|
`open-source-research-notebooks` is the best entry point for a researcher
new to Bellingcat methodology — it teaches the toolkit-via-Jupyter workflow.
## Council / government records
| Repo | ★ | Lang | Use | Personas |
| ---------------- | --- | ------ | -------------------------------------------------------------------- | ------------- |
| `CouncilSearcher`| 13 | Python | Find verbatim quotes from UK + Ireland council meetings | scribe, frodo |
Niche but powerful for any UK municipal-level investigation. Drop a
search term, get back transcript-grounded matches.
## Infrastructure / supporting
| Repo | ★ | Notes |
| ------------------------------- | -- | ---------------------------------------------- |
| `toolkit` | 539| The GitBook / curated-tools repo (this skill's source) |
| `hackathon-submission-template` | 11 | Template for Bellingcat Global Hackathon |
| `bcat-discord-bot` | 5 | Bellingcat's own Discord bot |
| `challenge-framework` | 5 | Static-site challenge framework |
| `google-apps-script` | 31 | Handy GAS snippets |
| `datasheet-server` | 32 | CSV → dynamic API server |
| `4-year-anniversary-network` | 2 | Anniversary visualization |
## Persona affinity quick-pivot
| Persona | Top repos to know |
| ----------- | ---------------------------------------------------------------------- |
| **Oracle** | octosuite, telegram-phone-number-checker, auto-archiver, snscrape, ShadowFinder, instagram-location-search, osm-search, name-variant-search, alias-generator, smart-image-sorter, wayback-google-analytics |
| **Frodo** | telegram-phone-number-checker, vk-url-scraper, ShadowFinder, instagram-location-search, EDGAR, ukraine-timemap, snscrape |
| **Wraith** | telegram-phone-number-checker, name-variant-search, alias-generator, o9a-product-scripts |
| **Sentinel**| octosuite, wayback-google-analytics, auto-archiver, smart-image-sorter |
| **Scribe** | auto-archiver, whisperbox-transcribe, uniform-timezone, CouncilSearcher |
| **Ledger** | EDGAR, sugartrail, coronavirus-aid-data |
| **Centurion** | sar-interference-tracker, ukraine-timemap, iran-conflict-damage-proxy-map, search-grid-generator, geoclustering |
| **Marshal** | sar-interference-tracker, umbra-open-data-tracker, cloud-free-subregion, search-grid-generator, ukraine-timemap |
| **Warden** | sar-interference-tracker, umbra-open-data-tracker, adsb-history |
| **Echo** | adsb-history (movement signatures), snscrape (signals correlation) |
| **Herald** | tiktok-hashtag-analysis, gesara-entity-viz, auto-archiver, whisperbox-transcribe |
| **Ghost** | gesara-entity-viz, polyphemus, gogettr, snscrape, telegram-group-joiner |
| **Scholar** | open-source-research-notebooks, RS4OSINT, open-questions, quitobaquito |
## Install patterns
Most Python repos follow:
```bash
pip install <repo-name>
<repo-name> --help
# OR
git clone https://github.com/bellingcat/<repo-name>
cd <repo-name> && pip install -e .
```
Earth Engine repos: open <https://code.earthengine.google.com/> and paste
the script. Some require enabling specific imagery collections.
Vue/JavaScript apps: `npm install && npm run dev` for local; `docker
compose up` if a `docker-compose.yml` is present.
## Update freshness
Run periodically:
```bash
curl -s "https://api.github.com/orgs/bellingcat/repos?per_page=100&sort=updated" \
| jq -r '.[] | select(.fork==false and .archived==false)
| "\(.stargazers_count)\t\(.language // "?")\t\(.name)\t\(.description // "")"' \
| sort -rn -k1 \
| head -50
```
Watch the org page directly: <https://github.com/orgs/bellingcat/repositories?type=all>.
## Pitfalls
- Bellingcat's social-media scrapers (`snscrape`, `vk-url-scraper`, etc.)
break frequently after platform API changes. Always run `--version` and
read recent issues before relying on output for an investigation.
- `auto-archiver` enrichers (Wayback, video DL, Telegram) each have their
own auth + rate limits. The full pipeline is heavy — start with one
enricher to validate flow before scaling.
- `telegram-phone-number-checker` requires a Telegram developer account
(`api_id`/`api_hash`). Excessive use will rate-limit or ban the account
used; rotate.
- GEE scripts need a Google account with Earth Engine access (free for
research/non-profit). Quotas apply on heavy queries.
- Several repos are unlicensed or have ambiguous licenses — for derivative
work check the LICENSE file. Bellingcat's official repos are
predominantly MIT.
- Archived repos (12 of 62) are NOT included here — those are read-only
historical references, not actively maintained.