diff --git a/personas/_shared/community-skills/bellingcat-osint-toolkit/SKILL.md b/personas/_shared/community-skills/bellingcat-osint-toolkit/SKILL.md index e9357ba..1c3aa16 100644 --- a/personas/_shared/community-skills/bellingcat-osint-toolkit/SKILL.md +++ b/personas/_shared/community-skills/bellingcat-osint-toolkit/SKILL.md @@ -30,18 +30,22 @@ bellingcat-osint-toolkit/ │ ├── refresh.sh pull fresh CSV from upstream nightly release │ └── regenerate-references.py rebuild references/*.md tables from CSV └── references/ - ├── archiving.md 8 tools - ├── companies-and-finance.md 26 tools - ├── conflict.md 6 tools - ├── data-org-and-analysis.md 11 tools - ├── environment-and-wildlife.md 24 tools - ├── geolocation.md 9 tools - ├── image-video.md 35 tools - ├── maps-and-satellites.md 83 tools - ├── people.md 33 tools - ├── social-media.md 63 tools - ├── transport.md 27 tools - └── websites.md 17 tools + ├── archiving.md 8 tools (curated externals) + ├── companies-and-finance.md 26 tools (curated externals) + ├── conflict.md 6 tools (curated externals) + ├── data-org-and-analysis.md 11 tools (curated externals) + ├── environment-and-wildlife.md 24 tools (curated externals) + ├── geolocation.md 9 tools (curated externals) + ├── image-video.md 35 tools (curated externals) + ├── maps-and-satellites.md 83 tools (curated externals) + ├── people.md 33 tools (curated externals) + ├── social-media.md 63 tools (curated externals) + ├── transport.md 27 tools (curated externals) + ├── websites.md 17 tools (curated externals) + └── bellingcat-own-repos.md 46 active repos Bellingcat ships + (octosuite, auto-archiver, EDGAR, + ShadowFinder, telegram-phone-checker, + sar-interference-tracker, etc.) ``` For ad-hoc queries the agent can grep the CSV directly: @@ -73,6 +77,7 @@ bash scripts/refresh.sh && python3 scripts/regenerate-references.py | Wildlife trafficking, environmental crime, terrain | `references/environment-and-wildlife.md` | 24 | | Preserve a webpage, video, social post | `references/archiving.md` | 8 | | Clean / merge / publish data; build the investigation file | `references/data-org-and-analysis.md` | 11 | +| Bellingcat's OWN open-source tools (octosuite, auto-archiver, EDGAR, ShadowFinder, etc.) | `references/bellingcat-own-repos.md` | 46 active repos | ## Persona affinity diff --git a/personas/_shared/community-skills/bellingcat-osint-toolkit/references/bellingcat-own-repos.md b/personas/_shared/community-skills/bellingcat-osint-toolkit/references/bellingcat-own-repos.md new file mode 100644 index 0000000..3c1d260 --- /dev/null +++ b/personas/_shared/community-skills/bellingcat-osint-toolkit/references/bellingcat-own-repos.md @@ -0,0 +1,374 @@ +# Bellingcat Toolkit — Own Repos (Tools They Built) + +The `bellingcat-osint-toolkit` skill's main catalog (`data/all-tools.csv` + 12 +category refs) lists tools Bellingcat **curates**. This reference covers +tools Bellingcat **built and ships** as code — 46 active non-fork repos +across the GitHub org, sorted by use case. + +> Source: +> Updated: 2026-05-02. Re-pull with +> `curl -s "https://api.github.com/orgs/bellingcat/repos?per_page=100"`. + +## Power tools — install and use + +### auto-archiver — bulk web/social-media preservation +1073★ Python. **Personas: scribe, oracle, herald, sentinel.** + +Multi-source archiver: pulls URLs from CSV / Google Sheets / CLI, archives +videos, images, social-media posts, webpages, and writes status back to +the source spreadsheet. Storage backends: local, S3, Google Drive. + +```bash +# Pip +pip install auto-archiver +auto-archiver --help + +# Docker (preferred for heavy enrichers) +docker pull bellingcat/auto-archiver +docker run -it --rm -v secrets:/app/secrets bellingcat/auto-archiver \ + --config secrets/orchestration.yaml +``` + +Companions: +- `auto-archiver-api` (13★) — REST API to manage users / sheets / URLs and dispatch workers +- `auto-archiver-extension` (3★) — browser extension front-end +- `auto-archiver-setup-tool` (11★) — Vue front-end for the API + +Docs: . + +When to reach for it: an investigation needs durable preservation of +dozens-to-thousands of URLs (Telegram channels going dark, breaking-news +videos before takedown, court-evidence chain). + +### octosuite — GitHub OSINT CLI + Python lib +1892★ Python. **Personas: oracle, sentinel, neo.** + +Terminal toolkit for GitHub data analysis. CLI + interactive TUI + Python +library all from the same package. + +```bash +pip install octosuite + +# CLI +octosuite user torvalds # profile +octosuite user torvalds --repos --per-page 50 # all repos +octosuite user torvalds --followers --json +octosuite repo torvalds/linux --commits +octosuite repo torvalds/linux --stargazers --export ./data +octosuite org github --members --json +octosuite search "supply chain attack" --repos +octosuite -t # interactive TUI + +# Library +import octosuite +user = octosuite.User("torvalds") +exists, profile = user.exists() +if exists: + repos = user.repos(page=1, per_page=100) +``` + +When to reach for it: profiling a threat-actor's GitHub footprint, finding +unpublished commits in an org, supply-chain audit on a maintainer, +triangulating an alias across GH events. + +### telegram-phone-number-checker — phone → Telegram correlation +1695★ Python. **Personas: oracle, wraith, frodo.** + +Given a phone number (or batch), check whether it's bound to a Telegram +account. Pivot for people-search on +country-code-known leads. + +```bash +pip install telegram-phone-number-checker +telegram-phone-number-checker check +12025550101 +telegram-phone-number-checker batch numbers.txt +``` + +Requires Telegram API credentials (api_id + api_hash from +). Rate-limited; use moderately to avoid bans. + +### snscrape — multi-platform social network scraper +346★ Python. **Personas: oracle, frodo, ghost.** + +Twitter (deprecated), Mastodon, Telegram, Reddit, Facebook, VK, Instagram, +WeChat, etc. Bellingcat maintains a fork — many platforms broke after +upstream changes; check repo status before relying on it. + +```bash +pip install snscrape +snscrape twitter-user elonmusk +snscrape telegram-channel durov --max-results 100 +``` + +### vk-url-scraper — VKontakte (Russian social) scraping +53★ Python. **Personas: oracle, ghost, frodo (russia).** + +```bash +pip install vk-url-scraper +vk-url-scraper --help +``` + +Library API also available. Useful for VK posts, photos, geotagged +content, group enumeration. + +### whisperbox-transcribe — Whisper audio/video transcription API +67★ Python. **Personas: scribe, herald, oracle.** + +Deploy Whisper as a service. Drop a video/audio URL, get transcript + +translation. Useful when an investigation accumulates hours of foreign- +language broadcast / Telegram voice notes. + +```bash +git clone https://github.com/bellingcat/whisperbox-transcribe +cd whisperbox-transcribe +docker compose up -d +curl -X POST -F "url=https://..." http://localhost:8000/jobs +``` + +### EDGAR — SEC corporate-data Python lib +203★ Python. **Personas: ledger, frodo.** + +Programmatic interface to SEC EDGAR — public filings, financials, ownership. + +```bash +pip install edgar-tool +edgar --help +edgar 10-K AAPL --year 2024 +``` + +Used for sanction-screening, insider trading patterns, beneficial-ownership +chains, ESG. + +## Geolocation toolbox + +| Repo | ★ | Lang | Use | Personas | +| --------------------------------- | ---- | ---------- | ---------------------------------------------------------------- | ---------------------- | +| `ShadowFinder` | 570 | Python | Map locations where a shadow of given length could occur at date/time | oracle, frodo, centurion | +| `instagram-location-search` | 679 | Python | Find Instagram location IDs near (lat, lon) | oracle, frodo | +| `osm-search` | 207 | Vue | OpenStreetMap proximity search UI | oracle, frodo | +| `geoclustering` | 45 | Python | Cluster a list of (lat,lon) points; CLI | oracle, frodo, marshal | +| `search-grid-generator` | 13 | Vue | Quickly generate KML search grids for area-of-interest mapping | oracle, marshal | +| `ColourHighlighter` | 4 | TypeScript | WebGL color filters / LUTs for screen-share geolocation | oracle (geo-analyst) | +| `rgb-viz` | 4 | JavaScript | Interactive viz of an image's R/G/B channels | oracle (forensics) | + +```bash +pip install bellingcat-shadowfinder +shadowfinder 1.5 --datetime "2024-03-15T14:00:00" --output map.html + +pip install bellingcat-instagram-location-search +ig-location-search --lat 40.7128 --lon -74.0060 +``` + +## Satellite / Earth Engine + +| Repo | ★ | Lang | Use | Personas | +| ------------------------------------------ | --- | ---------- | --------------------------------------------------------- | --------------------- | +| `sar-interference-tracker` | 556 | JavaScript | GEE script to detect SAR satellite radar interference | warden, marshal, centurion | +| `cloud-free-subregion` | 59 | JavaScript | GEE app — find cloud-free Sentinel-2 imagery for an AOI | oracle, marshal | +| `Multispectral-Satellite-Imagery-Explorer` | 13 | JavaScript | GEE app to explore Landsat-8 multispectral bands | oracle, marshal | +| `umbra-open-data-tracker` | 33 | Python | Monitor Umbra SAR open-data catalogue, emit KML coverage | warden, marshal | +| `ee_forest_area_tracker` | 4 | (?) | Forest-area tracking via Earth Engine | oracle, scholar | + +GEE scripts: copy-paste into . + +## Social-media scrapers (live status varies — verify before relying) + +| Repo | ★ | Lang | Platform / use | Personas | +| --------------------------------- | --- | ---------- | ------------------------------------------------------- | ---------------- | +| `tiktok-hashtag-analysis` | 358 | Python | Analyze hashtag co-occurrence + post stats | oracle, herald | +| `tiktok-timestamp` | 58 | HTML | Tiny client-side TikTok video timestamp retriever | oracle | +| `polyphemus` | 18 | Python | Odysee (alt-tech video) scraper | oracle, ghost | +| `gogettr` | 13 | Python | GETTR public API client for archival | oracle, ghost | +| `facebook-downloader` | 40 | Python | Public FB video downloader | oracle | +| `reddit-post-scraping-tool` | 92 | Python | Subreddit + keyword → top posts containing keyword | oracle, ghost | +| `youtube-comment-scraper` | 27 | Python | Scrape YT comments, find users commenting on N videos | oracle | +| `cisticola` | 20 | Python | Coordinator for multiple scrapers + DB layer | oracle (heavy) | + +## People search / aliases + +| Repo | ★ | Lang | Use | Personas | +| --------------------- | --- | ---------- | ---------------------------------------------------- | -------------- | +| `name-variant-search` | 50 | JavaScript | Generate search variations of a human name | oracle, wraith | +| `alias-generator` | 22 | JavaScript | Node module — likely aliases for a given name | oracle, wraith | + +```bash +npm install -g @bellingcat/alias-generator +alias-generator "John Smith" # produces J. Smith, Smith John, etc. +``` + +Use both in tandem: feed the name through `name-variant-search` for +cultural/transliteration variants, then pipe each variant through your +people-search stack (Sherlock, WhatsMyName, etc.). + +## Telegram-specific + +| Repo | ★ | Lang | Use | Personas | +| ------------------------------- | --- | ------ | ----------------------------------------------------- | ------------------------ | +| `telegram-phone-number-checker` | 1695| Python | Phone → Telegram presence check | oracle, wraith, frodo | +| `telegram-group-joiner` | 55 | (web) | Auto-join public/private Telegram groups | oracle, ghost | +| `gesara-entity-viz` | 4 | Python | Entity viz over a GESARA-conspiracy Telegram corpus | ghost, herald | + +Pair with this repo's own `telegram` skill (custom WAHA scraper) for +operational-scale Telegram archival. + +## Companies / finance + +| Repo | ★ | Lang | Use | Personas | +| ----------- | --- | ------ | ---------------------------------------------------------------------- | -------------- | +| `EDGAR` | 203 | Python | SEC EDGAR Python lib (filings, ownership, financials) | ledger, frodo | +| `sugartrail`| 76 | HTML | UK Companies House network viz — companies, officers, addresses | ledger | + +`sugartrail` is browser-based; deploy locally for big networks. Pair with +OpenCorporates / OpenSanctions in `references/companies-and-finance.md`. + +## Aircraft / transport intel + +| Repo | ★ | Lang | Use | Personas | +| -------------- | -- | ---- | ---------------------------------------------------------------- | --------------------- | +| `adsb-history` | 72 | Vue | Collect & query ADS-B aircraft history by region/altitude/type | warden, echo, frodo | + +```bash +git clone https://github.com/bellingcat/adsb-history +docker compose up -d +# Then visit http://localhost:5173 for the Vue front-end +``` + +Backfill investigations on private-jet movements, military transport +patterns, surveillance flights. + +## Image / media triage + +| Repo | ★ | Lang | Use | Personas | +| --------------------- | -- | ---------------- | -------------------------------------------------------------------- | --------------------- | +| `smart-image-sorter` | 62 | Jupyter Notebook | Zero-shot image classification via HuggingFace open-source models | oracle, sentinel | + +Use case: triage thousands of OSINT-collected images by content (e.g. +"weapon", "uniformed personnel", "vehicle"), then deep-dive the hits. + +## Web-history forensics + +| Repo | ★ | Lang | Use | Personas | +| -------------------------- | --- | -------- | ------------------------------------------------------------ | ---------------- | +| `wayback-google-analytics` | 234 | Python | Scrape current AND historic Google Analytics tags from sites | oracle, sentinel | +| `uniform-timezone` | 33 | Browser | Standardize timestamps across social-media UIs | scribe, oracle | + +`wayback-google-analytics` is gold for de-anonymizing networks of related +sites: GA tag IDs reused across domains often link sister-sites that +hide ownership. + +## Conflict / civilian-harm tracking + +| Repo | ★ | Lang | Use | Personas | +| --------------------------------- | --- | ---------- | ------------------------------------------------------------------ | -------------------- | +| `ukraine-timemap` | 287 | JavaScript | TimeMap instance for Civilian Harm in Ukraine | centurion, frodo, marshal | +| `iran-conflict-damage-proxy-map` | 6 | JavaScript | Iran conflict damage tracking | centurion, frodo, marshal | +| `vis-tj-kg-map-2022` | 3 | (?) | Tajikistan-Kyrgyzstan border-clash interactive map | centurion, frodo | + +These are public TimeMap front-ends backing Bellingcat's published +investigations. Reference architectures for building your own conflict +trackers — fork + adapt. + +## Specialized / research methodologies + +| Repo | ★ | Lang | Use | Personas | +| --------------------------------- | --- | ---------------- | ---------------------------------------------------------------- | -------------------- | +| `RS4OSINT` | 45 | TeX | Guide to Remote Sensing for OSINT (PDF + LaTeX source) | oracle, marshal, scholar | +| `open-source-research-notebooks` | 298 | Jupyter Notebook | Tutorial notebooks for command-line + code OSINT investigations | scholar, oracle | +| `open-questions` | 360 | Jupyter Notebook | Difficult research projects waiting for contributors | scholar, all-osint | +| `o9a-product-scripts` | 8 | Python | Scripts from Order of Nine Angles investigation | wraith (HUMINT) | +| `quitobaquito` | 14 | Python | Hydrology change methodology with GEE | scholar, oracle | +| `twitter-geocode-searches` | 26 | Python | Methodology for geofenced Twitter search | oracle | +| `coronavirus-aid-data` | 5 | Python | Data for Covid-19 relief-fund analysis article | ledger | +| `who-killed-abelardo` | 4 | (web) | Audio map visualization (single-investigation viz) | wraith, herald | +| `avoc` | 59 | CSS | 2024 Tech Fellowship working repo | (browse before reach)| + +`open-source-research-notebooks` is the best entry point for a researcher +new to Bellingcat methodology — it teaches the toolkit-via-Jupyter workflow. + +## Council / government records + +| Repo | ★ | Lang | Use | Personas | +| ---------------- | --- | ------ | -------------------------------------------------------------------- | ------------- | +| `CouncilSearcher`| 13 | Python | Find verbatim quotes from UK + Ireland council meetings | scribe, frodo | + +Niche but powerful for any UK municipal-level investigation. Drop a +search term, get back transcript-grounded matches. + +## Infrastructure / supporting + +| Repo | ★ | Notes | +| ------------------------------- | -- | ---------------------------------------------- | +| `toolkit` | 539| The GitBook / curated-tools repo (this skill's source) | +| `hackathon-submission-template` | 11 | Template for Bellingcat Global Hackathon | +| `bcat-discord-bot` | 5 | Bellingcat's own Discord bot | +| `challenge-framework` | 5 | Static-site challenge framework | +| `google-apps-script` | 31 | Handy GAS snippets | +| `datasheet-server` | 32 | CSV → dynamic API server | +| `4-year-anniversary-network` | 2 | Anniversary visualization | + +## Persona affinity quick-pivot + +| Persona | Top repos to know | +| ----------- | ---------------------------------------------------------------------- | +| **Oracle** | octosuite, telegram-phone-number-checker, auto-archiver, snscrape, ShadowFinder, instagram-location-search, osm-search, name-variant-search, alias-generator, smart-image-sorter, wayback-google-analytics | +| **Frodo** | telegram-phone-number-checker, vk-url-scraper, ShadowFinder, instagram-location-search, EDGAR, ukraine-timemap, snscrape | +| **Wraith** | telegram-phone-number-checker, name-variant-search, alias-generator, o9a-product-scripts | +| **Sentinel**| octosuite, wayback-google-analytics, auto-archiver, smart-image-sorter | +| **Scribe** | auto-archiver, whisperbox-transcribe, uniform-timezone, CouncilSearcher | +| **Ledger** | EDGAR, sugartrail, coronavirus-aid-data | +| **Centurion** | sar-interference-tracker, ukraine-timemap, iran-conflict-damage-proxy-map, search-grid-generator, geoclustering | +| **Marshal** | sar-interference-tracker, umbra-open-data-tracker, cloud-free-subregion, search-grid-generator, ukraine-timemap | +| **Warden** | sar-interference-tracker, umbra-open-data-tracker, adsb-history | +| **Echo** | adsb-history (movement signatures), snscrape (signals correlation) | +| **Herald** | tiktok-hashtag-analysis, gesara-entity-viz, auto-archiver, whisperbox-transcribe | +| **Ghost** | gesara-entity-viz, polyphemus, gogettr, snscrape, telegram-group-joiner | +| **Scholar** | open-source-research-notebooks, RS4OSINT, open-questions, quitobaquito | + +## Install patterns + +Most Python repos follow: +```bash +pip install + --help +# OR +git clone https://github.com/bellingcat/ +cd && pip install -e . +``` + +Earth Engine repos: open and paste +the script. Some require enabling specific imagery collections. + +Vue/JavaScript apps: `npm install && npm run dev` for local; `docker +compose up` if a `docker-compose.yml` is present. + +## Update freshness + +Run periodically: + +```bash +curl -s "https://api.github.com/orgs/bellingcat/repos?per_page=100&sort=updated" \ + | jq -r '.[] | select(.fork==false and .archived==false) + | "\(.stargazers_count)\t\(.language // "?")\t\(.name)\t\(.description // "")"' \ + | sort -rn -k1 \ + | head -50 +``` + +Watch the org page directly: . + +## Pitfalls + +- Bellingcat's social-media scrapers (`snscrape`, `vk-url-scraper`, etc.) + break frequently after platform API changes. Always run `--version` and + read recent issues before relying on output for an investigation. +- `auto-archiver` enrichers (Wayback, video DL, Telegram) each have their + own auth + rate limits. The full pipeline is heavy — start with one + enricher to validate flow before scaling. +- `telegram-phone-number-checker` requires a Telegram developer account + (`api_id`/`api_hash`). Excessive use will rate-limit or ban the account + used; rotate. +- GEE scripts need a Google account with Earth Engine access (free for + research/non-profit). Quotas apply on heavy queries. +- Several repos are unlicensed or have ambiguous licenses — for derivative + work check the LICENSE file. Bellingcat's official repos are + predominantly MIT. +- Archived repos (12 of 62) are NOT included here — those are read-only + historical references, not actively maintained.