Compare commits
2 Commits
0b308ed8be
...
00dc88bf5f
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
00dc88bf5f | ||
|
|
3126dadd19 |
@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
|
||||
|
||||
## What This Is
|
||||
|
||||
A platform-agnostic system prompt library for LLM agents. 29 personas across 10 domains, 111 variants, 59,712 words. Includes 796 shared skills, 58 brand design systems, 23 company agents, 168 AD/red team attack docs (InternalAllTheThings), and auto-install to 7 platforms (Claude, Antigravity, Gemini, OpenClaw, OpenCode, Paperclip, raw).
|
||||
A platform-agnostic system prompt library for LLM agents. 29 personas across 10 domains, 111 variants, 59,712 words. Includes 796 shared skills + 20 Feynman research-workflow skills, 58 brand design systems, 23 company agents, 168 AD/red team attack docs (InternalAllTheThings), and auto-install to 7 platforms (Claude, Antigravity, Gemini, OpenClaw, OpenCode, Paperclip, raw).
|
||||
|
||||
## Build
|
||||
|
||||
@@ -37,6 +37,7 @@ Optional: `cp config.example.yaml config.yaml` for dynamic variable injection. B
|
||||
- `skills/` — 42 shared skills from OpenClaw/kali-claw (SKILL.md + references per skill)
|
||||
- `paperclip-skills/` — 52 skills from paperclip-docs (ceo-advisor, coding-agent, security-review, etc.)
|
||||
- `community-skills/` — 703 skills from skills.sh marketplace (shadcn, vercel, olla, expo, etc.) (shadcn, vercel, marketing, expo, obsidian, impeccable, browser-use, stitch, firecrawl, github, neon, azure, etc.)
|
||||
- `feynman-skills/` — 20 research-workflow skills adapted from Feynman (deep-research, literature-review, paper-code-audit, peer-review, paper-writing, replication, source-comparison, summarize, alpha-research, eli5, autoresearch, docker/modal/runpod compute, session-log, session-search, jobs, watch, preview, contributing). Cross-platform: Claude Code + OpenCode. Subagent refs (`researcher`/`reviewer`/`writer`/`verifier`) mapped to host `Task`/`task` tool. Mapped to Scholar/Forge/Oracle personas.
|
||||
- `design-md/` — 58 brand DESIGN.md files (Stripe, Claude, Linear, Apple, Vercel, etc.)
|
||||
- `ui-ux-pro-max/` — BM25 search engine + 14 CSV data files (67 styles, 161 products, 57 fonts)
|
||||
- `paperclip-agents/` — 23 company agents (Odin/CEO, Thor/CTO, Freya/CMO, Frigg/COO + 19 team members)
|
||||
@@ -57,10 +58,11 @@ Optional: `cp config.example.yaml config.yaml` for dynamic variable injection. B
|
||||
|
||||
```bash
|
||||
python3 build.py --install claude # 111 slash commands → ~/.claude/commands/
|
||||
python3 build.py --install claude-skills # shared skills → ~/.claude/skills/ (default: skills,paperclip-skills,feynman-skills)
|
||||
python3 build.py --install antigravity # personas → ~/.config/antigravity/personas/
|
||||
python3 build.py --install gemini # Gems → generated/_gems/
|
||||
python3 build.py --install openclaw # IDENTITY.md + 29 personas → generated/_openclaw/
|
||||
python3 build.py --install opencode # 29 agents + 1530 skills → ~/.config/opencode/{agents,skills}/
|
||||
python3 build.py --install opencode # 29 agents + skills → ~/.config/opencode/{agents,skills}/
|
||||
python3 build.py --install paperclip # 52 agents + 73 skills → generated/_paperclip/
|
||||
python3 build.py --install all # all platforms at once
|
||||
```
|
||||
|
||||
@@ -84,17 +84,17 @@ cat generated/sentinel/apt-profiling.yaml # YAML with metadata
|
||||
| **Specter** | Malware Analyst / Reverse Engineer | Cerrah | general, firmware | — |
|
||||
| **Bastion** | Blue Team / DFIR | Muhafız | general, forensics, threat-hunting, incident-commander | senior-secops, sys-guard-linux-remediator, pcap-analyzer |
|
||||
| **Vortex** | Network Ops / Traffic Analysis | Telsizci | general, cloud-ad | nmap-recon, pcap-analyzer, dns-networking |
|
||||
| **Sentinel** | CTI / Threat Intelligence | İzci | general, apt-profiling, mitre-attack, darknet, **c2-hunting** | seithar-intel, gov-cybersecurity, pentest-c2-operator |
|
||||
| **Sentinel** | CTI / Threat Intelligence | İzci | general, apt-profiling, mitre-attack, darknet, **c2-hunting** | seithar-intel, gov-cybersecurity, pentest-c2-operator, telegram |
|
||||
|
||||
### Intelligence (5 personas, 29 variants)
|
||||
|
||||
| Codename | Role | Hitap | Variants | Skills |
|
||||
|----------|------|-------|----------|--------|
|
||||
| **Frodo** | Strategic Intelligence Analyst | Müsteşar | general, middle-east, russia, iran, africa, china, pakistan, india, nato-alliance, nuclear, energy-geopolitics, turkey, salva | freshrss, freshrss-reader, seithar-intel, war-intel-monitor, news-crawler, dellight-intelligence-ops, dellight-strategic-intelligence |
|
||||
| **Oracle** | OSINT & Digital Intelligence | Kaşif | general, crypto-osint, **source-verification**, salva | osint-investigator, stealth-browser, deep-scraper, crawl-for-ai, image-ocr, mistral-ocr, freshrss +2 |
|
||||
| **Frodo** | Strategic Intelligence Analyst | Müsteşar | general, middle-east, russia, iran, africa, china, pakistan, india, nato-alliance, nuclear, energy-geopolitics, turkey, salva | freshrss, freshrss-reader, seithar-intel, war-intel-monitor, news-crawler, dellight-intelligence-ops, dellight-strategic-intelligence, telegram |
|
||||
| **Oracle** | OSINT & Digital Intelligence | Kaşif | general, crypto-osint, **source-verification**, salva | osint-investigator, stealth-browser, deep-scraper, crawl-for-ai, image-ocr, mistral-ocr, freshrss, telegram +2 |
|
||||
| **Ghost** | PSYOP & Information Warfare | Propagandist | general, cognitive-warfare, russian-info-war, salva | social-trust-manipulation-detector |
|
||||
| **Wraith** | HUMINT & Counter-Intelligence | Mahrem | general, source-validation, case-studies, salva | — |
|
||||
| **Echo** | SIGINT / COMINT / ELINT | Kulakçı | general, nsa-sigint, electronic-order-of-battle, salva | dellight-intelligence-ops |
|
||||
| **Echo** | SIGINT / COMINT / ELINT | Kulakçı | general, nsa-sigint, electronic-order-of-battle, salva | dellight-intelligence-ops, telegram |
|
||||
|
||||
### Military & Strategy (4 personas, 24 variants)
|
||||
|
||||
|
||||
223
build.py
223
build.py
@@ -258,9 +258,18 @@ def build_persona(
|
||||
# Inject mapped skills for this persona
|
||||
if skills_index:
|
||||
mapped_skills = []
|
||||
for skill_name, skill_info in skills_index.get("skills", {}).items():
|
||||
if persona_name in skill_info.get("personas", []):
|
||||
mapped_skills.append(skill_name)
|
||||
for bucket in (
|
||||
"skills",
|
||||
"paperclip_skills",
|
||||
"community_skills",
|
||||
"feynman_skills",
|
||||
):
|
||||
for skill_name, skill_info in skills_index.get(bucket, {}).items():
|
||||
if not isinstance(skill_info, dict):
|
||||
continue
|
||||
if persona_name in skill_info.get("personas", []):
|
||||
if skill_name not in mapped_skills:
|
||||
mapped_skills.append(skill_name)
|
||||
# Also check config-based custom mapping
|
||||
skill_map = skills_index.get("_skill_persona_map", {})
|
||||
for skill_name, persona_list in skill_map.items():
|
||||
@@ -302,6 +311,8 @@ def build_persona(
|
||||
|
||||
|
||||
DEFAULT_SKILL_PERSONA_MAP = {
|
||||
# Browser automation for every persona
|
||||
"browser-use": ["*"],
|
||||
# Cybersecurity skills → personas
|
||||
"pentest": ["neo"],
|
||||
"nmap-recon": ["neo", "vortex"],
|
||||
@@ -332,6 +343,7 @@ DEFAULT_SKILL_PERSONA_MAP = {
|
||||
"news-crawler": ["frodo", "herald"],
|
||||
"dellight-intelligence-ops": ["frodo", "echo"],
|
||||
"dellight-strategic-intelligence": ["frodo"],
|
||||
"telegram": ["frodo", "oracle", "sentinel", "echo"],
|
||||
"agent-intelligence-network-scan": ["oracle"],
|
||||
"social-trust-manipulation-detector": ["ghost"],
|
||||
# Infrastructure skills → personas
|
||||
@@ -345,6 +357,8 @@ DEFAULT_SKILL_PERSONA_MAP = {
|
||||
# Web scraping → personas
|
||||
"deep-scraper": ["oracle"],
|
||||
"crawl-for-ai": ["oracle", "herald"],
|
||||
# Historical / archival research → personas
|
||||
"ekos-gazete-search": ["scribe", "scholar", "oracle", "frodo", "chronos", "centurion", "wraith"],
|
||||
}
|
||||
|
||||
|
||||
@@ -387,7 +401,10 @@ def parse_skill_frontmatter(skill_md: Path) -> dict:
|
||||
fm_match = re.match(r"^---\n(.*?)\n---\n", content, re.DOTALL)
|
||||
if not fm_match:
|
||||
return {}
|
||||
parsed = yaml.safe_load(fm_match.group(1))
|
||||
try:
|
||||
parsed = yaml.safe_load(fm_match.group(1))
|
||||
except yaml.YAMLError:
|
||||
return {}
|
||||
return parsed if isinstance(parsed, dict) else {}
|
||||
|
||||
|
||||
@@ -466,11 +483,39 @@ def infer_personas_from_skill_metadata(skill_name: str, metadata: dict) -> list:
|
||||
"ot": ["centurion", "bastion", "sentinel"],
|
||||
"scada": ["centurion", "bastion", "sentinel"],
|
||||
"ics": ["centurion", "bastion", "sentinel"],
|
||||
# Research / academic workflows (Feynman skills)
|
||||
"research": ["scholar", "oracle"],
|
||||
"paper": ["scholar", "oracle"],
|
||||
"arxiv": ["scholar", "oracle"],
|
||||
"replication": ["scholar", "forge"],
|
||||
"peer review": ["scholar"],
|
||||
"literature review": ["scholar"],
|
||||
"experiment": ["forge", "scholar"],
|
||||
"citation": ["scholar", "oracle"],
|
||||
}
|
||||
for keyword, mapped_personas in keyword_map.items():
|
||||
if keyword in blob:
|
||||
personas.update(mapped_personas)
|
||||
|
||||
# Feynman research-workflow skills map to scholar (primary) + forge/oracle.
|
||||
FEYNMAN_SKILLS = {
|
||||
"alpha-research", "autoresearch", "contributing", "deep-research",
|
||||
"docker", "eli5", "jobs", "literature-review", "modal-compute",
|
||||
"paper-code-audit", "paper-writing", "peer-review", "preview",
|
||||
"replication", "runpod-compute", "session-log", "session-search",
|
||||
"source-comparison", "summarize", "watch",
|
||||
}
|
||||
if name in FEYNMAN_SKILLS:
|
||||
if name in {"deep-research", "literature-review", "source-comparison",
|
||||
"paper-code-audit", "peer-review", "paper-writing",
|
||||
"replication", "alpha-research", "eli5", "summarize"}:
|
||||
personas.update(["scholar", "oracle"])
|
||||
if name in {"autoresearch", "docker", "modal-compute", "runpod-compute",
|
||||
"replication"}:
|
||||
personas.add("forge")
|
||||
if name in {"watch", "session-search", "jobs"}:
|
||||
personas.add("oracle")
|
||||
|
||||
# Conservative fallback for unmapped cybersecurity skills
|
||||
if not personas and "cyber" in domain:
|
||||
personas.update(["bastion"])
|
||||
@@ -482,13 +527,18 @@ def infer_personas_from_skill_metadata(skill_name: str, metadata: dict) -> list:
|
||||
def load_skill_persona_map(config: dict) -> dict:
|
||||
"""Load skill→persona mapping from config.yaml or use defaults."""
|
||||
custom = config.get("skill_persona_map", {})
|
||||
merged = {
|
||||
k: [p for p in v if p in VALID_PERSONAS]
|
||||
for k, v in DEFAULT_SKILL_PERSONA_MAP.items()
|
||||
}
|
||||
merged = {}
|
||||
for skill, personas in DEFAULT_SKILL_PERSONA_MAP.items():
|
||||
if "*" in personas:
|
||||
merged[skill] = sorted(VALID_PERSONAS)
|
||||
else:
|
||||
merged[skill] = [p for p in personas if p in VALID_PERSONAS]
|
||||
for skill, personas in custom.items():
|
||||
if isinstance(personas, list):
|
||||
merged[skill] = [p for p in personas if p in VALID_PERSONAS]
|
||||
if "*" in personas:
|
||||
merged[skill] = sorted(VALID_PERSONAS)
|
||||
else:
|
||||
merged[skill] = [p for p in personas if p in VALID_PERSONAS]
|
||||
return merged
|
||||
|
||||
|
||||
@@ -497,12 +547,17 @@ def search_skills(shared_dir: Path, query: str):
|
||||
query_terms = query.lower().split()
|
||||
results = []
|
||||
|
||||
for skills_subdir in ["skills", "paperclip-skills", "community-skills"]:
|
||||
for skills_subdir in [
|
||||
"skills",
|
||||
"paperclip-skills",
|
||||
"community-skills",
|
||||
"feynman-skills",
|
||||
]:
|
||||
skills_path = shared_dir / skills_subdir
|
||||
if not skills_path.exists():
|
||||
continue
|
||||
for skill_dir in sorted(skills_path.iterdir()):
|
||||
if not skill_dir.is_dir():
|
||||
if not skill_dir.is_dir() or skill_dir.name.startswith("_"):
|
||||
continue
|
||||
skill_md = skill_dir / "SKILL.md"
|
||||
if not skill_md.exists():
|
||||
@@ -623,12 +678,13 @@ def run_tests(personas_dir: Path, target: str = None):
|
||||
|
||||
|
||||
def build_skills_index(shared_dir: Path, config: dict = None) -> dict:
|
||||
"""Index all shared skills from _shared/{skills,paperclip-skills,community-skills}/."""
|
||||
"""Index all shared skills from _shared/{skills,paperclip-skills,community-skills,feynman-skills}/."""
|
||||
skill_map = load_skill_persona_map(config or {})
|
||||
index = {
|
||||
"skills": {},
|
||||
"paperclip_skills": {},
|
||||
"community_skills": {},
|
||||
"feynman_skills": {},
|
||||
"design_brands": [],
|
||||
"ui_ux_styles": 0,
|
||||
"_skill_persona_map": skill_map,
|
||||
@@ -680,7 +736,35 @@ def build_skills_index(shared_dir: Path, config: dict = None) -> dict:
|
||||
continue
|
||||
skill_md = skill_dir / "SKILL.md"
|
||||
if skill_md.exists():
|
||||
index["paperclip_skills"][skill_dir.name] = True
|
||||
skill_meta = parse_skill_frontmatter(skill_md)
|
||||
inferred_personas = infer_personas_from_skill_metadata(
|
||||
skill_dir.name, skill_meta
|
||||
)
|
||||
configured_personas = skill_map.get(skill_dir.name, [])
|
||||
merged_personas = sorted(
|
||||
set(configured_personas).union(inferred_personas)
|
||||
)
|
||||
content = skill_md.read_text(encoding="utf-8")
|
||||
first_line = ""
|
||||
for line in content.split("\n"):
|
||||
line = line.strip()
|
||||
if line and not line.startswith(
|
||||
("---", "#", "name:", "description:")
|
||||
):
|
||||
first_line = line[:120]
|
||||
break
|
||||
index["paperclip_skills"][skill_dir.name] = {
|
||||
"personas": merged_personas,
|
||||
"summary": first_line,
|
||||
"domain": str(skill_meta.get("domain", "")),
|
||||
"subdomain": str(skill_meta.get("subdomain", "")),
|
||||
"tags": skill_meta.get("tags", []),
|
||||
"mapped_by": {
|
||||
"explicit": configured_personas,
|
||||
"inferred": inferred_personas,
|
||||
},
|
||||
"has_references": (skill_dir / "references").is_dir(),
|
||||
}
|
||||
|
||||
# Index community-skills
|
||||
cskills_dir = shared_dir / "community-skills"
|
||||
@@ -690,7 +774,76 @@ def build_skills_index(shared_dir: Path, config: dict = None) -> dict:
|
||||
continue
|
||||
skill_md = skill_dir / "SKILL.md"
|
||||
if skill_md.exists():
|
||||
index["community_skills"][skill_dir.name] = True
|
||||
skill_meta = parse_skill_frontmatter(skill_md)
|
||||
inferred_personas = infer_personas_from_skill_metadata(
|
||||
skill_dir.name, skill_meta
|
||||
)
|
||||
configured_personas = skill_map.get(skill_dir.name, [])
|
||||
merged_personas = sorted(
|
||||
set(configured_personas).union(inferred_personas)
|
||||
)
|
||||
content = skill_md.read_text(encoding="utf-8")
|
||||
first_line = ""
|
||||
for line in content.split("\n"):
|
||||
line = line.strip()
|
||||
if line and not line.startswith(
|
||||
("---", "#", "name:", "description:")
|
||||
):
|
||||
first_line = line[:120]
|
||||
break
|
||||
index["community_skills"][skill_dir.name] = {
|
||||
"personas": merged_personas,
|
||||
"summary": first_line,
|
||||
"domain": str(skill_meta.get("domain", "")),
|
||||
"subdomain": str(skill_meta.get("subdomain", "")),
|
||||
"tags": skill_meta.get("tags", []),
|
||||
"mapped_by": {
|
||||
"explicit": configured_personas,
|
||||
"inferred": inferred_personas,
|
||||
},
|
||||
"has_references": (skill_dir / "references").is_dir(),
|
||||
}
|
||||
|
||||
# Index feynman-skills (research workflows adapted from Feynman).
|
||||
# Use the same persona-aware indexing as shared skills so mapped skills
|
||||
# flow into Scholar / Forge / Oracle persona JSON outputs.
|
||||
fskills_dir = shared_dir / "feynman-skills"
|
||||
if fskills_dir.exists():
|
||||
for skill_dir in sorted(fskills_dir.iterdir()):
|
||||
if not skill_dir.is_dir() or skill_dir.name.startswith("_"):
|
||||
continue
|
||||
skill_md = skill_dir / "SKILL.md"
|
||||
if not skill_md.exists():
|
||||
continue
|
||||
skill_meta = parse_skill_frontmatter(skill_md)
|
||||
inferred_personas = infer_personas_from_skill_metadata(
|
||||
skill_dir.name, skill_meta
|
||||
)
|
||||
configured_personas = skill_map.get(skill_dir.name, [])
|
||||
merged_personas = sorted(
|
||||
set(configured_personas).union(inferred_personas)
|
||||
)
|
||||
content = skill_md.read_text(encoding="utf-8")
|
||||
first_line = ""
|
||||
for line in content.split("\n"):
|
||||
line = line.strip()
|
||||
if line and not line.startswith(
|
||||
("---", "#", "name:", "description:")
|
||||
):
|
||||
first_line = line[:120]
|
||||
break
|
||||
index["feynman_skills"][skill_dir.name] = {
|
||||
"personas": merged_personas,
|
||||
"summary": first_line,
|
||||
"domain": str(skill_meta.get("domain", "")),
|
||||
"subdomain": str(skill_meta.get("subdomain", "")),
|
||||
"tags": skill_meta.get("tags", []),
|
||||
"mapped_by": {
|
||||
"explicit": configured_personas,
|
||||
"inferred": inferred_personas,
|
||||
},
|
||||
"has_references": (skill_dir / "references").is_dir(),
|
||||
}
|
||||
|
||||
# Index design brands
|
||||
design_dir = shared_dir / "design-md"
|
||||
@@ -880,6 +1033,7 @@ def build_catalog(
|
||||
f" Skills: {len(si.get('skills', {}))} shared + "
|
||||
f"{len(si.get('paperclip_skills', {}))} paperclip + "
|
||||
f"{len(si.get('community_skills', {}))} community + "
|
||||
f"{len(si.get('feynman_skills', {}))} feynman + "
|
||||
f"{len(si.get('design_brands', []))} design brands + "
|
||||
f"{si.get('ui_ux_styles', 0)} UI/UX data files"
|
||||
)
|
||||
@@ -1193,6 +1347,18 @@ def install_claude_skills(
|
||||
|
||||
per_source[source] = count
|
||||
|
||||
# Feynman-skill SKILL.md files reference `../_platform-mapping.md`. Emit the
|
||||
# sibling at ~/.claude/skills/_platform-mapping.md so relative refs resolve.
|
||||
if "feynman-skills" in sources:
|
||||
pmap_src = shared_dir / "feynman-skills" / "_platform-mapping.md"
|
||||
if pmap_src.exists():
|
||||
if dry_run:
|
||||
print(f" [dry-run] would emit {skills_dir}/_platform-mapping.md")
|
||||
else:
|
||||
(skills_dir / "_platform-mapping.md").write_text(
|
||||
pmap_src.read_text(encoding="utf-8"), encoding="utf-8"
|
||||
)
|
||||
|
||||
mode = "[dry-run] " if dry_run else ""
|
||||
print(f" {mode}Claude skills — per source: "
|
||||
+ ", ".join(f"{k}={v}" for k, v in per_source.items()))
|
||||
@@ -1299,6 +1465,11 @@ def _classify_skill_topic(name: str, fm: dict) -> str:
|
||||
return "security-general"
|
||||
|
||||
NAME_PATTERNS = [
|
||||
# Feynman research-workflow skills — keep before generic patterns so they
|
||||
# win over broader matches. All map into buckets already in the default.
|
||||
("ai-llm-dev", r"^(alpha-research|autoresearch|deep-research|literature-review|paper-code-audit|paper-writing|peer-review|replication|source-comparison|summarize|eli5)$"),
|
||||
("cloud-infra", r"^(modal-compute|runpod-compute)$"),
|
||||
("ops-sysadmin", r"^(session-log|session-search|preview|watch|jobs|contributing)$"),
|
||||
("coding-frontend", r"^(react|nextjs|next-|angular|vue-|svelte|tailwind|shadcn|vercel|expo|remotion|frontend|ui-ux|accessibility|canvas-|stitch|framer)"),
|
||||
("coding-backend", r"^(python|java-|csharp|dotnet|aspnet|kotlin|swift|rust-|golang|go-|ruby-|php-|nodejs|node-|bash-|cli-|bazel|async-|architecting-|aspire-)"),
|
||||
("coding-tools", r"^(commit|changelog|debug-|refactor|test-driven|tdd|bdd|git-|github-|gitlab-|bats|copilot|codeql|code-review|linting|formatting|add-|adr-|agent-browser|mcp-)"),
|
||||
@@ -1556,12 +1727,17 @@ def install_opencode(
|
||||
_shutil.rmtree(existing)
|
||||
|
||||
if shared_dir:
|
||||
for skills_subdir in ["skills", "paperclip-skills", "community-skills"]:
|
||||
for skills_subdir in [
|
||||
"skills",
|
||||
"paperclip-skills",
|
||||
"community-skills",
|
||||
"feynman-skills",
|
||||
]:
|
||||
src_root = shared_dir / skills_subdir
|
||||
if not src_root.exists():
|
||||
continue
|
||||
for skill_dir in src_root.iterdir():
|
||||
if not skill_dir.is_dir():
|
||||
if not skill_dir.is_dir() or skill_dir.name.startswith("_"):
|
||||
continue
|
||||
src_skill = skill_dir / "SKILL.md"
|
||||
if not src_skill.exists():
|
||||
@@ -1599,6 +1775,15 @@ def install_opencode(
|
||||
)
|
||||
skill_count += 1
|
||||
|
||||
# Feynman-skill SKILL.md files reference `../_platform-mapping.md`. Emit the
|
||||
# sibling so the relative reference resolves inside ~/.config/opencode/skills/.
|
||||
if shared_dir:
|
||||
pmap_src = shared_dir / "feynman-skills" / "_platform-mapping.md"
|
||||
if pmap_src.exists():
|
||||
(skills_dir / "_platform-mapping.md").write_text(
|
||||
pmap_src.read_text(encoding="utf-8"), encoding="utf-8"
|
||||
)
|
||||
|
||||
print(
|
||||
f" OpenCode: {agent_count} agents installed to {agents_dir}"
|
||||
)
|
||||
@@ -1932,10 +2117,10 @@ def main():
|
||||
# --- claude-skills filters --------------------------------------------
|
||||
parser.add_argument(
|
||||
"--skill-sources",
|
||||
default="skills,paperclip-skills",
|
||||
default="skills,paperclip-skills,feynman-skills",
|
||||
help="Comma-separated list of _shared/<dir> sources for claude-skills "
|
||||
"(available: skills,paperclip-skills,community-skills). "
|
||||
"Default: skills,paperclip-skills",
|
||||
"(available: skills,paperclip-skills,community-skills,feynman-skills). "
|
||||
"Default: skills,paperclip-skills,feynman-skills",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--skill-subdomains",
|
||||
|
||||
79
personas/_shared/feynman-skills/AGENTS.md
Normal file
79
personas/_shared/feynman-skills/AGENTS.md
Normal file
@@ -0,0 +1,79 @@
|
||||
# Agents
|
||||
|
||||
`AGENTS.md` is the repo-level contract for agents working in this repository.
|
||||
|
||||
Pi subagent behavior does **not** live here. The source of truth for bundled Pi subagents is `.feynman/agents/*.md`, which the runtime syncs into the Pi agent directory. If you need to change how `researcher`, `reviewer`, `writer`, or `verifier` behave, edit the corresponding file in `.feynman/agents/` instead of duplicating those prompts here.
|
||||
|
||||
## Pi subagents
|
||||
|
||||
Feynman ships four bundled research subagents:
|
||||
|
||||
- `researcher`
|
||||
- `reviewer`
|
||||
- `writer`
|
||||
- `verifier`
|
||||
|
||||
They are defined in `.feynman/agents/` and invoked via the Pi `subagent` tool.
|
||||
|
||||
## What belongs here
|
||||
|
||||
Keep this file focused on cross-agent repo conventions:
|
||||
|
||||
- output locations and file naming expectations
|
||||
- workspace-level continuity expectations for long-running work
|
||||
- provenance and verification requirements
|
||||
- handoff rules between the lead agent and subagents
|
||||
|
||||
Do **not** restate per-agent prompt text here unless there is a repo-wide constraint that applies to all agents.
|
||||
|
||||
## Output conventions
|
||||
|
||||
- Research outputs go in `outputs/`.
|
||||
- Paper-style drafts go in `papers/`.
|
||||
- Session logs go in `notes/`.
|
||||
- The workspace-level lab notebook lives at `CHANGELOG.md`.
|
||||
- Plan artifacts for long-running workflows go in `outputs/.plans/`.
|
||||
- Intermediate research artifacts are written to disk by subagents and read by the lead agent. They are not returned inline unless the user explicitly asks for them.
|
||||
- Long-running workflows should treat the plan artifact as an externalized working memory, not a static outline. Keep task status and verification state there as the run evolves.
|
||||
- Long-running or resumable workflows should also treat `CHANGELOG.md` as the chronological lab notebook: what changed, what failed, what was verified, and what should happen next.
|
||||
- Do not create or update `CHANGELOG.md` for trivial one-shot tasks.
|
||||
|
||||
## File naming
|
||||
|
||||
Every workflow that produces artifacts must derive a short **slug** from the topic (lowercase, hyphens, no filler words, ≤5 words — e.g. `cloud-sandbox-pricing`). All files in a single run use that slug as a prefix:
|
||||
|
||||
- Plan: `outputs/.plans/<slug>.md`
|
||||
- Intermediate research: `<slug>-research-web.md`, `<slug>-research-papers.md`, etc.
|
||||
- Draft: `outputs/.drafts/<slug>-draft.md`
|
||||
- Cited brief: `<slug>-brief.md`
|
||||
- Verification: `<slug>-verification.md`
|
||||
- Final output: `outputs/<slug>.md` or `papers/<slug>.md`
|
||||
- Provenance: `<slug>.provenance.md` (next to the final output)
|
||||
|
||||
Never use generic names like `research.md`, `draft.md`, `brief.md`, or `summary.md`. Concurrent runs must not collide.
|
||||
|
||||
## Workspace changelog
|
||||
|
||||
- `CHANGELOG.md` is a lab notebook, not release notes.
|
||||
- Read `CHANGELOG.md` before resuming substantial work when it exists.
|
||||
- Append concise entries after meaningful progress, failed approaches, major verification results, or new blockers.
|
||||
- Each entry should identify the active slug or objective and end with the next recommended step.
|
||||
- Mark verification state honestly with labels such as `verified`, `unverified`, `blocked`, or `inferred` only when they match the underlying evidence.
|
||||
|
||||
## Provenance and verification
|
||||
|
||||
- Every output from `/deepresearch` and `/lit` must include a `.provenance.md` sidecar.
|
||||
- Provenance sidecars should record source accounting and verification status.
|
||||
- Source verification and citation cleanup belong in the `verifier` stage, not in ad hoc edits after delivery.
|
||||
- Verification passes should happen before delivery when the workflow calls for them.
|
||||
- If a workflow uses the words `verified`, `confirmed`, or `checked`, the underlying artifact should record what was actually checked and how.
|
||||
- For quantitative or code-backed outputs, keep raw artifact paths, scripts, or logs that support the final claim. Do not rely on polished summaries alone.
|
||||
- Never smooth over missing checks. Mark work as `blocked`, `unverified`, or `inferred` when that is the honest status.
|
||||
|
||||
## Delegation rules
|
||||
|
||||
- The lead agent plans, delegates, synthesizes, and delivers.
|
||||
- Use subagents when the work is meaningfully decomposable; do not spawn them for trivial work.
|
||||
- Prefer file-based handoffs over dumping large intermediate results back into parent context.
|
||||
- The lead agent is responsible for reconciling task completion. Subagents may not silently skip assigned tasks; skipped or merged tasks must be recorded in the plan artifact.
|
||||
- For critical claims, require at least one adversarial verification pass after synthesis. Fix fatal issues before delivery or surface them explicitly.
|
||||
115
personas/_shared/feynman-skills/CONTRIBUTING.md
Normal file
115
personas/_shared/feynman-skills/CONTRIBUTING.md
Normal file
@@ -0,0 +1,115 @@
|
||||
# Contributing to Feynman
|
||||
|
||||
Feynman is a research-first CLI built on Pi and alphaXiv. This guide is for humans and agents contributing code, prompts, skills, docs, installers, or workflow behavior to the repository.
|
||||
|
||||
## Quick Links
|
||||
|
||||
- GitHub: https://github.com/getcompanion-ai/feynman
|
||||
- Docs: https://feynman.is/docs
|
||||
- Repo agent contract: [AGENTS.md](AGENTS.md)
|
||||
- Issues: https://github.com/getcompanion-ai/feynman/issues
|
||||
|
||||
## What Goes Where
|
||||
|
||||
- CLI/runtime code: `src/`
|
||||
- Bundled prompt templates: `prompts/`
|
||||
- Bundled Pi skills: `skills/`
|
||||
- Bundled Pi subagent prompts: `.feynman/agents/`
|
||||
- Docs site: `website/`
|
||||
- Build/release scripts: `scripts/`
|
||||
- Generated research artifacts: `outputs/`, `papers/`, `notes/`
|
||||
|
||||
If you need to change how bundled subagents behave, edit `.feynman/agents/*.md`. Do not duplicate that behavior in `AGENTS.md`.
|
||||
|
||||
## Before You Open a PR
|
||||
|
||||
1. Start from the latest `main`.
|
||||
2. Use Node.js `22.x` for local development. The supported runtime range is Node.js `20.19.0` through `24.x`; `.nvmrc` pins the preferred local version while `package.json`, `website/package.json`, and the runtime version guard define the broader supported range.
|
||||
3. Install dependencies from the repo root:
|
||||
|
||||
```bash
|
||||
nvm use || nvm install
|
||||
npm install
|
||||
```
|
||||
|
||||
4. Run the required checks before asking for review:
|
||||
|
||||
```bash
|
||||
npm test
|
||||
npm run typecheck
|
||||
npm run build
|
||||
```
|
||||
|
||||
5. If you changed the docs site, also validate the website:
|
||||
|
||||
```bash
|
||||
cd website
|
||||
npm install
|
||||
npm run build
|
||||
```
|
||||
|
||||
6. Keep the PR focused. Do not mix unrelated cleanup with the real change.
|
||||
7. Add or update tests when behavior changes.
|
||||
8. Update docs, prompts, or skills when the user-facing workflow changes.
|
||||
|
||||
## Contribution Rules
|
||||
|
||||
- Bugs, docs fixes, installer fixes, and focused workflow improvements are good PRs.
|
||||
- Large feature changes should start with an issue or a concrete implementation discussion before code lands.
|
||||
- Avoid refactor-only PRs unless they are necessary to unblock a real fix or requested by a maintainer.
|
||||
- Do not silently change release behavior, installer behavior, or runtime defaults without documenting the reason in the PR.
|
||||
- Use American English in docs, comments, prompts, UI copy, and examples.
|
||||
- Do not add bundled prompts, skills, or docs whose primary purpose is to market, endorse, or funnel users toward a third-party product or service. Product integrations must be justified by user-facing utility and written in neutral language.
|
||||
|
||||
## Repo-Specific Checks
|
||||
|
||||
### Prompt and skill changes
|
||||
|
||||
- New workflows usually live in `prompts/*.md`.
|
||||
- New reusable capabilities usually live in `skills/<name>/SKILL.md`.
|
||||
- Keep skill files concise. Put detailed operational rules in the prompt or in focused reference files only when needed.
|
||||
- If a new workflow should be invokable from the CLI, make sure its prompt frontmatter includes the correct metadata and that the command works through the normal prompt discovery path.
|
||||
|
||||
### Agent and artifact conventions
|
||||
|
||||
- `AGENTS.md` is the repo-level contract for workspace conventions, handoffs, provenance, and output naming.
|
||||
- Long-running research flows should write plan artifacts to `outputs/.plans/` and use `CHANGELOG.md` as a lab notebook when the work is substantial.
|
||||
- Do not update `CHANGELOG.md` for trivial one-shot changes.
|
||||
|
||||
### Release and versioning discipline
|
||||
|
||||
- The curl installer and release docs point users at tagged releases, not arbitrary commits on `main`.
|
||||
- If you ship user-visible fixes after a tag, do not leave the repo in a state where `main` and the latest release advertise the same version string while containing different behavior.
|
||||
- When changing release-sensitive behavior, check the version story across:
|
||||
- `.nvmrc`
|
||||
- `package.json`
|
||||
- `website/package.json`
|
||||
- `scripts/check-node-version.mjs`
|
||||
- install docs in `README.md` and `website/src/content/docs/getting-started/installation.md`
|
||||
|
||||
## AI-Assisted Contributions
|
||||
|
||||
AI-assisted PRs are fine. The contributor is still responsible for the diff.
|
||||
|
||||
- Understand the code you are submitting.
|
||||
- Run the local checks yourself instead of assuming generated code is correct.
|
||||
- Include enough context in the PR description for a reviewer to understand the change quickly.
|
||||
- If an agent updated prompts or skills, verify the instructions match the actual repo behavior.
|
||||
|
||||
## Review Expectations
|
||||
|
||||
- Explain what changed and why.
|
||||
- Call out tradeoffs, follow-up work, and anything intentionally not handled.
|
||||
- Include screenshots for UI changes.
|
||||
- Resolve review comments you addressed before requesting review again.
|
||||
|
||||
## Good First Areas
|
||||
|
||||
Useful contributions usually land in one of these areas:
|
||||
|
||||
- installation and upgrade reliability
|
||||
- research workflow quality
|
||||
- model/provider setup ergonomics
|
||||
- docs clarity
|
||||
- preview and export stability
|
||||
- packaging and release hygiene
|
||||
52
personas/_shared/feynman-skills/README.md
Normal file
52
personas/_shared/feynman-skills/README.md
Normal file
@@ -0,0 +1,52 @@
|
||||
# Feynman Skills (Platform-Agnostic Port)
|
||||
|
||||
Adapted from the [Feynman](https://feynman.is) research skills pack (v0.2.34) for use with **Claude Code**, **OpenCode**, and any Anthropic-spec skill runtime.
|
||||
|
||||
## Source
|
||||
|
||||
- Upstream: `~/.codex/skills/feynman/` (Feynman CLI installer)
|
||||
- Upstream license/model: research workflows produced by the Feynman project
|
||||
- This copy is re-adapted to remove Feynman-runtime coupling so skills work standalone.
|
||||
|
||||
## What changed from upstream
|
||||
|
||||
| Upstream element | This port |
|
||||
|---|---|
|
||||
| `/deepresearch`, `/lit`, `/review`, `/draft`, `/audit`, `/replicate`, `/compare`, `/watch`, `/log`, `/jobs` slash commands | Inline procedure inside `SKILL.md` or in `references/<name>.md`. Invoke the skill by name, not by slash command. |
|
||||
| `researcher` / `reviewer` / `writer` / `verifier` bundled subagents | Mapped to **Claude Code `Task` tool** (`subagent_type: scholar / oracle / general-purpose`) and **OpenCode `task` tool** (with a `scholar` / `oracle` / `forge` agent). |
|
||||
| `pi-autoresearch`, `pi-schedule-prompt`, `pi-charts`, `pi-processes` | Replaced with platform equivalents (Claude `ScheduleWakeup` / cron, Mermaid, bash) or marked optional. |
|
||||
| `~/.feynman/sessions/` transcripts | Generalized to per-platform session paths (documented in `session-search/`). |
|
||||
| `/preview` command | Bash fallbacks (`xdg-open`, `open`, `pandoc`). |
|
||||
| `../prompts/<name>.md` sibling references | Inlined into each SKILL.md, or moved to `references/<name>.md` inside the skill so the skill is portable. |
|
||||
|
||||
## Output conventions (carried over from upstream `AGENTS.md`)
|
||||
|
||||
- Research outputs → `outputs/`
|
||||
- Paper-style drafts → `papers/`
|
||||
- Session logs → `notes/`
|
||||
- Workspace lab notebook → `CHANGELOG.md`
|
||||
- Plan artifacts → `outputs/.plans/`
|
||||
- Intermediate research → `<slug>-research-*.md` on disk, not returned inline
|
||||
- Slug rule: every workflow derives a short hyphenated slug (`≤5 words`) and prefixes all artifacts with it — concurrent runs must not collide
|
||||
|
||||
## Platform-tool mapping reference
|
||||
|
||||
See `_platform-mapping.md` — the canonical mapping used by every SKILL.md in this directory.
|
||||
|
||||
## Skill list (19)
|
||||
|
||||
Research workflows:
|
||||
- `deep-research`, `literature-review`, `source-comparison`, `paper-code-audit`,
|
||||
- `peer-review`, `paper-writing`, `replication`, `autoresearch`
|
||||
|
||||
Paper utilities:
|
||||
- `alpha-research`, `eli5`
|
||||
|
||||
Compute environments:
|
||||
- `docker`, `modal-compute`, `runpod-compute`
|
||||
|
||||
Session / project:
|
||||
- `session-log`, `session-search`, `jobs`, `watch`, `preview`
|
||||
|
||||
Self-referential:
|
||||
- `contributing` (for contributing to the upstream Feynman repo)
|
||||
96
personas/_shared/feynman-skills/_platform-mapping.md
Normal file
96
personas/_shared/feynman-skills/_platform-mapping.md
Normal file
@@ -0,0 +1,96 @@
|
||||
# Platform Mapping Reference
|
||||
|
||||
Every Feynman-skill SKILL.md in this directory refers to the abstractions defined here. Both runtimes implement the same conceptual operations under different tool names.
|
||||
|
||||
## Subagent delegation
|
||||
|
||||
Feynman ships four bundled research roles: `researcher`, `reviewer`, `writer`, `verifier`. Outside of Feynman, dispatch them via the host platform's generic delegation primitive.
|
||||
|
||||
| Role | Claude Code | OpenCode |
|
||||
|---|---|---|
|
||||
| `researcher` (evidence gathering, source hunting) | `Task` tool with `subagent_type: scholar` (if installed from personas repo) or `general-purpose` | `task` tool invoking `scholar` agent (if installed) or `general` |
|
||||
| `reviewer` (adversarial review of a cited draft) | `Task` tool with `subagent_type: general-purpose`, prompt: *"review this artifact for FATAL/MAJOR/MINOR issues — no rewrites, only findings"* | `task` tool invoking a `reviewer`-prompted general agent |
|
||||
| `writer` (synthesize notes into a polished draft) | `Task` tool with `subagent_type: forge` (personas) or `general-purpose` | `task` tool invoking `forge` or `general` |
|
||||
| `verifier` (URL / citation verification) | `Task` tool with `subagent_type: oracle` (personas) or `general-purpose` + `WebFetch` permission | `task` tool invoking `oracle` or `general` with `webfetch` allowed |
|
||||
|
||||
The lead agent (the skill caller) plans, dispatches, synthesizes, and delivers. Subagents write artifacts to disk and the lead reads them. Never dump large intermediate results back into parent context.
|
||||
|
||||
## Scheduling recurring work
|
||||
|
||||
| Need | Claude Code | OpenCode |
|
||||
|---|---|---|
|
||||
| Wake up later in same session | `ScheduleWakeup` | (no direct equivalent — use cron) |
|
||||
| Cron-style recurring agent runs | `CronCreate` / `CronList` / `CronDelete` | system `cron` invoking `opencode run` |
|
||||
| Interactive loop with self-pacing | `/loop` slash command | Manual re-invocation |
|
||||
|
||||
## Charts and diagrams
|
||||
|
||||
`pi-charts` is a Feynman-runtime helper. Outside Feynman:
|
||||
|
||||
- **Quantitative comparisons**: output Mermaid bar/line/pie charts, or write CSV and ask the user to render. Do not invent charts.
|
||||
- **Architecture / pipeline diagrams**: Mermaid `graph TD` / `flowchart`.
|
||||
- **Every figure** needs a provenance-bearing caption naming the source.
|
||||
|
||||
## Preview / render
|
||||
|
||||
`/preview` is Feynman-specific. Fallbacks:
|
||||
|
||||
```bash
|
||||
# macOS
|
||||
open <file.md> # opens in default app
|
||||
open <file.pdf>
|
||||
|
||||
# Linux
|
||||
xdg-open <file.md>
|
||||
xdg-open <file.pdf>
|
||||
|
||||
# PDF export (cross-platform)
|
||||
pandoc <file.md> -o <file.pdf>
|
||||
```
|
||||
|
||||
## Session history search
|
||||
|
||||
Feynman stores transcripts at `~/.feynman/sessions/*.jsonl`. Other runtimes:
|
||||
|
||||
| Platform | Session store |
|
||||
|---|---|
|
||||
| Claude Code | `~/.claude/projects/<hash-of-cwd>/*.jsonl` |
|
||||
| OpenCode | `~/.local/share/opencode/session/` |
|
||||
|
||||
Search with `grep -ril "<query>" <store>` or the platform's session-search command.
|
||||
|
||||
## Paper search
|
||||
|
||||
`alpha` CLI (alphaXiv-backed) is platform-agnostic — install via `pip install alpha-hub` or the upstream installer. When `alpha` is unavailable, fall back to:
|
||||
|
||||
- `WebSearch` + arXiv abstract page fetch (Claude Code)
|
||||
- `webfetch` + arXiv (OpenCode)
|
||||
- Semantic Scholar API / OpenAlex API for programmatic paper search
|
||||
|
||||
## Cross-session persistence (plans, memory)
|
||||
|
||||
Feynman has a `memory_remember` tool that lets a workflow stash a plan or artifact under a stable key (e.g. `deepresearch.<slug>.plan`) so a later session can recover it. Outside Feynman:
|
||||
|
||||
| Platform | Equivalent |
|
||||
|---|---|
|
||||
| Claude Code | `auto-memory` system (`~/.claude/projects/<hash>/memory/`) — write `{{slug}}-plan.md` and add a one-liner to `MEMORY.md` |
|
||||
| OpenCode | No first-party equivalent; use filesystem (`outputs/.plans/<slug>.md` is already the canonical location) |
|
||||
| Any runtime | Filesystem is the lowest-common-denominator — `outputs/.plans/<slug>.md` survives session boundaries |
|
||||
|
||||
Rule: always write the plan to disk first (`outputs/.plans/<slug>.md`). If platform-native memory exists, also mirror a pointer there. Never rely on memory alone.
|
||||
|
||||
## Background process inspection
|
||||
|
||||
Feynman `process` tool → outside Feynman:
|
||||
|
||||
```bash
|
||||
# running processes
|
||||
ps auxf
|
||||
pgrep -fa <pattern>
|
||||
|
||||
# cron / systemd-timers
|
||||
crontab -l
|
||||
systemctl --user list-timers
|
||||
```
|
||||
|
||||
Claude Code also has `Monitor` for streaming background-command output.
|
||||
53
personas/_shared/feynman-skills/alpha-research/SKILL.md
Normal file
53
personas/_shared/feynman-skills/alpha-research/SKILL.md
Normal file
@@ -0,0 +1,53 @@
|
||||
---
|
||||
name: alpha-research
|
||||
description: Search, read, and query research papers via the `alpha` CLI (alphaXiv-backed). Use when the user asks about academic papers, wants to find research on a topic, needs to read a specific paper, ask questions about a paper, inspect a paper's code repository, or manage paper annotations.
|
||||
allowed-tools: Bash(alpha:*)
|
||||
---
|
||||
|
||||
# Alpha Research CLI
|
||||
|
||||
Use the `alpha` CLI via bash for all paper research operations. Platform-agnostic — works in Claude Code, OpenCode, or any shell.
|
||||
|
||||
## Install
|
||||
|
||||
```bash
|
||||
pip install alpha-hub
|
||||
alpha login # authenticate with alphaXiv
|
||||
alpha status # verify auth
|
||||
```
|
||||
|
||||
If `alpha` is unavailable, fall back to `WebSearch` (Claude Code) / `webfetch` (OpenCode) against `arxiv.org` or `semanticscholar.org`.
|
||||
|
||||
## Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `alpha search "<query>"` | Search papers. Prefer `--mode semantic` by default; use `--mode keyword` only for exact-term lookup and `--mode agentic` for broader retrieval. |
|
||||
| `alpha get <arxiv-id-or-url>` | Fetch paper content and any local annotation |
|
||||
| `alpha get --full-text <arxiv-id>` | Get raw full text instead of AI report |
|
||||
| `alpha ask <arxiv-id> "<question>"` | Ask a question about a paper's PDF |
|
||||
| `alpha code <github-url> [path]` | Read files from a paper's GitHub repo. Use `/` for overview |
|
||||
| `alpha annotate <paper-id> "<note>"` | Save a persistent annotation on a paper |
|
||||
| `alpha annotate --clear <paper-id>` | Remove an annotation |
|
||||
| `alpha annotate --list` | List all annotations |
|
||||
|
||||
## Examples
|
||||
|
||||
```bash
|
||||
alpha search "transformer scaling laws"
|
||||
alpha search --mode agentic "efficient attention mechanisms for long context"
|
||||
alpha get 2106.09685
|
||||
alpha ask 2106.09685 "What optimizer did they use?"
|
||||
alpha code https://github.com/karpathy/nanoGPT src/model.py
|
||||
alpha annotate 2106.09685 "Key paper on LoRA — revisit for adapter comparison"
|
||||
```
|
||||
|
||||
## When to use
|
||||
|
||||
- Academic paper search, reading, Q&A → `alpha`
|
||||
- Current topics (products, releases, docs) → web search tools
|
||||
- Mixed topics → combine both
|
||||
|
||||
## PDF fetch warning
|
||||
|
||||
`alpha get --full-text` can crash on malformed PDFs. Prefer metadata / abstracts / HTML for routine work; only pull full text when the user asks for a deep read.
|
||||
95
personas/_shared/feynman-skills/autoresearch/SKILL.md
Normal file
95
personas/_shared/feynman-skills/autoresearch/SKILL.md
Normal file
@@ -0,0 +1,95 @@
|
||||
---
|
||||
name: autoresearch
|
||||
description: Autonomous experiment loop that tries ideas, measures results, keeps what works, and discards what doesn't. Use when the user asks to optimize a metric, run an experiment loop, improve performance iteratively, or automate benchmarking.
|
||||
allowed-tools: Bash(git:*), Bash(docker:*), Bash(modal:*), Bash(runpodctl:*)
|
||||
---
|
||||
|
||||
# Autoresearch — Autonomous Optimization Loop
|
||||
|
||||
Run an iterative optimize-measure-commit loop against a user-chosen metric. The loop edits code, runs a benchmark, keeps commits that improve the metric, and reverts the rest.
|
||||
|
||||
> **Upstream note.** Feynman ships `pi-autoresearch` with the `init_experiment` / `run_experiment` / `log_experiment` tools. This skill documents the same loop as a generic procedure so it works in Claude Code and OpenCode without those tools. When `pi-autoresearch` is present, delegate to it; otherwise implement the loop with git + bash.
|
||||
|
||||
## Step 1 — Gather requirements
|
||||
|
||||
If `autoresearch.md` and `autoresearch.jsonl` already exist in the workspace, ask the user whether to **resume** or **start fresh**. If `CHANGELOG.md` exists, read the most recent relevant entries before resuming.
|
||||
|
||||
Otherwise, collect from the user before doing anything else:
|
||||
|
||||
- **What to optimize** — test speed, bundle size, training loss, build time, etc.
|
||||
- **Benchmark command** — the exact shell command that produces the metric
|
||||
- **Metric** — name, unit, and direction (lower-is-better or higher-is-better)
|
||||
- **Files in scope** — which files the loop is allowed to modify
|
||||
- **Max iterations** — default 20
|
||||
|
||||
## Step 2 — Pick an environment
|
||||
|
||||
Ask the user where to run iterations:
|
||||
|
||||
- **Local** — current working directory
|
||||
- **New git branch** — create a branch so `main` stays clean
|
||||
- **Virtual environment** — isolated venv/conda first
|
||||
- **Docker** — run iterations inside a container (see `docker` skill)
|
||||
- **Modal** — serverless GPU; stateless burst (see `modal-compute` skill)
|
||||
- **RunPod** — persistent GPU pod with SSH (see `runpod-compute` skill)
|
||||
|
||||
Do not proceed without a clear answer.
|
||||
|
||||
## Step 3 — Confirm
|
||||
|
||||
Before starting the loop, present the full plan:
|
||||
|
||||
```
|
||||
Optimization target: <metric> (<direction>)
|
||||
Benchmark command: <command>
|
||||
Files in scope: <files>
|
||||
Environment: <chosen environment>
|
||||
Max iterations: <N>
|
||||
```
|
||||
|
||||
Wait for explicit approval. No silent starts.
|
||||
|
||||
## Step 4 — Run the loop
|
||||
|
||||
Initialize session files:
|
||||
|
||||
- `autoresearch.md` — human-readable running log (one section per iteration)
|
||||
- `autoresearch.sh` — the benchmark command, committed so it's reproducible
|
||||
- `autoresearch.jsonl` — one JSON record per iteration: `{iter, diff_ref, metric_value, kept, duration_s, notes}`
|
||||
|
||||
Run the **baseline** once and record the metric. This is iteration 0.
|
||||
|
||||
Then loop until `max_iterations` or user interruption:
|
||||
|
||||
1. **Propose a change** — pick one hypothesis (fewer deps, tighter loop, different algo, different config) based on what you've learned from prior iterations. State it in one sentence before editing.
|
||||
2. **Edit** the files in scope.
|
||||
3. **Commit** with a descriptive message (`autoresearch: iter N — <hypothesis>`).
|
||||
4. **Run the benchmark** (`bash autoresearch.sh`), capture output and wall-clock time.
|
||||
5. **Decide**:
|
||||
- If the metric improved (per direction): **keep** the commit.
|
||||
- Otherwise: `git revert` the commit or `git reset --hard HEAD~1` if still uncommitted-on-branch.
|
||||
6. **Log** the iteration to `autoresearch.md` + `autoresearch.jsonl`.
|
||||
7. After meaningful milestones, append a concise entry to `CHANGELOG.md` summarizing what changed, the metric movement, and the next hypothesis.
|
||||
|
||||
## Step 5 — When to stop
|
||||
|
||||
- `max_iterations` reached
|
||||
- The metric plateaus for 3+ iterations
|
||||
- The user interrupts
|
||||
- You run out of clearly-motivated hypotheses (don't flail)
|
||||
|
||||
## Step 6 — Final report
|
||||
|
||||
When the loop stops, write a short summary: starting metric, ending metric, which hypotheses helped, which didn't, and what the next direction would be. Save to `outputs/<slug>-autoresearch-summary.md`.
|
||||
|
||||
## Subcommands (Feynman parity)
|
||||
|
||||
- `autoresearch <text>` — start or resume the loop
|
||||
- `autoresearch off` — stop the loop, keep data
|
||||
- `autoresearch clear` — delete all state and start fresh
|
||||
|
||||
## Key invariants
|
||||
|
||||
- **No silent rewrites.** Every iteration's metric movement must be traceable to a commit.
|
||||
- **No invented results.** If the benchmark fails, log the failure as iteration data; don't pretend it succeeded.
|
||||
- **No config drift.** The benchmark command must be stable across iterations — if it needs to change, that's a new session.
|
||||
45
personas/_shared/feynman-skills/contributing/SKILL.md
Normal file
45
personas/_shared/feynman-skills/contributing/SKILL.md
Normal file
@@ -0,0 +1,45 @@
|
||||
---
|
||||
name: contributing
|
||||
description: Contribute changes to the upstream Feynman repository. Use when the task is to add features, fix bugs, update prompts or skills, change install or release behavior, improve docs, or prepare a focused PR against the Feynman project itself.
|
||||
---
|
||||
|
||||
# Contributing to Upstream Feynman
|
||||
|
||||
> **Scope note.** This skill applies only when you are actively working inside a clone of the upstream Feynman repository (https://feynman.is). If you are using these skills in another runtime (Claude Code, OpenCode, your own research workspace), skip this skill — the contributing targets below don't exist here.
|
||||
|
||||
Read `CONTRIBUTING.md` and `AGENTS.md` at the Feynman repo root before making changes. Those two files are the source of truth; this skill is the short index.
|
||||
|
||||
## When this applies
|
||||
|
||||
- CLI or runtime changes in `src/`
|
||||
- prompt changes in `prompts/`
|
||||
- bundled skill changes in `skills/`
|
||||
- subagent behavior changes in `.feynman/agents/`
|
||||
- install, packaging, or release changes in `scripts/`, `README.md`, or website docs
|
||||
|
||||
## Minimum local checks before claiming a change is done
|
||||
|
||||
```bash
|
||||
npm test
|
||||
npm run typecheck
|
||||
npm run build
|
||||
```
|
||||
|
||||
If the docs site changed, also validate `website/`.
|
||||
|
||||
## Release-sensitive changes
|
||||
|
||||
When changing release-sensitive behavior, verify that these stay aligned:
|
||||
|
||||
- `.nvmrc`
|
||||
- package `engines`
|
||||
- runtime guards (version checks in CLI entry points)
|
||||
- install docs (`README.md`, install script)
|
||||
|
||||
Changes that touch any of these should go in one atomic PR with a CHANGELOG entry.
|
||||
|
||||
## PR discipline
|
||||
|
||||
- One topic per PR. Don't bundle a prompt fix with a runtime refactor.
|
||||
- Tests for any new public behavior.
|
||||
- Update `prompts/` and the corresponding `skills/<name>/SKILL.md` together — a prompt change without a skill pointer update leaves callers out of sync.
|
||||
208
personas/_shared/feynman-skills/deep-research/SKILL.md
Normal file
208
personas/_shared/feynman-skills/deep-research/SKILL.md
Normal file
@@ -0,0 +1,208 @@
|
||||
---
|
||||
name: deep-research
|
||||
description: Run a thorough, source-heavy investigation on any topic. Use when the user asks for deep research, a comprehensive analysis, an in-depth report, or a multi-source investigation. Produces a cited research brief with provenance tracking.
|
||||
---
|
||||
|
||||
# Deep Research
|
||||
|
||||
Execute a source-heavy investigation on a topic and produce a durable, cited brief with a provenance sidecar. This is an **execution skill**, not an explainer — your first actions should be tool calls that create directories and write the plan artifact.
|
||||
|
||||
## Do not
|
||||
|
||||
- Do not answer by describing the protocol.
|
||||
- Do not restate or summarize these instructions in chat.
|
||||
- Do not stop after planning — continue immediately through gathering and drafting.
|
||||
- Do not ask the user for confirmation unless they explicitly requested plan review.
|
||||
- Do not end with chat-only output. Every run leaves artifacts on disk.
|
||||
|
||||
## Subagent mapping
|
||||
|
||||
See `../_platform-mapping.md`. Roles used: `researcher` (evidence gathering), `verifier` (URL + citation verification), `reviewer` (adversarial review). Dispatch via `Task` tool (Claude Code) or `task` tool (OpenCode).
|
||||
|
||||
## Required artifacts
|
||||
|
||||
Derive a short slug from the topic (lowercase, hyphenated, no filler words, ≤5 words).
|
||||
|
||||
Every run must leave these files on disk:
|
||||
|
||||
- `outputs/.plans/<slug>.md`
|
||||
- `outputs/.drafts/<slug>-draft.md`
|
||||
- `outputs/.drafts/<slug>-cited.md`
|
||||
- `outputs/<slug>.md` (or `papers/<slug>.md` for paper-style briefs)
|
||||
- `outputs/<slug>.provenance.md` (or `papers/<slug>.provenance.md`)
|
||||
|
||||
If any capability fails, continue in **degraded mode** and still write a blocked/partial final output and provenance sidecar. Set `Verification: BLOCKED` when verification could not complete. Never end with only an explanation in chat.
|
||||
|
||||
## Step 1 — Plan
|
||||
|
||||
Create `outputs/.plans/<slug>.md` immediately. Required sections:
|
||||
|
||||
- **Key questions** — what the brief must answer
|
||||
- **Evidence needed** — for each question, what kind of source satisfies it
|
||||
- **Scale decision** — direct-search OR subagent-delegated (see Step 2)
|
||||
- **Task ledger** — one row per sub-question; status `pending | done | blocked | superseded`
|
||||
- **Verification log** — critical claims that will need citation verification
|
||||
- **Decision log** — key calls made during the run
|
||||
|
||||
Make the scale decision before assigning owners. For a narrow "what is X" explainer, the plan must use lead-owned direct search only — do not allocate researcher subagents in the task ledger.
|
||||
|
||||
After writing the plan, continue immediately. Do not pause for approval.
|
||||
|
||||
**Optional cross-session persistence.** If the runtime has a memory primitive (Claude Code `auto-memory`, a `memory_remember` tool, or equivalent), also mirror the plan there under key `deepresearch.<slug>.plan`. If no such primitive exists, continue — the disk file is the canonical copy. See `../_platform-mapping.md`.
|
||||
|
||||
## Step 2 — Scale
|
||||
|
||||
**Use direct search for:**
|
||||
|
||||
- Single fact or narrow question, including "what is X" explainers
|
||||
- Work you can answer with 3–10 tool calls
|
||||
|
||||
For "what is X" explainer topics, **do not spawn researcher subagents** unless the user explicitly asks for comprehensive coverage, current landscape, benchmarks, or production deployment. Don't inflate a simple explainer into a multi-agent survey.
|
||||
|
||||
**Use subagents only when decomposition clearly helps:**
|
||||
|
||||
- Direct comparison of 2–3 items: 2 `researcher` subagents
|
||||
- Broad survey or multi-faceted topic: 3–4 `researcher` subagents
|
||||
- Complex multi-domain research: 4–6 `researcher` subagents
|
||||
|
||||
## Step 3 — Gather evidence
|
||||
|
||||
**PDF warning.** Avoid crash-prone PDF parsing. Do not fetch `.pdf` URLs unless the user explicitly asked for PDF extraction. Prefer paper metadata, abstracts, HTML pages, official docs, and web snippets. If only a PDF exists, cite its URL from search metadata and mark full-text parsing as blocked.
|
||||
|
||||
**If direct search was chosen:**
|
||||
|
||||
- Skip subagent spawning entirely.
|
||||
- Search and fetch sources yourself using `WebSearch` / `WebFetch` (Claude Code) or equivalents.
|
||||
- Use **at least 3 distinct queries**, covering definition/history, mechanism/formula, and current usage/comparison (when relevant).
|
||||
- Record the exact search terms used in `<slug>-research-direct.md`.
|
||||
- Write notes to `<slug>-research-direct.md`.
|
||||
- Continue to synthesis.
|
||||
|
||||
**If subagents were chosen:**
|
||||
|
||||
- Write a per-researcher brief first: `outputs/.plans/<slug>-T1.md`, `outputs/.plans/<slug>-T2.md`, etc.
|
||||
- Keep the subagent dispatch payload small and valid — no multi-paragraph instructions inside the JSON.
|
||||
- Always set `failFast: false` if your runtime exposes it.
|
||||
- Do not name exact tool commands in subagent tasks unless those tool names are visible in the current tool set.
|
||||
- Prefer broad guidance like "use paper search and web search"; if a PDF parser or paper fetch fails, the researcher must continue from metadata, abstracts, and web sources and mark PDF parsing as blocked.
|
||||
|
||||
Example Claude Code dispatch shape (conceptual — adapt to the tool's actual schema):
|
||||
|
||||
```
|
||||
Task(
|
||||
subagent_type="scholar",
|
||||
description="research-web",
|
||||
prompt="Read outputs/.plans/<slug>-T1.md and write <slug>-research-web.md"
|
||||
)
|
||||
Task(
|
||||
subagent_type="scholar",
|
||||
description="research-papers",
|
||||
prompt="Read outputs/.plans/<slug>-T2.md and write <slug>-research-papers.md"
|
||||
)
|
||||
```
|
||||
|
||||
Dispatch independent researchers in parallel (single message, multiple tool-use blocks).
|
||||
|
||||
After evidence gathering, update the plan ledger and verification log. If research failed, record exactly what failed and proceed with a blocked/partial draft.
|
||||
|
||||
## Step 4 — Draft
|
||||
|
||||
**Write the report yourself. Do not delegate synthesis.**
|
||||
|
||||
Save to `outputs/.drafts/<slug>-draft.md`.
|
||||
|
||||
Include:
|
||||
|
||||
- Executive summary
|
||||
- Findings organized by question/theme
|
||||
- Evidence-backed caveats and disagreements
|
||||
- Open questions
|
||||
- No invented sources, results, figures, benchmarks, images, charts, or tables
|
||||
|
||||
**Pre-citation sweep of the draft:**
|
||||
|
||||
- Every critical claim, number, figure, table, or benchmark must map to a source URL, research note, raw artifact path, or command/script output.
|
||||
- Remove or downgrade unsupported claims.
|
||||
- Mark inferences explicitly as inferences.
|
||||
|
||||
## Step 5 — Cite
|
||||
|
||||
**If direct search / no researcher subagents was chosen:**
|
||||
|
||||
- Do citation yourself.
|
||||
- Verify reachable HTML/doc URLs with `WebFetch` or equivalent.
|
||||
- Copy or rewrite the draft to `outputs/.drafts/<slug>-cited.md` with inline citations and a Sources section.
|
||||
- Do not spawn a `verifier` subagent for direct-search runs.
|
||||
|
||||
**If researcher subagents were used:**
|
||||
|
||||
Run the `verifier` subagent after the draft exists. This is mandatory and must complete before any reviewer runs. Do not run verifier and reviewer in parallel.
|
||||
|
||||
Task shape (conceptual):
|
||||
|
||||
```
|
||||
Task(
|
||||
subagent_type="oracle", # or general-purpose with WebFetch
|
||||
description="verify-citations",
|
||||
prompt="Add inline citations to outputs/.drafts/<slug>-draft.md using the research files as source material. Verify every URL is reachable. Write the complete cited brief to outputs/.drafts/<slug>-cited.md."
|
||||
)
|
||||
```
|
||||
|
||||
After the verifier returns, confirm on disk that `outputs/.drafts/<slug>-cited.md` exists. If the verifier wrote elsewhere, find and move the cited file into place.
|
||||
|
||||
## Step 6 — Review
|
||||
|
||||
**If direct search / no researcher subagents was chosen:**
|
||||
|
||||
- Review the cited draft yourself.
|
||||
- Write `<slug>-verification.md` with FATAL / MAJOR / MINOR findings and the checks performed.
|
||||
- Fix FATAL issues before delivery.
|
||||
- Do not spawn a `reviewer` subagent for direct-search runs.
|
||||
|
||||
**If researcher subagents were used:**
|
||||
|
||||
Only after `outputs/.drafts/<slug>-cited.md` exists, run the `reviewer` subagent against it.
|
||||
|
||||
Task shape (conceptual):
|
||||
|
||||
```
|
||||
Task(
|
||||
subagent_type="general-purpose",
|
||||
description="review-cited-draft",
|
||||
prompt="Verify outputs/.drafts/<slug>-cited.md. Flag unsupported claims, logical gaps, single-source critical claims, and overstated confidence. This is a verification pass, not a peer review. Write to <slug>-verification.md."
|
||||
)
|
||||
```
|
||||
|
||||
If the reviewer flags FATAL issues, fix them before delivery and run one more review pass. Note MAJOR issues in Open Questions. Accept MINOR issues.
|
||||
|
||||
**Applying reviewer fixes:** small localized edits for 1–3 simple corrections. For section rewrites, table rewrites, or more than 3 substantive fixes, read the cited draft and write a corrected full file to `outputs/.drafts/<slug>-revised.md` — do not issue one giant multi-replacement edit.
|
||||
|
||||
The final candidate is `outputs/.drafts/<slug>-revised.md` if it exists; otherwise `outputs/.drafts/<slug>-cited.md`.
|
||||
|
||||
## Step 7 — Deliver
|
||||
|
||||
Copy the final candidate to:
|
||||
|
||||
- `papers/<slug>.md` for paper-style drafts
|
||||
- `outputs/<slug>.md` for everything else
|
||||
|
||||
Write the provenance sidecar next to it:
|
||||
|
||||
```markdown
|
||||
# Provenance: <topic>
|
||||
|
||||
- **Date:** YYYY-MM-DD
|
||||
- **Rounds:** <number of research rounds>
|
||||
- **Sources consulted:** <count and/or list>
|
||||
- **Sources accepted:** <count and/or list>
|
||||
- **Sources rejected:** <dead, unverifiable, or removed>
|
||||
- **Verification:** PASS | PASS WITH NOTES | BLOCKED
|
||||
- **Plan:** outputs/.plans/<slug>.md
|
||||
- **Research files:** <list>
|
||||
```
|
||||
|
||||
Before responding, verify on disk that all required artifacts exist. If verification could not be completed, set `Verification: BLOCKED` or `PASS WITH NOTES` and list the missing checks.
|
||||
|
||||
## Final response
|
||||
|
||||
Keep it brief: link the final file, the provenance file, and any blocked checks. Do not restate the report.
|
||||
85
personas/_shared/feynman-skills/docker/SKILL.md
Normal file
85
personas/_shared/feynman-skills/docker/SKILL.md
Normal file
@@ -0,0 +1,85 @@
|
||||
---
|
||||
name: docker
|
||||
description: Execute research code inside isolated Docker containers for safe replication, experiments, and benchmarks. Use when the user selects Docker as the execution environment or asks to run code safely, in isolation, or in a sandbox.
|
||||
allowed-tools: Bash(docker:*)
|
||||
---
|
||||
|
||||
# Docker Sandbox
|
||||
|
||||
Run research code inside Docker containers while the host stays clean. The container gets the project files, runs the commands, and results sync back. Works identically in Claude Code and OpenCode.
|
||||
|
||||
## When to use
|
||||
|
||||
- User selects "Docker Sandbox" as the execution environment in `replication` or `autoresearch`
|
||||
- Running untrusted code from a paper's repository
|
||||
- Experiments that install packages or modify system state
|
||||
- Any time the user asks to run something "safely" or "isolated"
|
||||
|
||||
## How it works
|
||||
|
||||
1. Build or pull an appropriate base image for the research code
|
||||
2. Mount the project directory into the container
|
||||
3. Run experiment commands inside the container
|
||||
4. Results write back to the mounted directory
|
||||
|
||||
## Running commands in a container
|
||||
|
||||
For Python research code (most common):
|
||||
|
||||
```bash
|
||||
docker run --rm -v "$(pwd)":/workspace -w /workspace python:3.11 bash -c "
|
||||
pip install -r requirements.txt &&
|
||||
python train.py
|
||||
"
|
||||
```
|
||||
|
||||
For projects with a Dockerfile:
|
||||
|
||||
```bash
|
||||
docker build -t feynman-experiment .
|
||||
docker run --rm -v "$(pwd)/results":/workspace/results feynman-experiment
|
||||
```
|
||||
|
||||
For GPU workloads (requires NVIDIA Container Toolkit):
|
||||
|
||||
```bash
|
||||
docker run --rm --gpus all -v "$(pwd)":/workspace -w /workspace pytorch/pytorch:latest bash -c "
|
||||
pip install -r requirements.txt &&
|
||||
python train.py
|
||||
"
|
||||
```
|
||||
|
||||
## Choosing the base image
|
||||
|
||||
| Research type | Base image |
|
||||
| --- | --- |
|
||||
| Python ML/DL | `pytorch/pytorch:latest` or `tensorflow/tensorflow:latest-gpu` |
|
||||
| Python general | `python:3.11` |
|
||||
| Node.js | `node:20` |
|
||||
| R / statistics | `rocker/r-ver:4` |
|
||||
| Julia | `julia:1.10` |
|
||||
| Multi-language | `ubuntu:24.04` with manual installs |
|
||||
|
||||
## Persistent containers
|
||||
|
||||
For iterative experiments (like `autoresearch`), create a named container instead of `--rm`. Choose a descriptive name based on the experiment:
|
||||
|
||||
```bash
|
||||
docker create --name <name> -v "$(pwd)":/workspace -w /workspace python:3.11 tail -f /dev/null
|
||||
docker start <name>
|
||||
docker exec <name> bash -c "pip install -r requirements.txt"
|
||||
docker exec <name> bash -c "python train.py"
|
||||
```
|
||||
|
||||
This preserves installed packages across iterations. Clean up with:
|
||||
|
||||
```bash
|
||||
docker stop <name> && docker rm <name>
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- The mounted workspace syncs results back to the host automatically
|
||||
- Containers are network-enabled by default — add `--network none` for full isolation
|
||||
- For GPU access, Docker must be configured with the NVIDIA Container Toolkit
|
||||
- Check availability: `command -v docker`
|
||||
34
personas/_shared/feynman-skills/eli5/SKILL.md
Normal file
34
personas/_shared/feynman-skills/eli5/SKILL.md
Normal file
@@ -0,0 +1,34 @@
|
||||
---
|
||||
name: eli5
|
||||
description: Explain research, papers, or technical ideas in plain English with minimal jargon, concrete analogies, and clear takeaways. Use when the user says "ELI5 this", asks for a simple explanation of a paper or research result, wants jargon removed, or asks what something technically dense actually means.
|
||||
---
|
||||
|
||||
# ELI5 — Explain Like I'm Five
|
||||
|
||||
Use the `alpha-research` skill first when the user names a specific paper, arXiv id, DOI, or paper URL.
|
||||
|
||||
If the user gives only a topic, identify 1–3 representative papers and anchor the explanation around the clearest or most important one.
|
||||
|
||||
## Output structure
|
||||
|
||||
- **One-Sentence Summary** — the idea in one sentence, no jargon
|
||||
- **Big Idea** — the insight that matters, in plain language
|
||||
- **How It Works** — mechanism, step by step, with one good analogy
|
||||
- **Why It Matters** — concrete consequence for the reader
|
||||
- **What To Be Skeptical Of** — limitations the paper itself flags, and common misreadings
|
||||
- **If You Remember 3 Things** — three sentences, each ≤15 words
|
||||
|
||||
## Guidelines
|
||||
|
||||
- Use short sentences and concrete words.
|
||||
- Define jargon immediately or remove it.
|
||||
- Prefer one good analogy over several weak ones.
|
||||
- Separate what the paper actually shows from speculation or interpretation.
|
||||
- Keep the explanation inline in the conversation unless the user explicitly asks to save it as an artifact.
|
||||
- Do not invent results, benchmarks, or history. If you are unsure, say so instead of smoothing it over.
|
||||
|
||||
## When to save to disk
|
||||
|
||||
Only when the user asks. Otherwise inline is fine — ELI5 is a reading aid, not an artifact.
|
||||
|
||||
If saving: `outputs/<slug>-eli5.md` where `<slug>` is a short hyphenated version of the paper/topic name.
|
||||
79
personas/_shared/feynman-skills/jobs/SKILL.md
Normal file
79
personas/_shared/feynman-skills/jobs/SKILL.md
Normal file
@@ -0,0 +1,79 @@
|
||||
---
|
||||
name: jobs
|
||||
description: Inspect active background research work including running processes, scheduled follow-ups, and pending tasks. Use when the user asks what's running, checks on background work, or wants to see scheduled jobs.
|
||||
allowed-tools: Bash(ps:*), Bash(pgrep:*), Bash(crontab:*), Bash(systemctl:*)
|
||||
---
|
||||
|
||||
# Jobs
|
||||
|
||||
Inspect active background work — running processes, scheduled follow-ups, and managed subagent tasks. This is an operational status skill, not a workflow launcher.
|
||||
|
||||
## What to inspect
|
||||
|
||||
Summarize the following categories. Skip any that are empty — don't pad the output.
|
||||
|
||||
### 1. Active foreground/background processes
|
||||
|
||||
```bash
|
||||
# anything the user started recently in this shell
|
||||
jobs
|
||||
ps -o pid,etime,cmd --user "$(whoami)" | head -30
|
||||
pgrep -fa "python|node|modal|runpodctl|docker" || true
|
||||
```
|
||||
|
||||
### 2. Scheduled / recurring work
|
||||
|
||||
Claude Code:
|
||||
- `CronList` — if the `schedule` skill is active, it lists registered triggers.
|
||||
- Any `ScheduleWakeup` calls pending in the current session.
|
||||
|
||||
OpenCode:
|
||||
- System `cron` invoking `opencode run`:
|
||||
```bash
|
||||
crontab -l 2>/dev/null | grep -i opencode || true
|
||||
```
|
||||
- systemd user timers:
|
||||
```bash
|
||||
systemctl --user list-timers --all 2>/dev/null || true
|
||||
```
|
||||
|
||||
### 3. Running containers / remote pods
|
||||
|
||||
```bash
|
||||
command -v docker && docker ps
|
||||
command -v runpodctl && runpodctl get pod
|
||||
command -v modal && modal app list
|
||||
```
|
||||
|
||||
### 4. Managed subagent tasks (Claude Code only)
|
||||
|
||||
Use `TaskList` to see any `Task`-tool subagents still in flight.
|
||||
|
||||
## Summary format
|
||||
|
||||
```markdown
|
||||
# Active work
|
||||
|
||||
## Processes
|
||||
- PID 12345 — `python train.py` — running 00:23:11
|
||||
|
||||
## Scheduled
|
||||
- cron: `0 9 * * * opencode run "/watch attention papers"` — next fire tomorrow 09:00
|
||||
- ScheduleWakeup pending: fire in 12 min — "checking long bun build"
|
||||
|
||||
## Remote compute
|
||||
- Modal: app `experiment` running, 1 A100
|
||||
- RunPod: pod `xxx` stopped, volume retained
|
||||
|
||||
## Failures needing attention
|
||||
- (none)
|
||||
|
||||
## Next command
|
||||
- To inspect the long-running training: `tail -f ~/logs/train.log`
|
||||
```
|
||||
|
||||
## What NOT to do
|
||||
|
||||
- Don't kill processes without confirming with the user.
|
||||
- Don't return massive `ps` dumps — filter to relevant processes.
|
||||
- If nothing is running, say "nothing active" — don't invent jobs.
|
||||
108
personas/_shared/feynman-skills/literature-review/SKILL.md
Normal file
108
personas/_shared/feynman-skills/literature-review/SKILL.md
Normal file
@@ -0,0 +1,108 @@
|
||||
---
|
||||
name: literature-review
|
||||
description: Run a literature review using paper search and primary-source synthesis. Use when the user asks for a lit review, paper survey, state of the art, or academic landscape summary on a research topic.
|
||||
---
|
||||
|
||||
# Literature Review
|
||||
|
||||
Produce a grounded, source-cited survey of the state of the art on a topic. Output is a Markdown artifact in `outputs/` with a `.provenance.md` sidecar.
|
||||
|
||||
Derive a short slug from the topic (lowercase, hyphens, ≤5 words). All files in this run use the slug as a prefix.
|
||||
|
||||
## Subagent mapping
|
||||
|
||||
See `../_platform-mapping.md`. In short:
|
||||
|
||||
| Upstream role | Claude Code | OpenCode |
|
||||
|---|---|---|
|
||||
| `researcher` | `Task` tool, `subagent_type: scholar` or `general-purpose` | `task` tool, `scholar` or `general` agent |
|
||||
| `verifier` | `Task` tool, `subagent_type: oracle` or `general-purpose` (with WebFetch) | `task` tool, `oracle` or `general` with `webfetch: allow` |
|
||||
| `reviewer` | `Task` tool, `subagent_type: general-purpose` with review-only prompt | `task` tool, general agent with review-only prompt |
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Plan
|
||||
|
||||
Write `outputs/.plans/<slug>.md`. Include:
|
||||
|
||||
- Key questions the review must answer
|
||||
- Source types to search (arXiv papers, web, repos, conference proceedings)
|
||||
- Time period (e.g. "last 3 years" or explicit year range)
|
||||
- Expected section structure
|
||||
- Task ledger (one row per sub-question; status = `pending` / `done` / `blocked` / `superseded`)
|
||||
- Verification log (critical claims that will need citation verification)
|
||||
|
||||
Summarize the plan to the user in 2–3 sentences. Continue immediately unless the user asks for plan review.
|
||||
|
||||
### 2. Gather
|
||||
|
||||
**Narrow topic (2–3 obvious angles):** search directly yourself — no subagent needed. Use the `alpha-research` skill for paper search.
|
||||
|
||||
**Wide topic (broad survey):** dispatch 2–4 `researcher` subagents in parallel. Give each a brief written to `outputs/.plans/<slug>-T<N>.md` describing exactly what that subagent should cover. Each researcher writes its notes to `<slug>-research-<topic>.md` on disk — not returned inline.
|
||||
|
||||
Rules:
|
||||
- No silent skipping — if a researcher can't cover an assigned question, it must mark the ledger entry `blocked` with the reason.
|
||||
- No PDF parsing unless the user asked for it. Prefer metadata, abstracts, HTML docs.
|
||||
- At least 3 distinct queries when researching directly, covering definition/history, mechanism, and current usage.
|
||||
|
||||
### 3. Synthesize
|
||||
|
||||
You (the lead) write the draft, not a subagent. Save to `outputs/.drafts/<slug>-draft.md`.
|
||||
|
||||
Separate clearly:
|
||||
|
||||
- **Consensus** — claims multiple sources agree on
|
||||
- **Disagreements** — explicitly name the split and who sits where
|
||||
- **Open questions** — what the field hasn't settled
|
||||
|
||||
Before handing to the verifier, sweep every strong claim against your verification log. Downgrade anything inferred or single-source-critical.
|
||||
|
||||
### 4. Cite
|
||||
|
||||
Dispatch the `verifier` subagent against `outputs/.drafts/<slug>-draft.md`. Task:
|
||||
|
||||
> Add inline citations to every claim using the research files as source material. Verify each URL is reachable. Write the complete cited brief to `outputs/.drafts/<slug>-cited.md`.
|
||||
|
||||
After the verifier returns, confirm on disk that `outputs/.drafts/<slug>-cited.md` exists. If the verifier wrote elsewhere, move the file into place.
|
||||
|
||||
### 5. Review
|
||||
|
||||
Dispatch the `reviewer` subagent against the cited draft. Task:
|
||||
|
||||
> Check `outputs/.drafts/<slug>-cited.md` for: unsupported claims, logical gaps, zombie sections, single-source critical findings, overstated confidence. Categorize findings as FATAL / MAJOR / MINOR. Write to `<slug>-verification.md`.
|
||||
|
||||
- Fix all FATAL issues before delivery. If you fix FATALs, run one more review pass.
|
||||
- Note MAJOR issues under "Open Questions" in the final draft.
|
||||
- Accept MINOR issues.
|
||||
|
||||
When applying reviewer fixes: small localized edits for 1–3 corrections; for larger rewrites, write `outputs/.drafts/<slug>-revised.md` instead of making one giant edit call.
|
||||
|
||||
### 6. Deliver
|
||||
|
||||
Copy the final candidate (`<slug>-revised.md` if it exists, else `<slug>-cited.md`) to `outputs/<slug>.md`.
|
||||
|
||||
Write `outputs/<slug>.provenance.md` next to it:
|
||||
|
||||
```markdown
|
||||
# Provenance: <topic>
|
||||
|
||||
- **Date:** YYYY-MM-DD
|
||||
- **Rounds:** <number of research rounds>
|
||||
- **Sources consulted:** <count>
|
||||
- **Sources accepted:** <count>
|
||||
- **Sources rejected:** <dead / unverifiable / removed>
|
||||
- **Verification:** PASS | PASS WITH NOTES | BLOCKED
|
||||
- **Plan:** outputs/.plans/<slug>.md
|
||||
- **Research files:** <list>
|
||||
```
|
||||
|
||||
Before responding, verify on disk that both files exist. Do not stop at an intermediate cited draft.
|
||||
|
||||
## Charts and diagrams
|
||||
|
||||
- **Quantitative comparison across papers** — Mermaid bar chart, or CSV in `outputs/.notes/<slug>-data.csv`. No invented numbers.
|
||||
- **Taxonomies / method pipelines** — Mermaid `graph TD`. Every figure needs a provenance-bearing caption.
|
||||
|
||||
## Final response
|
||||
|
||||
Brief: link the final artifact, the provenance sidecar, and list any blocked checks.
|
||||
64
personas/_shared/feynman-skills/modal-compute/SKILL.md
Normal file
64
personas/_shared/feynman-skills/modal-compute/SKILL.md
Normal file
@@ -0,0 +1,64 @@
|
||||
---
|
||||
name: modal-compute
|
||||
description: Run GPU workloads on Modal's serverless infrastructure. Use when the user needs remote GPU compute for training, inference, benchmarks, or batch processing and Modal CLI is available.
|
||||
allowed-tools: Bash(modal:*)
|
||||
---
|
||||
|
||||
# Modal Compute
|
||||
|
||||
Use the `modal` CLI for serverless GPU workloads. No pod lifecycle to manage — write a decorated Python script and run it. Works identically in Claude Code and OpenCode.
|
||||
|
||||
## Setup
|
||||
|
||||
```bash
|
||||
pip install modal
|
||||
modal setup # one-time auth
|
||||
```
|
||||
|
||||
Check availability: `command -v modal`.
|
||||
|
||||
## Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `modal run script.py` | Run a script on Modal (ephemeral) |
|
||||
| `modal run --detach script.py` | Run detached (background) |
|
||||
| `modal deploy script.py` | Deploy persistently |
|
||||
| `modal serve script.py` | Serve with hot-reload (dev) |
|
||||
| `modal shell --gpu a100` | Interactive shell with GPU |
|
||||
| `modal app list` | List deployed apps |
|
||||
|
||||
## GPU types
|
||||
|
||||
`T4`, `L4`, `A10G`, `L40S`, `A100`, `A100-80GB`, `H100`, `H200`, `B200`
|
||||
|
||||
Multi-GPU: `"H100:4"` for 4x H100s.
|
||||
|
||||
## Script pattern
|
||||
|
||||
```python
|
||||
import modal
|
||||
|
||||
app = modal.App("experiment")
|
||||
image = modal.Image.debian_slim(python_version="3.11").pip_install("torch==2.8.0")
|
||||
|
||||
@app.function(gpu="A100", image=image, timeout=600)
|
||||
def train():
|
||||
import torch
|
||||
# training code here
|
||||
|
||||
@app.local_entrypoint()
|
||||
def main():
|
||||
train.remote()
|
||||
```
|
||||
|
||||
## When to use
|
||||
|
||||
- Stateless burst GPU jobs (training, inference, benchmarks)
|
||||
- No persistent state needed between runs
|
||||
- Fast iteration — no pod provisioning delay
|
||||
|
||||
## When NOT to use
|
||||
|
||||
- Long-running experiments needing persistent SSH and filesystem state → use `runpod-compute` instead
|
||||
- Multi-step pipelines that stream intermediate files between stages on the same host
|
||||
98
personas/_shared/feynman-skills/paper-code-audit/SKILL.md
Normal file
98
personas/_shared/feynman-skills/paper-code-audit/SKILL.md
Normal file
@@ -0,0 +1,98 @@
|
||||
---
|
||||
name: paper-code-audit
|
||||
description: Compare a paper's claims against its public codebase. Use when the user asks to audit a paper, check code-claim consistency, verify reproducibility of a specific paper, or find mismatches between a paper and its implementation.
|
||||
---
|
||||
|
||||
# Paper-Code Audit
|
||||
|
||||
Compare a paper's claimed methods, defaults, metrics, and data handling against the actual code. Surface mismatches, omissions, and reproduction risks.
|
||||
|
||||
Derive a slug from the audit target (lowercase, hyphens, ≤5 words).
|
||||
|
||||
## Subagent mapping
|
||||
|
||||
See `../_platform-mapping.md`. Dispatch `researcher` for evidence gathering, `verifier` for citation/URL verification.
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Plan
|
||||
|
||||
Write `outputs/.plans/<slug>.md` with:
|
||||
|
||||
- **Paper** — title, arXiv id or DOI, and the specific version being audited
|
||||
- **Repo** — canonical URL (be explicit about fork vs upstream)
|
||||
- **Claims to check** — numbered list of specific claims (e.g. "claim 3: final layer uses GELU activation")
|
||||
- **Verification approach** — per-claim, how will you check it (grep source, run a specific script, diff configs)
|
||||
|
||||
Summarize the plan briefly, continue immediately unless the user asked for plan review.
|
||||
|
||||
### 2. Gather evidence
|
||||
|
||||
**Non-trivial audits:** dispatch a `researcher` subagent to pull implementation details from paper sections and linked code. Use the `alpha-research` skill's `alpha code` command (or equivalent repo browsing) to read source files.
|
||||
|
||||
**Small audits (single claim, ≤3 files):** the lead agent gathers directly.
|
||||
|
||||
For each claim, record:
|
||||
|
||||
- The paper section or figure where the claim is made
|
||||
- The code location (file:line or function name) where it should be implemented
|
||||
- What the code actually does
|
||||
|
||||
### 3. Compare
|
||||
|
||||
Organize findings under these buckets:
|
||||
|
||||
| Bucket | Meaning |
|
||||
|---|---|
|
||||
| **MATCH** | Code matches the claim faithfully |
|
||||
| **MISMATCH** | Code contradicts the paper |
|
||||
| **OMITTED** | Claim is in paper but code doesn't implement it |
|
||||
| **UNDOCUMENTED** | Code does something material that isn't in the paper |
|
||||
| **AMBIGUOUS** | Paper's description is too vague to verify against code |
|
||||
| **MISSING CODE** | The referenced module/experiment is not in the public repo |
|
||||
|
||||
### 4. Cite
|
||||
|
||||
For non-trivial audits, dispatch `verifier` against the draft to verify every URL (paper links, repo links, commit hashes) and add inline citations where missing.
|
||||
|
||||
### 5. Deliver
|
||||
|
||||
Save exactly one audit artifact to `outputs/<slug>-audit.md`:
|
||||
|
||||
```markdown
|
||||
# Audit: <paper title>
|
||||
|
||||
**Paper:** <link> (<version/date>)
|
||||
**Repo:** <link> (<commit hash used for audit>)
|
||||
**Date:** YYYY-MM-DD
|
||||
|
||||
## Summary
|
||||
- Claims checked: <N>
|
||||
- MATCH: <n> | MISMATCH: <n> | OMITTED: <n> | UNDOCUMENTED: <n> | AMBIGUOUS: <n> | MISSING CODE: <n>
|
||||
|
||||
## Findings
|
||||
|
||||
### <claim 1>
|
||||
- **Paper says:** <quote or summary> (<section>)
|
||||
- **Code does:** <what you found> (<file:line>)
|
||||
- **Verdict:** MATCH / MISMATCH / OMITTED / …
|
||||
- **Impact on reproducibility:** <brief>
|
||||
|
||||
### <claim 2>
|
||||
...
|
||||
|
||||
## Reproduction risks
|
||||
- <risks ordered by severity>
|
||||
|
||||
## Sources
|
||||
- <paper URL>
|
||||
- <repo URL at audit commit>
|
||||
```
|
||||
|
||||
End with a `Sources` section containing paper and repository URLs pinned to the version audited (commit hash, not `main`).
|
||||
|
||||
## What NOT to do
|
||||
|
||||
- Don't run the code unless the user explicitly asked for an execution audit. Reading is often enough to find the mismatch.
|
||||
- Don't generalize from `src/models/transformer.py` to "the method" without checking the experiment scripts actually call it.
|
||||
- Don't grade papers. The audit reports what is and isn't in the code; it doesn't pass judgement on the research.
|
||||
100
personas/_shared/feynman-skills/paper-writing/SKILL.md
Normal file
100
personas/_shared/feynman-skills/paper-writing/SKILL.md
Normal file
@@ -0,0 +1,100 @@
|
||||
---
|
||||
name: paper-writing
|
||||
description: Turn research findings into a polished paper-style draft with sections, equations, and citations. Use when the user asks to write a paper, draft a report, write up findings, or produce a technical document from collected research.
|
||||
---
|
||||
|
||||
# Paper Writing
|
||||
|
||||
Turn collected research notes into a polished paper-style draft with explicit claims, source-backed evidence, and clean Markdown+LaTeX formatting.
|
||||
|
||||
Derive a slug from the topic (lowercase, hyphens, ≤5 words). All files in this run use the slug as a prefix.
|
||||
|
||||
## Subagent mapping
|
||||
|
||||
See `../_platform-mapping.md`. Use `writer` (drafting) and `verifier` (citation + URL verification).
|
||||
|
||||
## Prerequisites
|
||||
|
||||
This skill assumes research notes already exist — from `deep-research`, `literature-review`, `source-comparison`, or the user's own notes. If research hasn't happened yet, run the appropriate research skill first.
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Outline
|
||||
|
||||
Write `outputs/.plans/<slug>.md` with:
|
||||
|
||||
- **Proposed title**
|
||||
- **Section structure** — title, abstract, problem statement, related work, method/synthesis, evidence/experiments, limitations, conclusion
|
||||
- **Key claims** — numbered list of the strongest claims the paper will make
|
||||
- **Source material** — which research notes or raw artifacts each claim will draw from
|
||||
- **Verification log** — a row per critical claim, figure, and calculation; populated during drafting
|
||||
|
||||
Briefly summarize the outline to the user, continue immediately unless they asked for outline review.
|
||||
|
||||
### 2. Draft
|
||||
|
||||
**Option A — subagent-driven (preferred when the notes are dense and the outline is solid):** dispatch a `writer` subagent with the outline and note paths. Task:
|
||||
|
||||
> Write a paper-style draft from the outline in `outputs/.plans/<slug>.md` and the notes at `<slug>-research-*.md`. Save to `outputs/.drafts/<slug>-draft.md`. Include all sections from the outline. Use LaTeX where equations materially help. Do not invent results.
|
||||
|
||||
**Option B — lead-agent drafting:** write the draft yourself, saving to `outputs/.drafts/<slug>-draft.md`.
|
||||
|
||||
Section requirements (minimum):
|
||||
|
||||
- **Title** — descriptive, not clickbait
|
||||
- **Abstract** — ≤200 words; problem, approach, headline result
|
||||
- **Problem statement** — what's unsolved, why it matters
|
||||
- **Related work** — honest positioning; no straw-manning
|
||||
- **Method / synthesis** — clean exposition of what you're claiming
|
||||
- **Evidence / experiments** — source-backed results; no invented tables
|
||||
- **Limitations** — explicit, not buried
|
||||
- **Conclusion** — what's established, what's next
|
||||
|
||||
### 3. Guardrails while drafting
|
||||
|
||||
- **No invented results.** If evidence is missing, leave a placeholder (`[TODO: verify claim against benchmark X]`) or describe the experiment you'd need to run, rather than fabricating.
|
||||
- **Every number, figure, and benchmark** must map to a source URL, research note, or script output.
|
||||
- **Tentative results** — mark them explicitly ("preliminary evidence suggests…" rather than "we show…").
|
||||
- **Charts** — Mermaid or source-backed CSV, never invented.
|
||||
|
||||
### 4. Self-sweep before handoff
|
||||
|
||||
Before calling the verifier, sweep the draft:
|
||||
|
||||
- Does every strong claim have a traceable source?
|
||||
- Is every figure caption provenance-bearing (names the source)?
|
||||
- Are tentative claims marked as tentative?
|
||||
- Are unsupported numerics removed or marked TODO?
|
||||
|
||||
If the sweep finds issues, fix them yourself — don't push the problem onto the verifier.
|
||||
|
||||
### 5. Cite
|
||||
|
||||
Dispatch `verifier` subagent to add inline citations and verify every URL:
|
||||
|
||||
> Add inline citations to `outputs/.drafts/<slug>-draft.md` using the research files as source material. Verify each URL is reachable. Write the complete cited draft to `outputs/.drafts/<slug>-cited.md`.
|
||||
|
||||
### 6. Deliver
|
||||
|
||||
Save the final draft to `papers/<slug>.md`. Write `papers/<slug>.provenance.md` with:
|
||||
|
||||
```markdown
|
||||
# Provenance: <title>
|
||||
|
||||
- **Date:** YYYY-MM-DD
|
||||
- **Outline:** outputs/.plans/<slug>.md
|
||||
- **Research files:** <list>
|
||||
- **Draft:** outputs/.drafts/<slug>-draft.md
|
||||
- **Cited draft:** outputs/.drafts/<slug>-cited.md
|
||||
- **Verification:** PASS | PASS WITH NOTES | BLOCKED
|
||||
- **Sources:** <count accepted / count rejected>
|
||||
```
|
||||
|
||||
End the paper with a `Sources` appendix listing direct URLs for all primary references.
|
||||
|
||||
## Format conventions
|
||||
|
||||
- Markdown with embedded LaTeX (`$\ldots$` inline, `$$\ldots$$` display). Equations only where they materially help comprehension.
|
||||
- Mermaid `graph TD` / `flowchart` for architectures, pipelines, and taxonomies.
|
||||
- Tables in standard Markdown; for complex tables, emit an HTML `<table>` block or a separate CSV.
|
||||
- Figure captions always name the source.
|
||||
99
personas/_shared/feynman-skills/peer-review/SKILL.md
Normal file
99
personas/_shared/feynman-skills/peer-review/SKILL.md
Normal file
@@ -0,0 +1,99 @@
|
||||
---
|
||||
name: peer-review
|
||||
description: Simulate a tough but constructive peer review of an AI research artifact. Use when the user asks for a review, critique, feedback on a paper or draft, or wants to identify weaknesses before submission.
|
||||
---
|
||||
|
||||
# Peer Review
|
||||
|
||||
Simulate a rigorous AI-research peer review with likely objections, severity scoring, and a concrete revision plan. Output a structured review artifact in `outputs/`.
|
||||
|
||||
Derive a slug from the artifact name (lowercase, hyphens, ≤5 words).
|
||||
|
||||
## Subagent mapping
|
||||
|
||||
See `../_platform-mapping.md`. Use `researcher` to gather evidence on the artifact; a second `reviewer`-style subagent (or the lead agent with a reviewer prompt) writes the actual review.
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Plan
|
||||
|
||||
Briefly outline:
|
||||
|
||||
- What will be reviewed (paper PDF, cited draft, code repo, or all of the above)
|
||||
- Review criteria — novelty, empirical rigor, baselines, reproducibility, clarity, related-work coverage
|
||||
- Verification-specific checks for claims, figures, and reported metrics
|
||||
|
||||
Summarize to the user in 2–3 sentences, continue immediately unless they asked for plan review.
|
||||
|
||||
### 2. Gather evidence (for non-trivial artifacts)
|
||||
|
||||
Dispatch a `researcher` subagent to:
|
||||
|
||||
- Read the paper / draft
|
||||
- Inspect the code repo (via `alpha-research` `alpha code` or equivalent)
|
||||
- Cross-check cited work for misrepresentation
|
||||
- Look at any linked experimental artifacts (logs, tables, commit history)
|
||||
|
||||
Output: `<slug>-research.md` on disk.
|
||||
|
||||
For small/simple artifacts where a full research pass is overkill, the lead agent reads directly and skips this step.
|
||||
|
||||
### 3. Write the review
|
||||
|
||||
Either dispatch a `reviewer` subagent with `<slug>-research.md` as input, or write the review yourself.
|
||||
|
||||
Required structure for `outputs/<slug>-review.md`:
|
||||
|
||||
```markdown
|
||||
# Review: <title>
|
||||
|
||||
**Artifact:** <link or path>
|
||||
**Reviewer:** <lead agent / subagent>
|
||||
**Date:** YYYY-MM-DD
|
||||
|
||||
## Summary (one paragraph)
|
||||
<what the artifact claims and what it actually delivers>
|
||||
|
||||
## Strengths
|
||||
- bulleted, concrete, with section references
|
||||
|
||||
## Objections (severity-scored)
|
||||
|
||||
### FATAL
|
||||
- <issue> — why it breaks the paper's core claim, what would need to change
|
||||
|
||||
### MAJOR
|
||||
- <issue> — substantive flaw that should be addressed before submission
|
||||
|
||||
### MINOR
|
||||
- <issue> — would improve the paper but isn't blocking
|
||||
|
||||
## Reproducibility check
|
||||
- Are baselines reported with seeds / runs / variance? <y/n + evidence>
|
||||
- Is the code available and runnable? <y/n + evidence>
|
||||
- Are datasets cited and accessible? <y/n + evidence>
|
||||
|
||||
## Related-work coverage
|
||||
- Work the paper engages with adequately
|
||||
- Work the paper ignores or misrepresents (name specific references)
|
||||
|
||||
## Revision plan
|
||||
A numbered list of concrete edits, in priority order, that would address the FATAL + MAJOR objections.
|
||||
|
||||
## Sources
|
||||
- <every external reference touched during the review>
|
||||
```
|
||||
|
||||
### 4. Second pass (only if FATALs were fixed)
|
||||
|
||||
If the first review found FATALs and the author fixes them, run one verification-style pass before final delivery. This pass checks that the fix actually addresses the original objection — no "we updated the introduction" cop-outs for a methodology flaw.
|
||||
|
||||
### 5. Deliver
|
||||
|
||||
Save exactly one review artifact to `outputs/<slug>-review.md`. End with a `Sources` section with direct URLs for every inspected external source.
|
||||
|
||||
## What a peer review is NOT
|
||||
|
||||
- Not a summary. The review assumes the reader has read the artifact.
|
||||
- Not a rewrite suggestion. Flag problems; don't draft the fix.
|
||||
- Not a hit piece. Every objection should be actionable and specific.
|
||||
62
personas/_shared/feynman-skills/preview/SKILL.md
Normal file
62
personas/_shared/feynman-skills/preview/SKILL.md
Normal file
@@ -0,0 +1,62 @@
|
||||
---
|
||||
name: preview
|
||||
description: Preview Markdown, LaTeX, PDF, or code artifacts in the browser or as PDF. Use when the user wants to review a written artifact, export a report, or view a rendered document.
|
||||
allowed-tools: Bash(open:*), Bash(xdg-open:*), Bash(pandoc:*)
|
||||
---
|
||||
|
||||
# Preview
|
||||
|
||||
Render and open artifacts produced by the research workflows. This is a thin wrapper over OS-native openers and `pandoc`.
|
||||
|
||||
> **Upstream note.** Feynman ships a `/preview` slash command. Outside Feynman, fall back to the bash commands below — both Claude Code and OpenCode can execute them.
|
||||
|
||||
## Open a file in the default app
|
||||
|
||||
macOS:
|
||||
|
||||
```bash
|
||||
open <file.md>
|
||||
open <file.pdf>
|
||||
open <file.html>
|
||||
```
|
||||
|
||||
Linux:
|
||||
|
||||
```bash
|
||||
xdg-open <file.md>
|
||||
xdg-open <file.pdf>
|
||||
xdg-open <file.html>
|
||||
```
|
||||
|
||||
The default app is whatever the OS has registered for the extension — usually a Markdown viewer, Preview/Evince, or a browser.
|
||||
|
||||
## Export Markdown to PDF
|
||||
|
||||
`pandoc` is the standard cross-platform renderer:
|
||||
|
||||
```bash
|
||||
pandoc outputs/<slug>.md -o outputs/<slug>.pdf \
|
||||
--pdf-engine=xelatex \
|
||||
--variable geometry:margin=1in \
|
||||
--toc
|
||||
```
|
||||
|
||||
For papers with LaTeX equations, prefer `--pdf-engine=xelatex` or `lualatex`. If LaTeX is not installed, `--pdf-engine=weasyprint` is a lightweight alternative that renders HTML+CSS to PDF.
|
||||
|
||||
## Export to HTML
|
||||
|
||||
```bash
|
||||
pandoc outputs/<slug>.md -o outputs/<slug>.html --standalone --mathjax
|
||||
```
|
||||
|
||||
## When to use
|
||||
|
||||
- User asks to "preview", "render", "export", or "view" a written artifact
|
||||
- Before delivering a paper or brief, sanity-check rendering
|
||||
- Converting `.md` outputs to PDF for sharing outside the repo
|
||||
|
||||
## What to pass back to the user
|
||||
|
||||
- The absolute path to the rendered file
|
||||
- Whether the render succeeded or had LaTeX/pandoc warnings
|
||||
- If the user wanted it opened, confirm the OS opener returned exit 0
|
||||
118
personas/_shared/feynman-skills/replication/SKILL.md
Normal file
118
personas/_shared/feynman-skills/replication/SKILL.md
Normal file
@@ -0,0 +1,118 @@
|
||||
---
|
||||
name: replication
|
||||
description: Plan or execute a replication of a paper, claim, or benchmark. Use when the user asks to replicate results, reproduce an experiment, verify a claim empirically, or build a replication package.
|
||||
---
|
||||
|
||||
# Replication
|
||||
|
||||
Plan — and optionally execute — a replication of a paper, claim, or benchmark. Always confirm the execution environment with the user before running any code.
|
||||
|
||||
Derive a slug from the paper or claim (lowercase, hyphens, ≤5 words).
|
||||
|
||||
## Subagent mapping
|
||||
|
||||
See `../_platform-mapping.md`. Use `researcher` to extract implementation details from the paper and repo.
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Extract
|
||||
|
||||
Dispatch a `researcher` subagent (or read directly for small papers) to pull implementation details from the target paper and any linked code:
|
||||
|
||||
- Algorithm / architecture specifics
|
||||
- Hyperparameters, config defaults, random seeds
|
||||
- Dataset — name, source, preprocessing
|
||||
- Training regime — epochs, batch size, optimizer
|
||||
- Metrics — exact definitions, evaluation splits
|
||||
- Hardware — what they used, what you'll need
|
||||
|
||||
If `CHANGELOG.md` exists in the workspace, read the most recent relevant entries before planning or resuming.
|
||||
|
||||
### 2. Plan
|
||||
|
||||
Write `outputs/.plans/<slug>.md` with three explicit columns:
|
||||
|
||||
- **Verified** — details you confirmed from paper + code
|
||||
- **Inferred** — details the paper/code implied but didn't state directly
|
||||
- **Missing** — details the paper/code doesn't specify
|
||||
|
||||
Also include:
|
||||
|
||||
- **Check oracles** — the specific measurements that will decide whether the replication succeeded (e.g. "top-1 accuracy within ±0.5% of reported 78.2%")
|
||||
- **Scope cut** — what's in (core claim) and what's out (ablations, follow-on experiments)
|
||||
|
||||
### 3. Environment — ask before running
|
||||
|
||||
Before executing anything, ask the user where to run:
|
||||
|
||||
- **Local** — current working directory
|
||||
- **Virtual environment** — isolated venv/conda
|
||||
- **Docker** — isolated container (see `docker` skill)
|
||||
- **Modal** — serverless GPU (see `modal-compute` skill). Best for burst jobs without persistent state. Requires `modal` CLI.
|
||||
- **RunPod** — persistent GPU pod with SSH (see `runpod-compute` skill). Best for long-running experiments. Requires `runpodctl` and `RUNPOD_API_KEY`.
|
||||
- **Plan only** — produce the replication plan without executing
|
||||
|
||||
Do not install packages, run training, or execute experiments without an explicit answer.
|
||||
|
||||
### 4. Execute (if a runtime was chosen)
|
||||
|
||||
Implement and run the replication steps in the chosen environment. Save:
|
||||
|
||||
- **Scripts** — checked-in `.py` / `.sh` files in `experiments/<slug>/`
|
||||
- **Configs** — exact configs used, checked in
|
||||
- **Raw outputs** — logs, metrics, predictions, checkpoints (or at least checksums) in a reproducible layout
|
||||
- **Results summary** — `outputs/<slug>-results.md` comparing your numbers to the paper's
|
||||
|
||||
Do not call the outcome "replicated" unless the planned oracles actually passed. If they didn't, write up what you observed and what diverged.
|
||||
|
||||
### 5. Log
|
||||
|
||||
For multi-step or resumable replication work, append concise entries to `CHANGELOG.md` after:
|
||||
|
||||
- Meaningful progress
|
||||
- Failed attempts
|
||||
- Major verification outcomes
|
||||
- Before stopping for the session
|
||||
|
||||
Each entry: active objective, what changed, what was checked, next step.
|
||||
|
||||
### 6. Report
|
||||
|
||||
Save the final replication write-up to `outputs/<slug>-replication.md`:
|
||||
|
||||
```markdown
|
||||
# Replication: <paper title>
|
||||
|
||||
**Paper:** <link> (<version>)
|
||||
**Claim replicated:** <specific claim>
|
||||
**Date:** YYYY-MM-DD
|
||||
**Environment:** <chosen runtime>
|
||||
|
||||
## Oracles
|
||||
- <oracle 1>: TARGET <x> — OBSERVED <y> — PASS / FAIL
|
||||
- <oracle 2>: ...
|
||||
|
||||
## What matched
|
||||
- <list>
|
||||
|
||||
## What diverged
|
||||
- <list, with severity>
|
||||
|
||||
## Plausible causes
|
||||
- <what could explain divergences>
|
||||
|
||||
## Reproducibility grade
|
||||
- FULL | PARTIAL | FAILED | BLOCKED
|
||||
|
||||
## Sources
|
||||
- <paper URL>
|
||||
- <repo URL at commit used>
|
||||
```
|
||||
|
||||
End with a `Sources` section containing paper and repository URLs pinned to the commit used for replication.
|
||||
|
||||
## Invariants
|
||||
|
||||
- **Confirm runtime before executing.** No silent installs or training runs.
|
||||
- **Don't claim replication without oracle checks.** "Numbers look close" isn't a check; "top-1 within ±0.5%" is.
|
||||
- **Log failures.** A failed replication with a written-up reason is more valuable than a hand-wavy "seems to work".
|
||||
59
personas/_shared/feynman-skills/runpod-compute/SKILL.md
Normal file
59
personas/_shared/feynman-skills/runpod-compute/SKILL.md
Normal file
@@ -0,0 +1,59 @@
|
||||
---
|
||||
name: runpod-compute
|
||||
description: Provision and manage GPU pods on RunPod for long-running experiments. Use when the user needs persistent GPU compute with SSH access, large datasets, or multi-step experiments.
|
||||
allowed-tools: Bash(runpodctl:*), Bash(ssh:*), Bash(scp:*)
|
||||
---
|
||||
|
||||
# RunPod Compute
|
||||
|
||||
Use `runpodctl` CLI for persistent GPU pods with SSH access. Works identically in Claude Code and OpenCode.
|
||||
|
||||
## Setup
|
||||
|
||||
```bash
|
||||
# macOS
|
||||
brew install runpod/runpodctl/runpodctl
|
||||
|
||||
# Linux — download from https://github.com/runpod/runpodctl/releases
|
||||
runpodctl config --apiKey=$RUNPOD_API_KEY
|
||||
```
|
||||
|
||||
Check availability: `command -v runpodctl`.
|
||||
|
||||
## Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `runpodctl create pod --gpuType "NVIDIA A100 80GB PCIe" --imageName "runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04" --name experiment` | Create a pod |
|
||||
| `runpodctl get pod` | List all pods |
|
||||
| `runpodctl stop pod <id>` | Stop (preserves volume) |
|
||||
| `runpodctl start pod <id>` | Resume a stopped pod |
|
||||
| `runpodctl remove pod <id>` | Terminate and delete |
|
||||
| `runpodctl gpu list` | List available GPU types and prices |
|
||||
| `runpodctl send <file>` | Transfer files to/from pods |
|
||||
| `runpodctl receive <code>` | Receive transferred files |
|
||||
|
||||
## SSH access
|
||||
|
||||
```bash
|
||||
ssh root@<IP> -p <PORT> -i ~/.ssh/id_ed25519
|
||||
```
|
||||
|
||||
Get connection details from `runpodctl get pod <id>`. Pods must expose port `22/tcp`.
|
||||
|
||||
## GPU types
|
||||
|
||||
`NVIDIA GeForce RTX 4090`, `NVIDIA RTX A6000`, `NVIDIA A40`, `NVIDIA A100 80GB PCIe`, `NVIDIA H100 80GB HBM3`
|
||||
|
||||
## When to use
|
||||
|
||||
- Long-running experiments needing persistent state
|
||||
- Large dataset processing
|
||||
- Multi-step work with SSH access between iterations
|
||||
- When the experiment writes and reads many intermediate files on the same host
|
||||
|
||||
## Lifecycle discipline
|
||||
|
||||
- Always stop or remove pods after experiments — running pods bill by the minute.
|
||||
- Use `runpodctl stop pod` to preserve the volume for resume, `remove pod` to release everything.
|
||||
- For reproducibility, snapshot the pod image before destroying.
|
||||
65
personas/_shared/feynman-skills/session-log/SKILL.md
Normal file
65
personas/_shared/feynman-skills/session-log/SKILL.md
Normal file
@@ -0,0 +1,65 @@
|
||||
---
|
||||
name: session-log
|
||||
description: Write a durable session log capturing completed work, findings, open questions, and next steps. Use when the user asks to log progress, save session notes, write up what was done, or create a research diary entry.
|
||||
---
|
||||
|
||||
# Session Log
|
||||
|
||||
Write a durable, readable log of the current research session. This is a lab-notebook entry, not a task tracker.
|
||||
|
||||
## Output location
|
||||
|
||||
`notes/session-logs/<YYYY-MM-DD>-<slug>.md`
|
||||
|
||||
Where `<slug>` is 1–5 hyphenated words describing the session's focus (e.g. `scaling-laws-comparison`, `nanogpt-replication`).
|
||||
|
||||
If more than one session happens in a day, append a suffix: `2026-04-18-nanogpt-replication-2.md`.
|
||||
|
||||
## Required sections
|
||||
|
||||
```markdown
|
||||
# Session: <topic>
|
||||
|
||||
**Date:** YYYY-MM-DD
|
||||
**Duration:** <approx. hh:mm or start/end>
|
||||
**Slug:** <slug>
|
||||
|
||||
## What was done
|
||||
- bulleted list of concrete actions: files written, experiments run, papers read
|
||||
- cite the artifacts by path (e.g. `outputs/attention-scaling.md`, `papers/my-draft.md`)
|
||||
|
||||
## Key findings
|
||||
- strongest claims, results, or decisions from the session
|
||||
- mark each as `verified`, `unverified`, `inferred`, or `blocked` — match the honesty of the underlying evidence
|
||||
|
||||
## Open questions
|
||||
- things you wanted to settle but couldn't
|
||||
- each question should be concrete enough to hand off to another session
|
||||
|
||||
## Next steps
|
||||
- one or two concrete actions the next session should take
|
||||
- name the artifact or command to resume from
|
||||
|
||||
## Sources
|
||||
- direct URLs for any external claim that matters
|
||||
```
|
||||
|
||||
## Tie-in to `CHANGELOG.md`
|
||||
|
||||
If the workspace has a `CHANGELOG.md` (repo-level lab notebook), the session log and the changelog are complementary:
|
||||
|
||||
- Session log = the full narrative of one sitting
|
||||
- CHANGELOG.md entry = a 2–4 line summary pointing at the session log
|
||||
|
||||
Append a changelog entry after writing the session log, referencing it by path.
|
||||
|
||||
## When to write
|
||||
|
||||
- At the end of any substantive research session
|
||||
- Before switching projects or stepping away for more than a day
|
||||
- When the user explicitly asks to "log progress" or "save session notes"
|
||||
|
||||
## When to skip
|
||||
|
||||
- Trivial one-question lookups that produced no artifacts
|
||||
- Pure clarification exchanges with no research output
|
||||
68
personas/_shared/feynman-skills/session-search/SKILL.md
Normal file
68
personas/_shared/feynman-skills/session-search/SKILL.md
Normal file
@@ -0,0 +1,68 @@
|
||||
---
|
||||
name: session-search
|
||||
description: Search past session transcripts to recover prior work, conversations, and research context. Use when the user references something from a previous session, asks "what did we do before", or when you suspect relevant past context exists.
|
||||
allowed-tools: Bash(grep:*), Bash(rg:*), Bash(ls:*)
|
||||
---
|
||||
|
||||
# Session Search
|
||||
|
||||
Recover context from prior sessions by searching transcript stores. The session store path depends on the runtime.
|
||||
|
||||
## Session store locations
|
||||
|
||||
| Runtime | Transcript path |
|
||||
|---|---|
|
||||
| Claude Code | `~/.claude/projects/<hash-of-cwd>/*.jsonl` |
|
||||
| OpenCode | `~/.local/share/opencode/session/` |
|
||||
| Codex (Feynman host) | `~/.feynman/sessions/*.jsonl` |
|
||||
|
||||
Transcripts are typically JSONL — one JSON record per line with `type` (`session`, `message`, `tool_use`, `model_change`) and `message.content` fields.
|
||||
|
||||
## Direct search (works everywhere)
|
||||
|
||||
```bash
|
||||
# keyword search across all transcripts
|
||||
rg -l "scaling laws" ~/.claude/projects/
|
||||
rg -l "scaling laws" ~/.local/share/opencode/session/
|
||||
|
||||
# fallback when ripgrep is not installed
|
||||
grep -ril "scaling laws" ~/.claude/projects/
|
||||
```
|
||||
|
||||
For structured queries against JSONL (e.g. "find all user messages about X"):
|
||||
|
||||
```bash
|
||||
rg --json "query" ~/.claude/projects/ | jq 'select(.type == "match") | .data.path.text'
|
||||
```
|
||||
|
||||
## Claude Code native
|
||||
|
||||
If the `session-logs` community skill is installed, it provides richer search over the current project's transcripts. Prefer it when available.
|
||||
|
||||
## OpenCode native
|
||||
|
||||
OpenCode stores sessions under `~/.local/share/opencode/session/<session-id>/`. List available sessions:
|
||||
|
||||
```bash
|
||||
ls -lt ~/.local/share/opencode/session/ | head
|
||||
```
|
||||
|
||||
Resume a past session with `opencode run -c <session-id>` (see OpenCode docs — subject to version).
|
||||
|
||||
## What to look for
|
||||
|
||||
- User messages referencing the same topic, paper, or codebase
|
||||
- Assistant outputs (artifacts) that were saved to `outputs/`, `papers/`, or `notes/`
|
||||
- Plan files (`outputs/.plans/<slug>.md`) that may still be valid
|
||||
- Failed approaches — often more informative than the successful ones
|
||||
|
||||
## When to use
|
||||
|
||||
- User says "we talked about X before" or "remember the report on Y"
|
||||
- Before starting research on a topic that feels familiar
|
||||
- When resuming a paused workflow mid-project
|
||||
|
||||
## What NOT to do
|
||||
|
||||
- Do not inline large transcript dumps back into context — read and summarize, don't paste.
|
||||
- Do not invent past conversations. If search returns nothing, say so instead of confabulating.
|
||||
72
personas/_shared/feynman-skills/source-comparison/SKILL.md
Normal file
72
personas/_shared/feynman-skills/source-comparison/SKILL.md
Normal file
@@ -0,0 +1,72 @@
|
||||
---
|
||||
name: source-comparison
|
||||
description: Compare multiple sources on a topic and produce a grounded comparison matrix. Use when the user asks to compare papers, tools, approaches, frameworks, or claims across multiple sources.
|
||||
---
|
||||
|
||||
# Source Comparison
|
||||
|
||||
Build a side-by-side comparison matrix grounded in primary sources, distinguishing agreement, disagreement, and uncertainty.
|
||||
|
||||
Derive a short slug from the comparison topic (lowercase, hyphens, ≤5 words). All files use this prefix.
|
||||
|
||||
## Subagent mapping
|
||||
|
||||
See `../_platform-mapping.md`. Use `researcher` for evidence gathering, `verifier` for citation verification.
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Plan
|
||||
|
||||
Write `outputs/.plans/<slug>.md`:
|
||||
|
||||
- **Items being compared** — name them exactly (paper A vs paper B, tool X vs tool Y, etc.)
|
||||
- **Dimensions of comparison** — 4–8 concrete axes (claim, evidence type, methodology, benchmark, caveats, confidence, etc.)
|
||||
- **Source expectations** — what primary source is required for each cell
|
||||
- **Output structure** — matrix columns and rows
|
||||
|
||||
Summarize briefly to the user, continue immediately unless they asked for plan review.
|
||||
|
||||
### 2. Gather
|
||||
|
||||
- **Narrow comparison (2–3 items, clear primary sources):** lead agent gathers directly.
|
||||
- **Broad comparison (5+ items, or wide surface area):** dispatch `researcher` subagents, one per item or per dimension, each writing to `<slug>-research-<item>.md`.
|
||||
|
||||
Require primary sources. Reject blog posts that themselves cite unverifiable claims.
|
||||
|
||||
### 3. Build the matrix
|
||||
|
||||
One row per item, one column per dimension. Every cell must be traceable to a source.
|
||||
|
||||
Markdown table pattern:
|
||||
|
||||
```markdown
|
||||
| Item | Key claim | Evidence type | Methodology | Caveats | Confidence |
|
||||
|------|-----------|---------------|-------------|---------|-----------|
|
||||
| ... | ... | ... | ... | ... | ... |
|
||||
```
|
||||
|
||||
Follow the matrix with:
|
||||
|
||||
- **Agreement section** — dimensions where items converge
|
||||
- **Disagreement section** — dimensions where items diverge, with the specific split named
|
||||
- **Uncertainty section** — dimensions where primary sources don't settle it
|
||||
|
||||
### 4. Charts
|
||||
|
||||
- Quantitative dimensions → Mermaid bar chart, or CSV in `outputs/.notes/<slug>-data.csv`. No invented numbers.
|
||||
- Method/architecture differences → Mermaid `graph TD`.
|
||||
|
||||
### 5. Cite
|
||||
|
||||
Dispatch `verifier` subagent to add inline citations and verify every URL in the comparison draft. Output to `outputs/.drafts/<slug>-cited.md`.
|
||||
|
||||
### 6. Deliver
|
||||
|
||||
Save final comparison to `outputs/<slug>-comparison.md`. End with a `Sources` section containing direct URLs for every source used.
|
||||
|
||||
Write `outputs/<slug>-comparison.provenance.md` with the same format as other research artifacts (see `literature-review` skill for the template).
|
||||
|
||||
## What NOT to compare
|
||||
|
||||
- Versions of the same thing that only differ in config — that's a benchmark, not a source comparison.
|
||||
- Items without public primary sources — say so in the plan and stop.
|
||||
154
personas/_shared/feynman-skills/summarize/SKILL.md
Normal file
154
personas/_shared/feynman-skills/summarize/SKILL.md
Normal file
@@ -0,0 +1,154 @@
|
||||
---
|
||||
name: summarize
|
||||
description: Summarize any URL, local file, or PDF using the RLM pattern — source stored on disk, never injected raw into context. Use when the user asks to summarize a long document, paper, webpage, or PDF that might exceed safe context-window limits.
|
||||
allowed-tools: Bash(curl:*), Bash(pdftotext:*), Bash(python3:*)
|
||||
---
|
||||
|
||||
# Summarize (RLM Pattern)
|
||||
|
||||
Summarize a URL, local file, or PDF without injecting the full document into context. The source stays on disk as an external variable; only bounded windows enter context.
|
||||
|
||||
Derive a short slug from the source filename or URL domain (lowercase, hyphens, ≤5 words — e.g. `attention-is-all-you-need`). All files use this prefix.
|
||||
|
||||
## Why the RLM pattern
|
||||
|
||||
Standard summarization injects the full document into context. Above ~15k tokens, early content degrades as the window fills (context rot). This workflow keeps the document on disk and reads only bounded windows — context pressure is proportional to the window size, not the document size.
|
||||
|
||||
Tier 1 (<8k chars) is a deliberate exception: direct injection is safe at ~2k tokens and windowed reading would add unnecessary friction.
|
||||
|
||||
## Step 1 — Fetch, validate, measure
|
||||
|
||||
Run all guards before any tier logic. A failure here is cheap; a failure mid-Tier-3 is not.
|
||||
|
||||
- **GitHub repo URL** (`https://github.com/owner/repo` — exactly 4 slashes): fetch the raw README instead. Try `https://raw.githubusercontent.com/{owner}/{repo}/main/README.md`, then `/master/README.md`. A repo HTML page is not the document the user wants to summarize.
|
||||
- **Remote URL:** fetch to disk: `curl -sL -o outputs/.notes/<slug>-raw.txt <url>`. Do NOT use a fetch tool whose return value enters context directly — that bypasses the RLM principle.
|
||||
- **Local file or PDF:** copy or extract to `outputs/.notes/<slug>-raw.txt`. For PDFs, extract text via `pdftotext <file> outputs/.notes/<slug>-raw.txt` (or equivalent) before measuring.
|
||||
- **Empty or failed fetch:** if the file is <50 bytes after fetching, stop and surface the error — do not proceed to tier selection.
|
||||
- **Binary content:** if the file is >1 KB but contains <100 readable text characters, stop and tell the user the content appears binary or unextracted.
|
||||
- **Existing output:** if `outputs/<slug>-summary.md` already exists, ask whether to overwrite or use a different slug. Do not proceed until confirmed.
|
||||
|
||||
Measure decoded text characters (not bytes — UTF-8 multi-byte chars would overcount). Log: `[summarize] source=<source> slug=<slug> chars=<count>`.
|
||||
|
||||
## Step 2 — Choose tier
|
||||
|
||||
| Chars | Tier | Strategy |
|
||||
|---|---|---|
|
||||
| <8 000 | 1 | Direct read — full content enters context (safe at ~2k tokens) |
|
||||
| 8 000 – 60 000 | 2 | RLM-lite — windowed bash extraction, progressive notes to disk |
|
||||
| >60 000 | 3 | Full RLM — bash chunking + parallel researcher subagents |
|
||||
|
||||
Log: `[summarize] tier=<N> chars=<count>`.
|
||||
|
||||
## Tier 1 — Direct read
|
||||
|
||||
Read `outputs/.notes/<slug>-raw.txt` in full. Summarize directly using the output format below. Write to `outputs/<slug>-summary.md`.
|
||||
|
||||
## Tier 2 — RLM-lite windowed read
|
||||
|
||||
The document stays on disk. Extract 6 000-char windows via bash/python:
|
||||
|
||||
```python
|
||||
# f.seek/f.read: the Read tool uses line offsets, not char offsets.
|
||||
# For exact char-boundary windowing across arbitrary text, bash/python is required.
|
||||
with open("outputs/.notes/<slug>-raw.txt", encoding="utf-8") as f:
|
||||
f.seek(n * 6000)
|
||||
window = f.read(6000)
|
||||
```
|
||||
|
||||
For each window:
|
||||
|
||||
1. Extract key claims and evidence.
|
||||
2. **Append to `outputs/.notes/<slug>-notes.md` before reading the next window.** This is the checkpoint: if the session is interrupted, processed windows survive.
|
||||
3. Log: `[summarize] window <N>/<total> done`.
|
||||
|
||||
After all windows, synthesize `outputs/.notes/<slug>-notes.md` into `outputs/<slug>-summary.md`.
|
||||
|
||||
## Tier 3 — Full RLM parallel chunks
|
||||
|
||||
Each chunk gets a fresh researcher subagent context window — context rot is impossible because no subagent sees more than 6 000 chars.
|
||||
|
||||
**Why 500-char overlap:** academic documents contain multi-sentence arguments that span chunk boundaries. 500 chars (~80 words) ensures a cross-boundary claim appears fully in at least one adjacent chunk.
|
||||
|
||||
### 3a. Chunk the document
|
||||
|
||||
```python
|
||||
import os
|
||||
os.makedirs("outputs/.notes", exist_ok=True)
|
||||
|
||||
with open("outputs/.notes/<slug>-raw.txt", encoding="utf-8") as f:
|
||||
text = f.read()
|
||||
|
||||
chunk_size, overlap = 6000, 500
|
||||
chunks, i = [], 0
|
||||
while i < len(text):
|
||||
chunks.append(text[i : i + chunk_size])
|
||||
i += chunk_size - overlap
|
||||
|
||||
for n, chunk in enumerate(chunks):
|
||||
# Zero-pad so files sort correctly (chunk-002 before chunk-010)
|
||||
with open(f"outputs/.notes/<slug>-chunk-{n:03d}.txt", "w", encoding="utf-8") as f:
|
||||
f.write(chunk)
|
||||
|
||||
print(f"[summarize] chunks={len(chunks)} chunk_size={chunk_size} overlap={overlap}")
|
||||
```
|
||||
|
||||
### 3b. Confirm before spawning
|
||||
|
||||
Briefly summarize: "Source is ~<chars> chars → <N> chunks → <N> researcher subagents. This may take several minutes." Then continue automatically. Do not ask for confirmation or wait for a proceed response unless the user explicitly requested review before launching.
|
||||
|
||||
### 3c. Dispatch researcher subagents
|
||||
|
||||
Dispatch one subagent per chunk (see `../_platform-mapping.md` for role mapping). Each subagent's prompt:
|
||||
|
||||
> Read ONLY `outputs/.notes/<slug>-chunk-NNN.txt`. Extract:
|
||||
> (1) key claims
|
||||
> (2) methodology or technical approach
|
||||
> (3) cited evidence
|
||||
>
|
||||
> Do NOT use web search or fetch external URLs — this is single-source summarization. If a claim appears to start or end mid-sentence at the file boundary, mark it `BOUNDARY PARTIAL`. Write to `outputs/.notes/<slug>-summary-chunk-NNN.md`.
|
||||
|
||||
Use `failFast: false` / equivalent so one chunk failure doesn't kill the batch. Cap concurrency at ~4 to avoid rate limits.
|
||||
|
||||
### 3d. Aggregate
|
||||
|
||||
After all subagents return, verify every expected `outputs/.notes/<slug>-summary-chunk-NNN.md` exists. Note any missing chunk indices — they appear in the **Coverage gaps** section of the output. Do not abort on partial coverage; a partial summary with gaps noted is more useful than none.
|
||||
|
||||
When synthesizing:
|
||||
|
||||
- **Deduplicate** — a claim in multiple chunks is one claim; keep the most complete formulation.
|
||||
- **Resolve boundary conflicts** — for adjacent-chunk contradictions, prefer the version with more supporting context.
|
||||
- **Remove `BOUNDARY PARTIAL` markers** where a complete version exists in a neighbouring chunk.
|
||||
|
||||
Write the final synthesis to `outputs/<slug>-summary.md`.
|
||||
|
||||
## Output format
|
||||
|
||||
All tiers produce the same artifact at `outputs/<slug>-summary.md`:
|
||||
|
||||
```markdown
|
||||
# Summary: <document title or source filename>
|
||||
|
||||
**Source:** <URL or file path>
|
||||
**Date:** YYYY-MM-DD
|
||||
**Tier:** 1 | 2 (N windows) | 3 (N chunks)
|
||||
|
||||
## Key Claims
|
||||
<3–7 most important assertions, each as a bullet>
|
||||
|
||||
## Methodology
|
||||
<approach, dataset, evaluation, baselines — omit for non-research documents>
|
||||
|
||||
## Limitations
|
||||
<what the source explicitly flags as weak, incomplete, or out of scope>
|
||||
|
||||
## Verdict
|
||||
<one paragraph: what this document establishes, its credibility, who should read it>
|
||||
|
||||
## Sources
|
||||
1. <title or filename> — <URL or file path>
|
||||
|
||||
## Coverage gaps
|
||||
<only for Tier 3 with missing chunks — list missing indices and approximate byte ranges>
|
||||
```
|
||||
|
||||
Before stopping, verify on disk that `outputs/<slug>-summary.md` exists. Sources contains only the single source confirmed reachable in Step 1. No verifier subagent is needed — there are no URLs constructed from memory to verify.
|
||||
70
personas/_shared/feynman-skills/watch/SKILL.md
Normal file
70
personas/_shared/feynman-skills/watch/SKILL.md
Normal file
@@ -0,0 +1,70 @@
|
||||
---
|
||||
name: watch
|
||||
description: Set up a recurring research watch on a topic, company, paper area, or product surface. Use when the user asks to monitor a field, track new papers, watch for updates, or set up alerts on a research area.
|
||||
---
|
||||
|
||||
# Watch
|
||||
|
||||
Establish a recurring or deferred research watch. The watch has two parts: a **baseline sweep** so future checks have something to diff against, and a **scheduled follow-up** that runs the same sweep later.
|
||||
|
||||
## Workflow
|
||||
|
||||
Derive a short slug from the watch topic (lowercase, hyphens, ≤5 words).
|
||||
|
||||
### 1. Plan
|
||||
|
||||
Write `outputs/.plans/<slug>.md` with:
|
||||
|
||||
- **What to monitor** — sources, keywords, specific repos/sites
|
||||
- **Signals that matter** — e.g. new arXiv papers, new GitHub releases, new benchmark entries
|
||||
- **What counts as a meaningful change** — filter out noise (typo edits, reformatting)
|
||||
- **Check frequency** — daily, weekly, monthly
|
||||
- **How results will be compared** — diff against previous baseline, or append to a log
|
||||
|
||||
Briefly summarize the plan to the user. Continue immediately unless the user asks for plan review.
|
||||
|
||||
### 2. Baseline sweep
|
||||
|
||||
Run the initial research pass now so the watch has a starting point. Use the `deep-research` or `literature-review` skill procedures as appropriate for the topic.
|
||||
|
||||
Save the baseline to `outputs/<slug>-baseline.md`.
|
||||
|
||||
End the baseline with a `Sources` section listing direct URLs — these are the surfaces the watch will re-check.
|
||||
|
||||
### 3. Schedule the follow-up
|
||||
|
||||
Don't merely promise to check later — register an actual schedule. Use the **scheduling** facility available in the host runtime:
|
||||
|
||||
| Runtime | Scheduling primitive |
|
||||
|---|---|
|
||||
| Claude Code | `CronCreate` via the `schedule` skill, or system `cron` invoking `claude -p "<prompt>"` |
|
||||
| OpenCode | system `cron` invoking `opencode run "<prompt>"` |
|
||||
| In-session self-wake (Claude only) | `ScheduleWakeup` with delay in seconds |
|
||||
|
||||
Example cron entry (weekly watch):
|
||||
|
||||
```cron
|
||||
0 9 * * 1 cd /path/to/workspace && opencode run "Re-run the <slug> watch; compare against outputs/<slug>-baseline.md; write diff to outputs/<slug>-watch-$(date +\%Y\%m\%d).md"
|
||||
```
|
||||
|
||||
### 4. Compare on each follow-up run
|
||||
|
||||
When the scheduled run fires:
|
||||
|
||||
1. Re-run the baseline sweep.
|
||||
2. Diff against the most recent previous output (`outputs/<slug>-baseline.md` or the most recent `outputs/<slug>-watch-*.md`).
|
||||
3. Write `outputs/<slug>-watch-<YYYYMMDD>.md` with:
|
||||
- New items found
|
||||
- Items that changed materially
|
||||
- Items that disappeared (rare but meaningful — e.g. paper retracted, repo deleted)
|
||||
4. If nothing meaningfully changed, write a one-line entry noting that.
|
||||
|
||||
### 5. Stop conditions
|
||||
|
||||
Tell the user explicitly how to stop the watch — e.g. `crontab -e` and remove the line, or `CronDelete <id>`. Watches that run forever are noise generators.
|
||||
|
||||
## Output artifacts
|
||||
|
||||
- `outputs/.plans/<slug>.md` — watch plan
|
||||
- `outputs/<slug>-baseline.md` — initial sweep
|
||||
- `outputs/<slug>-watch-<date>.md` — each follow-up run
|
||||
121
personas/_shared/paperclip-skills/browser-use/SKILL.md
Normal file
121
personas/_shared/paperclip-skills/browser-use/SKILL.md
Normal file
@@ -0,0 +1,121 @@
|
||||
---
|
||||
name: browser-use
|
||||
description: Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, or extract information from web pages.
|
||||
license: MIT
|
||||
metadata:
|
||||
author: browser-use
|
||||
version: "1.1.0"
|
||||
domain: engineering
|
||||
subdomain: browser-automation
|
||||
triggers: browser-use, browser automation, web scraping, form filling, screenshot, cloud browser, playwright cdp, session replay, workspace files, profile sync
|
||||
role: engineer
|
||||
scope: implementation
|
||||
---
|
||||
|
||||
# Browser Use
|
||||
|
||||
Use Browser Use Cloud SDK and API to run browser agents and raw browser sessions.
|
||||
|
||||
## When To Use
|
||||
|
||||
- Navigate websites and extract structured data
|
||||
- Fill forms and execute multi-step workflows
|
||||
- Stream live browser actions and agent messages
|
||||
- Reuse sessions, profiles, and workspaces across tasks
|
||||
- Connect Playwright/Puppeteer via CDP to cloud browsers
|
||||
|
||||
## Install
|
||||
|
||||
```bash
|
||||
pip install browser-use-sdk
|
||||
export BROWSER_USE_API_KEY=your_key
|
||||
```
|
||||
|
||||
TypeScript:
|
||||
|
||||
```bash
|
||||
npm install browser-use-sdk
|
||||
```
|
||||
|
||||
## Quick Start (v3 SDK)
|
||||
|
||||
```python
|
||||
from browser_use_sdk.v3 import AsyncBrowserUse
|
||||
|
||||
client = AsyncBrowserUse()
|
||||
result = await client.run("List the top 20 posts on Hacker News today with their points")
|
||||
print(result.output)
|
||||
```
|
||||
|
||||
```typescript
|
||||
import { BrowserUse } from "browser-use-sdk/v3";
|
||||
|
||||
const client = new BrowserUse();
|
||||
const result = await client.run("List the top 20 posts on Hacker News today with their points");
|
||||
console.log(result.output);
|
||||
```
|
||||
|
||||
## Core Patterns
|
||||
|
||||
- `run()` for one-shot tasks: auto create + poll + return output.
|
||||
- `sessions.create()` + `session_id` for follow-up tasks with shared browser state.
|
||||
- `workspaces.*` for file upload/download workflows.
|
||||
- `profiles.*` for login persistence and recurring automation.
|
||||
- `browsers.create()` for raw CDP control (Playwright/Puppeteer).
|
||||
|
||||
### Follow-up task pattern
|
||||
|
||||
```python
|
||||
session = await client.sessions.create()
|
||||
await client.run("Go to amazon.com and open first laptop", session_id=session.id)
|
||||
await client.run("Extract customer reviews", session_id=session.id)
|
||||
await client.sessions.stop(session.id)
|
||||
```
|
||||
|
||||
### Structured output
|
||||
|
||||
- Python: pass `output_schema` (Pydantic).
|
||||
- TypeScript: pass `schema` (Zod v4 required).
|
||||
|
||||
### Stream messages
|
||||
|
||||
- Iterate over `client.run(...)` to receive live messages.
|
||||
- `run.result` is valid only after iteration completes.
|
||||
|
||||
### Deterministic rerun (cache-script)
|
||||
|
||||
- Use `@{{...}}` placeholders in task plus `workspace_id`.
|
||||
- First run builds script, next runs can execute without LLM.
|
||||
- `cache_script`: `None` (auto), `True` (force), `False` (disable).
|
||||
|
||||
## Agent vs Browser
|
||||
|
||||
- Agent mode: `client.run(...)`, `client.sessions.*`.
|
||||
- Browser mode: `client.browsers.create(...)` returns `cdp_url` + `live_url`.
|
||||
- Use browser mode when you need custom CDP automation with Playwright/Puppeteer.
|
||||
|
||||
## Authentication and Persistence
|
||||
|
||||
- API key env: `BROWSER_USE_API_KEY`.
|
||||
- Header for direct API calls: `X-Browser-Use-API-Key: <key>`.
|
||||
- For user-specific state: create one profile per user and reuse `profile_id`.
|
||||
|
||||
## Operations Checklist
|
||||
|
||||
- Always stop sessions/browsers when done to avoid idle charges.
|
||||
- Always stop profiled sessions to persist cookies/localStorage correctly.
|
||||
- Sessions idle-timeout after 15 minutes; max duration is 4 hours.
|
||||
- Recording links are presigned and expire quickly (about 1 hour).
|
||||
|
||||
## Common Gotchas
|
||||
|
||||
- If streaming loop is interrupted early, cancel with `sessions.stop(..., strategy="task")` before sending another task.
|
||||
- TypeScript structured output fails with Zod v3; use Zod v4.
|
||||
- Selenium remote CDP support is limited; prefer Playwright/Puppeteer for cloud CDP.
|
||||
- Deleting a workspace is permanent.
|
||||
|
||||
## Reference
|
||||
|
||||
- Full LLM-optimized docs: `https://docs.browser-use.com/llms-full.txt`
|
||||
- Quick index: `https://docs.browser-use.com/llms.txt`
|
||||
- API key: `https://cloud.browser-use.com/settings?tab=api-keys&new=1`
|
||||
60
personas/_shared/skills/ekos-gazete-search/README.md
Normal file
60
personas/_shared/skills/ekos-gazete-search/README.md
Normal file
@@ -0,0 +1,60 @@
|
||||
# ekos-gazete-search
|
||||
|
||||
Claude Code skill: İstanbul Üniversitesi EKOS gazete arşivinde (1928-1942, 53 gazete, 581 106 OCR'lı sayfa) konu-bazlı sistematik arama.
|
||||
|
||||
## Hızlı başlangıç
|
||||
|
||||
```bash
|
||||
cd ~/.claude/skills/ekos-gazete-search
|
||||
python3 -m venv .venv && source .venv/bin/activate
|
||||
pip install -r scripts/requirements.txt
|
||||
|
||||
# 1) Manifest oluştur (~1 dk, tek seferlik)
|
||||
python scripts/01_build_manifest.py
|
||||
|
||||
# 2) Kırım taramasını öncelikli pencerelerle başlat
|
||||
python scripts/02_search_pdfs.py \
|
||||
--keywords keywords/kirim.yaml \
|
||||
--priority-only \
|
||||
--workers 4
|
||||
|
||||
# 3) Obsidian raporu oluştur
|
||||
python scripts/03_render_report.py --topic Kirim
|
||||
```
|
||||
|
||||
## Yapı
|
||||
|
||||
```
|
||||
.
|
||||
├── SKILL.md # Claude'a yönerge
|
||||
├── README.md # bu dosya
|
||||
├── keywords/
|
||||
│ ├── _template.yaml # yeni konu için şablon
|
||||
│ └── kirim.yaml # Kırım (Hanlık, Tatar, diaspora, Sovyet)
|
||||
├── scripts/
|
||||
│ ├── 01_build_manifest.py # 53 gazete sayfasını çek → manifest CSV
|
||||
│ ├── 02_search_pdfs.py # PDF indir + pdftotext + fuzzy regex → JSONL
|
||||
│ ├── 03_render_report.py # JSONL → Obsidian markdown
|
||||
│ ├── lib/fuzzy.py # OCR-toleranslı Türkçe regex motoru
|
||||
│ └── requirements.txt
|
||||
├── manifests/ # üretilmiş CSV'ler
|
||||
└── hits/ # üretilmiş JSONL hit dosyaları
|
||||
```
|
||||
|
||||
## Yeni konu
|
||||
|
||||
```bash
|
||||
cp keywords/_template.yaml keywords/filistin.yaml
|
||||
# Düzenle: canonical, aliases, proper_nouns, disambiguators, priority_windows
|
||||
python scripts/02_search_pdfs.py --keywords keywords/filistin.yaml --out hits/filistin.jsonl
|
||||
python scripts/03_render_report.py --hits hits/filistin.jsonl --topic Filistin
|
||||
```
|
||||
|
||||
## Sınırlar
|
||||
|
||||
- **Bant genişliği:** 581k sayfa × ~14MB PDF ≈ 8+ TB. Skill her PDF'i indirir, text-layer çıkarır, hit yoksa siler. Tam mirror YAPMAZ.
|
||||
- **Throttle:** varsayılan 0.25 sn/işistek + 4 worker = ~3 sayfa/sn. Kütüphaneye nezaket.
|
||||
- **OCR:** 2014 vintage, Türkçe diakritikleri çöp. Fuzzy regex bunu telafi eder ama %100 değildir.
|
||||
- **Kapsam:** 1928–1942. **Kırım sürgünü (1944) bu arşivde YOK.**
|
||||
|
||||
Ayrıntı için: `SKILL.md` ve [vault haritalama notu](/home/salva/Obsidian/6-Geopolitics/Russia/03.%20HISTORICAL%20CONTEXT/EKOS-Gazete-Arsivi-Haritalama.md).
|
||||
169
personas/_shared/skills/ekos-gazete-search/SKILL.md
Normal file
169
personas/_shared/skills/ekos-gazete-search/SKILL.md
Normal file
@@ -0,0 +1,169 @@
|
||||
---
|
||||
name: ekos-gazete-search
|
||||
description: "İstanbul Üniversitesi EKOS gazete arşivinde (1928-1942, 53 gazete, 581k sayfa OCR'lı) konu-bazlı sistematik arama. Türkçe-OCR-toleranslı fuzzy regex, öncelikli zaman pencereleri, Obsidian raporu üretimi. Kırım, Filistin, Holodomor, herhangi bir konu için parametrik."
|
||||
domain: intelligence
|
||||
subdomain: archival-research
|
||||
tags:
|
||||
- archive
|
||||
- foia
|
||||
- ottoman-press
|
||||
- turkish-press
|
||||
- historical-research
|
||||
- ocr
|
||||
- pdf
|
||||
- newspaper
|
||||
- early-republic
|
||||
- crimea
|
||||
- kirim
|
||||
- diaspora
|
||||
personas:
|
||||
- scribe
|
||||
- scholar
|
||||
- oracle
|
||||
- frodo
|
||||
---
|
||||
|
||||
# EKOS Gazete Arama — Skill
|
||||
|
||||
## Ne zaman çağırılır?
|
||||
|
||||
Kullanıcı şunlardan birini söylediğinde:
|
||||
- "EKOS arşivinde X tara/ara"
|
||||
- "İstanbul Üniversitesi gazete arşivinde X haberlerini bul"
|
||||
- "1928-1942 Türk basınında X"
|
||||
- "nek.istanbul.edu.tr gazetelerinde tarama"
|
||||
- Var olan keyword set'i (Kırım, Filistin, vb.) ile yeniden çalıştır
|
||||
|
||||
## Mimari özet
|
||||
|
||||
```
|
||||
SLUG (53 gazete) → manifest.csv → fuzzy search → hits.jsonl → Obsidian raporu
|
||||
```
|
||||
|
||||
3 aşama, üç ayrı script:
|
||||
1. **`scripts/01_build_manifest.py`** — 53 gazete sayfasını çekip tüm PDF URL'lerini `manifests/ekos_master.csv`'ye yazar. Bir kez çalıştırılır, cache'lenir.
|
||||
2. **`scripts/02_search_pdfs.py`** — manifest üzerinden iterate; her PDF'i indir, `pdftotext` ile metni çıkar, fuzzy regex'le ara, hit'leri `hits/<topic>.jsonl`'ye yaz, PDF'i sil.
|
||||
3. **`scripts/03_render_report.py`** — JSONL'yi `6-Geopolitics/Russia/03. HISTORICAL CONTEXT/` altına master + yıllık raporlar olarak markdown'a render eder.
|
||||
|
||||
## Önkoşullar
|
||||
|
||||
```bash
|
||||
# Sistem paketleri (Kali Linux'ta zaten var olabilir)
|
||||
which pdftotext pdfinfo curl # poppler-utils
|
||||
|
||||
# Python venv (CLAUDE.md kuralı: sisteme değil venv'e kur)
|
||||
cd /home/salva/.claude/skills/ekos-gazete-search
|
||||
python3 -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install requests pyyaml beautifulsoup4
|
||||
```
|
||||
|
||||
## Tipik kullanım akışı
|
||||
|
||||
### A) İlk çalıştırma — manifest oluştur
|
||||
|
||||
```bash
|
||||
cd /home/salva/.claude/skills/ekos-gazete-search
|
||||
source .venv/bin/activate
|
||||
python scripts/01_build_manifest.py
|
||||
# → manifests/ekos_master.csv (~tek seferlik, ~5 dk)
|
||||
```
|
||||
|
||||
### B) Arama — Kırım için, öncelikli pencerelerden başlayarak
|
||||
|
||||
```bash
|
||||
# Strateji B: 1932-33, 1936-37, 1941-42 önce
|
||||
python scripts/02_search_pdfs.py \
|
||||
--keywords keywords/kirim.yaml \
|
||||
--priority-only \
|
||||
--workers 4 \
|
||||
--out hits/kirim.jsonl
|
||||
|
||||
# Sonra geri kalan tüm yıllar
|
||||
python scripts/02_search_pdfs.py \
|
||||
--keywords keywords/kirim.yaml \
|
||||
--workers 4 \
|
||||
--out hits/kirim.jsonl
|
||||
```
|
||||
|
||||
### C) POC modu — sadece 5 ana gazete, az veri ile test
|
||||
|
||||
```bash
|
||||
python scripts/02_search_pdfs.py \
|
||||
--keywords keywords/kirim.yaml \
|
||||
--slug cumhuriyet \
|
||||
--year-from 1932 --year-to 1933 \
|
||||
--limit 50 \
|
||||
--out hits/kirim_poc.jsonl
|
||||
```
|
||||
|
||||
### D) Raporu render et
|
||||
|
||||
```bash
|
||||
python scripts/03_render_report.py \
|
||||
--hits hits/kirim.jsonl \
|
||||
--topic Kirim \
|
||||
--keywords keywords/kirim.yaml
|
||||
# → 6-Geopolitics/Russia/03. HISTORICAL CONTEXT/EKOS-Kirim-Bulgular.md (master)
|
||||
# → EKOS-Kirim-1932.md, EKOS-Kirim-1933.md, ... (yıllık)
|
||||
```
|
||||
|
||||
## Yeni konu eklemek
|
||||
|
||||
1. `keywords/<topic>.yaml` oluştur — `keywords/_template.yaml`'ı şablon olarak kullan.
|
||||
2. Wordlist'i doldur: `canonical`, `aliases`, `proper_nouns` (kişi adları), `disambiguators` (false positive filtreleri).
|
||||
3. `priority_windows` tanımla — konunun yoğunlaştığı yıllar.
|
||||
4. Çalıştır: `python scripts/02_search_pdfs.py --keywords keywords/<topic>.yaml --out hits/<topic>.jsonl`
|
||||
|
||||
## OCR Toleransı
|
||||
|
||||
PDF'lerin OCR'ı 2014 vintage, kalitesi orta-düşük. Türkçe diakritikleri sistematik olarak bozulmuş:
|
||||
|
||||
| Doğru | OCR'da | Regex class |
|
||||
|---|---|---|
|
||||
| `ı` | `1, i, l, |` | `[1iIıİlj|]` |
|
||||
| `ş` | `~, s` | `[s~ş]` |
|
||||
| `ç` | `c` | `[cç]` |
|
||||
| `ğ` | `g` | `[gğ]` |
|
||||
| `ü` | `u, ii` | `(?:[uü]|ii)` |
|
||||
| `ö` | `o` | `[oö]` |
|
||||
|
||||
`scripts/lib/fuzzy.py` bu mapping'i otomatik uygular: `build_pattern("Kırım")` → `r"K[1iIıİlj|][rR][1iIıİlj|]m"`.
|
||||
|
||||
## Sınırlar ve uyarılar
|
||||
|
||||
- **Yunanca/Ermenice gazeteler** (apoyevmatini, aravelk, jamanak, metapolitefsis): OCR'ları henüz test edilmedi. İlk taramada Latin transkripsiyon aliases üzerinden tarayacak. Yetersizse ileride Tesseract `ell`/`hye` ile re-OCR eklenir.
|
||||
- **Throttle:** 0.25 sn/istek. 581k sayfa tüm arşiv için 4 worker × ~12-18 saat. Kütüphaneye nezaket.
|
||||
- **False positive:** "Kerim" (özel ad) ↔ "Kırım", "Kefe" (ilçe) ↔ "kefil/kefe" çakışması olur. Hit listesini gözden geçirirken `disambiguators` listesini büyüt.
|
||||
- **Telif:** 1928-1942 PDF'ler kütüphane tarafından dağıtılıyor; biz sadece arama yapıp URL referansı kaydediyoruz, kalıcı kopya almıyoruz. Yasal sorun yok.
|
||||
|
||||
## Çıktı şeması (`hits/*.jsonl`)
|
||||
|
||||
Her satır bir hit:
|
||||
```json
|
||||
{
|
||||
"slug": "cumhuriyet",
|
||||
"year": "1933",
|
||||
"month": "subat",
|
||||
"day": "12",
|
||||
"page": 3,
|
||||
"keyword": "Kırım",
|
||||
"match": "K1r1m",
|
||||
"snippet": "...lan acl1k haberlerine gore K1r1m'da binlerce..." ,
|
||||
"url": "https://nek.istanbul.edu.tr/.../cumhuriyet_1933_subat_12_.pdf"
|
||||
}
|
||||
```
|
||||
|
||||
## Persona ile entegrasyon
|
||||
|
||||
Bu skill, `persona-scribe-salva` (FOIA arşivci) personasının el aletidir. Scribe persona, arşiv-tarama görevi aldığında bu skill'i çağırır. Diğer alakalı personalar:
|
||||
- `persona-frodo-russia` — Sovyet/Rus dönem analizi için hit'leri yorumlar
|
||||
- `persona-centurion` — Askeri/savaş haberleri (1941-42 Doğu Cephesi)
|
||||
- `persona-polyglot-russian` — Yunanca/Ermenice gazeteler aktive olduğunda
|
||||
|
||||
## Bilinen geliştirme alanları
|
||||
|
||||
- [ ] Yunanca/Ermenice OCR re-pass (Tesseract 5)
|
||||
- [ ] Hit-level Tesseract doğrulaması (yanlış pozitif azaltma)
|
||||
- [ ] Dataview view'ı (Obsidian'da hit listesi sortlanabilir)
|
||||
- [ ] Kütüphaneye yazılı bilgi notu (büyük tarama öncesi)
|
||||
@@ -0,0 +1,38 @@
|
||||
# EKOS Gazete Arama — Keyword Set Template
|
||||
# Yeni konu eklerken bu dosyayı kopyala: cp _template.yaml <topic>.yaml
|
||||
#
|
||||
# Şema açıklaması:
|
||||
# - canonical: aramada görüntülenecek "doğru" yazım (raporda bu görünür)
|
||||
# - aliases: aynı kavramın diğer yazımları (transliterasyon, eski Türkçe, yabancı dil)
|
||||
# - suffixes: opsiyonel — Türkçe ek toleransı (Kırım+lı, Kırım+da, ...)
|
||||
# - weight: hit önemi (1=zayıf sinyal, 5=smoking gun). Rapor sıralaması bunu kullanır.
|
||||
# - notes: bağlam (raporda görünmez)
|
||||
|
||||
topic: example
|
||||
description: "Konu kısa açıklaması — raporun başlığında görünür"
|
||||
|
||||
# 1. Ana terimler — geniş, kavram seviyesi
|
||||
keywords:
|
||||
- canonical: "Örnek"
|
||||
aliases: ["Example", "Beispiel"]
|
||||
weight: 3
|
||||
notes: "Genel terim"
|
||||
|
||||
# 2. Özel isimler — kişiler, yerler (smoking gun)
|
||||
proper_nouns:
|
||||
- canonical: "Mustafa Kemal"
|
||||
aliases: ["Gazi", "Atatürk"]
|
||||
weight: 5
|
||||
|
||||
# 3. Disambiguator — false positive filtre
|
||||
# Eğer match'in ±20 karakter çevresinde bu terimler varsa hit reddedilir
|
||||
disambiguators:
|
||||
- "Kerim Bey" # "Kırım" ile karışan özel ad
|
||||
- "Kerime Hanım"
|
||||
|
||||
# 4. Öncelikli zaman pencereleri — bu yıllarda hit'ler önce taranır + raporda öne çıkar
|
||||
priority_windows:
|
||||
- start: "1932-01-01"
|
||||
end: "1933-12-31"
|
||||
reason: "Açıklama"
|
||||
weight: 5
|
||||
286
personas/_shared/skills/ekos-gazete-search/keywords/kirim.yaml
Normal file
286
personas/_shared/skills/ekos-gazete-search/keywords/kirim.yaml
Normal file
@@ -0,0 +1,286 @@
|
||||
# EKOS Gazete Arama — Kırım Keyword Set
|
||||
# Kapsam: Kırım Hanlığı dönemi mirasından 1942'ye kadar Türk basınında
|
||||
# Kırım coğrafyası, halkı, diasporası, Sovyet dönemi, ve siyasi figürleri.
|
||||
|
||||
topic: Kirim
|
||||
description: "Kırım — Hanlık mirası, Tatar halkı, diaspora, Sovyet dönemi (1928-1942)"
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════
|
||||
# 1. ANA TERİMLER — coğrafya ve kavram (geniş, weight 3-4)
|
||||
# ═══════════════════════════════════════════════════════════════════════
|
||||
keywords:
|
||||
# Coğrafi temel
|
||||
- canonical: "Kırım"
|
||||
aliases: ["Crimea", "Krim", "Krym", "Krymea", "Crimee", "La Crimee"]
|
||||
suffixes: ["lı", "lılar", "dan", "ya", "a", "da", "de", "i", "ın", "ı"]
|
||||
weight: 4
|
||||
notes: "Ana terim. OCR'da K1r1m, Kirim varyantları hakim."
|
||||
|
||||
- canonical: "Kırım Hanlığı"
|
||||
aliases: ["Khanate of Crimea", "Crimean Khanate"]
|
||||
weight: 5
|
||||
notes: "Tarihsel devlet (1441-1783). Hanlık nostaljisi 1930'larda diaspora söyleminde aktif."
|
||||
|
||||
- canonical: "Kırım Yarımadası"
|
||||
aliases: ["Crimean Peninsula", "Tauride"]
|
||||
weight: 4
|
||||
|
||||
- canonical: "Kırım Türkleri"
|
||||
aliases: ["Kırım Tatarları", "Crimean Tatars", "Krimtataren", "Tatars de Crimee"]
|
||||
weight: 5
|
||||
notes: "Diaspora söyleminde 'Türk' kelimesi 'Tatar' yerine sık kullanıldı"
|
||||
|
||||
- canonical: "Tatar"
|
||||
aliases: ["Tatarlar", "Tatarların", "Tatare", "Tartar"]
|
||||
weight: 2
|
||||
notes: "WEIGHT DÜŞÜK — çok geniş hit verecek (Kazan Tatarı, Sibirya Tatarı vs). Disambiguator gerekir."
|
||||
|
||||
# Kırım şehirleri ve coğrafi noktalar
|
||||
- canonical: "Bahçesaray"
|
||||
aliases: ["Bahçe-saray", "Bagcesaray", "Bachtschisaraj", "Bakhchisaray", "Bakhchysaray"]
|
||||
weight: 5
|
||||
notes: "Hanlık başkenti — geçtiyse %100 Kırım bağlamı"
|
||||
|
||||
- canonical: "Akmescit"
|
||||
aliases: ["Ak-mescit", "Akmesçit", "Simferopol", "Симферополь", "Simferopole"]
|
||||
weight: 5
|
||||
|
||||
- canonical: "Kefe"
|
||||
aliases: ["Caffa", "Theodosia", "Feodosiya", "Feodosia", "Theodosie"]
|
||||
weight: 5
|
||||
notes: "Eski Ceneviz/Osmanlı liman şehri. 'kefil/kefa' ile çakışmaya dikkat — disambiguator zorunlu."
|
||||
|
||||
- canonical: "Gözleve"
|
||||
aliases: ["Yevpatoriya", "Eupatoria", "Yevpatoria"]
|
||||
weight: 5
|
||||
|
||||
- canonical: "Sivastopol"
|
||||
aliases: ["Sebastopol", "Sevastopol", "Sevastopolj", "Sevastopole"]
|
||||
weight: 4
|
||||
notes: "1854-55 Kırım Savaşı'nda meşhur, 1942'de Alman kuşatması"
|
||||
|
||||
- canonical: "Kerç"
|
||||
aliases: ["Kertsch", "Kerch", "Керчь", "Kerč"]
|
||||
weight: 4
|
||||
notes: "1941-42 Doğu Cephesi'nde stratejik"
|
||||
|
||||
- canonical: "Yalta"
|
||||
aliases: ["Jalta", "Ялта"]
|
||||
weight: 4
|
||||
|
||||
- canonical: "Çatırdağ"
|
||||
aliases: ["Çatır Dağı", "Chatyr-Dag", "Tschatyr-Dag"]
|
||||
weight: 5
|
||||
notes: "Kırım Tatar şiir/hatıra geleneğinde sembol — diaspora yazılarının imzası"
|
||||
|
||||
- canonical: "Or Kapı"
|
||||
aliases: ["Orkapı", "Perekop", "Перекоп"]
|
||||
weight: 5
|
||||
notes: "Kırım'a giriş kapısı; askeri haberlerin merkezi (1920 İç Savaş, 1941-42)"
|
||||
|
||||
- canonical: "Karasubazar"
|
||||
aliases: ["Karasu Bazar", "Karasubazaar", "Belogorsk"]
|
||||
weight: 5
|
||||
|
||||
- canonical: "Kezlev"
|
||||
aliases: ["Yevpatoria", "Kozlov"]
|
||||
weight: 4
|
||||
|
||||
# Tarihsel / siyasi kavramlar
|
||||
- canonical: "Kırım Muhtar Cumhuriyeti"
|
||||
aliases: ["Crimean ASSR", "Krimskaja ASSR", "Кримська АРСР", "Crimean Autonomous"]
|
||||
weight: 5
|
||||
notes: "1921'de kurulan Sovyet özerk cumhuriyeti — 1928-1942 arası tüm Kırım haberinin idari bağlamı"
|
||||
|
||||
- canonical: "Milli Fırka"
|
||||
aliases: ["Millî Fırka", "Milli Firka", "Kırım Milli Fırkası"]
|
||||
weight: 5
|
||||
notes: "Numan Çelebi Cihan'ın partisi — diaspora yazılarında smoking gun"
|
||||
|
||||
- canonical: "Kurultay"
|
||||
aliases: ["Kırım Kurultayı"]
|
||||
weight: 4
|
||||
notes: "1917 Kurultay'ı, Kazan Kurultay'ı ile karışabilir — bağlam denetimi gerekli"
|
||||
|
||||
- canonical: "muhacir"
|
||||
aliases: ["muhacirin", "muhacirler", "mültecilik", "mülteci"]
|
||||
weight: 2
|
||||
notes: "Genel terim ama Kırım göçü konusunda yoğun. Düşük weight + bağlam."
|
||||
|
||||
# Kırım Savaşı (tarihsel referans olarak gazetelerde geçer)
|
||||
- canonical: "Kırım Savaşı"
|
||||
aliases: ["Kırım Harbi", "Crimean War", "Krimkrieg", "Guerre de Crimee"]
|
||||
weight: 4
|
||||
notes: "1853-56. Tarihsel makaleler 1928-1942 boyunca düzenli."
|
||||
|
||||
# Sovyet dönem terminoloji
|
||||
- canonical: "kollektivizasyon"
|
||||
aliases: ["kollektifleştirme", "kolhoz", "sovhoz", "kolxoz"]
|
||||
weight: 2
|
||||
notes: "Geniş Sovyet bağlamı; Kırım haberleriyle birlikte gelirse weight artar"
|
||||
|
||||
- canonical: "açlık"
|
||||
aliases: ["kıtlık", "ac11k", "kit11k"]
|
||||
weight: 1
|
||||
notes: "Çok geniş — sadece Kırım/Sovyet ile yakınsa anlamlı"
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════
|
||||
# 2. ÖZEL İSİMLER — kişiler (smoking gun, weight 5)
|
||||
# Bir gazete sayfasında bu isimlerden biri geçtiyse Kırım içeriği %95 garanti.
|
||||
# ═══════════════════════════════════════════════════════════════════════
|
||||
proper_nouns:
|
||||
# Kırım Tatar siyasi liderleri
|
||||
- canonical: "Numan Çelebi Cihan"
|
||||
aliases: ["Noman Çelebicihan", "Numan Çelebicihan", "Çelebi Cihan", "Celebi Cihan"]
|
||||
weight: 5
|
||||
notes: "Kırım Müslüman Demokratik Cumhuriyeti kurucusu (1917), Bolşeviklerce öldürüldü 1918"
|
||||
|
||||
- canonical: "Cafer Seydahmet"
|
||||
aliases: ["Cafer Seyit Ahmet", "Cafer Seydamet", "Seydahmet Kırımer", "Cafer Kırımer", "Cafer Seyid Ahmet"]
|
||||
weight: 5
|
||||
notes: "İstanbul'da Kırım diasporasının lideri; 1928-1942 arası aktif yazar"
|
||||
|
||||
- canonical: "Müstecip Ülküsal"
|
||||
aliases: ["Mustecip Ulkusal", "Müstecip Hacı Fazıl", "Ülküsal"]
|
||||
weight: 5
|
||||
notes: "Romanya/Köstence merkezli Kırım Tatar lideri, 'Emel' dergisi"
|
||||
|
||||
- canonical: "Hamdullah Suphi"
|
||||
aliases: ["Hamdullah Suphi Tanrıöver", "Tanrıöver"]
|
||||
weight: 4
|
||||
notes: "Türk Ocakları reisi, Kırım/Romanya muhaceretiyle ilgili devlet adamı"
|
||||
|
||||
- canonical: "Yusuf Akçura"
|
||||
aliases: ["Yusuf Akçuraoğlu", "Akçura", "Akcura"]
|
||||
weight: 4
|
||||
notes: "Kazan Tatarı ama Türkçü/Tatar dünyasının ortak figürü"
|
||||
|
||||
- canonical: "İsmail Gaspıralı"
|
||||
aliases: ["İsmail Bey Gaspıralı", "Gasprinski", "Gasprinsky", "Ismail Gaspirali"]
|
||||
weight: 5
|
||||
notes: "Tercüman gazetesi yayıncısı, Türkçülüğün babası — anma yazıları sık"
|
||||
|
||||
- canonical: "Veli İbrahim"
|
||||
aliases: ["Veli Ibraimov", "Veli Ibrahim"]
|
||||
weight: 5
|
||||
notes: "Kırım Muhtar Cumhuriyeti başkanı, 1928'de Stalin tarafından idam"
|
||||
|
||||
- canonical: "Bekir Çobanzade"
|
||||
aliases: ["Bekir Çoban-zade", "Çobanzade", "Cobanzade"]
|
||||
weight: 5
|
||||
notes: "Kırım Tatar dilbilimci, 1937'de tasfiye edildi"
|
||||
|
||||
- canonical: "Mehmet Niyazi"
|
||||
aliases: ["Memet Niyazi", "Mehmed Niyazi"]
|
||||
weight: 4
|
||||
notes: "Romanya/Köstence Kırım Tatar şairi"
|
||||
|
||||
- canonical: "Habibullah Kerimi"
|
||||
aliases: ["Habibullah Karimi", "Kerimi"]
|
||||
weight: 4
|
||||
|
||||
- canonical: "Asan Sabri Ayvaz"
|
||||
aliases: ["Asan Sabri Ayvazov", "Ayvazov", "Sabri Ayvazov"]
|
||||
weight: 5
|
||||
notes: "Kırım Tatar yazar, 1937'de Stalin terörü kurbanı"
|
||||
|
||||
- canonical: "Reşit Mediyev"
|
||||
aliases: ["Reşit Mediev", "Mediyev", "Medief"]
|
||||
weight: 5
|
||||
|
||||
# Sovyet/Rus tarafı (Kırım'la doğrudan iş tutmuş)
|
||||
- canonical: "Stalin"
|
||||
aliases: ["Staline", "Сталин"]
|
||||
weight: 1
|
||||
notes: "Çok geniş; sadece Kırım/Tatar ile co-occurring olduğunda anlamlı"
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════
|
||||
# 3. DİSAMBİGÜATÖRLER — false positive filtreleri
|
||||
# Bir match'in ±50 karakter çevresinde bu kelime varsa hit reddedilir
|
||||
# ═══════════════════════════════════════════════════════════════════════
|
||||
disambiguators:
|
||||
# "Kırım" ↔ "Kerim" (özel isim) çakışması
|
||||
- "Kerim Bey"
|
||||
- "Kerim Pa~a" # Kerim Paşa OCR
|
||||
- "Kerim Pasa"
|
||||
- "Kerime Hanim"
|
||||
- "Kerime Han1m"
|
||||
- "Kerim Efendi"
|
||||
- "Abdulkerim"
|
||||
- "Abdiilkerim" # OCR varyantı
|
||||
# "Kefe" (Crimea) ↔ "kefil/kefe" (sigorta/teminat)
|
||||
- "kefil"
|
||||
- "kefalet"
|
||||
- "kefaleten"
|
||||
# "Tatar" yiyecekler
|
||||
- "tatar boregi"
|
||||
- "tatar boregi"
|
||||
- "tatar pidesi"
|
||||
- "tatar sosu"
|
||||
# "Yalta" ↔ Türkçe "yalta" yok; "yaltak" var
|
||||
- "yaltakl"
|
||||
- "yaltaklan"
|
||||
# 1932 Türk Dili Kurultayı / Türk Tarih Kurultayı false positive'leri (POC iter-1 öğrendik)
|
||||
# "Kurultay" tek başına Kırım için yetersiz; bu kombinler Atatürk dönemi reformları
|
||||
- "Türk Dili Kurultayı"
|
||||
- "Türk Dili Kurultay"
|
||||
- "Tiirk Dili Kurultayi" # OCR varyantı
|
||||
- "Dil Kurultayı"
|
||||
- "Dil Kurultay"
|
||||
- "Türk Tarih Kurultayı"
|
||||
- "Türk Tarih Kurultay"
|
||||
- "Tarih Kurultay"
|
||||
- "tarih kurultay"
|
||||
- "Halkevi Kurultay"
|
||||
- "halkevleri kurultay"
|
||||
- "C.H.F. Kurultay" # Cumhuriyet Halk Fırkası Kurultayı
|
||||
- "C.H.P. Kurultay"
|
||||
- "Fırka Kurultay"
|
||||
- "Parti Kurultay"
|
||||
# Kefe varyantları (genel "kıfayet/kifaye" OCR çöplüğü)
|
||||
- "kifayet"
|
||||
- "kifayetli"
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════
|
||||
# 4. ÖNCELİKLİ ZAMAN PENCERELERİ
|
||||
# Bu pencerelerdeki sayılar önce taranır, raporda öne çıkar
|
||||
# ═══════════════════════════════════════════════════════════════════════
|
||||
priority_windows:
|
||||
- start: "1928-01-01"
|
||||
end: "1928-12-31"
|
||||
weight: 4
|
||||
reason: "Veli İbrahim idamı + Kırım Tatar tasfiyesinin başlangıcı"
|
||||
|
||||
- start: "1932-01-01"
|
||||
end: "1933-12-31"
|
||||
weight: 5
|
||||
reason: "Holodomor / Kırım açlığı — Sovyet kıtlığının zirvesi"
|
||||
|
||||
- start: "1936-01-01"
|
||||
end: "1938-06-30"
|
||||
weight: 5
|
||||
reason: "Stalin Büyük Terör — Çobanzade, Ayvazov, Bekirov tasfiyeleri"
|
||||
|
||||
- start: "1939-08-23"
|
||||
end: "1941-06-22"
|
||||
weight: 4
|
||||
reason: "Molotov-Ribbentrop dönemi; Sovyet politikasında diaspora söylemi"
|
||||
|
||||
- start: "1941-06-22"
|
||||
end: "1942-12-31"
|
||||
weight: 5
|
||||
reason: "Alman Doğu Cephesi ilerleyişi; Kırım'ın Wehrmacht tarafından işgali"
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════
|
||||
# 5. CO-OCCURRENCE BOOST — birlikte geçerse hit ağırlığı artar
|
||||
# (lib/fuzzy.py içinde proximity score için)
|
||||
# ═══════════════════════════════════════════════════════════════════════
|
||||
co_occurrence_boost:
|
||||
# Bu çiftler aynı paragrafta (±300 char) geçerse weight +2
|
||||
- ["Kırım", "Tatar"]
|
||||
- ["Kırım", "muhacir"]
|
||||
- ["Sovyet", "Kırım"]
|
||||
- ["Kırım", "açlık"]
|
||||
- ["Kırım", "kollektivizasyon"]
|
||||
- ["Tatar", "Bahçesaray"]
|
||||
- ["Stalin", "Kırım"]
|
||||
@@ -0,0 +1,297 @@
|
||||
topic: KirimCore
|
||||
description: Kırım — sadece toponym ve Kırım-prefix kavramlar (dar tarama)
|
||||
keywords:
|
||||
- canonical: Kırım
|
||||
aliases:
|
||||
- Crimea
|
||||
- Krim
|
||||
- Krym
|
||||
- Krymea
|
||||
- Crimee
|
||||
- La Crimee
|
||||
suffixes:
|
||||
- lı
|
||||
- lılar
|
||||
- dan
|
||||
- ya
|
||||
- a
|
||||
- da
|
||||
- de
|
||||
- i
|
||||
- ın
|
||||
- ı
|
||||
weight: 4
|
||||
notes: Ana terim. OCR'da K1r1m, Kirim varyantları hakim.
|
||||
- canonical: Kırım Hanlığı
|
||||
aliases:
|
||||
- Khanate of Crimea
|
||||
- Crimean Khanate
|
||||
weight: 5
|
||||
notes: Tarihsel devlet (1441-1783). Hanlık nostaljisi 1930'larda diaspora söyleminde aktif.
|
||||
- canonical: Kırım Yarımadası
|
||||
aliases:
|
||||
- Crimean Peninsula
|
||||
- Tauride
|
||||
weight: 4
|
||||
- canonical: Kırım Türkleri
|
||||
aliases:
|
||||
- Kırım Tatarları
|
||||
- Crimean Tatars
|
||||
- Krimtataren
|
||||
- Tatars de Crimee
|
||||
weight: 5
|
||||
notes: Diaspora söyleminde 'Türk' kelimesi 'Tatar' yerine sık kullanıldı
|
||||
- canonical: Bahçesaray
|
||||
aliases:
|
||||
- Bahçe-saray
|
||||
- Bagcesaray
|
||||
- Bachtschisaraj
|
||||
- Bakhchisaray
|
||||
- Bakhchysaray
|
||||
weight: 5
|
||||
notes: Hanlık başkenti — geçtiyse %100 Kırım bağlamı
|
||||
- canonical: Akmescit
|
||||
aliases:
|
||||
- Ak-mescit
|
||||
- Akmesçit
|
||||
- Simferopol
|
||||
- Симферополь
|
||||
- Simferopole
|
||||
weight: 5
|
||||
- canonical: Kefe
|
||||
aliases:
|
||||
- Caffa
|
||||
- Theodosia
|
||||
- Feodosiya
|
||||
- Feodosia
|
||||
- Theodosie
|
||||
weight: 5
|
||||
notes: Eski Ceneviz/Osmanlı liman şehri. 'kefil/kefa' ile çakışmaya dikkat — disambiguator zorunlu.
|
||||
- canonical: Gözleve
|
||||
aliases:
|
||||
- Yevpatoriya
|
||||
- Eupatoria
|
||||
- Yevpatoria
|
||||
weight: 5
|
||||
- canonical: Sivastopol
|
||||
aliases:
|
||||
- Sebastopol
|
||||
- Sevastopol
|
||||
- Sevastopolj
|
||||
- Sevastopole
|
||||
weight: 4
|
||||
notes: 1854-55 Kırım Savaşı'nda meşhur, 1942'de Alman kuşatması
|
||||
- canonical: Kerç
|
||||
aliases:
|
||||
- Kertsch
|
||||
- Kerch
|
||||
- Керчь
|
||||
- Kerč
|
||||
weight: 4
|
||||
notes: 1941-42 Doğu Cephesi'nde stratejik
|
||||
- canonical: Yalta
|
||||
aliases:
|
||||
- Jalta
|
||||
- Ялта
|
||||
weight: 4
|
||||
- canonical: Çatırdağ
|
||||
aliases:
|
||||
- Çatır Dağı
|
||||
- Chatyr-Dag
|
||||
- Tschatyr-Dag
|
||||
weight: 5
|
||||
notes: Kırım Tatar şiir/hatıra geleneğinde sembol — diaspora yazılarının imzası
|
||||
- canonical: Or Kapı
|
||||
aliases:
|
||||
- Orkapı
|
||||
- Perekop
|
||||
- Перекоп
|
||||
weight: 5
|
||||
notes: Kırım'a giriş kapısı; askeri haberlerin merkezi (1920 İç Savaş, 1941-42)
|
||||
- canonical: Karasubazar
|
||||
aliases:
|
||||
- Karasu Bazar
|
||||
- Karasubazaar
|
||||
- Belogorsk
|
||||
weight: 5
|
||||
- canonical: Kezlev
|
||||
aliases:
|
||||
- Yevpatoria
|
||||
- Kozlov
|
||||
weight: 4
|
||||
- canonical: Kırım Muhtar Cumhuriyeti
|
||||
aliases:
|
||||
- Crimean ASSR
|
||||
- Krimskaja ASSR
|
||||
- Кримська АРСР
|
||||
- Crimean Autonomous
|
||||
weight: 5
|
||||
notes: 1921'de kurulan Sovyet özerk cumhuriyeti — 1928-1942 arası tüm Kırım haberinin idari bağlamı
|
||||
- canonical: Kırım Savaşı
|
||||
aliases:
|
||||
- Kırım Harbi
|
||||
- Crimean War
|
||||
- Krimkrieg
|
||||
- Guerre de Crimee
|
||||
weight: 4
|
||||
notes: 1853-56. Tarihsel makaleler 1928-1942 boyunca düzenli.
|
||||
proper_nouns:
|
||||
- canonical: Numan Çelebi Cihan
|
||||
aliases:
|
||||
- Noman Çelebicihan
|
||||
- Numan Çelebicihan
|
||||
- Çelebi Cihan
|
||||
- Celebi Cihan
|
||||
weight: 5
|
||||
notes: Kırım Müslüman Demokratik Cumhuriyeti kurucusu (1917), Bolşeviklerce öldürüldü 1918
|
||||
- canonical: Cafer Seydahmet
|
||||
aliases:
|
||||
- Cafer Seyit Ahmet
|
||||
- Cafer Seydamet
|
||||
- Seydahmet Kırımer
|
||||
- Cafer Kırımer
|
||||
- Cafer Seyid Ahmet
|
||||
weight: 5
|
||||
notes: İstanbul'da Kırım diasporasının lideri; 1928-1942 arası aktif yazar
|
||||
- canonical: Müstecip Ülküsal
|
||||
aliases:
|
||||
- Mustecip Ulkusal
|
||||
- Müstecip Hacı Fazıl
|
||||
- Ülküsal
|
||||
weight: 5
|
||||
notes: Romanya/Köstence merkezli Kırım Tatar lideri, 'Emel' dergisi
|
||||
- canonical: Hamdullah Suphi
|
||||
aliases:
|
||||
- Hamdullah Suphi Tanrıöver
|
||||
- Tanrıöver
|
||||
weight: 4
|
||||
notes: Türk Ocakları reisi, Kırım/Romanya muhaceretiyle ilgili devlet adamı
|
||||
- canonical: Yusuf Akçura
|
||||
aliases:
|
||||
- Yusuf Akçuraoğlu
|
||||
- Akçura
|
||||
- Akcura
|
||||
weight: 4
|
||||
notes: Kazan Tatarı ama Türkçü/Tatar dünyasının ortak figürü
|
||||
- canonical: İsmail Gaspıralı
|
||||
aliases:
|
||||
- İsmail Bey Gaspıralı
|
||||
- Gasprinski
|
||||
- Gasprinsky
|
||||
- Ismail Gaspirali
|
||||
weight: 5
|
||||
notes: Tercüman gazetesi yayıncısı, Türkçülüğün babası — anma yazıları sık
|
||||
- canonical: Veli İbrahim
|
||||
aliases:
|
||||
- Veli Ibraimov
|
||||
- Veli Ibrahim
|
||||
weight: 5
|
||||
notes: Kırım Muhtar Cumhuriyeti başkanı, 1928'de Stalin tarafından idam
|
||||
- canonical: Bekir Çobanzade
|
||||
aliases:
|
||||
- Bekir Çoban-zade
|
||||
- Çobanzade
|
||||
- Cobanzade
|
||||
weight: 5
|
||||
notes: Kırım Tatar dilbilimci, 1937'de tasfiye edildi
|
||||
- canonical: Mehmet Niyazi
|
||||
aliases:
|
||||
- Memet Niyazi
|
||||
- Mehmed Niyazi
|
||||
weight: 4
|
||||
notes: Romanya/Köstence Kırım Tatar şairi
|
||||
- canonical: Habibullah Kerimi
|
||||
aliases:
|
||||
- Habibullah Karimi
|
||||
- Kerimi
|
||||
weight: 4
|
||||
- canonical: Asan Sabri Ayvaz
|
||||
aliases:
|
||||
- Asan Sabri Ayvazov
|
||||
- Ayvazov
|
||||
- Sabri Ayvazov
|
||||
weight: 5
|
||||
notes: Kırım Tatar yazar, 1937'de Stalin terörü kurbanı
|
||||
- canonical: Reşit Mediyev
|
||||
aliases:
|
||||
- Reşit Mediev
|
||||
- Mediyev
|
||||
- Medief
|
||||
weight: 5
|
||||
- canonical: Stalin
|
||||
aliases:
|
||||
- Staline
|
||||
- Сталин
|
||||
weight: 1
|
||||
notes: Çok geniş; sadece Kırım/Tatar ile co-occurring olduğunda anlamlı
|
||||
disambiguators:
|
||||
- Kerim Bey
|
||||
- Kerim Pa~a
|
||||
- Kerim Pasa
|
||||
- Kerime Hanim
|
||||
- Kerime Han1m
|
||||
- Kerim Efendi
|
||||
- Abdulkerim
|
||||
- Abdiilkerim
|
||||
- kefil
|
||||
- kefalet
|
||||
- kefaleten
|
||||
- tatar boregi
|
||||
- tatar boregi
|
||||
- tatar pidesi
|
||||
- tatar sosu
|
||||
- yaltakl
|
||||
- yaltaklan
|
||||
- Türk Dili Kurultayı
|
||||
- Türk Dili Kurultay
|
||||
- Tiirk Dili Kurultayi
|
||||
- Dil Kurultayı
|
||||
- Dil Kurultay
|
||||
- Türk Tarih Kurultayı
|
||||
- Türk Tarih Kurultay
|
||||
- Tarih Kurultay
|
||||
- tarih kurultay
|
||||
- Halkevi Kurultay
|
||||
- halkevleri kurultay
|
||||
- C.H.F. Kurultay
|
||||
- C.H.P. Kurultay
|
||||
- Fırka Kurultay
|
||||
- Parti Kurultay
|
||||
- kifayet
|
||||
- kifayetli
|
||||
priority_windows:
|
||||
- start: '1928-01-01'
|
||||
end: '1928-12-31'
|
||||
weight: 4
|
||||
reason: Veli İbrahim idamı + Kırım Tatar tasfiyesinin başlangıcı
|
||||
- start: '1932-01-01'
|
||||
end: '1933-12-31'
|
||||
weight: 5
|
||||
reason: Holodomor / Kırım açlığı — Sovyet kıtlığının zirvesi
|
||||
- start: '1936-01-01'
|
||||
end: '1938-06-30'
|
||||
weight: 5
|
||||
reason: Stalin Büyük Terör — Çobanzade, Ayvazov, Bekirov tasfiyeleri
|
||||
- start: '1939-08-23'
|
||||
end: '1941-06-22'
|
||||
weight: 4
|
||||
reason: Molotov-Ribbentrop dönemi; Sovyet politikasında diaspora söylemi
|
||||
- start: '1941-06-22'
|
||||
end: '1942-12-31'
|
||||
weight: 5
|
||||
reason: Alman Doğu Cephesi ilerleyişi; Kırım'ın Wehrmacht tarafından işgali
|
||||
co_occurrence_boost:
|
||||
- - Kırım
|
||||
- Tatar
|
||||
- - Kırım
|
||||
- muhacir
|
||||
- - Sovyet
|
||||
- Kırım
|
||||
- - Kırım
|
||||
- açlık
|
||||
- - Kırım
|
||||
- kollektivizasyon
|
||||
- - Tatar
|
||||
- Bahçesaray
|
||||
- - Stalin
|
||||
- Kırım
|
||||
@@ -0,0 +1,121 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Build a master manifest of every PDF in the EKOS gazette archive.
|
||||
|
||||
Fetches each gazette.php?gazete=<slug> page once, extracts all PDF
|
||||
hrefs, and writes them into manifests/ekos_master.csv. This is a
|
||||
one-time operation (~5 minutes); the resulting CSV drives subsequent
|
||||
search runs.
|
||||
"""
|
||||
import csv
|
||||
import re
|
||||
import sys
|
||||
import time
|
||||
from pathlib import Path
|
||||
import requests
|
||||
|
||||
BASE = "https://nek.istanbul.edu.tr/ekos/GAZETE/"
|
||||
HERE = Path(__file__).resolve().parent.parent
|
||||
|
||||
# 53 newspaper slugs discovered during recon (2026-04-28)
|
||||
SLUGS = [
|
||||
"aciksoz", "aksam", "anadolu", "apoyevmatini", "aravelk", "aydin",
|
||||
"beyoglu", "borsa", "bugun", "cerideihavadis", "cumhuriyet", "dogu",
|
||||
"ensondakika", "ensonhavadis", "haber", "hakikat", "hakimiyetimilliye",
|
||||
"hakkinsesi", "halkindili", "halkinsesi", "ikdam", "ikdamhalk",
|
||||
"ikdamsabahpostasi", "istanbul", "izmirpostasi", "jamanak", "kurun",
|
||||
"leechodebelgrade", "metapolitefsis", "milliyet", "munakasa",
|
||||
"piyasacetveli", "savas", "sondakika", "sonposta", "sonsaat",
|
||||
"sontelgraf", "tan", "tasviriefkar", "turkdili", "turkischepost",
|
||||
"turksozu", "ulus", "ulusalbirlik", "ulussesi", "vakit", "vatan",
|
||||
"yarin", "yeniasir", "yenigun", "yenimersin", "yenisabah", "yeniyol",
|
||||
]
|
||||
|
||||
# /<slug>/<slug>_<year>/<slug>_<year>_<month>_/<slug>_<year>_<month>_<day>_.pdf
|
||||
PDF_HREF_RE = re.compile(r'href="([^"]+\.pdf)"', re.IGNORECASE)
|
||||
DATE_RE = re.compile(
|
||||
r'/([a-z][a-z0-9]+)_(\d{4})_([a-z]+?)_(\d+)_?\.pdf',
|
||||
re.IGNORECASE
|
||||
)
|
||||
|
||||
UA = {"User-Agent": "Mozilla/5.0 (research; ekos-gazete-search; "
|
||||
"contact: kutuphane@istanbul.edu.tr)"}
|
||||
|
||||
|
||||
def normalize_url(href: str) -> str:
|
||||
if href.startswith("http"):
|
||||
return href
|
||||
if href.startswith("/"):
|
||||
return "https://nek.istanbul.edu.tr" + href
|
||||
# remove leading "../" or "./"
|
||||
href = re.sub(r'^\.+/', '', href)
|
||||
return BASE + href
|
||||
|
||||
|
||||
def fetch_slug(slug: str, throttle: float = 1.0):
|
||||
url = f"{BASE}gazete.php?gazete={slug}"
|
||||
print(f" → {slug}", end=" ", flush=True)
|
||||
try:
|
||||
r = requests.get(url, headers=UA, timeout=30)
|
||||
r.raise_for_status()
|
||||
except Exception as e:
|
||||
print(f"FAIL: {e}")
|
||||
return []
|
||||
|
||||
rows = []
|
||||
for href in PDF_HREF_RE.findall(r.text):
|
||||
m = DATE_RE.search(href)
|
||||
if not m:
|
||||
continue
|
||||
s, year, month, day = m.groups()
|
||||
rows.append({
|
||||
"slug": s.lower(),
|
||||
"year": year,
|
||||
"month": month.lower(),
|
||||
"day": day,
|
||||
"url": normalize_url(href),
|
||||
})
|
||||
print(f"{len(rows)} PDFs")
|
||||
time.sleep(throttle)
|
||||
return rows
|
||||
|
||||
|
||||
def main():
|
||||
out_path = HERE / "manifests" / "ekos_master.csv"
|
||||
out_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
print(f"Fetching {len(SLUGS)} gazette pages → {out_path}")
|
||||
print(f"Throttle: 1s/req, expected runtime ~1 minute")
|
||||
print()
|
||||
|
||||
all_rows = []
|
||||
for slug in SLUGS:
|
||||
all_rows.extend(fetch_slug(slug))
|
||||
|
||||
# De-dup (the catalogs occasionally repeat hrefs)
|
||||
seen = set()
|
||||
deduped = []
|
||||
for r in all_rows:
|
||||
key = r["url"]
|
||||
if key in seen:
|
||||
continue
|
||||
seen.add(key)
|
||||
deduped.append(r)
|
||||
|
||||
with out_path.open("w", newline="", encoding="utf-8") as f:
|
||||
w = csv.DictWriter(f, fieldnames=["slug", "year", "month", "day", "url"])
|
||||
w.writeheader()
|
||||
w.writerows(deduped)
|
||||
|
||||
print(f"\n✓ Manifest: {len(deduped)} unique PDFs → {out_path}")
|
||||
# Quick stats
|
||||
by_slug = {}
|
||||
for r in deduped:
|
||||
by_slug[r["slug"]] = by_slug.get(r["slug"], 0) + 1
|
||||
print(f"\nTop 10 by issue count:")
|
||||
for s, c in sorted(by_slug.items(), key=lambda x: -x[1])[:10]:
|
||||
print(f" {s:>25} {c:>5}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,293 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Iterate the EKOS manifest, download each PDF, extract its text-layer,
|
||||
fuzzy-search against a YAML keyword set, and append hits to JSONL.
|
||||
|
||||
Storage policy:
|
||||
- Default: PDFs go to /tmp/ekos-cache/, processed, DELETED.
|
||||
- With --keep-pdfs DIR: PDFs that produce >=1 hit are MOVED to
|
||||
DIR/<slug>/<year>/<slug>_<year>_<month>_<day>.pdf for re-use.
|
||||
PDFs with zero hits are still deleted (content-driven curation).
|
||||
"""
|
||||
import argparse
|
||||
import csv
|
||||
import json
|
||||
import os
|
||||
import random
|
||||
import re
|
||||
import shutil
|
||||
import subprocess
|
||||
import sys
|
||||
import tempfile
|
||||
import time
|
||||
from concurrent.futures import ThreadPoolExecutor, FIRST_COMPLETED, wait
|
||||
from pathlib import Path
|
||||
|
||||
import requests
|
||||
import yaml
|
||||
|
||||
HERE = Path(__file__).resolve().parent.parent
|
||||
sys.path.insert(0, str(HERE / "scripts"))
|
||||
from lib.fuzzy import (
|
||||
compile_keyword_set, compile_disambiguators,
|
||||
is_false_positive, extract_snippet
|
||||
)
|
||||
|
||||
UA = {"User-Agent": "Mozilla/5.0 (research; ekos-gazete-search)"}
|
||||
DEFAULT_CACHE = Path(os.environ.get("EKOS_CACHE", "/tmp/ekos-cache"))
|
||||
|
||||
|
||||
def in_priority_window(year: int, month: str, day: str, windows: list):
|
||||
"""Return (in_window: bool, weight: int, reason: str)."""
|
||||
# Map Turkish month slugs to numbers for comparison
|
||||
month_map = {
|
||||
"ocak": 1, "subat": 2, "şubat": 2, "mart": 3, "nisan": 4, "mayis": 5,
|
||||
"mayıs": 5, "haziran": 6, "temmuz": 7, "agustos": 8, "ağustos": 8,
|
||||
"eylul": 9, "eylül": 9, "ekim": 10, "kasim": 11, "kasım": 11,
|
||||
"aralik": 12, "aralık": 12,
|
||||
}
|
||||
try:
|
||||
m = month_map.get(month.lower(), 1)
|
||||
d = int(day)
|
||||
from datetime import date
|
||||
cur = date(int(year), m, d)
|
||||
except Exception:
|
||||
return False, 0, None
|
||||
for w in windows:
|
||||
from datetime import datetime
|
||||
try:
|
||||
s = datetime.strptime(w["start"], "%Y-%m-%d").date()
|
||||
e = datetime.strptime(w["end"], "%Y-%m-%d").date()
|
||||
if s <= cur <= e:
|
||||
return True, w.get("weight", 3), w.get("reason", "")
|
||||
except Exception:
|
||||
continue
|
||||
return False, 0, None
|
||||
|
||||
|
||||
def pdftotext_page(pdf_path: Path, page: int, timeout: int = 30) -> str:
|
||||
"""Extract text from a single page using poppler-utils."""
|
||||
try:
|
||||
r = subprocess.run(
|
||||
["pdftotext", "-layout", "-f", str(page), "-l", str(page),
|
||||
str(pdf_path), "-"],
|
||||
capture_output=True, text=True, timeout=timeout, errors="replace"
|
||||
)
|
||||
return r.stdout
|
||||
except subprocess.TimeoutExpired:
|
||||
return ""
|
||||
|
||||
|
||||
def get_page_count(pdf_path: Path) -> int:
|
||||
try:
|
||||
r = subprocess.run(["pdfinfo", str(pdf_path)],
|
||||
capture_output=True, text=True, timeout=15)
|
||||
m = re.search(r"Pages:\s+(\d+)", r.stdout)
|
||||
return int(m.group(1)) if m else 1
|
||||
except Exception:
|
||||
return 1
|
||||
|
||||
|
||||
def process_pdf(row: dict, patterns: list, disambiguators: list,
|
||||
cache_dir: Path, priority_info: tuple,
|
||||
keep_dir: Path | None = None) -> list:
|
||||
"""Returns list of hit dicts (possibly empty).
|
||||
|
||||
If keep_dir is set and the PDF produces >=1 hit, the PDF is moved to
|
||||
keep_dir/<slug>/<year>/<basename>.pdf. Zero-hit PDFs are always deleted.
|
||||
"""
|
||||
slug, year, month, day, url = (row["slug"], row["year"], row["month"],
|
||||
row["day"], row["url"])
|
||||
pdf_path = cache_dir / f"{slug}_{year}_{month}_{day}.pdf"
|
||||
in_window, win_weight, win_reason = priority_info
|
||||
|
||||
# Download
|
||||
try:
|
||||
r = requests.get(url, headers=UA, timeout=120, stream=True)
|
||||
if r.status_code != 200:
|
||||
return []
|
||||
with pdf_path.open("wb") as f:
|
||||
for chunk in r.iter_content(8192):
|
||||
f.write(chunk)
|
||||
except Exception:
|
||||
return []
|
||||
|
||||
hits = []
|
||||
try:
|
||||
n_pages = get_page_count(pdf_path)
|
||||
for page in range(1, n_pages + 1):
|
||||
text = pdftotext_page(pdf_path, page)
|
||||
if len(text) < 50:
|
||||
continue
|
||||
# Search every compiled pattern against this page
|
||||
for label, weight, pat in patterns:
|
||||
for m in pat.finditer(text):
|
||||
if is_false_positive(text, m.start(), m.end(),
|
||||
disambiguators, window=200):
|
||||
continue
|
||||
snippet = extract_snippet(text, m.start(), m.end(), 200)
|
||||
final_weight = weight + (win_weight if in_window else 0)
|
||||
hits.append({
|
||||
"slug": slug, "year": year, "month": month,
|
||||
"day": day, "page": page,
|
||||
"keyword": label, "match": m.group(0),
|
||||
"snippet": snippet, "url": url,
|
||||
"weight": final_weight,
|
||||
"priority_window": in_window,
|
||||
"window_reason": win_reason if in_window else None,
|
||||
})
|
||||
except Exception as e:
|
||||
print(f" [error] {slug} {year}/{month}/{day}: {e}", file=sys.stderr)
|
||||
finally:
|
||||
try:
|
||||
if keep_dir is not None and hits:
|
||||
target_dir = keep_dir / slug / str(year)
|
||||
target_dir.mkdir(parents=True, exist_ok=True)
|
||||
target_path = target_dir / pdf_path.name
|
||||
shutil.move(str(pdf_path), str(target_path))
|
||||
for h in hits:
|
||||
h["local_pdf"] = str(target_path)
|
||||
else:
|
||||
pdf_path.unlink()
|
||||
except Exception as e:
|
||||
print(f" [retain error] {pdf_path}: {e}", file=sys.stderr)
|
||||
return hits
|
||||
|
||||
|
||||
def main():
|
||||
ap = argparse.ArgumentParser(description=__doc__,
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter)
|
||||
ap.add_argument("--manifest", default=str(HERE / "manifests/ekos_master.csv"))
|
||||
ap.add_argument("--keywords", default=str(HERE / "keywords/kirim.yaml"))
|
||||
ap.add_argument("--out", default=str(HERE / "hits/kirim.jsonl"))
|
||||
ap.add_argument("--priority-only", action="store_true",
|
||||
help="Only process issues inside priority_windows")
|
||||
ap.add_argument("--year-from", type=int)
|
||||
ap.add_argument("--year-to", type=int)
|
||||
ap.add_argument("--slug", help="Restrict to gazette slug(s); comma-separated for multiple")
|
||||
ap.add_argument("--workers", type=int, default=4)
|
||||
ap.add_argument("--limit", type=int, help="Process at most N issues")
|
||||
ap.add_argument("--throttle", type=float, default=0.25,
|
||||
help="Seconds to sleep between job dispatches")
|
||||
ap.add_argument("--cache", default=str(DEFAULT_CACHE))
|
||||
ap.add_argument("--keep-pdfs", default=None,
|
||||
help="Move hit-producing PDFs into DIR/<slug>/<year>/ "
|
||||
"instead of deleting them. Zero-hit PDFs are still "
|
||||
"deleted (content-driven curation).")
|
||||
args = ap.parse_args()
|
||||
keep_dir = Path(args.keep_pdfs) if args.keep_pdfs else None
|
||||
if keep_dir:
|
||||
keep_dir.mkdir(parents=True, exist_ok=True)
|
||||
print(f"PDF retention: hit-only → {keep_dir}/<slug>/<year>/")
|
||||
|
||||
# Load keywords
|
||||
with open(args.keywords) as f:
|
||||
keyword_data = yaml.safe_load(f)
|
||||
patterns = compile_keyword_set(keyword_data)
|
||||
disambiguators = compile_disambiguators(keyword_data)
|
||||
windows = keyword_data.get("priority_windows", [])
|
||||
|
||||
print(f"Compiled {len(patterns)} patterns, "
|
||||
f"{len(disambiguators)} disambiguators, "
|
||||
f"{len(windows)} priority windows")
|
||||
|
||||
cache_dir = Path(args.cache)
|
||||
cache_dir.mkdir(parents=True, exist_ok=True)
|
||||
out_path = Path(args.out)
|
||||
out_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Load + filter manifest
|
||||
slug_filter = set(s.strip() for s in args.slug.split(",")) if args.slug else None
|
||||
rows = []
|
||||
with open(args.manifest) as f:
|
||||
for r in csv.DictReader(f):
|
||||
if slug_filter and r["slug"] not in slug_filter:
|
||||
continue
|
||||
try:
|
||||
y = int(r["year"])
|
||||
except ValueError:
|
||||
continue
|
||||
if args.year_from and y < args.year_from:
|
||||
continue
|
||||
if args.year_to and y > args.year_to:
|
||||
continue
|
||||
in_w, _, _ = in_priority_window(y, r["month"], r["day"], windows)
|
||||
r["_in_window"] = in_w
|
||||
if args.priority_only and not in_w:
|
||||
continue
|
||||
rows.append(r)
|
||||
|
||||
# Sort priority-window first
|
||||
rows.sort(key=lambda r: (0 if r["_in_window"] else 1,
|
||||
r["year"], r["month"], r["day"]))
|
||||
if args.limit:
|
||||
rows = rows[:args.limit]
|
||||
|
||||
print(f"Processing {len(rows)} issues "
|
||||
f"({sum(1 for r in rows if r['_in_window'])} in priority windows)")
|
||||
print(f"Workers: {args.workers}, throttle: {args.throttle}s")
|
||||
print(f"Output: {out_path}")
|
||||
print()
|
||||
|
||||
start = time.time()
|
||||
n_hits = 0
|
||||
n_done = 0
|
||||
|
||||
def submit_job(executor, row):
|
||||
prio = in_priority_window(int(row["year"]), row["month"],
|
||||
row["day"], windows)
|
||||
return executor.submit(process_pdf, row, patterns, disambiguators,
|
||||
cache_dir, prio, keep_dir)
|
||||
|
||||
with out_path.open("a", encoding="utf-8") as out_f, \
|
||||
ThreadPoolExecutor(max_workers=args.workers) as ex:
|
||||
# Interleaved submit+collect: keep ~workers*2 jobs in flight,
|
||||
# flush hits & log progress as each future completes (crash-safe).
|
||||
row_iter = iter(rows)
|
||||
|
||||
def submit_next():
|
||||
try:
|
||||
r = next(row_iter)
|
||||
except StopIteration:
|
||||
return None
|
||||
if args.throttle > 0:
|
||||
time.sleep(args.throttle)
|
||||
return submit_job(ex, r)
|
||||
|
||||
in_flight = set()
|
||||
for _ in range(args.workers * 2):
|
||||
f = submit_next()
|
||||
if f is None:
|
||||
break
|
||||
in_flight.add(f)
|
||||
|
||||
while in_flight:
|
||||
done, in_flight = wait(in_flight, return_when=FIRST_COMPLETED)
|
||||
for fut in done:
|
||||
try:
|
||||
hits = fut.result()
|
||||
except Exception as e:
|
||||
print(f" [worker error] {e}", file=sys.stderr)
|
||||
hits = []
|
||||
for h in hits:
|
||||
out_f.write(json.dumps(h, ensure_ascii=False) + "\n")
|
||||
if hits:
|
||||
out_f.flush()
|
||||
n_hits += len(hits)
|
||||
n_done += 1
|
||||
if n_done % 25 == 0:
|
||||
rate = n_done / (time.time() - start)
|
||||
eta = (len(rows) - n_done) / max(rate, 0.01)
|
||||
print(f" [{n_done}/{len(rows)}] hits={n_hits} "
|
||||
f"rate={rate:.1f}/s eta={eta/60:.1f}min",
|
||||
flush=True)
|
||||
f = submit_next()
|
||||
if f is not None:
|
||||
in_flight.add(f)
|
||||
|
||||
print(f"\n✓ Done in {(time.time()-start)/60:.1f}min: "
|
||||
f"{n_done} issues processed, {n_hits} hits → {out_path}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,195 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Render hits/<topic>.jsonl into Obsidian-friendly Markdown reports
|
||||
under 6-Geopolitics/Russia/03. HISTORICAL CONTEXT/ .
|
||||
|
||||
Output:
|
||||
EKOS-<Topic>-Bulgular.md — master, cross-year overview
|
||||
EKOS-<Topic>-<YYYY>.md — per-year detailed list
|
||||
"""
|
||||
import argparse
|
||||
import json
|
||||
from collections import Counter, defaultdict
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
VAULT_BASE = Path("/home/salva/Obsidian/6-Geopolitics/Russia/03. HISTORICAL CONTEXT")
|
||||
HERE = Path(__file__).resolve().parent.parent
|
||||
|
||||
MONTH_TR = {
|
||||
"ocak": "Ocak", "subat": "Şubat", "mart": "Mart", "nisan": "Nisan",
|
||||
"mayis": "Mayıs", "haziran": "Haziran", "temmuz": "Temmuz",
|
||||
"agustos": "Ağustos", "eylul": "Eylül", "ekim": "Ekim",
|
||||
"kasim": "Kasım", "aralik": "Aralık",
|
||||
}
|
||||
|
||||
|
||||
def fmt_date(year: str, month: str, day: str) -> str:
|
||||
return f"{year}-{month}-{day:>02s}"
|
||||
|
||||
|
||||
def load_hits(path: Path) -> list:
|
||||
hits = []
|
||||
with path.open(encoding="utf-8") as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if line:
|
||||
hits.append(json.loads(line))
|
||||
return hits
|
||||
|
||||
|
||||
def write_master(path: Path, hits: list, topic: str):
|
||||
by_year = defaultdict(list)
|
||||
for h in hits:
|
||||
by_year[h["year"]].append(h)
|
||||
|
||||
kw_counter = Counter(h["keyword"] for h in hits)
|
||||
slug_counter = Counter(h["slug"] for h in hits)
|
||||
|
||||
priority_hits = [h for h in hits if h.get("priority_window")]
|
||||
|
||||
with path.open("w", encoding="utf-8") as f:
|
||||
f.write(f"""---
|
||||
up:: [[Russia - PDF Library Index]]
|
||||
tag:: [[6.1-Geopolitical Analysis]]
|
||||
created:: {datetime.now().strftime('%Y-%m-%d')}
|
||||
topic:: {topic}
|
||||
total_hits:: {len(hits)}
|
||||
priority_hits:: {len(priority_hits)}
|
||||
source:: EKOS - İstanbul Üniversitesi NEK
|
||||
---
|
||||
|
||||
# EKOS — {topic} Bulguları (Master)
|
||||
|
||||
> **İstanbul Üniversitesi Nadir Eserler Kütüphanesi gazete arşivi (1928-1942)**
|
||||
> Toplam **{len(hits)} hit** — bunların **{len(priority_hits)}** tanesi öncelikli zaman pencerelerinde.
|
||||
> Tarama tarihi: {datetime.now().strftime('%Y-%m-%d')}
|
||||
|
||||
## Yıllara Göre Dağılım
|
||||
|
||||
| Yıl | Toplam Hit | Öncelik Hit | Yıllık Rapor |
|
||||
|---|---:|---:|---|
|
||||
""")
|
||||
for year in sorted(by_year):
|
||||
year_hits = by_year[year]
|
||||
prio_count = sum(1 for h in year_hits if h.get("priority_window"))
|
||||
link = f"EKOS-{topic}-{year}"
|
||||
f.write(f"| {year} | {len(year_hits)} | {prio_count} | [[{link}]] |\n")
|
||||
|
||||
f.write("\n## En Sık Geçen Anahtar Terimler\n\n")
|
||||
for kw, cnt in kw_counter.most_common(25):
|
||||
f.write(f"- **{kw}** — {cnt}\n")
|
||||
|
||||
f.write("\n## En Verimli Gazeteler\n\n")
|
||||
for slug, cnt in slug_counter.most_common(20):
|
||||
f.write(f"- `{slug}` — {cnt}\n")
|
||||
|
||||
# Top weighted hits (most likely smoking guns)
|
||||
f.write("\n## En Yüksek Skorlu 30 Hit (öncelikli inceleme)\n\n")
|
||||
top = sorted(hits, key=lambda h: -h.get("weight", 0))[:30]
|
||||
for h in top:
|
||||
date_str = fmt_date(h["year"], h["month"], h["day"])
|
||||
f.write(f"### {h['slug']} — {date_str} — sayfa {h['page']}\n\n")
|
||||
f.write(f"- **Kelime:** {h['keyword']} (match: `{h['match']}`)\n")
|
||||
f.write(f"- **Skor:** {h.get('weight', 0)}")
|
||||
if h.get("priority_window"):
|
||||
f.write(f" _(öncelikli pencere: {h.get('window_reason', '')})_")
|
||||
f.write(f"\n- **Kaynak:** [PDF]({h['url']})\n")
|
||||
f.write(f"- **Bağlam:**\n > {h['snippet']}\n\n")
|
||||
|
||||
f.write(f"\n---\n_Otomatik üretildi: ekos-gazete-search skill, {datetime.now().strftime('%Y-%m-%d %H:%M')}_\n")
|
||||
|
||||
|
||||
def write_yearly(path: Path, hits: list, year: str, topic: str, master_stem: str):
|
||||
by_date = defaultdict(list)
|
||||
for h in hits:
|
||||
by_date[fmt_date(h["year"], h["month"], h["day"])].append(h)
|
||||
|
||||
kw_counter = Counter(h["keyword"] for h in hits)
|
||||
priority_hits = [h for h in hits if h.get("priority_window")]
|
||||
window_reasons = set(h.get("window_reason") for h in hits if h.get("priority_window"))
|
||||
window_reasons.discard(None)
|
||||
|
||||
with path.open("w", encoding="utf-8") as f:
|
||||
f.write(f"""---
|
||||
up:: [[{master_stem}]]
|
||||
tag:: [[6.1-Geopolitical Analysis]]
|
||||
year:: {year}
|
||||
topic:: {topic}
|
||||
hit_count:: {len(hits)}
|
||||
priority_hits:: {len(priority_hits)}
|
||||
---
|
||||
|
||||
# EKOS — {topic} {year}
|
||||
|
||||
**Toplam hit:** {len(hits)}{f' — bunların {len(priority_hits)} tanesi öncelikli pencerede' if priority_hits else ''}.
|
||||
|
||||
""")
|
||||
if window_reasons:
|
||||
f.write("**Öncelikli pencereler bu yılda:**\n")
|
||||
for r in window_reasons:
|
||||
f.write(f"- {r}\n")
|
||||
f.write("\n")
|
||||
|
||||
f.write("**Kelime dağılımı:** ")
|
||||
f.write(", ".join(f"{k} ({v})" for k, v in kw_counter.most_common(10)))
|
||||
f.write("\n\n---\n\n")
|
||||
|
||||
for date_str in sorted(by_date):
|
||||
date_hits = sorted(by_date[date_str], key=lambda h: -h.get("weight", 0))
|
||||
month_pretty = MONTH_TR.get(date_hits[0]["month"], date_hits[0]["month"])
|
||||
f.write(f"## {date_str} _({month_pretty})_\n\n")
|
||||
for h in date_hits:
|
||||
f.write(f"### {h['slug']} — sayfa {h['page']} — `{h['keyword']}`\n\n")
|
||||
f.write(f"> {h['snippet']}\n\n")
|
||||
f.write(f"- Match: `{h['match']}` • Skor: {h.get('weight', 0)}")
|
||||
if h.get("priority_window"):
|
||||
f.write(" 🔥")
|
||||
f.write(f"\n- [PDF]({h['url']})\n\n")
|
||||
|
||||
f.write(f"\n---\n_ekos-gazete-search, {datetime.now().strftime('%Y-%m-%d %H:%M')}_\n")
|
||||
|
||||
|
||||
def main():
|
||||
ap = argparse.ArgumentParser(description=__doc__,
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter)
|
||||
ap.add_argument("--hits", default=str(HERE / "hits/kirim.jsonl"))
|
||||
ap.add_argument("--topic", default="Kirim",
|
||||
help="Used in filenames (e.g. Kirim → EKOS-Kirim-1932.md)")
|
||||
ap.add_argument("--vault", default=str(VAULT_BASE),
|
||||
help="Output base dir under vault")
|
||||
args = ap.parse_args()
|
||||
|
||||
hits_path = Path(args.hits)
|
||||
if not hits_path.exists() or hits_path.stat().st_size == 0:
|
||||
print(f"[!] No hits at {hits_path} — run 02_search_pdfs.py first")
|
||||
return
|
||||
|
||||
hits = load_hits(hits_path)
|
||||
if not hits:
|
||||
print(f"[!] hits file empty: {hits_path}")
|
||||
return
|
||||
|
||||
print(f"Loaded {len(hits)} hits")
|
||||
|
||||
vault_dir = Path(args.vault)
|
||||
vault_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
master_path = vault_dir / f"EKOS-{args.topic}-Bulgular.md"
|
||||
write_master(master_path, hits, args.topic)
|
||||
print(f" ✓ master → {master_path}")
|
||||
|
||||
by_year = defaultdict(list)
|
||||
for h in hits:
|
||||
by_year[h["year"]].append(h)
|
||||
|
||||
for year, year_hits in sorted(by_year.items()):
|
||||
year_path = vault_dir / f"EKOS-{args.topic}-{year}.md"
|
||||
write_yearly(year_path, year_hits, year, args.topic, master_path.stem)
|
||||
print(f" ✓ {year} ({len(year_hits)} hit) → {year_path.name}")
|
||||
|
||||
print(f"\n✓ Reports rendered under {vault_dir}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
282
personas/_shared/skills/ekos-gazete-search/scripts/04_export.py
Normal file
282
personas/_shared/skills/ekos-gazete-search/scripts/04_export.py
Normal file
@@ -0,0 +1,282 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Export hits/<topic>.jsonl into:
|
||||
reports/EKOS-<Topic>.csv — flat, all hits, chronological
|
||||
reports/EKOS-<Topic>-Rapor.docx — formatted Word report (TOC, top-30 smoking
|
||||
guns, per-year sections with snippets)
|
||||
|
||||
Examples:
|
||||
python scripts/04_export.py
|
||||
python scripts/04_export.py --topic Kirim --out-dir /home/salva/Documents/EKOS-out
|
||||
"""
|
||||
import argparse
|
||||
import csv
|
||||
import json
|
||||
from collections import Counter, defaultdict
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
from docx import Document
|
||||
from docx.enum.text import WD_ALIGN_PARAGRAPH
|
||||
from docx.shared import Pt, RGBColor, Cm
|
||||
|
||||
HERE = Path(__file__).resolve().parent.parent
|
||||
|
||||
MONTH_TR = {
|
||||
"ocak": ("Ocak", 1), "subat": ("Şubat", 2), "şubat": ("Şubat", 2),
|
||||
"mart": ("Mart", 3), "nisan": ("Nisan", 4), "mayis": ("Mayıs", 5),
|
||||
"mayıs": ("Mayıs", 5), "haziran": ("Haziran", 6), "temmuz": ("Temmuz", 7),
|
||||
"agustos": ("Ağustos", 8), "ağustos": ("Ağustos", 8),
|
||||
"eylul": ("Eylül", 9), "eylül": ("Eylül", 9), "ekim": ("Ekim", 10),
|
||||
"kasim": ("Kasım", 11), "kasım": ("Kasım", 11),
|
||||
"aralik": ("Aralık", 12), "aralık": ("Aralık", 12),
|
||||
"kanunusani": ("Ocak", 1), "kanunuevvel": ("Aralık", 12),
|
||||
"tesrinievvel": ("Ekim", 10), "tesrinisani": ("Kasım", 11),
|
||||
}
|
||||
|
||||
|
||||
def date_key(h):
|
||||
"""Sort key: (year, month_num, day)."""
|
||||
m = MONTH_TR.get(h["month"].lower(), (h["month"], 99))[1]
|
||||
try:
|
||||
d = int(h["day"])
|
||||
except Exception:
|
||||
d = 99
|
||||
return (int(h["year"]), m, d)
|
||||
|
||||
|
||||
def load_hits(path):
|
||||
hits = []
|
||||
with path.open(encoding="utf-8") as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if line:
|
||||
hits.append(json.loads(line))
|
||||
return hits
|
||||
|
||||
|
||||
def write_csv(path, hits):
|
||||
"""All hits flat. Sort: chronological, then weight DESC within same date."""
|
||||
fields = ["year", "month", "day", "slug", "page",
|
||||
"keyword", "match", "weight",
|
||||
"priority_window", "window_reason", "snippet", "url"]
|
||||
sorted_hits = sorted(hits, key=lambda h: (date_key(h), -h.get("weight", 0)))
|
||||
with path.open("w", encoding="utf-8", newline="") as f:
|
||||
w = csv.DictWriter(f, fieldnames=fields, extrasaction="ignore")
|
||||
w.writeheader()
|
||||
for h in sorted_hits:
|
||||
row = dict(h)
|
||||
row["snippet"] = (row.get("snippet") or "").replace("\n", " ").strip()
|
||||
w.writerow(row)
|
||||
return len(sorted_hits)
|
||||
|
||||
|
||||
def _set_cell_bold(cell, bold=True):
|
||||
for p in cell.paragraphs:
|
||||
for r in p.runs:
|
||||
r.bold = bold
|
||||
|
||||
|
||||
def write_docx(path, hits, topic):
|
||||
doc = Document()
|
||||
|
||||
# Margins
|
||||
for s in doc.sections:
|
||||
s.top_margin = Cm(2.0)
|
||||
s.bottom_margin = Cm(2.0)
|
||||
s.left_margin = Cm(2.0)
|
||||
s.right_margin = Cm(2.0)
|
||||
|
||||
# Title
|
||||
t = doc.add_heading(f"EKOS — {topic} Bulguları", level=0)
|
||||
t.alignment = WD_ALIGN_PARAGRAPH.CENTER
|
||||
|
||||
sub = doc.add_paragraph()
|
||||
sub.alignment = WD_ALIGN_PARAGRAPH.CENTER
|
||||
r = sub.add_run("İstanbul Üniversitesi NEK Gazete Arşivi (1928-1942)")
|
||||
r.italic = True; r.font.size = Pt(11)
|
||||
|
||||
# Stats overview
|
||||
by_year = defaultdict(list)
|
||||
for h in hits:
|
||||
by_year[h["year"]].append(h)
|
||||
kw_counter = Counter(h["keyword"] for h in hits)
|
||||
slug_counter = Counter(h["slug"] for h in hits)
|
||||
priority_hits = [h for h in hits if h.get("priority_window")]
|
||||
|
||||
doc.add_paragraph()
|
||||
p = doc.add_paragraph()
|
||||
p.add_run("Üretim tarihi: ").bold = True
|
||||
p.add_run(datetime.now().strftime("%Y-%m-%d %H:%M"))
|
||||
p = doc.add_paragraph()
|
||||
p.add_run("Toplam vuruş: ").bold = True
|
||||
p.add_run(f"{len(hits)} ")
|
||||
p.add_run("Öncelikli pencere içinde: ").bold = True
|
||||
p.add_run(f"{len(priority_hits)}")
|
||||
p = doc.add_paragraph()
|
||||
p.add_run("Yıl aralığı: ").bold = True
|
||||
yrs = sorted(by_year)
|
||||
p.add_run(f"{yrs[0]} – {yrs[-1]} ")
|
||||
p.add_run("Gazete sayısı: ").bold = True
|
||||
p.add_run(f"{len(slug_counter)}")
|
||||
|
||||
# Yearly distribution table
|
||||
doc.add_heading("Yıllara Göre Dağılım", level=1)
|
||||
tbl = doc.add_table(rows=1, cols=3)
|
||||
tbl.style = "Light Grid Accent 1"
|
||||
hdr = tbl.rows[0].cells
|
||||
hdr[0].text = "Yıl"; hdr[1].text = "Toplam"; hdr[2].text = "Öncelikli"
|
||||
for c in hdr: _set_cell_bold(c, True)
|
||||
for y in yrs:
|
||||
row = tbl.add_row().cells
|
||||
row[0].text = y
|
||||
row[1].text = str(len(by_year[y]))
|
||||
row[2].text = str(sum(1 for h in by_year[y] if h.get("priority_window")))
|
||||
|
||||
# Keyword distribution
|
||||
doc.add_heading("Anahtar Kelime Dağılımı (top 20)", level=1)
|
||||
tbl = doc.add_table(rows=1, cols=2)
|
||||
tbl.style = "Light Grid Accent 1"
|
||||
hdr = tbl.rows[0].cells
|
||||
hdr[0].text = "Anahtar"; hdr[1].text = "Sayı"
|
||||
for c in hdr: _set_cell_bold(c, True)
|
||||
for kw, n in kw_counter.most_common(20):
|
||||
row = tbl.add_row().cells
|
||||
row[0].text = kw
|
||||
row[1].text = str(n)
|
||||
|
||||
# Slug productivity
|
||||
doc.add_heading("En Verimli Gazeteler", level=1)
|
||||
tbl = doc.add_table(rows=1, cols=2)
|
||||
tbl.style = "Light Grid Accent 1"
|
||||
hdr = tbl.rows[0].cells
|
||||
hdr[0].text = "Gazete"; hdr[1].text = "Vuruş"
|
||||
for c in hdr: _set_cell_bold(c, True)
|
||||
for slug, n in slug_counter.most_common(15):
|
||||
row = tbl.add_row().cells
|
||||
row[0].text = slug
|
||||
row[1].text = str(n)
|
||||
|
||||
# Top scored hits
|
||||
doc.add_page_break()
|
||||
doc.add_heading("En Yüksek Skorlu 30 Vuruş", level=1)
|
||||
doc.add_paragraph(
|
||||
"Skorlama: temel kelime ağırlığı + öncelikli pencere bonusu. "
|
||||
"Yüksek skorlu vuruşlar manuel okumada ilk öncelik."
|
||||
).italic = True
|
||||
top = sorted(hits, key=lambda h: -h.get("weight", 0))[:30]
|
||||
for i, h in enumerate(top, 1):
|
||||
m_pretty = MONTH_TR.get(h["month"].lower(), (h["month"], 0))[0]
|
||||
head = doc.add_paragraph()
|
||||
run = head.add_run(f"{i}. {h['slug']} — {h['year']} {m_pretty} {h['day']} — s. {h['page']}")
|
||||
run.bold = True; run.font.size = Pt(11)
|
||||
|
||||
meta = doc.add_paragraph()
|
||||
meta.add_run("Anahtar: ").bold = True
|
||||
meta.add_run(f"{h['keyword']} ")
|
||||
meta.add_run("Eşleşme: ").bold = True
|
||||
meta.add_run(f"{h['match']} ")
|
||||
meta.add_run("Skor: ").bold = True
|
||||
meta.add_run(f"{h.get('weight', 0)}")
|
||||
if h.get("priority_window"):
|
||||
wr = h.get("window_reason") or ""
|
||||
run2 = meta.add_run(f" [öncelikli: {wr[:60]}]")
|
||||
run2.italic = True
|
||||
run2.font.color.rgb = RGBColor(0xC0, 0x39, 0x2B)
|
||||
|
||||
sn = doc.add_paragraph()
|
||||
sn.paragraph_format.left_indent = Cm(0.6)
|
||||
sn_run = sn.add_run(h.get("snippet", ""))
|
||||
sn_run.italic = True; sn_run.font.size = Pt(10)
|
||||
|
||||
url_p = doc.add_paragraph()
|
||||
url_p.paragraph_format.left_indent = Cm(0.6)
|
||||
url_run = url_p.add_run(f"PDF: {h['url']}")
|
||||
url_run.font.size = Pt(8)
|
||||
url_run.font.color.rgb = RGBColor(0x55, 0x55, 0x55)
|
||||
|
||||
# Per-year sections
|
||||
for year in yrs:
|
||||
doc.add_page_break()
|
||||
year_hits = sorted(by_year[year],
|
||||
key=lambda h: (date_key(h), -h.get("weight", 0)))
|
||||
prio = sum(1 for h in year_hits if h.get("priority_window"))
|
||||
doc.add_heading(f"{year} ({len(year_hits)} vuruş, {prio} öncelikli)",
|
||||
level=1)
|
||||
|
||||
# Quick keyword summary for the year
|
||||
yk = Counter(h["keyword"] for h in year_hits)
|
||||
s = doc.add_paragraph()
|
||||
s.add_run("Anahtar dağılımı: ").bold = True
|
||||
s.add_run(", ".join(f"{k}({v})" for k, v in yk.most_common(8)))
|
||||
|
||||
# Group by date
|
||||
by_date = defaultdict(list)
|
||||
for h in year_hits:
|
||||
key = (h["year"], h["month"], h["day"])
|
||||
by_date[key].append(h)
|
||||
|
||||
for dk in sorted(by_date, key=lambda k: (
|
||||
int(k[0]),
|
||||
MONTH_TR.get(k[1].lower(), (k[1], 99))[1],
|
||||
int(k[2]) if str(k[2]).isdigit() else 99)):
|
||||
y, m, d = dk
|
||||
m_pretty = MONTH_TR.get(m.lower(), (m, 0))[0]
|
||||
doc.add_heading(f"{y} {m_pretty} {d}", level=3)
|
||||
for h in by_date[dk]:
|
||||
p = doc.add_paragraph()
|
||||
p.add_run(f"{h['slug']} ").bold = True
|
||||
p.add_run(f"s.{h['page']} — ")
|
||||
kr = p.add_run(h["keyword"])
|
||||
kr.bold = True
|
||||
if h.get("priority_window"):
|
||||
kr.font.color.rgb = RGBColor(0xC0, 0x39, 0x2B)
|
||||
p.add_run(f" (skor {h.get('weight', 0)})")
|
||||
|
||||
sn = doc.add_paragraph()
|
||||
sn.paragraph_format.left_indent = Cm(0.5)
|
||||
sr = sn.add_run(h.get("snippet", ""))
|
||||
sr.italic = True; sr.font.size = Pt(9)
|
||||
|
||||
# Footer
|
||||
doc.add_page_break()
|
||||
f = doc.add_paragraph()
|
||||
f.alignment = WD_ALIGN_PARAGRAPH.CENTER
|
||||
fr = f.add_run(f"Otomatik üretildi: ekos-gazete-search skill, "
|
||||
f"{datetime.now().strftime('%Y-%m-%d %H:%M')}")
|
||||
fr.italic = True; fr.font.size = Pt(9)
|
||||
|
||||
doc.save(str(path))
|
||||
|
||||
|
||||
def main():
|
||||
ap = argparse.ArgumentParser(description=__doc__,
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter)
|
||||
ap.add_argument("--hits", default=str(HERE / "hits/kirim.jsonl"))
|
||||
ap.add_argument("--topic", default="Kirim")
|
||||
ap.add_argument("--out-dir", default=str(HERE / "reports"))
|
||||
args = ap.parse_args()
|
||||
|
||||
hits_path = Path(args.hits)
|
||||
if not hits_path.exists() or hits_path.stat().st_size == 0:
|
||||
print(f"[!] No hits at {hits_path}")
|
||||
return
|
||||
|
||||
hits = load_hits(hits_path)
|
||||
print(f"Loaded {len(hits)} hits from {hits_path}")
|
||||
|
||||
out_dir = Path(args.out_dir)
|
||||
out_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
csv_path = out_dir / f"EKOS-{args.topic}.csv"
|
||||
docx_path = out_dir / f"EKOS-{args.topic}-Rapor.docx"
|
||||
|
||||
n = write_csv(csv_path, hits)
|
||||
print(f" ✓ CSV ({n} rows) → {csv_path}")
|
||||
|
||||
write_docx(docx_path, hits, args.topic)
|
||||
print(f" ✓ DOCX → {docx_path}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
173
personas/_shared/skills/ekos-gazete-search/scripts/lib/fuzzy.py
Normal file
173
personas/_shared/skills/ekos-gazete-search/scripts/lib/fuzzy.py
Normal file
@@ -0,0 +1,173 @@
|
||||
"""
|
||||
OCR-tolerant fuzzy regex builder for Turkish text.
|
||||
|
||||
Strategy: 2014-vintage Turkish OCR systematically destroys diacritics.
|
||||
Each character is replaced with a character class that covers all
|
||||
plausible OCR misreadings. See: keywords/kirim.yaml notes column.
|
||||
"""
|
||||
import re
|
||||
import unicodedata
|
||||
|
||||
# Character → tolerant character class mapping.
|
||||
# Order matters in DIACRITIC_CLASSES: lookup is case-folded.
|
||||
DIACRITIC_CLASSES = {
|
||||
# The ı/i/I/İ family — the most damaged
|
||||
'i': r'[1iIıİlj|!]',
|
||||
'ı': r'[1iIıİlj|!]',
|
||||
# Sibilants
|
||||
's': r'[s$ş]',
|
||||
'ş': r'[s$ş~]',
|
||||
# Plosives
|
||||
'c': r'[cç]',
|
||||
'ç': r'[cç]',
|
||||
'g': r'[gğ]',
|
||||
'ğ': r'[gğq]',
|
||||
# Vowels
|
||||
'u': r'(?:[uü]|ii)',
|
||||
'ü': r'(?:[uü]|ii)',
|
||||
'o': r'[oö0]',
|
||||
'ö': r'[oö0]',
|
||||
'a': r'[aâå]',
|
||||
'â': r'[aâå]',
|
||||
'e': r'[eé]',
|
||||
}
|
||||
|
||||
# Non-letter separators in OCR can be space, dash, underscore, tilde, dot.
|
||||
WORD_SEP = r'[\s\-_~.,]+'
|
||||
|
||||
|
||||
def turkish_lower(s: str) -> str:
|
||||
"""Turkish-aware lowercase: İ→i, I→ı."""
|
||||
return s.replace('İ', 'i').replace('I', 'ı').lower()
|
||||
|
||||
|
||||
def build_pattern(word: str) -> str:
|
||||
"""Build OCR-tolerant regex for a single word or phrase."""
|
||||
parts = []
|
||||
for ch in word:
|
||||
lower = turkish_lower(ch)
|
||||
if lower in DIACRITIC_CLASSES:
|
||||
parts.append(DIACRITIC_CLASSES[lower])
|
||||
elif ch == ' ':
|
||||
parts.append(WORD_SEP)
|
||||
elif ch.isalpha():
|
||||
# Plain ASCII letter — case-insensitive
|
||||
parts.append(f'[{ch.lower()}{ch.upper()}]')
|
||||
else:
|
||||
parts.append(re.escape(ch))
|
||||
# Word boundaries: \b doesn't work well with character classes,
|
||||
# so use lookarounds for non-letter context.
|
||||
return r'(?<![\wıİşŞçÇğĞüÜöÖâÂ])' + ''.join(parts) + r'(?![\wıİşŞçÇğĞüÜöÖâÂ])'
|
||||
|
||||
|
||||
def build_pattern_with_suffixes(word: str, suffixes: list = None) -> str:
|
||||
"""Build pattern allowing optional Turkish suffixes."""
|
||||
base = build_pattern(word)
|
||||
# Strip trailing boundary, add suffix group, re-add boundary
|
||||
base_no_end = base[:-len(r'(?![\wıİşŞçÇğĞüÜöÖâÂ])')]
|
||||
if suffixes:
|
||||
suffix_alts = '|'.join(re.escape(s) for s in suffixes)
|
||||
suffix_group = rf'(?:{suffix_alts})?'
|
||||
return base_no_end + suffix_group + r'(?![\wıİşŞçÇğĞüÜöÖâÂ])'
|
||||
return base
|
||||
|
||||
|
||||
def compile_keyword_set(keyword_data: dict) -> list:
|
||||
"""
|
||||
Compile a YAML keyword set into a list of (label, weight, regex) tuples.
|
||||
Higher-weight matches surface first in reports.
|
||||
"""
|
||||
compiled = []
|
||||
# Main keywords
|
||||
for kw in keyword_data.get('keywords', []):
|
||||
canonical = kw['canonical']
|
||||
aliases = kw.get('aliases', [])
|
||||
suffixes = kw.get('suffixes', [])
|
||||
weight = kw.get('weight', 3)
|
||||
for term in [canonical] + aliases:
|
||||
try:
|
||||
pat = build_pattern_with_suffixes(term, suffixes)
|
||||
compiled.append((canonical, weight, re.compile(pat, re.IGNORECASE | re.UNICODE)))
|
||||
except re.error as e:
|
||||
print(f" [warn] regex compile failed for {term!r}: {e}")
|
||||
# Proper nouns (smoking guns)
|
||||
for pn in keyword_data.get('proper_nouns', []):
|
||||
canonical = pn['canonical']
|
||||
aliases = pn.get('aliases', [])
|
||||
weight = pn.get('weight', 5)
|
||||
for term in [canonical] + aliases:
|
||||
try:
|
||||
pat = build_pattern(term)
|
||||
compiled.append((canonical, weight, re.compile(pat, re.IGNORECASE | re.UNICODE)))
|
||||
except re.error as e:
|
||||
print(f" [warn] regex compile failed for {term!r}: {e}")
|
||||
return compiled
|
||||
|
||||
|
||||
def compile_disambiguators(keyword_data: dict) -> list:
|
||||
"""Compile false-positive filter patterns."""
|
||||
return [
|
||||
re.compile(build_pattern(term), re.IGNORECASE | re.UNICODE)
|
||||
for term in keyword_data.get('disambiguators', [])
|
||||
]
|
||||
|
||||
|
||||
def is_false_positive(text: str, match_start: int, match_end: int,
|
||||
disambiguators: list, window: int = 50) -> bool:
|
||||
"""Check if match falls inside a disambiguator (e.g., 'Kerim Bey' near 'Kırım')."""
|
||||
win_start = max(0, match_start - window)
|
||||
win_end = min(len(text), match_end + window)
|
||||
window_text = text[win_start:win_end]
|
||||
for dis_re in disambiguators:
|
||||
if dis_re.search(window_text):
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def extract_snippet(text: str, match_start: int, match_end: int,
|
||||
radius: int = 200) -> str:
|
||||
"""Extract a clean ±radius snippet around a match."""
|
||||
s = max(0, match_start - radius)
|
||||
e = min(len(text), match_end + radius)
|
||||
snip = text[s:e]
|
||||
# Collapse whitespace, drop weird control chars
|
||||
snip = re.sub(r'\s+', ' ', snip).strip()
|
||||
snip = ''.join(c for c in snip if c.isprintable() or c in ' \n')
|
||||
return snip
|
||||
|
||||
|
||||
def co_occurrence_score(text: str, term_a: str, term_b: str,
|
||||
compiled_patterns: dict, window: int = 300) -> int:
|
||||
"""
|
||||
Count how many times term_a and term_b appear within `window` chars of each other.
|
||||
Used by report renderer for boost scoring.
|
||||
"""
|
||||
if term_a not in compiled_patterns or term_b not in compiled_patterns:
|
||||
return 0
|
||||
a_positions = [m.start() for m in compiled_patterns[term_a].finditer(text)]
|
||||
b_positions = [m.start() for m in compiled_patterns[term_b].finditer(text)]
|
||||
score = 0
|
||||
for ap in a_positions:
|
||||
for bp in b_positions:
|
||||
if abs(ap - bp) <= window:
|
||||
score += 1
|
||||
return score
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
# Smoke test
|
||||
test_words = ['Kırım', 'Bahçesaray', 'Cafer Seydahmet', 'İsmail Gaspıralı']
|
||||
test_text = """
|
||||
OCR çöplüğü:
|
||||
K1r1m Tatarlari hakkinda bir haber.
|
||||
Bahcesaray'da bir hadise.
|
||||
K~r~m Hanl1g1 tarihi.
|
||||
Cafer Seydamet Bey istanbula geldi.
|
||||
Ismail Gaspirali'nin 1934 anma toplantisi.
|
||||
Kerim Bey ile karistirma — bu yanlis pozitif.
|
||||
"""
|
||||
for w in test_words:
|
||||
pat = build_pattern(w)
|
||||
print(f"\n{w!r} → {pat}")
|
||||
for m in re.finditer(pat, test_text, re.IGNORECASE):
|
||||
print(f" hit: {m.group(0)!r} @ {m.start()}")
|
||||
@@ -0,0 +1,3 @@
|
||||
requests>=2.31
|
||||
PyYAML>=6.0
|
||||
python-docx>=1.0
|
||||
50
personas/_shared/skills/ekos-gazete-search/scripts/run_capped.sh
Executable file
50
personas/_shared/skills/ekos-gazete-search/scripts/run_capped.sh
Executable file
@@ -0,0 +1,50 @@
|
||||
#!/usr/bin/env bash
|
||||
# Run the EKOS PDF searcher inside a transient systemd user-unit with
|
||||
# CPU + memory caps. All extra args are forwarded to 02_search_pdfs.py.
|
||||
#
|
||||
# Profile env vars (override before invocation):
|
||||
# EKOS_CPU_QUOTA default 300% (3 cores)
|
||||
# EKOS_MEM_MAX default 3G
|
||||
# EKOS_UNIT default ekos-search-<timestamp>
|
||||
#
|
||||
# Examples:
|
||||
# bash scripts/run_capped.sh --slug cumhuriyet --priority-only --year-to 1931 --workers 2
|
||||
# EKOS_CPU_QUOTA=500% EKOS_MEM_MAX=4G bash scripts/run_capped.sh --priority-only --workers 4
|
||||
#
|
||||
# Monitor:
|
||||
# systemctl --user status <unit>
|
||||
# journalctl --user -u <unit> -f
|
||||
# systemctl --user stop <unit>
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
|
||||
PY="${HERE}/.venv/bin/python"
|
||||
SCRIPT="${HERE}/scripts/02_search_pdfs.py"
|
||||
|
||||
CPU_QUOTA="${EKOS_CPU_QUOTA:-300%}"
|
||||
MEM_MAX="${EKOS_MEM_MAX:-3G}"
|
||||
UNIT="${EKOS_UNIT:-ekos-search-$(date +%Y%m%d-%H%M%S)}"
|
||||
|
||||
if [[ ! -x "$PY" ]]; then
|
||||
echo "venv not found: $PY" >&2
|
||||
echo "create with: cd $HERE && python3 -m venv .venv && .venv/bin/pip install -r scripts/requirements.txt" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "Unit: $UNIT"
|
||||
echo "CPUQuota: $CPU_QUOTA"
|
||||
echo "MemoryMax: $MEM_MAX"
|
||||
echo "Forward: $*"
|
||||
echo
|
||||
|
||||
exec systemd-run --user \
|
||||
--unit="$UNIT" \
|
||||
--working-directory="$HERE" \
|
||||
-p "CPUQuota=$CPU_QUOTA" \
|
||||
-p "MemoryMax=$MEM_MAX" \
|
||||
-p "MemorySwapMax=1G" \
|
||||
-p "Nice=10" \
|
||||
-p "IOWeight=50" \
|
||||
--setenv=PYTHONUNBUFFERED=1 \
|
||||
"$PY" "$SCRIPT" "$@"
|
||||
229
personas/_shared/skills/telegram/SKILL.md
Normal file
229
personas/_shared/skills/telegram/SKILL.md
Normal file
@@ -0,0 +1,229 @@
|
||||
---
|
||||
name: telegram
|
||||
description: Use when reading, searching, sending, or managing Telegram messages and folders for the user's personal account. Triggers on "Telegram'a mesaj gönder", "şu kanaldan son mesajları getir", "Telegram'da ara", "okunmamış mesajlar", "Telegram klasörlerini güncelle", "yeni kanalları kategorize et", "fetch telegram dialogs", "telegram inbox", "@username'e şunu yaz", or any direct mention of fetch_all/tg_read/tg_send/tg_search/tg_inbox/apply_folders. Also covers the Telethon-based pipeline at /home/salva/Documents/telegram (auth, session, channels.json, assignments.json).
|
||||
---
|
||||
|
||||
# Telegram Operator (Telethon)
|
||||
|
||||
## Overview
|
||||
|
||||
Read, search, send, manage, and organize the user's Telegram personal account from the command line via Telethon. All scripts share one venv and one `.session` file at `/home/salva/Documents/telegram/`.
|
||||
|
||||
```
|
||||
┌── tg_read.py (fetch from a chat)
|
||||
├── tg_send.py (send text/file, reply, silent)
|
||||
Telethon client ────┼── tg_search.py (global / scoped search)
|
||||
(one .session) ├── tg_inbox.py (unread overview, mark-read)
|
||||
└── folder pipeline:
|
||||
fetch_all.py → build_assignments.py → apply_folders.py
|
||||
```
|
||||
|
||||
## Project location
|
||||
|
||||
`/home/salva/Documents/telegram/`
|
||||
|
||||
| File | Role |
|
||||
|---|---|
|
||||
| `api.txt` | api_id / api_hash from my.telegram.org. **Do not commit.** |
|
||||
| `config.py` | Loads creds → `API_ID`, `API_HASH`, `SESSION_NAME` |
|
||||
| `telegram_session.session` | Telethon SQLite session. **Do not delete unless re-login needed.** |
|
||||
| `venv/` | Project venv, activate with `source venv/bin/activate` |
|
||||
| `requirements.txt` | `telethon>=1.43.1` |
|
||||
| `tg_utils.py` | Shared helpers: `resolve_chat`, `fmt_msg`, `confirm`, `parse_date` |
|
||||
| `tg_read.py` | Read messages from a chat |
|
||||
| `tg_send.py` | Send text and/or file (interactive confirm by default) |
|
||||
| `tg_search.py` | Search messages globally or in one chat |
|
||||
| `tg_inbox.py` | Unread overview + mark-as-read (single or bulk) |
|
||||
| `fetch_all.py` | Snapshot all dialogs + 40 messages each → `data/channels.json` |
|
||||
| `build_assignments.py` | Static id→folder map → `data/assignments.json` |
|
||||
| `apply_folders.py` | Push folder layout to Telegram (interactive y/N) |
|
||||
| `categorize.py` | Library helper used if pipeline grows beyond static dict |
|
||||
| `data/` | All JSON outputs (channels, assignments, compact, names.tsv, …) |
|
||||
|
||||
## Setup
|
||||
|
||||
```bash
|
||||
cd /home/salva/Documents/telegram
|
||||
source venv/bin/activate # or: python3 -m venv venv && pip install -r requirements.txt
|
||||
```
|
||||
|
||||
First run on a new machine: any script will prompt for phone number → SMS code → 2FA password (if set), then writes `telegram_session.session`. Subsequent runs are silent.
|
||||
|
||||
If you see `AuthKeyUnregisteredError` or `SessionPasswordNeededError` after a long absence: delete the `.session` file and re-login.
|
||||
|
||||
## Common chat references
|
||||
|
||||
Every script accepts the same `chat` argument forms:
|
||||
|
||||
- `"@username"` — public username (channels, bots, users)
|
||||
- `12345` or `-1001234567890` — numeric id (positive = user, negative = group/channel)
|
||||
- `"some name"` — case-insensitive substring of the dialog name; errors out if 0 or >1 matches, listing the candidates
|
||||
- `"me"` / `"self"` — Saved Messages (your own DM-to-self chat)
|
||||
|
||||
## Reading
|
||||
|
||||
```bash
|
||||
python tg_read.py "@durov" # last 20 messages
|
||||
python tg_read.py "Born2beroot" --limit 100
|
||||
python tg_read.py "@channel" --since 2026-04-01 # since a date
|
||||
python tg_read.py "@x" --search "CVE" # filter inside the chat
|
||||
python tg_read.py "@x" --json # machine-readable output
|
||||
python tg_read.py "@x" --mark-read # also clear the unread badge
|
||||
```
|
||||
|
||||
Output format: `id │ YYYY-MM-DD HH:MM │ sender(20) │ text(200)` — one row per message, `--json` flips to a JSON array with `{id,date,sender_id,text,has_media,reply_to}` per item.
|
||||
|
||||
## Sending
|
||||
|
||||
```bash
|
||||
python tg_send.py "@user" "Hello" # interactive [y/N]
|
||||
python tg_send.py "me" "note to self" --yes # auto-confirm
|
||||
python tg_send.py "@chan" "Caption" --file report.pdf
|
||||
python tg_send.py "@x" "" --file img.png --caption "ss" # file-only
|
||||
python tg_send.py "@x" "Reply" --reply-to 12345
|
||||
python tg_send.py "@x" "ping" --silent # no notification
|
||||
python tg_send.py "@x" "<b>bold</b>" --parse html
|
||||
```
|
||||
|
||||
**Send safety policy** — by default, `tg_send.py` prints a preview of the destination + payload and asks `Gönder? [y/N]` before transmitting. Pass `--yes` (or `-y`) to skip the prompt for scripted/automated runs. This matches the convention used by `apply_folders.py`.
|
||||
|
||||
Default text parse mode is **markdown**. Use `--parse html` for HTML-style entities (`<b>`, `<i>`, `<a href=…>`), or `--parse none` for plain.
|
||||
|
||||
## Searching
|
||||
|
||||
```bash
|
||||
python tg_search.py "ransomware" # global, last 50 hits
|
||||
python tg_search.py "Putin" --since 2026-04-01 -n 200
|
||||
python tg_search.py "kitap" --chat "E Kitap PDF" # scoped to one chat
|
||||
python tg_search.py "report" --chat me
|
||||
```
|
||||
|
||||
Global search uses Telegram's server-side message index. Each hit is prefixed with the chat's title in `[brackets]`. Scoped search (`--chat`) is faster and avoids the per-chat title resolution lookup.
|
||||
|
||||
## Inbox / unread management
|
||||
|
||||
```bash
|
||||
python tg_inbox.py # ranked by unread count
|
||||
python tg_inbox.py --top 20
|
||||
python tg_inbox.py --include-archived # include archived folder
|
||||
python tg_inbox.py --mark-read "Born2bero" # clear ONE chat
|
||||
python tg_inbox.py --mark-all-read # clear EVERY unread (asks y/N)
|
||||
python tg_inbox.py --mark-all-read --yes # … or skip prompt
|
||||
```
|
||||
|
||||
The bulk `--mark-all-read` is destructive on the unread badge state and irreversible — there is no "mark-as-unread" RPC. The script always confirms unless `--yes`.
|
||||
|
||||
## Folder pipeline (≈600 dialogs → 9 folders)
|
||||
|
||||
3-stage workflow for organizing dialogs into Telegram client-side folders:
|
||||
|
||||
```bash
|
||||
python fetch_all.py # ~1-3 min, refreshes data/channels.json
|
||||
python build_assignments.py # warns about ⚠ unassigned ids
|
||||
# → if warnings: edit build_assignments.py:A, add the new ids, rerun
|
||||
python apply_folders.py # interactive y/N to push to Telegram
|
||||
```
|
||||
|
||||
### Folder schema (current — titles capped at 12 chars by Telegram)
|
||||
|
||||
| Emoji | Title | Scope |
|
||||
|---|---|---|
|
||||
| 🛡 | `Güvenlik` | Cybersec, hacking, intel feeds, OSINT, ham radio |
|
||||
| ☁ | `Logs & Cloud` | Cloud account dumps, ULP/redline logs, cracked services |
|
||||
| ⚔ | `Rus-Ukrayna` | Russia/Ukraine war channels, both sides + Western trackers |
|
||||
| 🕌 | `Ortadoğu` | Middle East news (Arabic/Persian/Turkish/English) |
|
||||
| 🎖 | `Askeri Jeo` | Turkish military, geopolitics, MGK, defense industry |
|
||||
| 📚 | `E-Kitap` | E-books, audiobooks, manga, KPSS/YKS material |
|
||||
| 🌐 | `Dil & Kurs` | Russian/Swahili/English language groups, Udemy/PacktPub |
|
||||
| 📈 | `Finans` | Borsa İstanbul, trading, stock tips, central bank |
|
||||
| 💬 | `Sosyal` | Twitch, social, hobby groups, anything else |
|
||||
|
||||
Full id→folder map: `build_assignments.py:A` (~260 entries). Edit the dict, **never** edit `data/assignments.json` directly — `build_assignments.py` regenerates it.
|
||||
|
||||
### New-channel triage (unassigned id heuristic)
|
||||
|
||||
When `build_assignments.py` reports `⚠ assignment eksik`, read the channel's name and first messages from `data/channels.json`, then assign by these rules (first match wins):
|
||||
|
||||
```
|
||||
HACK / CVE / exploit / SOC / OSINT / red team / siber → Güvenlik
|
||||
cloud-free / ulp / redline / cracked / leaked logs / vbv → Logs & Cloud
|
||||
Ukraine / Russia / Donbas / Kyiv / Москва (war context) → Rus-Ukrayna
|
||||
Arabic-script (ar/fa) news, Israel/Gaza/Syria/Iran → Ortadoğu
|
||||
TSK / SİHA / NATO / geopolitics / military doctrine → Askeri Jeo
|
||||
PDF / kitap / e-book / sesli kitap / manga / KPSS / YKS → E-Kitap
|
||||
Udemy / Coursera / Russian/Swahili/Arabic/French/IELTS → Dil & Kurs
|
||||
borsa / hisse / trading / forex / kripto → Finans
|
||||
twitch / hobby / chat / barahol / banter → Sosyal
|
||||
```
|
||||
|
||||
Edge cases:
|
||||
- Russia/Ukraine **doctrine** (not war news) → `Askeri Jeo`, not `Rus-Ukrayna`.
|
||||
- Stock-tip Udemy channels → `Finans`, not `Dil & Kurs`.
|
||||
- Sesli Kitap / Manga / KPSS folded into `E-Kitap`.
|
||||
|
||||
## Telethon API constraints
|
||||
|
||||
- `DialogFilter.id` 0 and 1 are reserved (All Chats, etc.); `apply_folders.py` skips them.
|
||||
- Folder titles capped at **12 characters** by Telegram. Telegram allows up to 30 folders (100 with Premium); current schema uses 9.
|
||||
- `iter_dialogs(archived=None)` returns both normal and archived; `archived=False` (default in `tg_inbox.py`) returns only normal.
|
||||
- `iter_messages(entity, search=...)` is server-side full-text; `iter_messages(None, search=...)` is the global search.
|
||||
- Rate limits: don't run `fetch_all.py` more than ~once per hour for accounts with many dialogs (`FloodWaitError`). For sending in tight loops, sleep ≥1s between messages or be ready to handle `FloodWait`.
|
||||
- `client.send_read_acknowledge(entity)` clears unread; there is no inverse RPC to mark unread.
|
||||
|
||||
## Auth & secrets
|
||||
|
||||
- `api.txt` and `telegram_session.session` are **as good as a password**: anyone with both can read all your messages and send as you. Keep them out of git, dotfiles sync, and shared backups.
|
||||
- The MTProto session is bound to the device fingerprint Telethon presents. Telegram → Settings → Devices lists active sessions; revoke "TelegramTUI / linux" entries you don't recognize.
|
||||
- 2FA password (cloud password) is **not** stored in `.session`; you'll be prompted on first login if it's set.
|
||||
|
||||
## When NOT to run
|
||||
|
||||
- `apply_folders.py` overwrites each folder's `include_peers` — manual folder rearrangements in the Telegram client are lost. Always confirm before pushing.
|
||||
- `tg_send.py` and `tg_inbox.py --mark-all-read` are destructive in the "user-visible-side-effect" sense; default behavior is interactive confirm. Don't `--yes` blindly in a script unless the destination/payload is hard-coded and reviewed.
|
||||
- `fetch_all.py` more than ~hourly: triggers `FloodWaitError` for large accounts.
|
||||
|
||||
## Snippets cookbook
|
||||
|
||||
```python
|
||||
# One-off custom run inside the same venv:
|
||||
import asyncio
|
||||
from telethon import TelegramClient
|
||||
from config import API_ID, API_HASH, SESSION_NAME
|
||||
from tg_utils import resolve_chat, fmt_msg
|
||||
|
||||
async def main():
|
||||
async with TelegramClient(SESSION_NAME, API_ID, API_HASH) as c:
|
||||
e = await resolve_chat(c, "@durov")
|
||||
async for m in c.iter_messages(e, limit=5):
|
||||
print(fmt_msg(m))
|
||||
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
```python
|
||||
# Live monitoring (event handler):
|
||||
from telethon import events
|
||||
@client.on(events.NewMessage(chats=["@channel1", "@channel2"]))
|
||||
async def handler(event):
|
||||
print(event.chat.title, event.message.text)
|
||||
client.run_until_disconnected()
|
||||
```
|
||||
|
||||
```python
|
||||
# Forward N messages from A to B:
|
||||
msgs = await client.get_messages(src_entity, ids=[101, 102, 103])
|
||||
await client.forward_messages(dst_entity, msgs)
|
||||
```
|
||||
|
||||
```python
|
||||
# Download all media from a chat into ./media/:
|
||||
async for msg in client.iter_messages(entity, limit=100):
|
||||
if msg.media:
|
||||
await msg.download_media(file="media/")
|
||||
```
|
||||
|
||||
## Related skills
|
||||
|
||||
- `obsidian-tasks` — track Telegram-organization items as tasks.
|
||||
- `news-crawler`, `freshrss`, `freshrss-reader` — alternative news ingestion paths; `Askeri Jeo`/`Ortadoğu` Telegram channels overlap with FreshRSS feeds.
|
||||
- `obsidian-linux` — once messages are extracted, can convert into vault notes via `notesmd-cli`.
|
||||
98
personas/_shared/skills/telegram/scripts/apply_folders.py
Normal file
98
personas/_shared/skills/telegram/scripts/apply_folders.py
Normal file
@@ -0,0 +1,98 @@
|
||||
"""
|
||||
ADIM 3 — data/assignments.json'daki 10 klasörü Telegram'da oluştur/güncelle.
|
||||
|
||||
assignments.json formatı:
|
||||
{
|
||||
"folders": [{"title": "...", "emoticon": "🛡"}, ...],
|
||||
"assignments": {"<channel_id>": "FolderTitle", ...}
|
||||
}
|
||||
"""
|
||||
import asyncio
|
||||
import json
|
||||
from collections import defaultdict
|
||||
from pathlib import Path
|
||||
|
||||
from telethon import TelegramClient
|
||||
from telethon.tl.functions.messages import (
|
||||
GetDialogFiltersRequest,
|
||||
UpdateDialogFilterRequest,
|
||||
)
|
||||
from telethon.tl.types import DialogFilter, TextWithEntities
|
||||
|
||||
from config import API_HASH, API_ID, SESSION_NAME
|
||||
|
||||
DATA_FILE = Path(__file__).parent / "data" / "assignments.json"
|
||||
|
||||
|
||||
def _title_text(f) -> str:
|
||||
t = getattr(f, "title", None)
|
||||
if t is None:
|
||||
return ""
|
||||
return t.text if hasattr(t, "text") else str(t)
|
||||
|
||||
|
||||
async def main() -> None:
|
||||
cfg = json.loads(DATA_FILE.read_text(encoding="utf-8"))
|
||||
folders_meta = cfg["folders"] # sıralı, emoji'li
|
||||
assignments: dict[str, str] = cfg["assignments"] # "id" -> title
|
||||
|
||||
buckets: dict[str, list[int]] = defaultdict(list)
|
||||
for sid, title in assignments.items():
|
||||
buckets[title].append(int(sid))
|
||||
|
||||
print("Önizleme:")
|
||||
for f in folders_meta:
|
||||
n = len(buckets.get(f["title"], []))
|
||||
print(f" {f['emoticon']} {f['title']:<22} {n:>3} sohbet")
|
||||
|
||||
if input("\nTelegram'a uygula? [y/N]: ").strip().lower() != "y":
|
||||
print("iptal.")
|
||||
return
|
||||
|
||||
async with TelegramClient(SESSION_NAME, API_ID, API_HASH) as client:
|
||||
resp = await client(GetDialogFiltersRequest())
|
||||
existing = resp.filters if hasattr(resp, "filters") else resp
|
||||
by_title: dict[str, DialogFilter] = {}
|
||||
used_ids: set[int] = {0, 1}
|
||||
for f in existing:
|
||||
if isinstance(f, DialogFilter):
|
||||
by_title[_title_text(f)] = f
|
||||
used_ids.add(f.id)
|
||||
|
||||
next_id = max(used_ids) + 1
|
||||
|
||||
for fmeta in folders_meta:
|
||||
title = fmeta["title"]
|
||||
ids = buckets.get(title, [])
|
||||
include_peers = []
|
||||
for cid in ids:
|
||||
try:
|
||||
include_peers.append(await client.get_input_entity(cid))
|
||||
except Exception as e:
|
||||
print(f" ! {cid} eklenemedi: {e}")
|
||||
|
||||
if title in by_title:
|
||||
fid = by_title[title].id
|
||||
action = "güncellendi"
|
||||
else:
|
||||
fid = next_id
|
||||
next_id += 1
|
||||
action = "oluşturuldu"
|
||||
|
||||
df = DialogFilter(
|
||||
id=fid,
|
||||
title=TextWithEntities(text=title, entities=[]),
|
||||
pinned_peers=[],
|
||||
include_peers=include_peers,
|
||||
exclude_peers=[],
|
||||
contacts=False, non_contacts=False, groups=False,
|
||||
broadcasts=False, bots=False,
|
||||
exclude_muted=False, exclude_read=False, exclude_archived=False,
|
||||
emoticon=fmeta.get("emoticon"),
|
||||
)
|
||||
await client(UpdateDialogFilterRequest(id=fid, filter=df))
|
||||
print(f"✓ {fmeta['emoticon']} {title} ({len(include_peers)}) — {action}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
296
personas/_shared/skills/telegram/scripts/build_assignments.py
Normal file
296
personas/_shared/skills/telegram/scripts/build_assignments.py
Normal file
@@ -0,0 +1,296 @@
|
||||
"""Claude'un elle yaptığı kategorizasyon → data/assignments.json."""
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
# Telegram klasör ismi en fazla 12 karakter.
|
||||
FOLDERS = [
|
||||
{"title": "Güvenlik", "emoticon": "🛡"},
|
||||
{"title": "Logs & Cloud", "emoticon": "☁"},
|
||||
{"title": "Rus-Ukrayna", "emoticon": "⚔"},
|
||||
{"title": "Ortadoğu", "emoticon": "🕌"},
|
||||
{"title": "Askeri Jeo", "emoticon": "🎖"},
|
||||
{"title": "E-Kitap", "emoticon": "📚"}, # Sesli Manga + KPSS da burada
|
||||
{"title": "Dil & Kurs", "emoticon": "🌐"},
|
||||
{"title": "Finans", "emoticon": "📈"}, # yeni: borsa/trading
|
||||
{"title": "Sosyal", "emoticon": "💬"}, # kalan sosyal/eğlence
|
||||
]
|
||||
|
||||
# id → folder title
|
||||
A = {
|
||||
# --- Siber Güvenlik ---
|
||||
-1003772746107: "Güvenlik", # Born2beroot
|
||||
-1001182095274: "Güvenlik", # Cyber Threat Intelligence Feeds
|
||||
-1001601457644: "Güvenlik", # Linux Türkiye Topluluğu
|
||||
-5245874036: "Güvenlik", # APT10
|
||||
-1001424015690: "Güvenlik", # Siber Kulüpler Birliği
|
||||
-1002044403490: "Güvenlik", # Siber Güvenlik Turkey
|
||||
-1001433765532: "Güvenlik", # Türkiye Amatör Telsiz
|
||||
-1001448773154: "Güvenlik", # tinyGS Community
|
||||
-1001248961775: "Güvenlik", # Geek Hacker
|
||||
-1001820205147: "Güvenlik", # PSD
|
||||
-1001486620605: "Güvenlik", # OpenStreetMap Türkiye
|
||||
-1001224374951: "Güvenlik", # Siberdinc
|
||||
-1001175709038: "Güvenlik", # Özgür Yazılım Derneği
|
||||
-1001102366261: "Güvenlik", # Dark Web Intelligence
|
||||
-1001705864902: "Güvenlik", # HackCodeRepeat
|
||||
-4895889925: "Güvenlik", # Born2beroot (küçük)
|
||||
-1001560793071: "Güvenlik", # ForenSec
|
||||
-1001099338447: "Güvenlik", # burpsuite (unofficial)
|
||||
-1002019701877: "Güvenlik", # sc_sibermagazin
|
||||
-1001968311017: "Güvenlik", # CVE
|
||||
-4762809106: "Güvenlik", # RaCONF'25
|
||||
-4960718009: "Güvenlik", # OpZ
|
||||
-1001425186624: "Güvenlik", # Zer0Day Lab
|
||||
-1001369540037: "Güvenlik", # inj3ct0r exploit db
|
||||
-1002601559408: "Güvenlik", # Garuda Error System
|
||||
-1002389372004: "Güvenlik", # AnonSec (hacker crew)
|
||||
|
||||
# --- Logs & Cloud ---
|
||||
-1002696769378: "Logs & Cloud", # Valide Cloud Free
|
||||
-1001628710143: "Logs & Cloud", # Omega Cloud
|
||||
-1001921972180: "Logs & Cloud", # Burn Cloud
|
||||
-1001939548708: "Logs & Cloud", # Trident Cloud
|
||||
-1001602298018: "Logs & Cloud", # Free xbox game pass
|
||||
-1002231096661: "Logs & Cloud", # Vpesports Xbox
|
||||
-1002592627432: "Logs & Cloud", # Cvv190 Cloud
|
||||
-1002107853176: "Logs & Cloud", # Plutonium logs
|
||||
-1002047552897: "Logs & Cloud", # Бесплатный лицензионный
|
||||
-1002575521311: "Logs & Cloud", # Darknes-Cloud
|
||||
-1002355411584: "Logs & Cloud", # Roves Cloud
|
||||
-1002025418650: "Logs & Cloud", # Valide Cloud FREE
|
||||
-1002849195507: "Logs & Cloud", # azef cloud
|
||||
-1001440229722: "Logs & Cloud", # Freedom F0x
|
||||
-1002294768789: "Logs & Cloud", # D49d3k ULP-Cloud
|
||||
-1002415889954: "Logs & Cloud", # scale invite
|
||||
-1001773319933: "Logs & Cloud", # CRYPTOLOGS REDLINE
|
||||
-1001578557816: "Logs & Cloud", # Link Arşivleri
|
||||
-1001672949739: "Logs & Cloud", # BerserkLogs
|
||||
|
||||
# --- Rus-Ukrayna ---
|
||||
-1001668977160: "Rus-Ukrayna", # Rybar in English
|
||||
-1001326223284: "Rus-Ukrayna", # Рыбарь
|
||||
-1001082968817: "Rus-Ukrayna", # Минобороны России
|
||||
-1001513431778: "Rus-Ukrayna", # Два майора
|
||||
-1001475819126: "Rus-Ukrayna", # Роскосмос
|
||||
-1001220606936: "Rus-Ukrayna", # STERNENKO
|
||||
-1001783035076: "Rus-Ukrayna", # TrackANaziMerc
|
||||
-1001003313758: "Rus-Ukrayna", # Новости Москвы
|
||||
-1001654562332: "Rus-Ukrayna", # TASS
|
||||
-1001386375324: "Rus-Ukrayna", # МВС України
|
||||
-1001173684180: "Rus-Ukrayna", # ЧП / Крым
|
||||
-1001310984791: "Rus-Ukrayna", # Intel Slava
|
||||
-1001747148099: "Rus-Ukrayna", # Судоплатов
|
||||
-1002121256650: "Rus-Ukrayna", # Угруповання об'єднаних сил
|
||||
-1001583313036: "Rus-Ukrayna", # АРХАНГЕЛ СПЕЦНАЗА
|
||||
-1001352726486: "Rus-Ukrayna", # INSIDER UA
|
||||
-1001669110938: "Rus-Ukrayna", # UNITED24Media
|
||||
-1001350274993: "Rus-Ukrayna", # Tim Kirby Russia Hardcore
|
||||
-1001463721328: "Rus-Ukrayna", # Zelenskiy Official
|
||||
-1001117303064: "Rus-Ukrayna", # Россия в глобальной политике
|
||||
-1001509172593: "Rus-Ukrayna", # monitorwar
|
||||
-1001222633586: "Rus-Ukrayna", # FEDOROV
|
||||
-1001469021333: "Rus-Ukrayna", # DeepState
|
||||
-1001616052141: "Rus-Ukrayna", # Проект «Хочу жить»
|
||||
-1001900958834: "Rus-Ukrayna", # Ігор Клименко МВС
|
||||
-1001617325371: "Rus-Ukrayna", # Десантно-штурмові війська ЗСУ
|
||||
-1002490955621: "Rus-Ukrayna", # DIPLOMATIE RUSSE
|
||||
-1001385909762: "Rus-Ukrayna", # Артем Дмитрук
|
||||
-1001764041965: "Rus-Ukrayna", # Kremlin News EN
|
||||
-1001790907266: "Rus-Ukrayna", # Кремль Новости RU
|
||||
-1003222724492: "Rus-Ukrayna", # Ionfall
|
||||
-1001936622736: "Rus-Ukrayna", # ЖАХ З НЕБЕС 123
|
||||
-1002029042694: "Rus-Ukrayna", # 123 омсбр
|
||||
-1002051535105: "Rus-Ukrayna", # 114 Бригада
|
||||
|
||||
# --- Ortadoğu ---
|
||||
-1002062736232: "Ortadoğu", # نايا - NAYA
|
||||
-1002059959435: "Ortadoğu", # UAE MoD (multilingual AR)
|
||||
-1001272529767: "Ortadoğu", # Middle East News
|
||||
-1001822461311: "Ortadoğu", # JHArnous
|
||||
-1002263475135: "Ortadoğu", # Syrian FM
|
||||
-1001226363458: "Ortadoğu", # Stay Free
|
||||
-1001048133085: "Ortadoğu", # تَأكّدْ
|
||||
-1001081687249: "Ortadoğu", # مركز الزيتونة AR
|
||||
-1001147346052: "Ortadoğu", # Al-Zaytouna EN
|
||||
-1002142228056: "Ortadoğu", # Elly_bar Israel-Hamas
|
||||
-1001797479924: "Ortadoğu", # بيان نيوز
|
||||
-1001180533415: "Ortadoğu", # Orient - أورينت
|
||||
-1001463836083: "Ortadoğu", # Suriye Milli Ordusu
|
||||
-1002280669663: "Ortadoğu", # خیابون انقلاب
|
||||
-1002450267230: "Ortadoğu", # خیابون انقلاب (dup)
|
||||
|
||||
# --- Askeri & Jeopolitik ---
|
||||
-1001173129471: "Askeri Jeo", # AZERTAC
|
||||
-1002143761332: "Askeri Jeo", # Askeri İstihbarat Sohbet
|
||||
-1001508782705: "Askeri Jeo", # 3. Dünya Savaşı
|
||||
-1001802903419: "Askeri Jeo", # Askeri İstihbarat TR
|
||||
-1001220118870: "Askeri Jeo", # Enformasyon
|
||||
-1001251299061: "Askeri Jeo", # SouthFront
|
||||
-1001699619673: "Askeri Jeo", # The Grayzone
|
||||
-1001689501969: "Askeri Jeo", # Fokus+
|
||||
-1001734228215: "Askeri Jeo", # People's Daily China
|
||||
-1001810182217: "Askeri Jeo", # Rerum Novarum
|
||||
-1002642181270: "Askeri Jeo", # Gallipoli General
|
||||
-1001834311682: "Askeri Jeo", # SOFTAÇAM
|
||||
-1001857092414: "Askeri Jeo", # FahrettinAltay_
|
||||
-1002334106447: "Askeri Jeo", # Source News
|
||||
-1001601338144: "Askeri Jeo", # ASKERİ HARP
|
||||
-1002388640996: "Askeri Jeo", # Military Vibe
|
||||
-1001055365200: "Askeri Jeo", # Nairobi News
|
||||
-1001127820109: "Askeri Jeo", # Bellingcat
|
||||
-990795574: "Askeri Jeo", # Milli Güvenlik Kurulu
|
||||
-1001381692248: "Askeri Jeo", # Rusya Ankara Büyükelçiliği
|
||||
|
||||
# --- E-Kitap ---
|
||||
-1001968002316: "E-Kitap", # E Kütüphanem
|
||||
-1001295770478: "E-Kitap", # Kitap Turşusu Premium
|
||||
-1001273763604: "E-Kitap", # Kitap Evreni
|
||||
-1001176839029: "E-Kitap", # e-kitap yardımlaşma
|
||||
-1003179138041: "E-Kitap", # Kitap Botu PDF
|
||||
-1001948357383: "E-Kitap", # PDF E Kitap İstek
|
||||
-1001267622915: "E-Kitap", # E Kitap Grup
|
||||
-1003339908160: "E-Kitap", # Kitap Arama Grubu
|
||||
-1001884485811: "E-Kitap", # Kitaplık Rafı
|
||||
-1001219338945: "E-Kitap", # e-Babil Kütüphanesi
|
||||
-1002761890261: "E-Kitap", # E kitap Roman PDF
|
||||
-1001379065337: "E-Kitap", # Dijital Kitap
|
||||
-1001436274859: "E-Kitap", # E-Kitap Oku
|
||||
-1002231474242: "E-Kitap", # E - Kitap PDF
|
||||
-1001837236620: "E-Kitap", # E-Kitap Paylaşım Sohbet
|
||||
-1001896451121: "E-Kitap", # Kitap Modu
|
||||
-1002123805391: "E-Kitap", # Kitap PDF Arşivi Roman Hikaye
|
||||
-1001651874667: "E-Kitap", # E Kitap PDF
|
||||
-1001869548408: "E-Kitap", # Aranan Kitapçık duyuru
|
||||
-1001741842267: "E-Kitap", # PDF Kitap Evreni
|
||||
-1002844665098: "E-Kitap", # PDF KİTAP
|
||||
-1001379762150: "E-Kitap", # Atatürk Pdf Kitap
|
||||
-1001380972711: "E-Kitap", # BUNDLE Kitap epub pdf
|
||||
-1002677555843: "E-Kitap", # Büyük Kitap Arşivi
|
||||
-1002084828902: "E-Kitap", # Kitapçı PDF Arşivi
|
||||
-1002502833110: "E-Kitap", # PDF KİTAP ROMAN HİKAYE
|
||||
-1002969079664: "E-Kitap", # PDF KİTAP ARŞİV
|
||||
-1003491537567: "E-Kitap", # YATIRIM KİTAPLARI
|
||||
-1001625595378: "E-Kitap", # Kütübhâne-i Tevârîh
|
||||
-1001616159980: "E-Kitap", # Books
|
||||
-1001916886683: "E-Kitap", # PDF Kitap İndir
|
||||
-1002066019978: "E-Kitap", # PDF Kitap Yurdu
|
||||
-1002233958112: "E-Kitap", # PDF Kitaplar pdfstok
|
||||
-1002739085389: "E-Kitap", # Telegram Kitap Grupları
|
||||
-1003450748883: "E-Kitap", # KÜTÜPHANE
|
||||
|
||||
# --- Sesli & Manga ---
|
||||
-1001651817526: "E-Kitap" , # RiF Новеллы, ранобэ и фф
|
||||
-1001559096136: "E-Kitap" , # Sesli Kitap
|
||||
-1002267174397: "E-Kitap" , # Aaron Arşiv
|
||||
-1001851524017: "E-Kitap" , # Hentai TV
|
||||
-1003037921710: "E-Kitap" , # Sesli Kitap Storytel
|
||||
-1003026138059: "E-Kitap" , # Sesli Kitap Dinlio
|
||||
-1003106544769: "E-Kitap" , # SESLİ KİTAP EDEBİYAT
|
||||
-1003483217842: "E-Kitap" , # MANGA KİTAPLARI
|
||||
-1002269816836: "E-Kitap" , # Anime Maniaxx
|
||||
-1003179794694: "E-Kitap" , # Dergi PDF Arşivi
|
||||
-1001519763115: "E-Kitap" , # Sesli Kitap Dinle
|
||||
-1003417275151: "E-Kitap" , # ÇİZGİ ROMAN KİTAPLARI
|
||||
|
||||
# --- KPSS & YKS ---
|
||||
-1002788272998: "E-Kitap" , # AGS KPSS PDF
|
||||
-1002967115062: "E-Kitap" , # YDS YÖKDİL
|
||||
-1002335523660: "E-Kitap" , # KPSS YKS KİTAP PDF
|
||||
-1002164684267: "E-Kitap" , # Yks PDF AYT TYT
|
||||
-1003029282639: "E-Kitap" , # KİTAP PDF YKS KPSS
|
||||
-1003332920930: "E-Kitap" , # SINAV KAYNAKLARI
|
||||
|
||||
# --- Dil & Kurs ---
|
||||
-1001279165634: "Dil & Kurs", # Udemy Courses Free
|
||||
-1001498152897: "Dil & Kurs", # Eduonix Courses Free
|
||||
-1001005463014: "Dil & Kurs", # PacktPub Free Learning
|
||||
-1001044241441: "Dil & Kurs", # Books Mania (grammar)
|
||||
-1002973548671: "Dil & Kurs", # RUSÇA ÖĞREN SOHBET
|
||||
-1001541869122: "Dil & Kurs", # I speak russian
|
||||
-1001205656183: "Dil & Kurs", # Russian Microlearning
|
||||
-1001262177780: "Dil & Kurs", # Russian With Max
|
||||
-1002374924223: "Dil & Kurs", # LLama Russian Study
|
||||
-1001475363663: "Dil & Kurs", # LEARN SWAHILI
|
||||
-1001654101128: "Dil & Kurs", # Russian for lunch
|
||||
-1001912229645: "Dil & Kurs", # Russian home
|
||||
-1001933331449: "Dil & Kurs", # Study Russian
|
||||
-1002647054427: "Dil & Kurs", # Tutorial for new joiners
|
||||
-1003023929968: "Dil & Kurs", # RUSÇA ÖĞRENİYORUM 2025
|
||||
-1001159423770: "Dil & Kurs", # English Books Magazines Novels
|
||||
-4509421355: "Dil & Kurs", # Russian LLama
|
||||
-1001612802963: "Dil & Kurs", # Vitabu vya Kiislamu (Swahili)
|
||||
-1001807530830: "Dil & Kurs", # Ankara Rus Evi
|
||||
|
||||
# --- Sosyal & Diğer ---
|
||||
-1001379307100: "Sosyal" , # Ламповая беседка
|
||||
-1001887551302: "Sosyal" , # ТРОЕТОЧИЕ
|
||||
-1001492338580: "Sosyal" , # Sirius poets
|
||||
-1001751338081: "Sosyal" , # Geometric Telegramssion
|
||||
-1001760743689: "Sosyal" , # Квартал красных фонарей
|
||||
-1001714372021: "Sosyal" , # Kaktüs v2.0
|
||||
-1002464236122: "Sosyal" , # Malvinkin Twitch
|
||||
-1001865528673: "Sosyal" , # Fiftnmls
|
||||
-1002466936546: "Finans" , # İnfo Yatırım Hisse
|
||||
-1001961199646: "Sosyal" , # аничух (twitch)
|
||||
-1003321701261: "Finans" , # ADRENALİN TRADE
|
||||
-1001613153861: "Sosyal" , # FOŞİX ERLİK
|
||||
-1001495437712: "Sosyal" , # Erlik Video Deposu
|
||||
-1001363595671: "Finans" , # Advo
|
||||
-1001874359773: "Finans" , # Udemy Türkçe (stock tips)
|
||||
-1001591294939: "Finans" , # Hazine-i BORSA
|
||||
-4789484210: "Finans" , # Trade
|
||||
-1002517056894: "Sosyal" , # Барахолка Москва
|
||||
-1003222134628: "Sosyal" , # GAME CHILL
|
||||
-1001476005114: "Finans" , # cBank
|
||||
-901188134: "Sosyal" , # CHP GENÇLİK
|
||||
-4543354861: "Sosyal" , # Atatürkçüler Birliği
|
||||
-693237968: "Sosyal" , # İzmir Kavram
|
||||
-525645675: "Sosyal" , # GB - Jeoloji
|
||||
-567627579: "Sosyal" , # GB - Psikoloji
|
||||
-562687596: "Sosyal" , # GB - tarih
|
||||
-537589148: "Sosyal" , # GB - tıp
|
||||
-541262586: "Sosyal" , # GB - kimya
|
||||
-516377683: "Sosyal" , # GB - biyoloji
|
||||
-500211559: "Sosyal" , # GB - mühendislik
|
||||
}
|
||||
|
||||
|
||||
def main() -> None:
|
||||
here = Path(__file__).parent
|
||||
channels = json.loads((here / "data" / "channels.json").read_text(encoding="utf-8"))
|
||||
|
||||
channel_ids = {c["id"] for c in channels}
|
||||
assigned_ids = set(A.keys())
|
||||
|
||||
missing = channel_ids - assigned_ids
|
||||
extra = assigned_ids - channel_ids
|
||||
if missing:
|
||||
print("⚠ assignment eksik:")
|
||||
for mid in missing:
|
||||
name = next((c["name"] for c in channels if c["id"] == mid), "?")
|
||||
print(f" {mid} {name!r}")
|
||||
if extra:
|
||||
print("⚠ assignment'da fazladan ID var:", extra)
|
||||
|
||||
assignments_str = {str(k): v for k, v in A.items()}
|
||||
counts: dict[str, int] = {}
|
||||
for v in A.values():
|
||||
counts[v] = counts.get(v, 0) + 1
|
||||
|
||||
out = here / "data" / "assignments.json"
|
||||
out.write_text(
|
||||
json.dumps(
|
||||
{"folders": FOLDERS, "assignments": assignments_str},
|
||||
ensure_ascii=False, indent=2,
|
||||
),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
print(f"\n✓ {len(A)} atama → {out}")
|
||||
for f in FOLDERS:
|
||||
n = counts.get(f["title"], 0)
|
||||
print(f" {f['emoticon']} {f['title']:<22} {n:>3}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
35
personas/_shared/skills/telegram/scripts/categorize.py
Normal file
35
personas/_shared/skills/telegram/scripts/categorize.py
Normal file
@@ -0,0 +1,35 @@
|
||||
"""
|
||||
ADIM 2 — Sınıflandırma sonucunu (ID → klasör) data/assignments.json'dan okur.
|
||||
|
||||
assignments.json formatı:
|
||||
{
|
||||
"folders": ["Klasör1", "Klasör2", ...], # tam 10 tane
|
||||
"assignments": { "<channel_id>": "Klasör1", ... }
|
||||
}
|
||||
|
||||
Bu dosyayı Claude (ben) data/channels.json'ı analiz edip üretir.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
_ASSIGN_FILE = Path(__file__).parent / "data" / "assignments.json"
|
||||
_cache: dict | None = None
|
||||
|
||||
|
||||
def _load() -> dict:
|
||||
global _cache
|
||||
if _cache is None:
|
||||
if not _ASSIGN_FILE.exists():
|
||||
raise FileNotFoundError(
|
||||
f"{_ASSIGN_FILE} yok. Önce data/channels.json üretilmeli, "
|
||||
"sonra Claude assignments.json'u yazacak."
|
||||
)
|
||||
_cache = json.loads(_ASSIGN_FILE.read_text(encoding="utf-8"))
|
||||
return _cache
|
||||
|
||||
|
||||
def categorize(channel: dict) -> str | None:
|
||||
data = _load()
|
||||
return data["assignments"].get(str(channel["id"]))
|
||||
9
personas/_shared/skills/telegram/scripts/config.py
Normal file
9
personas/_shared/skills/telegram/scripts/config.py
Normal file
@@ -0,0 +1,9 @@
|
||||
import re
|
||||
from pathlib import Path
|
||||
|
||||
_API_TXT = Path(__file__).parent / "api.txt"
|
||||
_text = _API_TXT.read_text(encoding="utf-8")
|
||||
|
||||
API_ID = int(re.search(r"api_id:\s*\n?\s*(\d+)", _text).group(1))
|
||||
API_HASH = re.search(r"api_hash:\s*\n?\s*([a-f0-9]+)", _text).group(1)
|
||||
SESSION_NAME = str(Path(__file__).parent / "telegram_session")
|
||||
63
personas/_shared/skills/telegram/scripts/fetch_all.py
Normal file
63
personas/_shared/skills/telegram/scripts/fetch_all.py
Normal file
@@ -0,0 +1,63 @@
|
||||
"""
|
||||
ADIM 1 — Her grup/kanal + son mesajları çek, data/channels.json'a kaydet.
|
||||
|
||||
Arşivli olanlar da dahil (iter_dialogs(archived=None) hem normal hem arşivli getirir).
|
||||
"""
|
||||
import asyncio
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
from telethon import TelegramClient
|
||||
|
||||
from config import API_ID, API_HASH, SESSION_NAME
|
||||
|
||||
DATA_DIR = Path(__file__).parent / "data"
|
||||
OUTPUT = DATA_DIR / "channels.json"
|
||||
|
||||
MESSAGE_SAMPLE = 40 # her kanaldan kaç mesaj
|
||||
MESSAGE_CHAR_LIMIT = 600 # her mesaj max uzunluk
|
||||
|
||||
|
||||
async def main() -> None:
|
||||
DATA_DIR.mkdir(exist_ok=True)
|
||||
|
||||
async with TelegramClient(SESSION_NAME, API_ID, API_HASH) as client:
|
||||
me = await client.get_me()
|
||||
print(f"Bağlandı: @{me.username or me.first_name}\n")
|
||||
|
||||
results: list[dict] = []
|
||||
async for d in client.iter_dialogs(archived=None):
|
||||
if not (d.is_group or d.is_channel):
|
||||
continue
|
||||
|
||||
idx = len(results) + 1
|
||||
print(f"[{idx:>3}] {d.name} (arşiv={bool(d.archived)})")
|
||||
|
||||
messages: list[str] = []
|
||||
try:
|
||||
async for msg in client.iter_messages(d.entity, limit=MESSAGE_SAMPLE):
|
||||
text = (msg.message or "").strip()
|
||||
if text:
|
||||
messages.append(text[:MESSAGE_CHAR_LIMIT])
|
||||
except Exception as e:
|
||||
print(f" ! mesaj çekilemedi: {e}")
|
||||
|
||||
results.append({
|
||||
"id": d.id,
|
||||
"name": d.name or "",
|
||||
"type": "channel" if (d.is_channel and not d.is_group) else "group",
|
||||
"is_broadcast": bool(getattr(d.entity, "broadcast", False)),
|
||||
"archived": bool(d.archived),
|
||||
"unread_count": d.unread_count,
|
||||
"messages": messages,
|
||||
})
|
||||
|
||||
OUTPUT.write_text(
|
||||
json.dumps(results, ensure_ascii=False, indent=2),
|
||||
encoding="utf-8",
|
||||
)
|
||||
print(f"\n✓ {len(results)} sohbet kaydedildi -> {OUTPUT}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
@@ -0,0 +1 @@
|
||||
telethon>=1.43.1
|
||||
77
personas/_shared/skills/telegram/scripts/tg_inbox.py
Normal file
77
personas/_shared/skills/telegram/scripts/tg_inbox.py
Normal file
@@ -0,0 +1,77 @@
|
||||
"""Show unread chats — your real inbox view.
|
||||
|
||||
Usage:
|
||||
python tg_inbox.py # all unread, sorted by count desc
|
||||
python tg_inbox.py --top 20
|
||||
python tg_inbox.py --include-archived
|
||||
python tg_inbox.py --mark-read "Born2bero" # zero-out a specific chat
|
||||
python tg_inbox.py --mark-all-read --yes # nuke ALL unread (destructive)
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
|
||||
from telethon import TelegramClient
|
||||
|
||||
from config import API_HASH, API_ID, SESSION_NAME
|
||||
from tg_utils import confirm, resolve_chat
|
||||
|
||||
|
||||
async def main() -> None:
|
||||
p = argparse.ArgumentParser(description=__doc__.splitlines()[0])
|
||||
p.add_argument("--top", type=int, default=0, help="show only top N (default: all)")
|
||||
p.add_argument("--include-archived", action="store_true",
|
||||
help="include archived dialogs (default: only normal)")
|
||||
p.add_argument("--mark-read", help="mark this specific chat as read")
|
||||
p.add_argument("--mark-all-read", action="store_true",
|
||||
help="mark every unread chat as read (DESTRUCTIVE)")
|
||||
p.add_argument("--yes", "-y", action="store_true", help="skip confirmation")
|
||||
args = p.parse_args()
|
||||
|
||||
archived = None if args.include_archived else False
|
||||
|
||||
async with TelegramClient(SESSION_NAME, API_ID, API_HASH) as client:
|
||||
# Single-target mark-read
|
||||
if args.mark_read:
|
||||
entity = await resolve_chat(client, args.mark_read)
|
||||
await client.send_read_acknowledge(entity)
|
||||
title = getattr(entity, "title", None) or getattr(entity, "username", None) or str(entity.id)
|
||||
print(f"✓ {title} marked as read")
|
||||
return
|
||||
|
||||
# Collect unread
|
||||
unread: list[tuple[int, str, int, bool]] = []
|
||||
async for d in client.iter_dialogs(archived=archived):
|
||||
if d.unread_count > 0:
|
||||
kind = "channel" if (d.is_channel and not d.is_group) else (
|
||||
"group" if d.is_group else "user")
|
||||
unread.append((d.unread_count, d.name or str(d.id), d.id, d.is_channel))
|
||||
# store original Dialog for later mark-read pass
|
||||
unread[-1] = (d.unread_count, d.name or str(d.id), d.id, kind)
|
||||
|
||||
unread.sort(reverse=True)
|
||||
if args.top:
|
||||
unread = unread[:args.top]
|
||||
|
||||
total = sum(n for n, *_ in unread)
|
||||
print(f"# {len(unread)} unread chats — {total} unread messages\n")
|
||||
for n, name, cid, kind in unread:
|
||||
print(f" {n:>5} [{kind:<7}] {name} (id={cid})")
|
||||
|
||||
# Mark-all-read
|
||||
if args.mark_all_read:
|
||||
print()
|
||||
if not args.yes and not confirm(f"Mark ALL {len(unread)} chats as read?"):
|
||||
print("iptal.")
|
||||
return
|
||||
for _, name, cid, _ in unread:
|
||||
try:
|
||||
await client.send_read_acknowledge(await client.get_input_entity(cid))
|
||||
print(f" ✓ {name}")
|
||||
except Exception as e:
|
||||
print(f" ! {name}: {e}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
71
personas/_shared/skills/telegram/scripts/tg_read.py
Normal file
71
personas/_shared/skills/telegram/scripts/tg_read.py
Normal file
@@ -0,0 +1,71 @@
|
||||
"""Read messages from a Telegram chat.
|
||||
|
||||
Usage:
|
||||
python tg_read.py "@username"
|
||||
python tg_read.py "Born2beroot" --limit 50
|
||||
python tg_read.py -1001182095274 --since 2026-04-01
|
||||
python tg_read.py "@durov" --limit 5 --json
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import json
|
||||
|
||||
from telethon import TelegramClient
|
||||
|
||||
from config import API_HASH, API_ID, SESSION_NAME
|
||||
from tg_utils import fmt_msg, parse_date, resolve_chat
|
||||
|
||||
|
||||
async def main() -> None:
|
||||
p = argparse.ArgumentParser(description=__doc__.splitlines()[0])
|
||||
p.add_argument("chat", help="@username, numeric id, or name substring")
|
||||
p.add_argument("--limit", "-n", type=int, default=20, help="max messages (default 20)")
|
||||
p.add_argument("--since", help="YYYY-MM-DD; only newer than this date")
|
||||
p.add_argument("--search", "-s", help="filter to messages containing this text")
|
||||
p.add_argument("--json", action="store_true", help="emit JSON instead of table")
|
||||
p.add_argument("--mark-read", action="store_true", help="mark fetched messages as read")
|
||||
args = p.parse_args()
|
||||
|
||||
offset_date = parse_date(args.since) if args.since else None
|
||||
|
||||
async with TelegramClient(SESSION_NAME, API_ID, API_HASH) as client:
|
||||
entity = await resolve_chat(client, args.chat)
|
||||
title = getattr(entity, "title", None) or getattr(entity, "username", None) or str(entity.id)
|
||||
|
||||
if not args.json:
|
||||
print(f"# {title} (id={entity.id})\n")
|
||||
|
||||
kwargs = {"limit": args.limit}
|
||||
if offset_date:
|
||||
kwargs["reverse"] = True
|
||||
kwargs["offset_date"] = offset_date
|
||||
if args.search:
|
||||
kwargs["search"] = args.search
|
||||
|
||||
rows = []
|
||||
async for msg in client.iter_messages(entity, **kwargs):
|
||||
if args.json:
|
||||
rows.append({
|
||||
"id": msg.id,
|
||||
"date": msg.date.isoformat(),
|
||||
"sender_id": msg.sender_id,
|
||||
"text": msg.message or "",
|
||||
"has_media": msg.media is not None,
|
||||
"reply_to": msg.reply_to_msg_id,
|
||||
})
|
||||
else:
|
||||
print(fmt_msg(msg))
|
||||
|
||||
if args.json:
|
||||
print(json.dumps(rows, ensure_ascii=False, indent=2))
|
||||
|
||||
if args.mark_read:
|
||||
await client.send_read_acknowledge(entity)
|
||||
if not args.json:
|
||||
print(f"\n✓ {title} marked as read")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
72
personas/_shared/skills/telegram/scripts/tg_search.py
Normal file
72
personas/_shared/skills/telegram/scripts/tg_search.py
Normal file
@@ -0,0 +1,72 @@
|
||||
"""Search messages — globally or scoped to one chat.
|
||||
|
||||
Usage:
|
||||
python tg_search.py "CVE-2024" # global, last 50 hits
|
||||
python tg_search.py "kitap" --chat "E Kitap" # scoped to one chat
|
||||
python tg_search.py "Putin" --since 2026-04-01 --limit 100
|
||||
python tg_search.py "report" --chat me # only Saved Messages
|
||||
|
||||
Global search uses Telegram's server-side message index (telethon
|
||||
client.iter_messages(None, search=...)).
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
|
||||
from telethon import TelegramClient
|
||||
|
||||
from config import API_HASH, API_ID, SESSION_NAME
|
||||
from tg_utils import fmt_msg, parse_date, resolve_chat
|
||||
|
||||
|
||||
async def main() -> None:
|
||||
p = argparse.ArgumentParser(description=__doc__.splitlines()[0])
|
||||
p.add_argument("query", help="search text")
|
||||
p.add_argument("--chat", help="restrict to this chat (@user/id/name/me)")
|
||||
p.add_argument("--limit", "-n", type=int, default=50)
|
||||
p.add_argument("--since", help="YYYY-MM-DD lower bound")
|
||||
args = p.parse_args()
|
||||
|
||||
offset_date = parse_date(args.since) if args.since else None
|
||||
|
||||
async with TelegramClient(SESSION_NAME, API_ID, API_HASH) as client:
|
||||
if args.chat:
|
||||
entity = await resolve_chat(client, "me" if args.chat in {"me", "self"} else args.chat)
|
||||
else:
|
||||
entity = None
|
||||
|
||||
kwargs = {"search": args.query, "limit": args.limit}
|
||||
if offset_date:
|
||||
kwargs["reverse"] = True
|
||||
kwargs["offset_date"] = offset_date
|
||||
|
||||
# Cache chat titles to annotate global hits.
|
||||
chat_titles: dict[int, str] = {}
|
||||
|
||||
async def title_for(chat_id: int) -> str:
|
||||
if chat_id in chat_titles:
|
||||
return chat_titles[chat_id]
|
||||
try:
|
||||
e = await client.get_entity(chat_id)
|
||||
t = getattr(e, "title", None) or getattr(e, "username", None) or str(chat_id)
|
||||
except Exception:
|
||||
t = str(chat_id)
|
||||
chat_titles[chat_id] = t
|
||||
return t
|
||||
|
||||
count = 0
|
||||
async for msg in client.iter_messages(entity, **kwargs):
|
||||
count += 1
|
||||
if entity is None:
|
||||
where = await title_for(msg.peer_id.channel_id) if hasattr(msg.peer_id, "channel_id") \
|
||||
else await title_for(getattr(msg.peer_id, "user_id", 0) or getattr(msg.peer_id, "chat_id", 0))
|
||||
print(f"[{where[:25]:<25}] {fmt_msg(msg)}")
|
||||
else:
|
||||
print(fmt_msg(msg))
|
||||
|
||||
print(f"\n{count} hit(s)")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
90
personas/_shared/skills/telegram/scripts/tg_send.py
Normal file
90
personas/_shared/skills/telegram/scripts/tg_send.py
Normal file
@@ -0,0 +1,90 @@
|
||||
"""Send a message (text and/or file) to a Telegram chat.
|
||||
|
||||
Usage:
|
||||
python tg_send.py "@username" "Hello"
|
||||
python tg_send.py "Born2beroot" "Check this" --file report.pdf
|
||||
python tg_send.py "@chan" "" --file image.png --caption "screenshot"
|
||||
python tg_send.py "Saved Messages" "note to self" --yes
|
||||
python tg_send.py "@x" "Reply" --reply-to 12345
|
||||
python tg_send.py "@x" "Quiet ping" --silent
|
||||
|
||||
Defaults to dry-run preview + interactive [y/N] confirm. --yes skips it.
|
||||
Saved Messages is resolvable as "me" or by your own username.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
from telethon import TelegramClient
|
||||
|
||||
from config import API_HASH, API_ID, SESSION_NAME
|
||||
from tg_utils import confirm, resolve_chat
|
||||
|
||||
|
||||
async def main() -> None:
|
||||
p = argparse.ArgumentParser(description=__doc__.splitlines()[0])
|
||||
p.add_argument("chat", help='@username, id, name; "me" for Saved Messages')
|
||||
p.add_argument("text", help="message text (use '' if only sending a file)")
|
||||
p.add_argument("--file", "-f", help="path to file/image to attach")
|
||||
p.add_argument("--caption", help="caption for the file (overrides text if --file given)")
|
||||
p.add_argument("--reply-to", type=int, help="message id to reply to")
|
||||
p.add_argument("--silent", action="store_true", help="send without notification")
|
||||
p.add_argument("--parse", choices=["md", "html", "none"], default="md",
|
||||
help="text parse mode (default: md)")
|
||||
p.add_argument("--yes", "-y", action="store_true", help="skip confirmation")
|
||||
args = p.parse_args()
|
||||
|
||||
if not args.text and not args.file:
|
||||
sys.exit("nothing to send: provide text and/or --file")
|
||||
|
||||
if args.file and not Path(args.file).exists():
|
||||
sys.exit(f"file not found: {args.file}")
|
||||
|
||||
parse_mode = None if args.parse == "none" else args.parse
|
||||
|
||||
async with TelegramClient(SESSION_NAME, API_ID, API_HASH) as client:
|
||||
entity = await resolve_chat(client, "me" if args.chat in {"me", "self"} else args.chat)
|
||||
title = getattr(entity, "title", None) or getattr(entity, "username", None) or str(entity.id)
|
||||
|
||||
print(f"→ to: {title} (id={entity.id})")
|
||||
if args.file:
|
||||
print(f"→ file: {args.file}")
|
||||
print(f"→ cap: {(args.caption or args.text)[:120]}")
|
||||
else:
|
||||
preview = args.text if len(args.text) < 200 else args.text[:200] + "…"
|
||||
print(f"→ text: {preview}")
|
||||
if args.reply_to:
|
||||
print(f"→ reply: msg #{args.reply_to}")
|
||||
if args.silent:
|
||||
print("→ silent: yes")
|
||||
|
||||
if not args.yes and not confirm("Gönder?"):
|
||||
print("iptal.")
|
||||
return
|
||||
|
||||
if args.file:
|
||||
sent = await client.send_file(
|
||||
entity,
|
||||
args.file,
|
||||
caption=args.caption or args.text or None,
|
||||
reply_to=args.reply_to,
|
||||
silent=args.silent,
|
||||
parse_mode=parse_mode,
|
||||
)
|
||||
else:
|
||||
sent = await client.send_message(
|
||||
entity,
|
||||
args.text,
|
||||
reply_to=args.reply_to,
|
||||
silent=args.silent,
|
||||
parse_mode=parse_mode,
|
||||
)
|
||||
|
||||
print(f"✓ sent msg id={sent.id}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
100
personas/_shared/skills/telegram/scripts/tg_utils.py
Normal file
100
personas/_shared/skills/telegram/scripts/tg_utils.py
Normal file
@@ -0,0 +1,100 @@
|
||||
"""Shared helpers for the tg_* CLI scripts (read/send/search/inbox)."""
|
||||
from __future__ import annotations
|
||||
|
||||
import sys
|
||||
from datetime import datetime, timezone
|
||||
from typing import Iterable
|
||||
|
||||
from telethon import TelegramClient
|
||||
|
||||
|
||||
async def resolve_chat(client: TelegramClient, ref: str):
|
||||
"""Resolve a chat reference to a Telethon entity.
|
||||
|
||||
Accepts:
|
||||
- "@username" or "username" (with leading '+' for invite phone-number)
|
||||
- numeric id (positive or negative; large negative for supergroups)
|
||||
- case-insensitive name substring; errors if 0 or >1 matches
|
||||
"""
|
||||
ref = ref.strip()
|
||||
|
||||
if ref.startswith("@") or ref.startswith("+"):
|
||||
return await client.get_entity(ref)
|
||||
|
||||
try:
|
||||
return await client.get_entity(int(ref))
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
needle = ref.lower()
|
||||
matches = []
|
||||
async for d in client.iter_dialogs(archived=None):
|
||||
if needle in (d.name or "").lower():
|
||||
matches.append(d)
|
||||
|
||||
if not matches:
|
||||
sys.exit(f"chat not found: {ref!r}")
|
||||
if len(matches) > 1:
|
||||
preview = "\n ".join(f"{d.id:>15} {d.name}" for d in matches[:10])
|
||||
more = "" if len(matches) <= 10 else f"\n ... +{len(matches)-10} more"
|
||||
sys.exit(
|
||||
f"ambiguous chat {ref!r} ({len(matches)} matches):\n "
|
||||
f"{preview}{more}\nuse the numeric id or @username"
|
||||
)
|
||||
return matches[0].entity
|
||||
|
||||
|
||||
def parse_date(s: str) -> datetime:
|
||||
"""Parse YYYY-MM-DD or full ISO into UTC-aware datetime."""
|
||||
if "T" in s or " " in s:
|
||||
dt = datetime.fromisoformat(s.replace("Z", "+00:00"))
|
||||
else:
|
||||
dt = datetime.strptime(s, "%Y-%m-%d")
|
||||
if dt.tzinfo is None:
|
||||
dt = dt.replace(tzinfo=timezone.utc)
|
||||
return dt
|
||||
|
||||
|
||||
def fmt_msg(msg, max_chars: int = 200) -> str:
|
||||
"""Compact one-line representation of a Telethon Message."""
|
||||
sender = ""
|
||||
if getattr(msg, "sender", None) is not None:
|
||||
sender = (
|
||||
getattr(msg.sender, "username", None)
|
||||
or getattr(msg.sender, "first_name", None)
|
||||
or getattr(msg.sender, "title", None)
|
||||
or str(msg.sender_id)
|
||||
)
|
||||
elif msg.sender_id:
|
||||
sender = str(msg.sender_id)
|
||||
|
||||
text = (msg.message or "").replace("\n", " ⏎ ")
|
||||
if len(text) > max_chars:
|
||||
text = text[: max_chars - 1] + "…"
|
||||
media = ""
|
||||
if msg.media and not msg.message:
|
||||
media = f" [media:{type(msg.media).__name__}]"
|
||||
return f"{msg.id:>9} │ {msg.date.strftime('%Y-%m-%d %H:%M')} │ {sender[:20]:<20} │ {text}{media}"
|
||||
|
||||
|
||||
def confirm(prompt: str = "Onayla", default: bool = False) -> bool:
|
||||
"""Interactive y/N. default=False → [y/N], default=True → [Y/n]."""
|
||||
suffix = " [Y/n]: " if default else " [y/N]: "
|
||||
try:
|
||||
r = input(prompt + suffix).strip().lower()
|
||||
except EOFError:
|
||||
return default
|
||||
if not r:
|
||||
return default
|
||||
return r in ("y", "yes", "evet", "e", "ok")
|
||||
|
||||
|
||||
def chunked(items: Iterable, size: int):
|
||||
buf = []
|
||||
for x in items:
|
||||
buf.append(x)
|
||||
if len(buf) == size:
|
||||
yield buf
|
||||
buf = []
|
||||
if buf:
|
||||
yield buf
|
||||
Reference in New Issue
Block a user