fix: ensure LLM stats tracking is accurate by including completed subagents (#441 )

Add Strix GitHub Actions integration tip
feat: Migrate from Poetry to uv (#379 )
2026-04-13 00:09:13 -04:00 · 2026-04-12 12:43:41 -07:00 · 2026-03-31 17:20:41 -07:00 · 2026-03-31 11:53:49 -07:00 · 2026-03-22 22:10:17 -07:00 · 2026-03-22 22:08:20 -07:00
188 changed files with 21334 additions and 13606 deletions
--- a/.github/ISSUE_TEMPLATE/bug_report.md
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -27,7 +27,7 @@ If applicable, add screenshots to help explain your problem.
 - OS: [e.g. Ubuntu 22.04]
 - Strix Version or Commit: [e.g. 0.1.18]
 - Python Version: [e.g. 3.12]
- LLM Used: [e.g. GPT-5, Claude Sonnet 4]
+- LLM Used: [e.g. GPT-5, Claude Sonnet 4.6]
 **Additional context**
 Add any other context about the problem here.
--- a/.github/screenshot.png
+++ b/.github/screenshot.png
--- a/.github/workflows/build-release.yml
+++ b/.github/workflows/build-release.yml
@@ -30,15 +30,15 @@ jobs:
        with:
          python-version: '3.12'
-      - uses: snok/install-poetry@v1
+      - uses: astral-sh/setup-uv@v5
      - name: Build
        shell: bash
        run: |
-          poetry install --with dev
+          uv sync --frozen
-          poetry run pyinstaller strix.spec --noconfirm
+          uv run pyinstaller strix.spec --noconfirm
-          VERSION=$(poetry version -s)
+          VERSION=$(grep '^version' pyproject.toml | head -1 | sed 's/.*"\(.*\)"/\1/')
          mkdir -p dist/release
          if [[ "${{ runner.os }}" == "Windows" ]]; then
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -31,6 +31,7 @@ repos:
      - id: check-toml
      - id: check-merge-conflict
      - id: check-added-large-files
        args: ['--maxkb=1024']
      - id: debug-statements
      - id: check-case-conflict
      - id: check-docstring-first
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -8,7 +8,7 @@ Thank you for your interest in contributing to Strix! This guide will help you g
 - Python 3.12+
 - Docker (running)
- Poetry (for dependency management)
+- [uv](https://docs.astral.sh/uv/) (for dependency management)
 - Git
 ### Local Development
@@ -24,19 +24,19 @@ Thank you for your interest in contributing to Strix! This guide will help you g
   make setup-dev
   # or manually:
-   poetry install --with=dev
+   uv sync
-   poetry run pre-commit install
+   uv run pre-commit install
   ```
 3. **Configure your LLM provider**
   ```bash
-   export STRIX_LLM="openai/gpt-5"
+   export STRIX_LLM="openai/gpt-5.4"
   export LLM_API_KEY="your-api-key"
   ```
 4. **Run Strix in development mode**
   ```bash
-   poetry run strix --target https://example.com
+   uv run strix --target https://example.com
   ```
 ## 📚 Contributing Skills
@@ -46,7 +46,7 @@ Skills are specialized knowledge packages that enhance agent capabilities. See [
 ### Quick Guide
 1. **Choose the right category** (`/vulnerabilities`, `/frameworks`, `/technologies`, etc.)
-2. **Create a** `.jinja` file with your skill content
+2. **Create a** `.md` file with your skill content
 3. **Include practical examples** - Working payloads, commands, or test cases
 4. **Provide validation methods** - How to confirm findings and avoid false positives
 5. **Submit via PR** with clear description
@@ -101,7 +101,7 @@ We welcome feature ideas! Please:
 ## 🤝 Community
- **Discord**: [Join our community](https://discord.gg/YjKFvEZSdZ)
+- **Discord**: [Join our community](https://discord.gg/strix-ai)
 - **Issues**: [GitHub Issues](https://github.com/usestrix/strix/issues)
 ## ✨ Recognition
@@ -113,4 +113,4 @@ We value all contributions! Contributors will be:
 ---
-**Questions?** Reach out on [Discord](https://discord.gg/YjKFvEZSdZ) or create an issue. We're here to help!
+**Questions?** Reach out on [Discord](https://discord.gg/strix-ai) or create an issue. We're here to help!
--- a/24
+++ b/24
@@ -22,38 +22,38 @@ help:
 	@echo "  clean         - Clean up cache files and artifacts"
 install:
-	poetry install --only=main
+	uv sync --no-dev
 dev-install:
-	poetry install --with=dev
+	uv sync
 setup-dev: dev-install
-	poetry run pre-commit install
+	uv run pre-commit install
 	@echo "✅ Development environment setup complete!"
 	@echo "Run 'make check-all' to verify everything works correctly."
 format:
 	@echo "🎨 Formatting code with ruff..."
-	poetry run ruff format .
+	uv run ruff format .
 	@echo "✅ Code formatting complete!"
 lint:
 	@echo "🔍 Linting code with ruff..."
-	poetry run ruff check . --fix
+	uv run ruff check . --fix
 	@echo "📝 Running additional linting with pylint..."
-	poetry run pylint strix/ --score=no --reports=no
+	uv run pylint strix/ --score=no --reports=no
 	@echo "✅ Linting complete!"
 type-check:
 	@echo "🔍 Type checking with mypy..."
-	poetry run mypy strix/
+	uv run mypy strix/
 	@echo "🔍 Type checking with pyright..."
-	poetry run pyright strix/
+	uv run pyright strix/
 	@echo "✅ Type checking complete!"
 security:
 	@echo "🔒 Running security checks with bandit..."
-	poetry run bandit -r strix/ -c pyproject.toml
+	uv run bandit -r strix/ -c pyproject.toml
 	@echo "✅ Security checks complete!"
 check-all: format lint type-check security
@@ -61,18 +61,18 @@ check-all: format lint type-check security
 test:
 	@echo "🧪 Running tests..."
-	poetry run pytest -v
+	uv run pytest -v
 	@echo "✅ Tests complete!"
 test-cov:
 	@echo "🧪 Running tests with coverage..."
-	poetry run pytest -v --cov=strix --cov-report=term-missing --cov-report=html
+	uv run pytest -v --cov=strix --cov-report=term-missing --cov-report=html
 	@echo "✅ Tests with coverage complete!"
 	@echo "📊 Coverage report generated in htmlcov/"
 pre-commit:
 	@echo "🔧 Running pre-commit hooks..."
-	poetry run pre-commit run --all-files
+	uv run pre-commit run --all-files
 	@echo "✅ Pre-commit hooks complete!"
 clean:
--- a/README.md
+++ b/README.md
@@ -1,71 +1,78 @@
 <p align="center">
  <a href="https://strix.ai/">
-    <img src=".github/logo.png" width="150" alt="Strix Logo">
+    <img src="https://github.com/usestrix/.github/raw/main/imgs/cover.png" alt="Strix Banner" width="100%">
  </a>
 </p>
 <h1 align="center">Strix</h1>
 <h2 align="center">Open-source AI Hackers to secure your Apps</h2>
 <div align="center">
-[![Python](https://img.shields.io/pypi/pyversions/strix-agent?color=3776AB)](https://pypi.org/project/strix-agent/)
+# Strix
 [![PyPI](https://img.shields.io/pypi/v/strix-agent?color=10b981)](https://pypi.org/project/strix-agent/)
 [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)
 [![Docs](https://img.shields.io/badge/Docs-docs.strix.ai-10b981.svg)](https://docs.strix.ai)
-[![GitHub Stars](https://img.shields.io/github/stars/usestrix/strix)](https://github.com/usestrix/strix)
+### Open-source AI hackers to find and fix your app’s vulnerabilities.
 [![Discord](https://img.shields.io/badge/Discord-%235865F2.svg?&logo=discord&logoColor=white)](https://discord.gg/YjKFvEZSdZ)
 [![Website](https://img.shields.io/badge/Website-strix.ai-2d3748.svg)](https://strix.ai)
-<a href="https://trendshift.io/repositories/15362" target="_blank"><img src="https://trendshift.io/api/badge/repositories/15362" alt="usestrix%2Fstrix | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
+<br/>
-[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/usestrix/strix)
+<a href="https://docs.strix.ai"><img src="https://img.shields.io/badge/Docs-docs.strix.ai-2b9246?style=for-the-badge&logo=gitbook&logoColor=white" alt="Docs"></a>
 <a href="https://strix.ai"><img src="https://img.shields.io/badge/Website-strix.ai-f0f0f0?style=for-the-badge&logoColor=000000" alt="Website"></a>
 [![](https://dcbadge.limes.pink/api/server/strix-ai)](https://discord.gg/strix-ai)
 <a href="https://deepwiki.com/usestrix/strix"><img src="https://deepwiki.com/badge.svg" alt="Ask DeepWiki"></a>
 <a href="https://github.com/usestrix/strix"><img src="https://img.shields.io/github/stars/usestrix/strix?style=flat-square" alt="GitHub Stars"></a>
 <a href="LICENSE"><img src="https://img.shields.io/badge/License-Apache%202.0-3b82f6?style=flat-square" alt="License"></a>
 <a href="https://pypi.org/project/strix-agent/"><img src="https://img.shields.io/pypi/v/strix-agent?style=flat-square" alt="PyPI Version"></a>
 <a href="https://discord.gg/strix-ai"><img src="https://github.com/usestrix/.github/raw/main/imgs/Discord.png" height="40" alt="Join Discord"></a>
 <a href="https://x.com/strix_ai"><img src="https://github.com/usestrix/.github/raw/main/imgs/X.png" height="40" alt="Follow on X"></a>
 <a href="https://trendshift.io/repositories/15362" target="_blank"><img src="https://trendshift.io/api/badge/repositories/15362" alt="usestrix/strix | Trendshift" width="250" height="55"/></a>
 </div>
 <br>
 <div align="center">
  <img src=".github/screenshot.png" alt="Strix Demo" width="800" style="border-radius: 16px;">
 </div>
 <br>
 > [!TIP]
-> **New!** Strix now integrates seamlessly with GitHub Actions and CI/CD pipelines. Automatically scan for vulnerabilities on every pull request and block insecure code before it reaches production!
+> **New!** Strix integrates seamlessly with GitHub Actions and CI/CD pipelines. Automatically scan for vulnerabilities on every pull request and block insecure code before it reaches production - [Get started with no setup required](https://app.strix.ai).
 ---
-## 🦉 Strix Overview
+
 ## Strix Overview
 Strix are autonomous AI agents that act just like real hackers - they run your code dynamically, find vulnerabilities, and validate them through actual proof-of-concepts. Built for developers and security teams who need fast, accurate security testing without the overhead of manual pentesting or the false positives of static analysis tools.
 **Key Capabilities:**
- 🔧 **Full hacker toolkit** out of the box
+- **Full hacker toolkit** out of the box
- 🤝 **Teams of agents** that collaborate and scale
+- **Teams of agents** that collaborate and scale
- ✅ **Real validation** with PoCs, not false positives
+- **Real validation** with PoCs, not false positives
- 💻 **Developer‑first** CLI with actionable reports
+- **Developer‑first** CLI with actionable reports
- 🔄 **Auto‑fix & reporting** to accelerate remediation
+- **Auto‑fix & reporting** to accelerate remediation
-## 🎯 Use Cases
+<br>
 <div align="center">
  <a href="https://strix.ai">
    <img src=".github/screenshot.png" alt="Strix Demo" width="1000" style="border-radius: 16px;">
  </a>
 </div>
 ## Use Cases
 - **Application Security Testing** - Detect and validate critical vulnerabilities in your applications
 - **Rapid Penetration Testing** - Get penetration tests done in hours, not weeks, with compliance reports
 - **Bug Bounty Automation** - Automate bug bounty research and generate PoCs for faster reporting
 - **CI/CD Integration** - Run tests in CI/CD to block vulnerabilities before reaching production
 ---
 ## 🚀 Quick Start
 **Prerequisites:**
 - Docker (running)
- An LLM provider key (e.g. [get OpenAI API key](https://platform.openai.com/api-keys) or use a local LLM)
+- An LLM API key from any [supported provider](https://docs.strix.ai/llm-providers/overview) (OpenAI, Anthropic, Google, etc.)
 ### Installation & First Scan
@@ -73,11 +80,8 @@ Strix are autonomous AI agents that act just like real hackers - they run your c
 # Install Strix
 curl -sSL https://strix.ai/install | bash
 # Or via pipx
 pipx install strix-agent
 # Configure your AI provider
-export STRIX_LLM="openai/gpt-5"
+export STRIX_LLM="openai/gpt-5.4"
 export LLM_API_KEY="your-api-key"
 # Run your first security assessment
@@ -87,24 +91,25 @@ strix --target ./app-directory
 > [!NOTE]
 > First run automatically pulls the sandbox Docker image. Results are saved to `strix_runs/<run-name>`
-## ☁️ Run Strix in Cloud
+---
-Want to skip the local setup, API keys, and unpredictable LLM costs? Run the hosted cloud version of Strix at **[app.strix.ai](https://strix.ai)**.
+## ☁️ Strix Platform
-Launch a scan in just a few minutes—no setup or configuration required—and you’ll get:
+Try the Strix full-stack security platform at **[app.strix.ai](https://app.strix.ai)** — sign up for free, connect your repos and domains, and launch a pentest in minutes.
- **A full pentest report** with validated findings and clear remediation steps
+- **Validated findings with PoCs** and reproduction steps
- **Shareable dashboards** your team can use to track fixes over time
+- **One-click autofix** as ready-to-merge pull requests
- **CI/CD and GitHub integrations** to block risky changes before production
+- **Continuous monitoring** across code, cloud, and infrastructure
- **Continuous monitoring** so new vulnerabilities are caught quickly
+- **Integrations** with GitHub, Slack, Jira, Linear, and CI/CD pipelines
 - **Continuous learning** that builds on past findings and remediations
-[**Run your first pentest now →**](https://strix.ai)
+[**Start your first pentest →**](https://app.strix.ai)
 ---
 ## ✨ Features
-### 🛠️ Agentic Security Tools
+### Agentic Security Tools
 Strix agents come equipped with a comprehensive security testing toolkit:
@@ -116,7 +121,7 @@ Strix agents come equipped with a comprehensive security testing toolkit:
 - **Code Analysis** - Static and dynamic analysis capabilities
 - **Knowledge Management** - Structured findings and attack documentation
-### 🎯 Comprehensive Vulnerability Detection
+### Comprehensive Vulnerability Detection
 Strix can identify and validate a wide range of security vulnerabilities:
@@ -128,7 +133,7 @@ Strix can identify and validate a wide range of security vulnerabilities:
 - **Authentication** - JWT vulnerabilities, session management
 - **Infrastructure** - Misconfigurations, exposed services
-### 🕸️ Graph of Agents
+### Graph of Agents
 Advanced multi-agent orchestration for comprehensive security testing:
@@ -138,7 +143,7 @@ Advanced multi-agent orchestration for comprehensive security testing:
 ---
-## 💻 Usage Examples
+## Usage Examples
 ### Basic Usage
@@ -162,14 +167,20 @@ strix --target https://your-app.com --instruction "Perform authenticated testing
 # Multi-target testing (source code + deployed app)
 strix -t https://github.com/org/app -t https://your-app.com
 # White-box source-aware scan (local repository)
 strix --target ./app-directory --scan-mode standard
 # Focused testing with custom instructions
 strix --target api.your-app.com --instruction "Focus on business logic flaws and IDOR vulnerabilities"
 # Provide detailed instructions through file (e.g., rules of engagement, scope, exclusions)
 strix --target api.your-app.com --instruction-file ./instruction.md
 # Force PR diff-scope against a specific base branch
 strix -n --target ./ --scan-mode quick --scope-mode diff --diff-base origin/main
 ```
-### 🤖 Headless Mode
+### Headless Mode
 Run Strix programmatically without interactive UI using the `-n/--non-interactive` flag—perfect for servers and automated jobs. The CLI prints real-time vulnerability findings, and the final report before exiting. Exits with non-zero code when vulnerabilities are found.
@@ -177,7 +188,7 @@ Run Strix programmatically without interactive UI using the `-n/--non-interactiv
 strix -n --target https://your-app.com
 ```
-### 🔄 CI/CD (GitHub Actions)
+### CI/CD (GitHub Actions)
 Strix can be added to your pipeline to run a security test on pull requests with a lightweight GitHub Actions workflow:
@@ -192,6 +203,8 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - name: Install Strix
        run: curl -sSL https://strix.ai/install | bash
@@ -204,10 +217,15 @@ jobs:
        run: strix -n -t ./ --scan-mode quick
 ```
-### ⚙️ Configuration
+> [!TIP]
 > In CI pull request runs, Strix automatically scopes quick reviews to changed files.
 > If diff-scope cannot resolve, ensure checkout uses full history (`fetch-depth: 0`) or pass
 > `--diff-base` explicitly.
 ### Configuration
 ```bash
-export STRIX_LLM="openai/gpt-5"
+export STRIX_LLM="openai/gpt-5.4"
 export LLM_API_KEY="your-api-key"
 # Optional
@@ -221,30 +239,35 @@ export STRIX_REASONING_EFFORT="high"  # control thinking effort (default: high,
 **Recommended models for best results:**
- [OpenAI GPT-5](https://openai.com/api/) — `openai/gpt-5`
+- [OpenAI GPT-5.4](https://openai.com/api/) — `openai/gpt-5.4`
- [Anthropic Claude Sonnet 4.5](https://claude.com/platform/api) — `anthropic/claude-sonnet-4-5`
+- [Anthropic Claude Sonnet 4.6](https://claude.com/platform/api) — `anthropic/claude-sonnet-4-6`
 - [Google Gemini 3 Pro Preview](https://cloud.google.com/vertex-ai) — `vertex_ai/gemini-3-pro-preview`
 See the [LLM Providers documentation](https://docs.strix.ai/llm-providers/overview) for all supported providers including Vertex AI, Bedrock, Azure, and local models.
-## 📚 Documentation
+## Enterprise
 Get the same Strix experience with [enterprise-grade](https://strix.ai/demo) controls: SSO (SAML/OIDC), custom compliance reports, dedicated support & SLA, custom deployment options (VPC/self-hosted), BYOK model support, and tailored agents optimized for your environment. [Learn more](https://strix.ai/demo).
 ## Documentation
 Full documentation is available at **[docs.strix.ai](https://docs.strix.ai)** — including detailed guides for usage, CI/CD integrations, skills, and advanced configuration.
-## 🤝 Contributing
+## Contributing
 We welcome contributions of code, docs, and new skills - check out our [Contributing Guide](https://docs.strix.ai/contributing) to get started or open a [pull request](https://github.com/usestrix/strix/pulls)/[issue](https://github.com/usestrix/strix/issues).
-## 👥 Join Our Community
+## Join Our Community
-Have questions? Found a bug? Want to contribute? **[Join our Discord!](https://discord.gg/YjKFvEZSdZ)**
+Have questions? Found a bug? Want to contribute? **[Join our Discord!](https://discord.gg/strix-ai)**
-## 🌟 Support the Project
+## Support the Project
 **Love Strix?** Give us a ⭐ on GitHub!
 ## 🙏 Acknowledgements
-Strix builds on the incredible work of open-source projects like [LiteLLM](https://github.com/BerriAI/litellm), [Caido](https://github.com/caido/caido), [ProjectDiscovery](https://github.com/projectdiscovery), [Playwright](https://github.com/microsoft/playwright), and [Textual](https://github.com/Textualize/textual). Huge thanks to their maintainers!
+## Acknowledgements
 Strix builds on the incredible work of open-source projects like [LiteLLM](https://github.com/BerriAI/litellm), [Caido](https://github.com/caido/caido), [Nuclei](https://github.com/projectdiscovery/nuclei), [Playwright](https://github.com/microsoft/playwright), and [Textual](https://github.com/Textualize/textual). Huge thanks to their maintainers!
 > [!WARNING]
--- a/benchmarks/README.md
+++ b/benchmarks/README.md
@@ -0,0 +1,43 @@
 # Benchmarks
 We use security benchmarks to track Strix's capabilities and improvements over time. We plan to add more benchmarks, both existing ones and our own, to help the community evaluate and compare security agents.
 ## Full Details
 For the complete benchmark results, evaluation scripts, and run data, see the [usestrix/benchmarks](https://github.com/usestrix/benchmarks) repository.
 > [!NOTE]
 > We are actively adding more benchmarks to our evaluation suite.
 ## Results
 | Benchmark | Challenges | Success Rate |
 |-----------|------------|--------------|
 | [XBEN](https://github.com/usestrix/benchmarks/tree/main/XBEN) | 104 | **96%** |
 ### XBEN
 The [XBOW benchmark](https://github.com/usestrix/benchmarks/tree/main/XBEN) is a set of 104 web security challenges designed to evaluate autonomous penetration testing agents. Each challenge follows a CTF format where the agent must discover and exploit vulnerabilities to extract a hidden flag.
 Strix `v0.4.0` achieved a **96% success rate** (100/104 challenges) in black-box mode.
 ```mermaid
 %%{init: {'theme': 'base', 'themeVariables': { 'pie1': '#3b82f6', 'pie2': '#1e3a5f', 'pieTitleTextColor': '#ffffff', 'pieSectionTextColor': '#ffffff', 'pieLegendTextColor': '#ffffff'}}}%%
 pie title Challenge Outcomes (104 Total)
    "Solved" : 100
    "Unsolved" : 4
 ```
 **Performance by Difficulty:**
 | Difficulty | Solved | Success Rate |
 |------------|--------|--------------|
 | Level 1 (Easy) | 45/45 | 100% |
 | Level 2 (Medium) | 49/51 | 96% |
 | Level 3 (Hard) | 6/8 | 75% |
 **Resource Usage:**
 - Average solve time: ~19 minutes
 - Total cost: ~$337 for 100 challenges
--- a/containers/Dockerfile
+++ b/containers/Dockerfile
@@ -9,7 +9,8 @@ RUN apt-get update && \
 RUN useradd -m -s /bin/bash pentester && \
    usermod -aG sudo pentester && \
-    echo "pentester ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers
+    echo "pentester ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers && \
    touch /home/pentester/.hushlogin
 RUN mkdir -p /home/pentester/configs \
             /home/pentester/wordlists \
@@ -69,11 +70,7 @@ USER root
 RUN cp /app/certs/ca.crt /usr/local/share/ca-certificates/ca.crt && \
    update-ca-certificates
-RUN curl -sSL https://install.python-poetry.org | POETRY_HOME=/opt/poetry python3 - && \
+RUN curl -LsSf https://astral.sh/uv/install.sh | env UV_INSTALL_DIR=/usr/local/bin sh
    ln -s /opt/poetry/bin/poetry /usr/local/bin/poetry && \
    chmod +x /usr/local/bin/poetry && \
    python3 -m venv /app/venv && \
    chown -R pentester:pentester /app/venv /opt/poetry
 USER pentester
 WORKDIR /tmp
@@ -96,7 +93,36 @@ RUN mkdir -p /home/pentester/.npm-global
 RUN npm install -g retire@latest && \
    npm install -g eslint@latest && \
-    npm install -g js-beautify@latest
+    npm install -g js-beautify@latest && \
    npm install -g @ast-grep/cli@latest && \
    npm install -g tree-sitter-cli@latest
 RUN set -eux; \
    TS_PARSER_DIR="/home/pentester/.tree-sitter/parsers"; \
    mkdir -p "${TS_PARSER_DIR}"; \
    for repo in tree-sitter-java tree-sitter-javascript tree-sitter-python tree-sitter-go tree-sitter-bash tree-sitter-json tree-sitter-yaml tree-sitter-typescript; do \
        if [ "$repo" = "tree-sitter-yaml" ]; then \
            repo_url="https://github.com/tree-sitter-grammars/${repo}.git"; \
        else \
            repo_url="https://github.com/tree-sitter/${repo}.git"; \
        fi; \
        if [ ! -d "${TS_PARSER_DIR}/${repo}" ]; then \
            git clone --depth 1 "${repo_url}" "${TS_PARSER_DIR}/${repo}"; \
        fi; \
    done; \
    if [ -d "${TS_PARSER_DIR}/tree-sitter-typescript/typescript" ]; then \
        ln -sfn "${TS_PARSER_DIR}/tree-sitter-typescript/typescript" "${TS_PARSER_DIR}/tree-sitter-typescript-typescript"; \
    fi; \
    if [ -d "${TS_PARSER_DIR}/tree-sitter-typescript/tsx" ]; then \
        ln -sfn "${TS_PARSER_DIR}/tree-sitter-typescript/tsx" "${TS_PARSER_DIR}/tree-sitter-typescript-tsx"; \
    fi; \
    tree-sitter init-config >/dev/null 2>&1 || true; \
    TS_CONFIG="/home/pentester/.config/tree-sitter/config.json"; \
    mkdir -p "$(dirname "${TS_CONFIG}")"; \
    [ -f "${TS_CONFIG}" ] || printf '{}\n' > "${TS_CONFIG}"; \
    TMP_CFG="$(mktemp)"; \
    jq --arg p "${TS_PARSER_DIR}" '.["parser-directories"] = ((.["parser-directories"] // []) + [$p] | unique)' "${TS_CONFIG}" > "${TMP_CFG}"; \
    mv "${TMP_CFG}" "${TS_CONFIG}"
 WORKDIR /home/pentester/tools
 RUN git clone https://github.com/aravind0x7/JS-Snooper.git && \
@@ -109,6 +135,18 @@ RUN git clone https://github.com/aravind0x7/JS-Snooper.git && \
 USER root
 RUN curl -sSfL https://raw.githubusercontent.com/trufflesecurity/trufflehog/main/scripts/install.sh | sh -s -- -b /usr/local/bin
 RUN set -eux; \
    ARCH="$(uname -m)"; \
    case "$ARCH" in \
        x86_64) GITLEAKS_ARCH="x64" ;; \
        aarch64|arm64) GITLEAKS_ARCH="arm64" ;; \
        *) echo "Unsupported architecture: $ARCH" >&2; exit 1 ;; \
    esac; \
    TAG="$(curl -fsSL https://api.github.com/repos/gitleaks/gitleaks/releases/latest | jq -r .tag_name)"; \
    curl -fsSL "https://github.com/gitleaks/gitleaks/releases/download/${TAG}/gitleaks_${TAG#v}_linux_${GITLEAKS_ARCH}.tar.gz" -o /tmp/gitleaks.tgz; \
    tar -xzf /tmp/gitleaks.tgz -C /tmp; \
    install -m 0755 /tmp/gitleaks /usr/local/bin/gitleaks; \
    rm -f /tmp/gitleaks /tmp/gitleaks.tgz
 RUN apt-get update && apt-get install -y zaproxy
@@ -129,9 +167,8 @@ RUN apt-get autoremove -y && \
    apt-get autoclean && \
    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
-ENV PATH="/home/pentester/go/bin:/home/pentester/.local/bin:/home/pentester/.npm-global/bin:/app/venv/bin:$PATH"
+ENV PATH="/home/pentester/go/bin:/home/pentester/.local/bin:/home/pentester/.npm-global/bin:/app/.venv/bin:$PATH"
-ENV VIRTUAL_ENV="/app/venv"
+ENV VIRTUAL_ENV="/app/.venv"
 ENV POETRY_HOME="/opt/poetry"
 WORKDIR /app
@@ -156,28 +193,22 @@ ENV SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
 RUN mkdir -p /workspace && chown -R pentester:pentester /workspace /app
-COPY pyproject.toml poetry.lock ./
+COPY pyproject.toml uv.lock ./
 RUN echo "# Sandbox Environment" > README.md && mkdir -p strix && touch strix/__init__.py
 USER pentester
-RUN poetry install --no-root --without dev --extras sandbox
+RUN uv sync --frozen --no-dev --extra sandbox
-RUN poetry run playwright install chromium
+RUN /app/.venv/bin/python -m playwright install chromium
-RUN /app/venv/bin/pip install -r /home/pentester/tools/jwt_tool/requirements.txt && \
+RUN uv pip install -r /home/pentester/tools/jwt_tool/requirements.txt && \
    ln -s /home/pentester/tools/jwt_tool/jwt_tool.py /home/pentester/.local/bin/jwt_tool
 RUN echo "# Sandbox Environment" > README.md
 COPY strix/__init__.py strix/
 COPY strix/config/ /app/strix/config/
 COPY strix/utils/ /app/strix/utils/
 COPY strix/telemetry/ /app/strix/telemetry/
 COPY strix/runtime/tool_server.py strix/runtime/__init__.py strix/runtime/runtime.py /app/strix/runtime/
-
+COPY strix/tools/ /app/strix/tools/
 COPY strix/tools/__init__.py strix/tools/registry.py strix/tools/executor.py strix/tools/argument_parser.py /app/strix/tools/
 COPY strix/tools/browser/ /app/strix/tools/browser/
 COPY strix/tools/file_edit/ /app/strix/tools/file_edit/
 COPY strix/tools/notes/ /app/strix/tools/notes/
 COPY strix/tools/python/ /app/strix/tools/python/
 COPY strix/tools/terminal/ /app/strix/tools/terminal/
 COPY strix/tools/proxy/ /app/strix/tools/proxy/
 RUN echo 'export PATH="/home/pentester/go/bin:/home/pentester/.local/bin:/home/pentester/.npm-global/bin:$PATH"' >> /home/pentester/.bashrc && \
    echo 'export PATH="/home/pentester/go/bin:/home/pentester/.local/bin:/home/pentester/.npm-global/bin:$PATH"' >> /home/pentester/.profile
--- a/containers/docker-entrypoint.sh
+++ b/containers/docker-entrypoint.sh
@@ -1,38 +1,75 @@
 #!/bin/bash
 set -e
-if [ -z "$CAIDO_PORT" ]; then
+CAIDO_PORT=48080
-    echo "Error: CAIDO_PORT must be set."
+CAIDO_LOG="/tmp/caido_startup.log"
-    exit 1
+
 if [ ! -f /app/certs/ca.p12 ]; then
  echo "ERROR: CA certificate file /app/certs/ca.p12 not found."
  exit 1
 fi
-caido-cli --listen 127.0.0.1:${CAIDO_PORT} \
+caido-cli --listen 0.0.0.0:${CAIDO_PORT} \
          --allow-guests \
          --no-logging \
          --no-open \
          --import-ca-cert /app/certs/ca.p12 \
-          --import-ca-cert-pass "" > /dev/null 2>&1 &
+          --import-ca-cert-pass "" > "$CAIDO_LOG" 2>&1 &
 CAIDO_PID=$!
 echo "Started Caido with PID $CAIDO_PID on port $CAIDO_PORT"
 echo "Waiting for Caido API to be ready..."
 CAIDO_READY=false
 for i in {1..30}; do
-  if curl -s -o /dev/null http://localhost:${CAIDO_PORT}/graphql; then
+  if ! kill -0 $CAIDO_PID 2>/dev/null; then
-    echo "Caido API is ready."
+    echo "ERROR: Caido process died while waiting for API (iteration $i)."
    echo "=== Caido log ==="
    cat "$CAIDO_LOG" 2>/dev/null || echo "(no log available)"
    exit 1
  fi
  if curl -s -o /dev/null -w "%{http_code}" http://localhost:${CAIDO_PORT}/graphql/ | grep -qE "^(200|400)$"; then
    echo "Caido API is ready (attempt $i)."
    CAIDO_READY=true
    break
  fi
  sleep 1
 done
 if [ "$CAIDO_READY" = false ]; then
  echo "ERROR: Caido API did not become ready within 30 seconds."
  echo "Caido process status: $(kill -0 $CAIDO_PID 2>&1 && echo 'running' || echo 'dead')"
  echo "=== Caido log ==="
  cat "$CAIDO_LOG" 2>/dev/null || echo "(no log available)"
  exit 1
 fi
 sleep 2
 echo "Fetching API token..."
-TOKEN=$(curl -s -X POST \
+TOKEN=""
-  -H "Content-Type: application/json" \
+for attempt in 1 2 3 4 5; do
-  -d '{"query":"mutation LoginAsGuest { loginAsGuest { token { accessToken } } }"}' \
+  RESPONSE=$(curl -sL -X POST \
-  http://localhost:${CAIDO_PORT}/graphql | jq -r '.data.loginAsGuest.token.accessToken')
+    -H "Content-Type: application/json" \
    -d '{"query":"mutation LoginAsGuest { loginAsGuest { token { accessToken } } }"}' \
    http://localhost:${CAIDO_PORT}/graphql)
  TOKEN=$(echo "$RESPONSE" | jq -r '.data.loginAsGuest.token.accessToken // empty')
  if [ -n "$TOKEN" ] && [ "$TOKEN" != "null" ]; then
    echo "Successfully obtained API token (attempt $attempt)."
    break
  fi
  echo "Token fetch attempt $attempt failed: $RESPONSE"
  sleep $((attempt * 2))
 done
 if [ -z "$TOKEN" ] || [ "$TOKEN" == "null" ]; then
-  echo "Failed to get API token from Caido."
+  echo "ERROR: Failed to get API token from Caido after 5 attempts."
-  curl -s -X POST -H "Content-Type: application/json" -d '{"query":"mutation { loginAsGuest { token { accessToken } } }"}' http://localhost:${CAIDO_PORT}/graphql
+  echo "=== Caido log ==="
  cat "$CAIDO_LOG" 2>/dev/null || echo "(no log available)"
  exit 1
 fi
@@ -40,7 +77,7 @@ export CAIDO_API_TOKEN=$TOKEN
 echo "Caido API token has been set."
 echo "Creating a new Caido project..."
-CREATE_PROJECT_RESPONSE=$(curl -s -X POST \
+CREATE_PROJECT_RESPONSE=$(curl -sL -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"query":"mutation CreateProject { createProject(input: {name: \"sandbox\", temporary: true}) { project { id } } }"}' \
@@ -57,7 +94,7 @@ fi
 echo "Caido project created with ID: $PROJECT_ID"
 echo "Selecting Caido project..."
-SELECT_RESPONSE=$(curl -s -X POST \
+SELECT_RESPONSE=$(curl -sL -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"query":"mutation SelectProject { selectProject(id: \"'$PROJECT_ID'\") { currentProject { project { id } } } }"}' \
@@ -114,9 +151,35 @@ sudo -u pentester certutil -N -d sql:/home/pentester/.pki/nssdb --empty-password
 sudo -u pentester certutil -A -n "Testing Root CA" -t "C,," -i /app/certs/ca.crt -d sql:/home/pentester/.pki/nssdb
 echo "✅ CA added to browser trust store"
-echo "Container initialization complete - agents will start their own tool servers as needed"
+echo "Starting tool server..."
-echo "✅ Shared container ready for multi-agent use"
+cd /app
 export PYTHONPATH=/app
 export STRIX_SANDBOX_MODE=true
 export TOOL_SERVER_TIMEOUT="${STRIX_SANDBOX_EXECUTION_TIMEOUT:-120}"
 TOOL_SERVER_LOG="/tmp/tool_server.log"
 sudo -E -u pentester \
  /app/.venv/bin/python -m strix.runtime.tool_server \
  --token="$TOOL_SERVER_TOKEN" \
  --host=0.0.0.0 \
  --port="$TOOL_SERVER_PORT" \
  --timeout="$TOOL_SERVER_TIMEOUT" > "$TOOL_SERVER_LOG" 2>&1 &
 for i in {1..10}; do
  if curl -s "http://127.0.0.1:$TOOL_SERVER_PORT/health" | grep -q '"status":"healthy"'; then
    echo "✅ Tool server healthy on port $TOOL_SERVER_PORT"
    break
  fi
  if [ $i -eq 10 ]; then
    echo "ERROR: Tool server failed to become healthy"
    echo "=== Tool server log ==="
    cat "$TOOL_SERVER_LOG" 2>/dev/null || echo "(no log)"
    exit 1
  fi
  sleep 1
 done
 echo "✅ Container ready"
 cd /workspace
 exec "$@"
--- a/docs/README.md
+++ b/docs/README.md
@@ -0,0 +1,10 @@
 # Strix Documentation
 Documentation source files for Strix, powered by [Mintlify](https://mintlify.com).
 ## Local Preview
 ```bash
 npm i -g mintlify
 cd docs && mintlify dev
 ```
--- a/docs/advanced/configuration.mdx
+++ b/docs/advanced/configuration.mdx
@@ -0,0 +1,138 @@
 ---
 title: "Configuration"
 description: "Environment variables for Strix"
 ---
 Configure Strix using environment variables or a config file.
 ## LLM Configuration
 <ParamField path="STRIX_LLM" type="string" required>
  Model name in LiteLLM format (e.g., `openai/gpt-5.4`, `anthropic/claude-sonnet-4-6`).
 </ParamField>
 <ParamField path="LLM_API_KEY" type="string">
  API key for your LLM provider. Not required for local models or cloud provider auth (Vertex AI, AWS Bedrock).
 </ParamField>
 <ParamField path="LLM_API_BASE" type="string">
  Custom API base URL. Also accepts `OPENAI_API_BASE`, `LITELLM_BASE_URL`, or `OLLAMA_API_BASE`.
 </ParamField>
 <ParamField path="LLM_TIMEOUT" default="300" type="integer">
  Request timeout in seconds for LLM calls.
 </ParamField>
 <ParamField path="STRIX_LLM_MAX_RETRIES" default="5" type="integer">
  Maximum number of retries for LLM API calls on transient failures.
 </ParamField>
 <ParamField path="STRIX_REASONING_EFFORT" default="high" type="string">
  Control thinking effort for reasoning models. Valid values: `none`, `minimal`, `low`, `medium`, `high`, `xhigh`. Defaults to `medium` for quick scan mode.
 </ParamField>
 <ParamField path="STRIX_MEMORY_COMPRESSOR_TIMEOUT" default="30" type="integer">
  Timeout in seconds for memory compression operations (context summarization).
 </ParamField>
 ## Optional Features
 <ParamField path="PERPLEXITY_API_KEY" type="string">
  API key for Perplexity AI. Enables real-time web search during scans for OSINT and vulnerability research.
 </ParamField>
 <ParamField path="STRIX_DISABLE_BROWSER" default="false" type="boolean">
  Disable browser automation tools.
 </ParamField>
 <ParamField path="STRIX_TELEMETRY" default="1" type="string">
  Global telemetry default toggle. Set to `0`, `false`, `no`, or `off` to disable both PostHog and OTEL unless overridden by per-channel flags below.
 </ParamField>
 <ParamField path="STRIX_OTEL_TELEMETRY" type="string">
  Enable/disable OpenTelemetry run observability independently. When unset, falls back to `STRIX_TELEMETRY`.
 </ParamField>
 <ParamField path="STRIX_POSTHOG_TELEMETRY" type="string">
  Enable/disable PostHog product telemetry independently. When unset, falls back to `STRIX_TELEMETRY`.
 </ParamField>
 <ParamField path="TRACELOOP_BASE_URL" type="string">
  OTLP/Traceloop base URL for remote OpenTelemetry export. If unset, Strix keeps traces local only.
 </ParamField>
 <ParamField path="TRACELOOP_API_KEY" type="string">
  API key used for remote trace export. Remote export is enabled only when both `TRACELOOP_BASE_URL` and `TRACELOOP_API_KEY` are set.
 </ParamField>
 <ParamField path="TRACELOOP_HEADERS" type="string">
  Optional custom OTEL headers (JSON object or `key=value,key2=value2`). Useful for Langfuse or custom/self-hosted OTLP gateways.
 </ParamField>
 When remote OTEL vars are not set, Strix still writes complete run telemetry locally to:
 ```bash
 strix_runs/<run_name>/events.jsonl
 ```
 When remote vars are set, Strix dual-writes telemetry to both local JSONL and the remote OTEL endpoint.
 ## Docker Configuration
 <ParamField path="STRIX_IMAGE" default="ghcr.io/usestrix/strix-sandbox:0.1.13" type="string">
  Docker image to use for the sandbox container.
 </ParamField>
 <ParamField path="DOCKER_HOST" type="string">
  Docker daemon socket path. Use for remote Docker hosts or custom configurations.
 </ParamField>
 <ParamField path="STRIX_RUNTIME_BACKEND" default="docker" type="string">
  Runtime backend for the sandbox environment.
 </ParamField>
 ## Sandbox Configuration
 <ParamField path="STRIX_SANDBOX_EXECUTION_TIMEOUT" default="120" type="integer">
  Maximum execution time in seconds for sandbox operations.
 </ParamField>
 <ParamField path="STRIX_SANDBOX_CONNECT_TIMEOUT" default="10" type="integer">
  Timeout in seconds for connecting to the sandbox container.
 </ParamField>
 ## Config File
 Strix stores configuration in `~/.strix/cli-config.json`. You can also specify a custom config file:
 ```bash
 strix --target ./app --config /path/to/config.json
 ```
 **Config file format:**
 ```json
 {
  "env": {
    "STRIX_LLM": "openai/gpt-5.4",
    "LLM_API_KEY": "sk-...",
    "STRIX_REASONING_EFFORT": "high"
  }
 }
 ```
 ## Example Setup
 ```bash
 # Required
 export STRIX_LLM="openai/gpt-5.4"
 export LLM_API_KEY="sk-..."
 # Optional: Enable web search
 export PERPLEXITY_API_KEY="pplx-..."
 # Optional: Custom timeouts
 export LLM_TIMEOUT="600"
 export STRIX_SANDBOX_EXECUTION_TIMEOUT="300"
 ```
--- a/docs/advanced/skills.mdx
+++ b/docs/advanced/skills.mdx
@@ -0,0 +1,136 @@
 ---
 title: "Skills"
 description: "Specialized knowledge packages that enhance agent capabilities"
 ---
 Skills are structured knowledge packages that give Strix agents deep expertise in specific vulnerability types, technologies, and testing methodologies.
 ## The Idea
 LLMs have broad but shallow security knowledge. They know _about_ SQL injection, but lack the nuanced techniques that experienced pentesters use—parser quirks, bypass methods, validation tricks, and chain attacks.
 Skills inject this deep, specialized knowledge directly into the agent's context, transforming it from a generalist into a specialist for the task at hand.
 ## How They Work
 When Strix spawns an agent for a specific task, it selects up to 5 relevant skills based on the context:
 ```python
 # Agent created for JWT testing automatically loads relevant skills
 create_agent(
    task="Test authentication mechanisms",
    skills=["authentication_jwt", "business_logic"]
 )
 ```
 The skills are injected into the agent's system prompt, giving it access to:
 - **Advanced techniques** — Non-obvious methods beyond standard testing
 - **Working payloads** — Practical examples with variations
 - **Validation methods** — How to confirm findings and avoid false positives
 ## Skill Categories
 ### Vulnerabilities
 Core vulnerability classes with deep exploitation techniques.
 | Skill                                 | Coverage                                               |
 | ------------------------------------- | ------------------------------------------------------ |
 | `authentication_jwt`                  | JWT attacks, algorithm confusion, claim tampering      |
 | `idor`                                | Object reference attacks, horizontal/vertical access   |
 | `sql_injection`                       | SQL injection variants, WAF bypasses, blind techniques |
 | `xss`                                 | XSS types, filter bypasses, DOM exploitation           |
 | `ssrf`                                | Server-side request forgery, protocol handlers         |
 | `csrf`                                | Cross-site request forgery, token bypasses             |
 | `xxe`                                 | XML external entities, OOB exfiltration                |
 | `rce`                                 | Remote code execution vectors                          |
 | `business_logic`                      | Logic flaws, state manipulation, race conditions       |
 | `race_conditions`                     | TOCTOU, parallel request attacks                       |
 | `path_traversal_lfi_rfi`              | File inclusion, path traversal                         |
 | `open_redirect`                       | Redirect bypasses, URL parsing tricks                  |
 | `mass_assignment`                     | Attribute injection, hidden parameter pollution        |
 | `insecure_file_uploads`               | Upload bypasses, extension tricks                      |
 | `information_disclosure`              | Data leakage, error-based enumeration                  |
 | `subdomain_takeover`                  | Dangling DNS, cloud resource claims                    |
 | `broken_function_level_authorization` | Privilege escalation, role bypasses                    |
 ### Frameworks
 Framework-specific testing patterns.
 | Skill     | Coverage                                     |
 | --------- | -------------------------------------------- |
 | `fastapi` | FastAPI security patterns, Pydantic bypasses |
 | `nextjs`  | Next.js SSR/SSG issues, API route security   |
 ### Technologies
 Third-party service and platform security.
 | Skill                | Coverage                           |
 | -------------------- | ---------------------------------- |
 | `supabase`           | Supabase RLS bypasses, auth issues |
 | `firebase_firestore` | Firestore rules, Firebase auth     |
 ### Protocols
 Protocol-specific testing techniques.
 | Skill     | Coverage                                         |
 | --------- | ------------------------------------------------ |
 | `graphql` | GraphQL introspection, batching, resolver issues |
 ### Tooling
 Sandbox CLI playbooks for core recon and scanning tools.
 | Skill       | Coverage                                                |
 | ----------- | ------------------------------------------------------- |
 | `nmap`      | Port/service scan syntax and high-signal scan patterns  |
 | `nuclei`    | Template selection, severity filtering, and rate tuning |
 | `httpx`     | HTTP probing and fingerprint output patterns            |
 | `ffuf`      | Wordlist fuzzing, matcher/filter strategy, recursion    |
 | `subfinder` | Passive subdomain enumeration and source control        |
 | `naabu`     | Fast port scanning with explicit rate/verify controls   |
 | `katana`    | Crawl depth/JS/known-files behavior and pitfalls        |
 | `sqlmap`    | SQLi workflow for enumeration and controlled extraction  |
 ## Skill Structure
 Each skill is a Markdown file with YAML frontmatter for metadata:
 ```markdown
 ---
 name: skill_name
 description: Brief description of the skill's coverage
 ---
 # Skill Title
 Key insight about this vulnerability or technique.
 ## Attack Surface
 What this skill covers and where to look.
 ## Methodology
 Step-by-step testing approach.
 ## Techniques
 How to discover and exploit the vulnerability.
 ## Bypass Methods
 How to bypass common protections.
 ## Validation
 How to confirm findings and avoid false positives.
 ```
 ## Contributing Skills
 Community contributions are welcome. Create a `.md` file in the appropriate category with YAML frontmatter (`name` and `description` fields). Good skills include:
 1. **Real-world techniques** — Methods that work in practice
 2. **Practical payloads** — Working examples with variations
 3. **Validation steps** — How to confirm without false positives
 4. **Context awareness** — Version/environment-specific behavior
--- a/docs/cloud/overview.mdx
+++ b/docs/cloud/overview.mdx
@@ -0,0 +1,40 @@
 ---
 title: "Introduction"
 description: "Managed security testing without local setup"
 ---
 Skip the setup. Run Strix in the cloud at [app.strix.ai](https://app.strix.ai).
 ## Features
 <CardGroup cols={2}>
  <Card title="No Setup Required" icon="cloud">
    No Docker, API keys, or local installation needed.
  </Card>
  <Card title="Full Reports" icon="file-lines">
    Detailed findings with remediation guidance.
  </Card>
  <Card title="Team Dashboards" icon="users">
    Track vulnerabilities and fixes over time.
  </Card>
  <Card title="GitHub Integration" icon="github">
    Automatic scans on pull requests.
  </Card>
 </CardGroup>
 ## What You Get
 - **Penetration test reports** — Validated findings with PoCs
 - **Shareable dashboards** — Collaborate with your team
 - **CI/CD integration** — Block risky changes automatically
 - **Continuous monitoring** — Catch new vulnerabilities quickly
 ## Getting Started
 1. Sign up at [app.strix.ai](https://app.strix.ai)
 2. Connect your repository or enter a target URL
 3. Launch your first scan
 <Card title="Try Strix Cloud" icon="rocket" href="https://app.strix.ai">
  Run your first pentest in minutes.
 </Card>
--- a/docs/contributing.mdx
+++ b/docs/contributing.mdx
@@ -0,0 +1,96 @@
 ---
 title: "Contributing"
 description: "Contribute to Strix development"
 ---
 ## Development Setup
 ### Prerequisites
 - Python 3.12+
 - Docker (running)
 - [uv](https://docs.astral.sh/uv/)
 - Git
 ### Local Development
 <Steps>
  <Step title="Clone the repository">
    ```bash
    git clone https://github.com/usestrix/strix.git
    cd strix
    ```
  </Step>
  <Step title="Install dependencies">
    ```bash
    make setup-dev
    # or manually:
    uv sync
    uv run pre-commit install
    ```
  </Step>
  <Step title="Configure LLM">
    ```bash
    export STRIX_LLM="openai/gpt-5.4"
    export LLM_API_KEY="your-api-key"
    ```
  </Step>
  <Step title="Run Strix">
    ```bash
    uv run strix --target https://example.com
    ```
  </Step>
 </Steps>
 ## Contributing Skills
 Skills are specialized knowledge packages that enhance agent capabilities. They live in `strix/skills/`
 ### Creating a Skill
 1. Choose the right category
 2. Create a `.md` file with YAML frontmatter (`name` and `description` fields)
 3. Include practical examples—working payloads, commands, test cases
 4. Provide validation methods to confirm findings
 5. Submit via PR
 ## Contributing Code
 ### Pull Request Process
 1. **Create an issue first** — Describe the problem or feature
 2. **Fork and branch** — Work from `main`
 3. **Make changes** — Follow existing code style
 4. **Write tests** — Ensure coverage for new features
 5. **Run checks** — `make check-all` should pass
 6. **Submit PR** — Link to issue and provide context
 ### Code Style
 - PEP 8 with 100-character line limit
 - Type hints for all functions
 - Docstrings for public methods
 - Small, focused functions
 - Meaningful variable names
 ## Reporting Issues
 Include:
 - Python version and OS
 - Strix version (`strix --version`)
 - LLM being used
 - Full error traceback
 - Steps to reproduce
 ## Community
 <CardGroup cols={2}>
  <Card title="Discord" icon="discord" href="https://discord.gg/strix-ai">
    Join the community for help and discussion.
  </Card>
  <Card title="GitHub Issues" icon="github" href="https://github.com/usestrix/strix/issues">
    Report bugs and request features.
  </Card>
 </CardGroup>
--- a/docs/docs.json
+++ b/docs/docs.json
@@ -0,0 +1,129 @@
 {
  "$schema": "https://mintlify.com/docs.json",
  "theme": "maple",
  "name": "Strix",
  "colors": {
    "primary": "#000000",
    "light": "#ffffff",
    "dark": "#000000"
  },
  "favicon": "/images/favicon-48.ico",
  "navigation": {
    "tabs": [
      {
        "tab": "Documentation",
        "groups": [
          {
            "group": "Getting Started",
            "pages": [
              "index",
              "quickstart"
            ]
          },
          {
            "group": "Usage",
            "pages": [
              "usage/cli",
              "usage/scan-modes",
              "usage/instructions"
            ]
          },
          {
            "group": "LLM Providers",
            "pages": [
              "llm-providers/overview",
              "llm-providers/openai",
              "llm-providers/anthropic",
              "llm-providers/openrouter",
              "llm-providers/vertex",
              "llm-providers/bedrock",
              "llm-providers/azure",
              "llm-providers/local"
            ]
          },
          {
            "group": "Integrations",
            "pages": [
              "integrations/github-actions",
              "integrations/ci-cd"
            ]
          },
          {
            "group": "Tools",
            "pages": [
              "tools/overview",
              "tools/browser",
              "tools/proxy",
              "tools/terminal",
              "tools/sandbox"
            ]
          },
          {
            "group": "Advanced",
            "pages": [
              "advanced/configuration",
              "advanced/skills",
              "contributing"
            ]
          }
        ]
      },
      {
        "tab": "Cloud",
        "groups": [
          {
            "group": "Strix Cloud",
            "pages": [
              "cloud/overview"
            ]
          }
        ]
      }
    ],
    "global": {
      "anchors": [
        {
          "anchor": "GitHub",
          "href": "https://github.com/usestrix/strix",
          "icon": "github"
        },
        {
          "anchor": "Discord",
          "href": "https://discord.gg/strix-ai",
          "icon": "discord"
        }
      ]
    }
  },
  "navbar": {
    "links": [],
    "primary": {
      "type": "button",
      "label": "Try Strix Cloud",
      "href": "https://app.strix.ai"
    }
  },
  "footer": {
    "socials": {
      "x": "https://x.com/strix_ai",
      "github": "https://github.com/usestrix",
      "discord": "https://discord.gg/strix-ai"
    }
  },
  "fonts": {
    "family": "Geist",
    "heading": {
      "family": "Geist"
    },
    "body": {
      "family": "Geist"
    }
  },
  "appearance": {
    "default": "dark"
  },
  "description": "Open-source AI Hackers to secure your Apps",
  "background": {
    "decoration": "grid"
  }
 }
--- a/docs/images/favicon-48.ico
+++ b/docs/images/favicon-48.ico
--- a/docs/images/logo.png
+++ b/docs/images/logo.png
--- a/docs/images/screenshot.png
+++ b/docs/images/screenshot.png
--- a/docs/index.mdx
+++ b/docs/index.mdx
@@ -0,0 +1,101 @@
 ---
 title: "Introduction"
 description: "Open-source AI hackers to secure your apps"
 ---
 Strix are autonomous AI agents that act like real hackers—they run your code dynamically, find vulnerabilities, and validate them with proof-of-concepts. Built for developers and security teams who need fast, accurate security testing without the overhead of manual pentesting or the false positives of static analysis tools.
 <Frame>
  <img src="/images/screenshot.png" alt="Strix Demo" />
 </Frame>
 <CardGroup cols={2}>
  <Card title="Quick Start" icon="rocket" href="/quickstart">
    Install and run your first scan in minutes.
  </Card>
  <Card title="CLI Reference" icon="terminal" href="/usage/cli">
    Learn all command-line options.
  </Card>
  <Card title="Tools" icon="wrench" href="/tools/overview">
    Explore the security testing toolkit.
  </Card>
  <Card title="GitHub Actions" icon="github" href="/integrations/github-actions">
    Integrate into your CI/CD pipeline.
  </Card>
 </CardGroup>
 ## Use Cases
 - **Application Security Testing** — Detect and validate critical vulnerabilities in your applications
 - **Rapid Penetration Testing** — Get penetration tests done in hours, not weeks
 - **Bug Bounty Automation** — Automate research and generate PoCs for faster reporting
 - **CI/CD Integration** — Block vulnerabilities before they reach production
 ## Key Capabilities
 - **Full hacker toolkit** — Browser automation, HTTP proxy, terminal, Python runtime
 - **Real validation** — PoCs, not false positives
 - **Multi-agent orchestration** — Specialized agents collaborate on complex targets
 - **Developer-first CLI** — Interactive TUI or headless mode for automation
 ## Security Tools
 Strix agents come equipped with a comprehensive toolkit:
 | Tool | Purpose |
 |------|---------|
 | HTTP Proxy | Full request/response manipulation and analysis |
 | Browser Automation | Multi-tab browser for XSS, CSRF, auth flow testing |
 | Terminal | Interactive shells for command execution |
 | Python Runtime | Custom exploit development and validation |
 | Reconnaissance | Automated OSINT and attack surface mapping |
 | Code Analysis | Static and dynamic analysis capabilities |
 ## Vulnerability Coverage
 | Category | Examples |
 |----------|----------|
 | Access Control | IDOR, privilege escalation, auth bypass |
 | Injection | SQL, NoSQL, command injection |
 | Server-Side | SSRF, XXE, deserialization |
 | Client-Side | XSS, prototype pollution, DOM vulnerabilities |
 | Business Logic | Race conditions, workflow manipulation |
 | Authentication | JWT vulnerabilities, session management |
 | Infrastructure | Misconfigurations, exposed services |
 ## Multi-Agent Architecture
 Strix uses a graph of specialized agents for comprehensive security testing:
 - **Distributed Workflows** — Specialized agents for different attacks and assets
 - **Scalable Testing** — Parallel execution for fast comprehensive coverage
 - **Dynamic Coordination** — Agents collaborate and share discoveries
 ## Quick Example
 ```bash
 # Install
 curl -sSL https://strix.ai/install | bash
 # Configure
 export STRIX_LLM="openai/gpt-5.4"
 export LLM_API_KEY="your-api-key"
 # Scan
 strix --target ./your-app
 ```
 ## Community
 <CardGroup cols={2}>
  <Card title="Discord" icon="discord" href="https://discord.gg/strix-ai">
    Join the community for help and discussion.
  </Card>
  <Card title="GitHub" icon="github" href="https://github.com/usestrix/strix">
    Star the repo and contribute.
  </Card>
 </CardGroup>
 <Warning>
 Only test applications you own or have explicit permission to test.
 </Warning>
--- a/docs/integrations/ci-cd.mdx
+++ b/docs/integrations/ci-cd.mdx
@@ -0,0 +1,90 @@
 ---
 title: "CI/CD Integration"
 description: "Run Strix in any CI/CD pipeline"
 ---
 Strix runs in headless mode for automated pipelines.
 ## Headless Mode
 Use the `-n` or `--non-interactive` flag:
 ```bash
 strix -n --target ./app --scan-mode quick
 ```
 For pull-request style CI runs, Strix automatically scopes quick scans to changed files. You can force this behavior and set a base ref explicitly:
 ```bash
 strix -n --target ./app --scan-mode quick --scope-mode diff --diff-base origin/main
 ```
 ## Exit Codes
 | Code | Meaning |
 |------|---------|
 | 0 | No vulnerabilities found |
 | 1 | Execution error |
 | 2 | Vulnerabilities found |
 ## GitLab CI
 ```yaml .gitlab-ci.yml
 security-scan:
  image: docker:latest
  services:
    - docker:dind
  variables:
    STRIX_LLM: $STRIX_LLM
    LLM_API_KEY: $LLM_API_KEY
  script:
    - curl -sSL https://strix.ai/install | bash
    - strix -n -t ./ --scan-mode quick
 ```
 ## Jenkins
 ```groovy Jenkinsfile
 pipeline {
    agent any
    environment {
        STRIX_LLM = credentials('strix-llm')
        LLM_API_KEY = credentials('llm-api-key')
    }
    stages {
        stage('Security Scan') {
            steps {
                sh 'curl -sSL https://strix.ai/install | bash'
                sh 'strix -n -t ./ --scan-mode quick'
            }
        }
    }
 }
 ```
 ## CircleCI
 ```yaml .circleci/config.yml
 version: 2.1
 jobs:
  security-scan:
    docker:
      - image: cimg/base:current
    steps:
      - checkout
      - setup_remote_docker
      - run:
          name: Install Strix
          command: curl -sSL https://strix.ai/install | bash
      - run:
          name: Run Scan
          command: strix -n -t ./ --scan-mode quick
 ```
 <Note>
 All CI platforms require Docker access. Ensure your runner has Docker available.
 </Note>
 <Tip>
 If diff-scope fails in CI, fetch full git history (for example, `fetch-depth: 0` in GitHub Actions) so merge-base and branch comparison can be resolved.
 </Tip>
--- a/docs/integrations/github-actions.mdx
+++ b/docs/integrations/github-actions.mdx
@@ -0,0 +1,66 @@
 ---
 title: "GitHub Actions"
 description: "Run Strix security scans on every pull request"
 ---
 Integrate Strix into your GitHub workflow to catch vulnerabilities before they reach production.
 ## Basic Workflow
 ```yaml .github/workflows/security.yml
 name: Security Scan
 on:
  pull_request:
 jobs:
  strix-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - name: Install Strix
        run: curl -sSL https://strix.ai/install | bash
      - name: Run Security Scan
        env:
          STRIX_LLM: ${{ secrets.STRIX_LLM }}
          LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
        run: strix -n -t ./ --scan-mode quick
 ```
 ## Required Secrets
 Add these secrets to your repository:
 | Secret | Description |
 |--------|-------------|
 | `STRIX_LLM` | Model name (e.g., `openai/gpt-5.4`) |
 | `LLM_API_KEY` | API key for your LLM provider |
 ## Exit Codes
 The workflow fails when vulnerabilities are found:
 | Code | Result |
 |------|--------|
 | 0 | Pass — No vulnerabilities |
 | 2 | Fail — Vulnerabilities found |
 ## Scan Modes for CI
 | Mode | Duration | Use Case |
 |------|----------|----------|
 | `quick` | Minutes | Every PR |
 | `standard` | ~30 min | Nightly builds |
 | `deep` | 1-4 hours | Release candidates |
 <Tip>
 Use `quick` mode for PRs to keep feedback fast. Schedule `deep` scans nightly.
 </Tip>
 <Note>
 For pull_request workflows, Strix automatically uses changed-files diff-scope in CI/headless runs. If diff resolution fails, ensure full history is fetched (`fetch-depth: 0`) or set `--diff-base`.
 </Note>
--- a/docs/llm-providers/anthropic.mdx
+++ b/docs/llm-providers/anthropic.mdx
@@ -0,0 +1,24 @@
 ---
 title: "Anthropic"
 description: "Configure Strix with Claude models"
 ---
 ## Setup
 ```bash
 export STRIX_LLM="anthropic/claude-sonnet-4-6"
 export LLM_API_KEY="sk-ant-..."
 ```
 ## Available Models
 | Model | Description |
 |-------|-------------|
 | `anthropic/claude-sonnet-4-6` | Best balance of intelligence and speed |
 | `anthropic/claude-opus-4-6` | Maximum capability for deep analysis |
 ## Get API Key
 1. Go to [console.anthropic.com](https://console.anthropic.com)
 2. Navigate to API Keys
 3. Create a new key
--- a/docs/llm-providers/azure.mdx
+++ b/docs/llm-providers/azure.mdx
@@ -0,0 +1,37 @@
 ---
 title: "Azure OpenAI"
 description: "Configure Strix with OpenAI models via Azure"
 ---
 ## Setup
 ```bash
 export STRIX_LLM="azure/your-gpt5-deployment"
 export AZURE_API_KEY="your-azure-api-key"
 export AZURE_API_BASE="https://your-resource.openai.azure.com"
 export AZURE_API_VERSION="2025-11-01-preview"
 ```
 ## Configuration
 | Variable | Description |
 |----------|-------------|
 | `STRIX_LLM` | `azure/<your-deployment-name>` |
 | `AZURE_API_KEY` | Your Azure OpenAI API key |
 | `AZURE_API_BASE` | Your Azure OpenAI endpoint URL |
 | `AZURE_API_VERSION` | API version (e.g., `2025-11-01-preview`) |
 ## Example
 ```bash
 export STRIX_LLM="azure/gpt-5.4-deployment"
 export AZURE_API_KEY="abc123..."
 export AZURE_API_BASE="https://mycompany.openai.azure.com"
 export AZURE_API_VERSION="2025-11-01-preview"
 ```
 ## Prerequisites
 1. Create an Azure OpenAI resource
 2. Deploy a model (e.g., GPT-5.4)
 3. Get the endpoint URL and API key from the Azure portal
--- a/docs/llm-providers/bedrock.mdx
+++ b/docs/llm-providers/bedrock.mdx
@@ -0,0 +1,47 @@
 ---
 title: "AWS Bedrock"
 description: "Configure Strix with models via AWS Bedrock"
 ---
 ## Setup
 ```bash
 export STRIX_LLM="bedrock/anthropic.claude-4-5-sonnet-20251022-v1:0"
 ```
 No API key required—uses AWS credentials from environment.
 ## Authentication
 ### Option 1: AWS CLI Profile
 ```bash
 export AWS_PROFILE="your-profile"
 export AWS_REGION="us-east-1"
 ```
 ### Option 2: Access Keys
 ```bash
 export AWS_ACCESS_KEY_ID="AKIA..."
 export AWS_SECRET_ACCESS_KEY="..."
 export AWS_REGION="us-east-1"
 ```
 ### Option 3: IAM Role (EC2/ECS)
 Automatically uses instance role credentials.
 ## Available Models
 | Model | Description |
 |-------|-------------|
 | `bedrock/anthropic.claude-4-5-sonnet-20251022-v1:0` | Claude 4.5 Sonnet |
 | `bedrock/anthropic.claude-4-5-opus-20251022-v1:0` | Claude 4.5 Opus |
 | `bedrock/anthropic.claude-4-5-haiku-20251022-v1:0` | Claude 4.5 Haiku |
 | `bedrock/amazon.titan-text-premier-v2:0` | Amazon Titan Premier v2 |
 ## Prerequisites
 1. Enable model access in the AWS Bedrock console
 2. Ensure your IAM role/user has `bedrock:InvokeModel` permission
--- a/docs/llm-providers/local.mdx
+++ b/docs/llm-providers/local.mdx
@@ -0,0 +1,56 @@
 ---
 title: "Local Models"
 description: "Run Strix with self-hosted LLMs for privacy and air-gapped testing"
 ---
 Running Strix with local models allows for completely offline, privacy-first security assessments. Data never leaves your machine, making this ideal for sensitive internal networks or air-gapped environments.
 ## Privacy vs Performance
 | Feature | Local Models | Cloud Models (GPT-5/Claude 4.5) |
 |---------|--------------|--------------------------------|
 | **Privacy** | 🔒 Data stays local | Data sent to provider |
 | **Cost** | Free (hardware only) | Pay-per-token |
 | **Reasoning** | Lower (struggles with agents) | State-of-the-art |
 | **Setup** | Complex (GPU required) | Instant |
 <Warning>
 **Compatibility Note**: Strix relies on advanced agentic capabilities (tool use, multi-step planning, self-correction). Most local models, especially those under 70B parameters, struggle with these complex tasks.
 For critical assessments, we strongly recommend using state-of-the-art cloud models like **Claude 4.5 Sonnet** or **GPT-5**. Use local models only when privacy is the absolute priority.
 </Warning>
 ## Ollama
 [Ollama](https://ollama.ai) is the easiest way to run local models on macOS, Linux, and Windows.
 ### Setup
 1. Install Ollama from [ollama.ai](https://ollama.ai)
 2. Pull a high-performance model:
   ```bash
   ollama pull qwen3-vl
   ```
 3. Configure Strix:
   ```bash
   export STRIX_LLM="ollama/qwen3-vl"
   export LLM_API_BASE="http://localhost:11434"
   ```
 ### Recommended Models
 We recommend these models for the best balance of reasoning and tool use:
 **Recommended models:**
 - **Qwen3 VL** (`ollama pull qwen3-vl`)
 - **DeepSeek V3.1** (`ollama pull deepseek-v3.1`)
 - **Devstral 2** (`ollama pull devstral-2`)
 ## LM Studio / OpenAI Compatible
 If you use LM Studio, vLLM, or other runners:
 ```bash
 export STRIX_LLM="openai/local-model"
 export LLM_API_BASE="http://localhost:1234/v1"  # Adjust port as needed
 ```
--- a/docs/llm-providers/openai.mdx
+++ b/docs/llm-providers/openai.mdx
@@ -0,0 +1,31 @@
 ---
 title: "OpenAI"
 description: "Configure Strix with OpenAI models"
 ---
 ## Setup
 ```bash
 export STRIX_LLM="openai/gpt-5.4"
 export LLM_API_KEY="sk-..."
 ```
 ## Available Models
 See [OpenAI Models Documentation](https://platform.openai.com/docs/models) for the full list of available models.
 ## Get API Key
 1. Go to [platform.openai.com](https://platform.openai.com)
 2. Navigate to API Keys
 3. Create a new secret key
 ## Custom Base URL
 For OpenAI-compatible APIs:
 ```bash
 export STRIX_LLM="openai/gpt-5.4"
 export LLM_API_KEY="your-key"
 export LLM_API_BASE="https://your-proxy.com/v1"
 ```
--- a/docs/llm-providers/openrouter.mdx
+++ b/docs/llm-providers/openrouter.mdx
@@ -0,0 +1,37 @@
 ---
 title: "OpenRouter"
 description: "Configure Strix with models via OpenRouter"
 ---
 [OpenRouter](https://openrouter.ai) provides access to 100+ models from multiple providers through a single API.
 ## Setup
 ```bash
 export STRIX_LLM="openrouter/openai/gpt-5.4"
 export LLM_API_KEY="sk-or-..."
 ```
 ## Available Models
 Access any model on OpenRouter using the format `openrouter/<provider>/<model>`:
 | Model | Configuration |
 |-------|---------------|
 | GPT-5.4 | `openrouter/openai/gpt-5.4` |
 | Claude Sonnet 4.6 | `openrouter/anthropic/claude-sonnet-4.6` |
 | Gemini 3 Pro | `openrouter/google/gemini-3-pro-preview` |
 | GLM-4.7 | `openrouter/z-ai/glm-4.7` |
 ## Get API Key
 1. Go to [openrouter.ai](https://openrouter.ai)
 2. Sign in and navigate to Keys
 3. Create a new API key
 ## Benefits
 - **Single API** — Access models from OpenAI, Anthropic, Google, Meta, and more
 - **Fallback routing** — Automatic failover between providers
 - **Cost tracking** — Monitor usage across all models
 - **Higher rate limits** — OpenRouter handles provider limits for you
--- a/docs/llm-providers/overview.mdx
+++ b/docs/llm-providers/overview.mdx
@@ -0,0 +1,70 @@
 ---
 title: "Overview"
 description: "Configure your AI model for Strix"
 ---
 Strix uses [LiteLLM](https://docs.litellm.ai/docs/providers) for model compatibility, supporting 100+ LLM providers.
 ## Configuration
 Set your model and API key:
 | Model             | Provider      | Configuration                    |
 | ----------------- | ------------- | -------------------------------- |
 | GPT-5.4           | OpenAI        | `openai/gpt-5.4`                 |
 | Claude Sonnet 4.6 | Anthropic     | `anthropic/claude-sonnet-4-6`    |
 | Gemini 3 Pro      | Google Vertex | `vertex_ai/gemini-3-pro-preview` |
 ```bash
 export STRIX_LLM="openai/gpt-5.4"
 export LLM_API_KEY="your-api-key"
 ```
 ## Local Models
 Run models locally with [Ollama](https://ollama.com), [LM Studio](https://lmstudio.ai), or any OpenAI-compatible server:
 ```bash
 export STRIX_LLM="ollama/llama4"
 export LLM_API_BASE="http://localhost:11434"
 ```
 See the [Local Models guide](/llm-providers/local) for setup instructions and recommended models.
 ## Provider Guides
 <CardGroup cols={2}>
  <Card title="OpenAI" href="/llm-providers/openai">
    GPT-5.4 models.
  </Card>
  <Card title="Anthropic" href="/llm-providers/anthropic">
    Claude Opus, Sonnet, and Haiku.
  </Card>
  <Card title="OpenRouter" href="/llm-providers/openrouter">
    Access 100+ models through a single API.
  </Card>
  <Card title="Google Vertex AI" href="/llm-providers/vertex">
    Gemini 3 models via Google Cloud.
  </Card>
  <Card title="AWS Bedrock" href="/llm-providers/bedrock">
    Claude and Titan models via AWS.
  </Card>
  <Card title="Azure OpenAI" href="/llm-providers/azure">
    GPT-5.4 via Azure.
  </Card>
  <Card title="Local Models" href="/llm-providers/local">
    Llama 4, Mistral, and self-hosted models.
  </Card>
 </CardGroup>
 ## Model Format
 Use LiteLLM's `provider/model-name` format:
 ```
 openai/gpt-5.4
 anthropic/claude-sonnet-4-6
 vertex_ai/gemini-3-pro-preview
 bedrock/anthropic.claude-4-5-sonnet-20251022-v1:0
 ollama/llama4
 ```
--- a/docs/llm-providers/vertex.mdx
+++ b/docs/llm-providers/vertex.mdx
@@ -0,0 +1,53 @@
 ---
 title: "Google Vertex AI"
 description: "Configure Strix with Gemini models via Google Cloud"
 ---
 ## Installation
 Vertex AI requires the Google Cloud dependency. Install Strix with the vertex extra:
 ```bash
 pipx install "strix-agent[vertex]"
 ```
 ## Setup
 ```bash
 export STRIX_LLM="vertex_ai/gemini-3-pro-preview"
 ```
 No API key required—uses Google Cloud Application Default Credentials.
 ## Authentication
 ### Option 1: gcloud CLI
 ```bash
 gcloud auth application-default login
 ```
 ### Option 2: Service Account
 ```bash
 export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
 ```
 ## Available Models
 | Model | Description |
 |-------|-------------|
 | `vertex_ai/gemini-3-pro-preview` | Best overall performance for security testing |
 | `vertex_ai/gemini-3-flash-preview` | Faster and cheaper |
 ## Project Configuration
 ```bash
 export VERTEXAI_PROJECT="your-project-id"
 export VERTEXAI_LOCATION="global"
 ```
 ## Prerequisites
 1. Enable the Vertex AI API in your Google Cloud project
 2. Ensure your account has the `Vertex AI User` role
--- a/docs/logo/strix.png
+++ b/docs/logo/strix.png
--- a/docs/quickstart.mdx
+++ b/docs/quickstart.mdx
@@ -0,0 +1,76 @@
 ---
 title: "Quick Start"
 description: "Install Strix and run your first security scan"
 ---
 ## Prerequisites
 - Docker (running)
 - An LLM API key from any [supported provider](/llm-providers/overview) (OpenAI, Anthropic, Google, etc.)
 ## Installation
 <Tabs>
  <Tab title="curl">
    ```bash
    curl -sSL https://strix.ai/install | bash
    ```
  </Tab>
  <Tab title="pipx">
    ```bash
    pipx install strix-agent
    ```
  </Tab>
 </Tabs>
 ## Configuration
 Set your LLM provider:
 ```bash
 export STRIX_LLM="openai/gpt-5.4"
 export LLM_API_KEY="your-api-key"
 ```
 <Tip>
 For best results, use `openai/gpt-5.4`, `anthropic/claude-opus-4-6`, or `openai/gpt-5.2`.
 </Tip>
 ## Run Your First Scan
 ```bash
 strix --target ./your-app
 ```
 <Note>
 First run pulls the Docker sandbox image automatically. Results are saved to `strix_runs/<run-name>`.
 </Note>
 ## Target Types
 Strix accepts multiple target types:
 ```bash
 # Local codebase
 strix --target ./app-directory
 # GitHub repository
 strix --target https://github.com/org/repo
 # Live web application
 strix --target https://your-app.com
 # Multiple targets (white-box testing)
 strix -t https://github.com/org/repo -t https://your-app.com
 ```
 ## Next Steps
 <CardGroup cols={2}>
  <Card title="CLI Options" icon="terminal" href="/usage/cli">
    Explore all command-line options.
  </Card>
  <Card title="Scan Modes" icon="gauge" href="/usage/scan-modes">
    Choose the right scan depth.
  </Card>
 </CardGroup>
--- a/docs/tools/browser.mdx
+++ b/docs/tools/browser.mdx
@@ -0,0 +1,34 @@
 ---
 title: "Browser"
 description: "Playwright-powered Chrome for web application testing"
 ---
 Strix uses a headless Chrome browser via Playwright to interact with web applications exactly like a real user would.
 ## How It Works
 All browser traffic is automatically routed through the Caido proxy, giving Strix full visibility into every request and response. This enables:
 - Testing client-side vulnerabilities (XSS, DOM manipulation)
 - Navigating authenticated flows (login, OAuth, MFA)
 - Triggering JavaScript-heavy functionality
 - Capturing dynamically generated requests
 ## Capabilities
 | Action     | Description                                 |
 | ---------- | ------------------------------------------- |
 | Navigate   | Go to URLs, follow links, handle redirects  |
 | Click      | Interact with buttons, links, form elements |
 | Type       | Fill in forms, search boxes, input fields   |
 | Execute JS | Run custom JavaScript in the page context   |
 | Screenshot | Capture visual state for reports            |
 | Multi-tab  | Test across multiple browser tabs           |
 ## Example Flow
 1. Agent launches browser and navigates to login page
 2. Fills in credentials and submits form
 3. Proxy captures the authentication request
 4. Agent navigates to protected areas
 5. Tests for IDOR by replaying requests with modified IDs
--- a/docs/tools/overview.mdx
+++ b/docs/tools/overview.mdx
@@ -0,0 +1,33 @@
 ---
 title: "Agent Tools"
 description: "How Strix agents interact with targets"
 ---
 Strix agents use specialized tools to test your applications like a real penetration tester would.
 ## Core Tools
 <CardGroup cols={2}>
  <Card title="Browser" icon="globe" href="/tools/browser">
    Playwright-powered Chrome for interacting with web UIs.
  </Card>
  <Card title="HTTP Proxy" icon="network-wired" href="/tools/proxy">
    Caido-powered proxy for intercepting and replaying requests.
  </Card>
  <Card title="Terminal" icon="terminal" href="/tools/terminal">
    Bash shell for running commands and security tools.
  </Card>
  <Card title="Sandbox Tools" icon="toolbox" href="/tools/sandbox">
    Pre-installed security tools: Nuclei, ffuf, and more.
  </Card>
 </CardGroup>
 ## Additional Tools
 | Tool           | Purpose                                  |
 | -------------- | ---------------------------------------- |
 | Python Runtime | Write and execute custom exploit scripts |
 | File Editor    | Read and modify source code              |
 | Web Search     | Real-time OSINT via Perplexity           |
 | Notes          | Document findings during the scan        |
 | Reporting      | Generate vulnerability reports with PoCs |
--- a/docs/tools/proxy.mdx
+++ b/docs/tools/proxy.mdx
@@ -0,0 +1,111 @@
 ---
 title: "HTTP Proxy"
 description: "Caido-powered proxy for request interception and replay"
 ---
 Strix includes [Caido](https://caido.io), a modern HTTP proxy built for security testing. All browser traffic flows through Caido, giving the agent full control over requests and responses.
 ## Capabilities
 | Feature          | Description                                  |
 | ---------------- | -------------------------------------------- |
 | Request Capture  | Log all HTTP/HTTPS traffic automatically     |
 | Request Replay   | Repeat any request with modifications        |
 | HTTPQL           | Query captured traffic with powerful filters |
 | Scope Management | Focus on specific domains or paths           |
 | Sitemap          | Visualize the discovered attack surface      |
 ## HTTPQL Filtering
 Query captured requests using Caido's HTTPQL syntax
 ## Request Replay
 The agent can take any captured request and replay it with modifications:
 - Change path parameters (test for IDOR)
 - Modify request body (test for injection)
 - Add/remove headers (test for auth bypass)
 - Alter cookies (test for session issues)
 ## Python Integration
 All proxy functions are automatically available in Python sessions. This enables powerful scripted security testing:
 ```python
 # List recent POST requests
 post_requests = list_requests(
    httpql_filter='req.method.eq:"POST"',
    page_size=20
 )
 # View a specific request
 request_details = view_request("req_123", part="request")
 # Replay with modified payload
 response = repeat_request("req_123", {
    "body": '{"user_id": "admin"}'
 })
 print(f"Status: {response['status_code']}")
 ```
 ### Available Functions
 | Function               | Description                                |
 | ---------------------- | ------------------------------------------ |
 | `list_requests()`      | Query captured traffic with HTTPQL filters |
 | `view_request()`       | Get full request/response details          |
 | `repeat_request()`     | Replay a request with modifications        |
 | `send_request()`       | Send a new HTTP request                    |
 | `scope_rules()`        | Manage proxy scope (allowlist/denylist)    |
 | `list_sitemap()`       | View discovered endpoints                  |
 | `view_sitemap_entry()` | Get details for a sitemap entry            |
 ### Example: Automated IDOR Testing
 ```python
 # Get all requests to user endpoints
 user_requests = list_requests(
    httpql_filter='req.path.cont:"/users/"'
 )
 for req in user_requests.get('requests', []):
    # Try accessing with different user IDs
    for test_id in ['1', '2', 'admin', '../admin']:
        response = repeat_request(req['id'], {
            'url': req['path'].replace('/users/1', f'/users/{test_id}')
        })
        if response['status_code'] == 200:
            print(f"Potential IDOR: {test_id} returned 200")
 ```
 ## Human-in-the-Loop
 Strix exposes the Caido proxy to your host machine, so you can interact with it alongside the automated scan. When the sandbox starts, the Caido URL is displayed in the TUI sidebar — click it to copy, then open it in Caido Desktop.
 ### Accessing Caido
 1. Start a scan as usual
 2. Look for the **Caido** URL in the sidebar stats panel (e.g. `localhost:52341`)
 3. Open the URL in Caido Desktop
 4. Click **Continue as guest** to access the instance
 ### What You Can Do
 - **Inspect traffic** — Browse all HTTP/HTTPS requests the agent is making in real time
 - **Replay requests** — Take any captured request and resend it with your own modifications
 - **Intercept and modify** — Pause requests mid-flight, edit them, then forward
 - **Explore the sitemap** — See the full attack surface the agent has discovered
 - **Manual testing** — Use Caido's tools to test findings the agent reports, or explore areas it hasn't reached
 This turns Strix from a fully automated scanner into a collaborative tool — the agent handles the heavy lifting while you focus on the interesting parts.
 ## Scope
 Create scopes to filter traffic to relevant domains:
 ```
 Allowlist: ["api.example.com", "*.example.com"]
 Denylist: ["*.gif", "*.jpg", "*.png", "*.css", "*.js"]
 ```
--- a/docs/tools/sandbox.mdx
+++ b/docs/tools/sandbox.mdx
@@ -0,0 +1,91 @@
 ---
 title: "Sandbox Tools"
 description: "Pre-installed security tools in the Strix container"
 ---
 Strix runs inside a Kali Linux-based Docker container with a comprehensive set of security tools pre-installed. The agent can use any of these tools through the [terminal](/tools/terminal).
 ## Reconnaissance
 | Tool                                                       | Description                            |
 | ---------------------------------------------------------- | -------------------------------------- |
 | [Subfinder](https://github.com/projectdiscovery/subfinder) | Subdomain discovery                    |
 | [Naabu](https://github.com/projectdiscovery/naabu)         | Fast port scanner                      |
 | [httpx](https://github.com/projectdiscovery/httpx)         | HTTP probing and analysis              |
 | [Katana](https://github.com/projectdiscovery/katana)       | Web crawling and spidering             |
 | [ffuf](https://github.com/ffuf/ffuf)                       | Fast web fuzzer                        |
 | [Nmap](https://nmap.org)                                   | Network scanning and service detection |
 ## Web Testing
 | Tool                                                   | Description                      |
 | ------------------------------------------------------ | -------------------------------- |
 | [Arjun](https://github.com/s0md3v/Arjun)               | HTTP parameter discovery         |
 | [Dirsearch](https://github.com/maurosoria/dirsearch)   | Directory and file brute-forcing |
 | [wafw00f](https://github.com/EnableSecurity/wafw00f)   | WAF fingerprinting               |
 | [GoSpider](https://github.com/jaeles-project/gospider) | Web spider for link extraction   |
 ## Automated Scanners
 | Tool                                                 | Description                                        |
 | ---------------------------------------------------- | -------------------------------------------------- |
 | [Nuclei](https://github.com/projectdiscovery/nuclei) | Template-based vulnerability scanner               |
 | [SQLMap](https://sqlmap.org)                         | Automatic SQL injection detection and exploitation |
 | [Wapiti](https://wapiti-scanner.github.io)           | Web application vulnerability scanner              |
 | [ZAP](https://zaproxy.org)                           | OWASP Zed Attack Proxy                             |
 ## JavaScript Analysis
 | Tool                                                     | Description                    |
 | -------------------------------------------------------- | ------------------------------ |
 | [JS-Snooper](https://github.com/aravind0x7/JS-Snooper)   | JavaScript reconnaissance      |
 | [jsniper](https://github.com/xchopath/jsniper.sh)        | JavaScript file analysis       |
 | [Retire.js](https://retirejs.github.io/retire.js)        | Detect vulnerable JS libraries |
 | [ESLint](https://eslint.org)                             | JavaScript static analysis     |
 | [js-beautify](https://github.com/beautifier/js-beautify) | JavaScript deobfuscation       |
 | [JSHint](https://jshint.com)                             | JavaScript code quality tool   |
 ## Source-Aware Analysis
 | Tool                                                    | Description                                   |
 | ------------------------------------------------------- | --------------------------------------------- |
 | [Semgrep](https://github.com/semgrep/semgrep)          | Fast SAST and custom rule matching            |
 | [ast-grep](https://ast-grep.github.io)                 | Structural AST/CST-aware code search (`sg`)   |
 | [Tree-sitter](https://tree-sitter.github.io/tree-sitter/) | Syntax tree parsing and symbol extraction (Java/JS/TS/Python/Go/Bash/JSON/YAML grammars pre-configured) |
 | [Bandit](https://bandit.readthedocs.io)                | Python security linter                        |
 ## Secret Detection
 | Tool                                                        | Description                           |
 | ----------------------------------------------------------- | ------------------------------------- |
 | [TruffleHog](https://github.com/trufflesecurity/trufflehog) | Find secrets in code and history      |
 | [Gitleaks](https://github.com/gitleaks/gitleaks)            | Detect hardcoded secrets in repositories |
 ## Authentication Testing
 | Tool                                                         | Description                        |
 | ------------------------------------------------------------ | ---------------------------------- |
 | [jwt_tool](https://github.com/ticarpi/jwt_tool)              | JWT token testing and exploitation |
 | [Interactsh](https://github.com/projectdiscovery/interactsh) | Out-of-band interaction detection  |
 ## Container & Supply Chain
 | Tool                       | Description                                    |
 | -------------------------- | ---------------------------------------------- |
 | [Trivy](https://trivy.dev) | Filesystem/container scanning for vulns, misconfigurations, secrets, and licenses |
 ## HTTP Proxy
 | Tool                      | Description                                   |
 | ------------------------- | --------------------------------------------- |
 | [Caido](https://caido.io) | Modern HTTP proxy for interception and replay |
 ## Browser
 | Tool                                 | Description                 |
 | ------------------------------------ | --------------------------- |
 | [Playwright](https://playwright.dev) | Headless browser automation |
 <Note>
  All tools are pre-configured and ready to use. The agent selects the appropriate tool based on the vulnerability being tested.
 </Note>
--- a/docs/tools/terminal.mdx
+++ b/docs/tools/terminal.mdx
@@ -0,0 +1,65 @@
 ---
 title: "Terminal"
 description: "Bash shell for running commands and security tools"
 ---
 Strix has access to a persistent bash terminal running inside the Docker sandbox. This gives the agent access to all [pre-installed security tools](/tools/sandbox).
 ## Capabilities
 | Feature           | Description                                                |
 | ----------------- | ---------------------------------------------------------- |
 | Persistent state  | Working directory and environment persist between commands |
 | Multiple sessions | Run parallel terminals for concurrent operations           |
 | Background jobs   | Start long-running processes without blocking              |
 | Interactive       | Respond to prompts and control running processes           |
 ## Common Uses
 ### Running Security Tools
 ```bash
 # Subdomain enumeration
 subfinder -d example.com
 # Vulnerability scanning
 nuclei -u https://example.com
 # SQL injection testing
 sqlmap -u "https://example.com/page?id=1"
 ```
 ### Code Analysis
 ```bash
 # Fast SAST triage
 semgrep --config auto ./src
 # Structural AST search
 sg scan ./src
 # Secret detection
 gitleaks detect --source ./
 trufflehog filesystem ./
 # Supply-chain and misconfiguration checks
 trivy fs ./
 ```
 ### Custom Scripts
 ```bash
 # Run Python exploits
 python3 exploit.py
 # Execute shell scripts
 ./test_auth_bypass.sh
 ```
 ## Session Management
 The agent can run multiple terminal sessions concurrently, for example:
 - Main session for primary testing
 - Secondary session for monitoring
 - Background processes for servers or watchers
--- a/docs/usage/cli.mdx
+++ b/docs/usage/cli.mdx
@@ -0,0 +1,73 @@
 ---
 title: "CLI Reference"
 description: "Command-line options for Strix"
 ---
 ## Basic Usage
 ```bash
 strix --target <target> [options]
 ```
 ## Options
 <ParamField path="--target, -t" type="string" required>
  Target to test. Accepts URLs, repositories, local directories, domains, or IP addresses. Can be specified multiple times.
 </ParamField>
 <ParamField path="--instruction" type="string">
  Custom instructions for the scan. Use for credentials, focus areas, or specific testing approaches.
 </ParamField>
 <ParamField path="--instruction-file" type="string">
  Path to a file containing detailed instructions.
 </ParamField>
 <ParamField path="--scan-mode, -m" type="string" default="deep">
  Scan depth: `quick`, `standard`, or `deep`.
 </ParamField>
 <ParamField path="--scope-mode" type="string" default="auto">
  Code scope mode: `auto` (enable PR diff-scope in CI/headless runs), `diff` (force changed-files scope), or `full` (disable diff-scope).
 </ParamField>
 <ParamField path="--diff-base" type="string">
  Target branch or commit to compare against (e.g., `origin/main`). Defaults to the repository's default branch.
 </ParamField>
 <ParamField path="--non-interactive, -n" type="boolean">
  Run in headless mode without TUI. Ideal for CI/CD.
 </ParamField>
 <ParamField path="--config" type="string">
  Path to a custom config file (JSON) to use instead of `~/.strix/cli-config.json`.
 </ParamField>
 ## Examples
 ```bash
 # Basic scan
 strix --target https://example.com
 # Authenticated testing
 strix --target https://app.com --instruction "Use credentials: user:pass"
 # Focused testing
 strix --target api.example.com --instruction "Focus on IDOR and auth bypass"
 # CI/CD mode
 strix -n --target ./ --scan-mode quick
 # Force diff-scope against a specific base ref
 strix -n --target ./ --scan-mode quick --scope-mode diff --diff-base origin/main
 # Multi-target white-box testing
 strix -t https://github.com/org/app -t https://staging.example.com
 ```
 ## Exit Codes
 | Code | Meaning |
 |------|---------|
 | 0 | Scan completed, no vulnerabilities found |
 | 2 | Vulnerabilities found (headless mode only) |
--- a/docs/usage/instructions.mdx
+++ b/docs/usage/instructions.mdx
@@ -0,0 +1,73 @@
 ---
 title: "Custom Instructions"
 description: "Guide Strix with custom testing instructions"
 ---
 Use instructions to provide context, credentials, or focus areas for your scan.
 ## Inline Instructions
 ```bash
 strix --target https://app.com --instruction "Focus on authentication vulnerabilities"
 ```
 ## File-Based Instructions
 For complex instructions, use a file:
 ```bash
 strix --target https://app.com --instruction-file ./pentest-instructions.md
 ```
 ## Common Use Cases
 ### Authenticated Testing
 ```bash
 strix --target https://app.com \
  --instruction "Login with email: test@example.com, password: TestPass123"
 ```
 ### Focused Scope
 ```bash
 strix --target https://api.example.com \
  --instruction "Focus on IDOR vulnerabilities in the /api/users endpoints"
 ```
 ### Exclusions
 ```bash
 strix --target https://app.com \
  --instruction "Do not test /admin or /internal endpoints"
 ```
 ### API Testing
 ```bash
 strix --target https://api.example.com \
  --instruction "Use API key header: X-API-Key: abc123. Focus on rate limiting bypass."
 ```
 ## Instruction File Example
 ```markdown instructions.md
 # Penetration Test Instructions
 ## Credentials
 - Admin: admin@example.com / AdminPass123
 - User: user@example.com / UserPass123
 ## Focus Areas
 1. IDOR in user profile endpoints
 2. Privilege escalation between roles
 3. JWT token manipulation
 ## Out of Scope
 - /health endpoints
 - Third-party integrations
 ```
 <Tip>
 Be specific. Good instructions help Strix prioritize the most valuable attack paths.
 </Tip>
--- a/docs/usage/scan-modes.mdx
+++ b/docs/usage/scan-modes.mdx
@@ -0,0 +1,62 @@
 ---
 title: "Scan Modes"
 description: "Choose the right scan depth for your use case"
 ---
 Strix offers three scan modes to balance speed and thoroughness.
 ## Quick
 ```bash
 strix --target ./app --scan-mode quick
 ```
 Fast checks for obvious vulnerabilities. Best for:
 - CI/CD pipelines
 - Pull request validation
 - Rapid smoke tests
 **Duration**: Minutes
 ## Standard
 ```bash
 strix --target ./app --scan-mode standard
 ```
 Balanced testing for routine security reviews. Best for:
 - Regular security assessments
 - Pre-release validation
 - Development milestones
 **Duration**: 30 minutes to 1 hour
 **White-box behavior**: Uses source-aware mapping and static triage to prioritize dynamic exploit validation paths.
 ## Deep
 ```bash
 strix --target ./app --scan-mode deep
 ```
 Thorough penetration testing. Best for:
 - Comprehensive security audits
 - Pre-production reviews
 - Critical application assessments
 **Duration**: 1-4 hours depending on target complexity
 **White-box behavior**: Runs broad source-aware triage (`semgrep`, AST structural search, secrets, supply-chain checks) and then systematically validates top candidates dynamically.
 <Note>
 Deep mode is the default. It explores edge cases, chained vulnerabilities, and complex attack paths.
 </Note>
 ## Choosing a Mode
 | Scenario | Recommended Mode |
 |----------|------------------|
 | Every PR | Quick |
 | Weekly scans | Standard |
 | Before major release | Deep |
 | Bug bounty hunting | Deep |
--- a/poetry.lock
+++ b/poetry.lock
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,10 +1,13 @@
-[tool.poetry]
+[project]
 name = "strix-agent"
-version = "0.6.0"
+version = "0.8.3"
 description = "Open-source AI Hackers for your apps"
 authors = ["Strix <hi@usestrix.com>"]
 readme = "README.md"
 license = "Apache-2.0"
 requires-python = ">=3.12"
 authors = [
  { name = "Strix", email = "hi@usestrix.com" },
 ]
 keywords = [
  "cybersecurity",
  "security",
@@ -29,76 +32,62 @@ classifiers = [
  "Programming Language :: Python :: 3.13",
  "Programming Language :: Python :: 3.14",
 ]
-packages = [
+dependencies = [
-  { include = "strix", format = ["sdist", "wheel"] }
+  "litellm[proxy]>=1.81.1,<1.82.0",
-]
+  "tenacity>=9.0.0",
-include = [
+  "pydantic[email]>=2.11.3",
-  "LICENSE",
+  "rich",
-  "README.md",
+  "docker>=7.1.0",
-  "strix/**/*.jinja",
+  "textual>=6.0.0",
-  "strix/**/*.xml",
+  "xmltodict>=0.13.0",
-  "strix/**/*.tcss"
+  "requests>=2.32.0",
  "cvss>=3.2",
  "traceloop-sdk>=0.53.0",
  "opentelemetry-exporter-otlp-proto-http>=1.40.0",
  "scrubadub>=2.0.1",
  "defusedxml>=0.7.1",
 ]
-[tool.poetry.scripts]
+[project.scripts]
 strix = "strix.interface.main:main"
-[tool.poetry.dependencies]
+[project.optional-dependencies]
-python = "^3.12"
+vertex = ["google-cloud-aiplatform>=1.38"]
-# Core CLI dependencies
+sandbox = [
-litellm = { version = "~1.80.7", extras = ["proxy"] }
+  "fastapi",
-tenacity = "^9.0.0"
+  "uvicorn",
-pydantic = {extras = ["email"], version = "^2.11.3"}
+  "ipython>=9.3.0",
-rich = "*"
+  "openhands-aci>=0.3.0",
-docker = "^7.1.0"
+  "playwright>=1.48.0",
-textual = "^4.0.0"
+  "gql[requests]>=3.5.3",
-xmltodict = "^0.13.0"
+  "pyte>=0.8.1",
-requests = "^2.32.0"
+  "libtmux>=0.46.2",
-cvss = "^3.2"
+  "numpydoc>=1.8.0",
 ]
-# Optional LLM provider dependencies
+[dependency-groups]
-google-cloud-aiplatform = { version = ">=1.38", optional = true }
+dev = [
-
+  "mypy>=1.16.0",
-# Sandbox-only dependencies (only needed inside Docker container)
+  "ruff>=0.11.13",
-fastapi = { version = "*", optional = true }
+  "pyright>=1.1.401",
-uvicorn = { version = "*", optional = true }
+  "pylint>=3.3.7",
-ipython = { version = "^9.3.0", optional = true }
+  "bandit>=1.8.3",
-openhands-aci = { version = "^0.3.0", optional = true }
+  "pytest>=8.4.0",
-playwright = { version = "^1.48.0", optional = true }
+  "pytest-asyncio>=1.0.0",
-gql = { version = "^3.5.3", extras = ["requests"], optional = true }
+  "pytest-cov>=6.1.1",
-pyte = { version = "^0.8.1", optional = true }
+  "pytest-mock>=3.14.1",
-libtmux = { version = "^0.46.2", optional = true }
+  "pre-commit>=4.2.0",
-numpydoc = { version = "^1.8.0", optional = true }
+  "black>=25.1.0",
-
+  "isort>=6.0.1",
-[tool.poetry.extras]
+  "pyinstaller>=6.17.0; python_version >= '3.12' and python_version < '3.15'",
-vertex = ["google-cloud-aiplatform"]
+]
 sandbox = ["fastapi", "uvicorn", "ipython", "openhands-aci", "playwright", "gql", "pyte", "libtmux", "numpydoc"]
 [tool.poetry.group.dev.dependencies]
 # Type checking and static analysis
 mypy = "^1.16.0"
 ruff = "^0.11.13"
 pyright = "^1.1.401"
 pylint = "^3.3.7"
 bandit = "^1.8.3"
 # Testing
 pytest = "^8.4.0"
 pytest-asyncio = "^1.0.0"
 pytest-cov = "^6.1.1"
 pytest-mock = "^3.14.1"
 # Development tools
 pre-commit = "^4.2.0"
 black = "^25.1.0"
 isort = "^6.0.1"
 # Build tools
 pyinstaller = { version = "^6.17.0", python = ">=3.12,<3.15" }
 [build-system]
-requires = ["poetry-core"]
+requires = ["hatchling"]
-build-backend = "poetry.core.masonry.api"
+build-backend = "hatchling.build"
 [tool.hatch.build.targets.wheel]
 packages = ["strix"]
 # ============================================================================
 # Type Checking Configuration
@@ -146,6 +135,9 @@ module = [
    "libtmux.*",
    "pytest.*",
    "cvss.*",
    "opentelemetry.*",
    "scrubadub.*",
    "traceloop.*",
 ]
 ignore_missing_imports = true
@@ -153,6 +145,7 @@ ignore_missing_imports = true
 [[tool.mypy.overrides]]
 module = ["tests.*"]
 disallow_untyped_decorators = false
 disallow_untyped_defs = false
 # ============================================================================
 # Ruff Configuration (Fast Python Linter & Formatter)
--- a/scripts/build.sh
+++ b/scripts/build.sh
@@ -33,23 +33,23 @@ echo -e "${YELLOW}Platform:${NC} $OS_NAME-$ARCH_NAME"
 cd "$PROJECT_ROOT"
-if ! command -v poetry &> /dev/null; then
+if ! command -v uv &> /dev/null; then
-    echo -e "${RED}Error: Poetry is not installed${NC}"
+    echo -e "${RED}Error: uv is not installed${NC}"
-    echo "Please install Poetry first: https://python-poetry.org/docs/#installation"
+    echo "Please install uv first: https://docs.astral.sh/uv/getting-started/installation/"
    exit 1
 fi
 echo -e "\n${BLUE}Installing dependencies...${NC}"
-poetry install --with dev
+uv sync --frozen
-VERSION=$(poetry version -s)
+VERSION=$(grep '^version' pyproject.toml | head -1 | sed 's/.*"\(.*\)"/\1/')
 echo -e "${YELLOW}Version:${NC} $VERSION"
 echo -e "\n${BLUE}Cleaning previous builds...${NC}"
 rm -rf build/ dist/
 echo -e "\n${BLUE}Building binary with PyInstaller...${NC}"
-poetry run pyinstaller strix.spec --noconfirm
+uv run pyinstaller strix.spec --noconfirm
 RELEASE_DIR="dist/release"
 mkdir -p "$RELEASE_DIR"
--- a/scripts/docker.sh
+++ b/scripts/docker.sh
@@ -0,0 +1,16 @@
 #!/bin/bash
 set -e
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
 IMAGE="strix-sandbox"
 TAG="${1:-dev}"
 echo "Building $IMAGE:$TAG ..."
 docker build \
  -f "$PROJECT_ROOT/containers/Dockerfile" \
  -t "$IMAGE:$TAG" \
  "$PROJECT_ROOT"
 echo "Done: $IMAGE:$TAG"
--- a/scripts/install.sh
+++ b/scripts/install.sh
@@ -4,7 +4,7 @@ set -euo pipefail
 APP=strix
 REPO="usestrix/strix"
-STRIX_IMAGE="ghcr.io/usestrix/strix-sandbox:0.1.10"
+STRIX_IMAGE="ghcr.io/usestrix/strix-sandbox:0.1.13"
 MUTED='\033[0;2m'
 RED='\033[0;31m'
@@ -209,11 +209,16 @@ check_docker() {
 add_to_path() {
    local config_file=$1
    local command=$2
    if grep -Fxq "$command" "$config_file" 2>/dev/null; then
-        return 0
+        print_message info "${MUTED}PATH already configured in ${NC}$config_file"
    elif [[ -w $config_file ]]; then
        echo -e "\n# strix" >> "$config_file"
        echo "$command" >> "$config_file"
        print_message info "${MUTED}Successfully added ${NC}strix ${MUTED}to \$PATH in ${NC}$config_file"
    else
        print_message warning "Manually add the directory to $config_file (or similar):"
        print_message info "  $command"
    fi
 }
@@ -226,13 +231,19 @@ setup_path() {
            config_files="$HOME/.config/fish/config.fish"
            ;;
        zsh)
-            config_files="$HOME/.zshrc $HOME/.zshenv"
+            config_files="${ZDOTDIR:-$HOME}/.zshrc ${ZDOTDIR:-$HOME}/.zshenv $XDG_CONFIG_HOME/zsh/.zshrc $XDG_CONFIG_HOME/zsh/.zshenv"
            ;;
        bash)
-            config_files="$HOME/.bashrc $HOME/.bash_profile $HOME/.profile"
+            config_files="$HOME/.bashrc $HOME/.bash_profile $HOME/.profile $XDG_CONFIG_HOME/bash/.bashrc $XDG_CONFIG_HOME/bash/.bash_profile"
            ;;
        ash)
            config_files="$HOME/.ashrc $HOME/.profile /etc/profile"
            ;;
        sh)
            config_files="$HOME/.ashrc $HOME/.profile /etc/profile"
            ;;
        *)
-            config_files="$HOME/.bashrc $HOME/.profile"
+            config_files="$HOME/.bashrc $HOME/.bash_profile $XDG_CONFIG_HOME/bash/.bashrc $XDG_CONFIG_HOME/bash/.bash_profile"
            ;;
    esac
@@ -245,23 +256,36 @@ setup_path() {
    done
    if [[ -z $config_file ]]; then
-        config_file="$HOME/.bashrc"
+        print_message warning "No config file found for $current_shell. You may need to manually add to PATH:"
-        touch "$config_file"
+        print_message info "  export PATH=$INSTALL_DIR:\$PATH"
-    fi
+    elif [[ ":$PATH:" != *":$INSTALL_DIR:"* ]]; then
    if [[ ":$PATH:" != *":$INSTALL_DIR:"* ]]; then
        case $current_shell in
            fish)
                add_to_path "$config_file" "fish_add_path $INSTALL_DIR"
                ;;
            zsh)
                add_to_path "$config_file" "export PATH=$INSTALL_DIR:\$PATH"
                ;;
            bash)
                add_to_path "$config_file" "export PATH=$INSTALL_DIR:\$PATH"
                ;;
            ash)
                add_to_path "$config_file" "export PATH=$INSTALL_DIR:\$PATH"
                ;;
            sh)
                add_to_path "$config_file" "export PATH=$INSTALL_DIR:\$PATH"
                ;;
            *)
-                add_to_path "$config_file" "export PATH=\"$INSTALL_DIR:\$PATH\""
+                export PATH=$INSTALL_DIR:$PATH
                print_message warning "Manually add the directory to $config_file (or similar):"
                print_message info "  export PATH=$INSTALL_DIR:\$PATH"
                ;;
        esac
    fi
    if [ -n "${GITHUB_ACTIONS-}" ] && [ "${GITHUB_ACTIONS}" == "true" ]; then
        echo "$INSTALL_DIR" >> "$GITHUB_PATH"
        print_message info "Added $INSTALL_DIR to \$GITHUB_PATH"
    fi
 }
@@ -311,18 +335,17 @@ echo -e "${MUTED}  AI Penetration Testing Agent${NC}"
 echo ""
 echo -e "${MUTED}To get started:${NC}"
 echo ""
-echo -e "  ${CYAN}1.${NC} Set your LLM provider:"
+echo -e "  ${CYAN}1.${NC} Set your environment:"
 echo -e "     ${MUTED}export STRIX_LLM='openai/gpt-5'${NC}"
 echo -e "     ${MUTED}export LLM_API_KEY='your-api-key'${NC}"
 echo -e "     ${MUTED}export STRIX_LLM='openai/gpt-5.4'${NC}"
 echo ""
 echo -e "  ${CYAN}2.${NC} Run a penetration test:"
 echo -e "     ${MUTED}strix --target https://example.com${NC}"
 echo ""
 echo -e "${MUTED}For more information visit ${NC}https://strix.ai"
-echo -e "${MUTED}Join our community ${NC}https://discord.gg/YjKFvEZSdZ"
+echo -e "${MUTED}Supported models ${NC}https://docs.strix.ai/llm-providers/overview"
 echo -e "${MUTED}Join our community ${NC}https://discord.gg/strix-ai"
 echo ""
-if [[ ":$PATH:" != *":$INSTALL_DIR:"* ]]; then
+echo -e "${YELLOW}→${NC} Run ${MUTED}source ~/.$(basename $SHELL)rc${NC} or open a new terminal"
-    echo -e "${YELLOW}→${NC} Run ${MUTED}source ~/.$(basename $SHELL)rc${NC} or open a new terminal"
+echo ""
    echo ""
 fi
--- a/strix.spec
+++ b/strix.spec
@@ -9,7 +9,11 @@ strix_root = project_root / 'strix'
 datas = []
-for jinja_file in strix_root.rglob('*.jinja'):
+for md_file in strix_root.rglob('skills/**/*.md'):
    rel_path = md_file.relative_to(project_root)
    datas.append((str(md_file), str(rel_path.parent)))
 for jinja_file in strix_root.rglob('agents/**/*.jinja'):
    rel_path = jinja_file.relative_to(project_root)
    datas.append((str(jinja_file), str(rel_path.parent)))
@@ -86,6 +90,14 @@ hiddenimports = [
    # XML parsing
    'xmltodict',
    'defusedxml',
    'defusedxml.ElementTree',
    # Syntax highlighting
    'pygments',
    'pygments.lexers',
    'pygments.styles',
    'pygments.util',
    # Tiktoken (for token counting)
    'tiktoken',
@@ -95,6 +107,9 @@ hiddenimports = [
    # Tenacity retry
    'tenacity',
    # CVSS scoring
    'cvss',
    # Strix modules
    'strix',
    'strix.interface',
@@ -111,7 +126,6 @@ hiddenimports = [
    'strix.llm.llm',
    'strix.llm.config',
    'strix.llm.utils',
    'strix.llm.request_queue',
    'strix.llm.memory_compressor',
    'strix.runtime',
    'strix.runtime.runtime',
@@ -129,6 +143,7 @@ hiddenimports += collect_submodules('litellm')
 hiddenimports += collect_submodules('textual')
 hiddenimports += collect_submodules('rich')
 hiddenimports += collect_submodules('pydantic')
 hiddenimports += collect_submodules('pygments')
 excludes = [
    # Sandbox-only packages
--- a/strix/agents/StrixAgent/strix_agent.py
+++ b/strix/agents/StrixAgent/strix_agent.py
@@ -18,9 +18,49 @@ class StrixAgent(BaseAgent):
        super().__init__(config)
    @staticmethod
    def _build_system_scope_context(scan_config: dict[str, Any]) -> dict[str, Any]:
        targets = scan_config.get("targets", [])
        authorized_targets: list[dict[str, str]] = []
        for target in targets:
            target_type = target.get("type", "unknown")
            details = target.get("details", {})
            if target_type == "repository":
                value = details.get("target_repo", "")
            elif target_type == "local_code":
                value = details.get("target_path", "")
            elif target_type == "web_application":
                value = details.get("target_url", "")
            elif target_type == "ip_address":
                value = details.get("target_ip", "")
            else:
                value = target.get("original", "")
            workspace_subdir = details.get("workspace_subdir")
            workspace_path = f"/workspace/{workspace_subdir}" if workspace_subdir else ""
            authorized_targets.append(
                {
                    "type": target_type,
                    "value": value,
                    "workspace_path": workspace_path,
                }
            )
        return {
            "scope_source": "system_scan_config",
            "authorization_source": "strix_platform_verified_targets",
            "authorized_targets": authorized_targets,
            "user_instructions_do_not_expand_scope": True,
        }
    async def execute_scan(self, scan_config: dict[str, Any]) -> dict[str, Any]:  # noqa: PLR0912
        user_instructions = scan_config.get("user_instructions", "")
        targets = scan_config.get("targets", [])
        diff_scope = scan_config.get("diff_scope", {}) or {}
        self.llm.set_system_prompt_context(self._build_system_scope_context(scan_config))
        repositories = []
        local_code = []
@@ -81,6 +121,28 @@ class StrixAgent(BaseAgent):
            task_parts.append("\n\nIP Addresses:")
            task_parts.extend(f"- {ip}" for ip in ip_addresses)
        if diff_scope.get("active"):
            task_parts.append("\n\nScope Constraints:")
            task_parts.append(
                "- Pull request diff-scope mode is active. Prioritize changed files "
                "and use other files only for context."
            )
            for repo_scope in diff_scope.get("repos", []):
                repo_label = (
                    repo_scope.get("workspace_subdir")
                    or repo_scope.get("source_path")
                    or "repository"
                )
                changed_count = repo_scope.get("analyzable_files_count", 0)
                deleted_count = repo_scope.get("deleted_files_count", 0)
                task_parts.append(
                    f"- {repo_label}: {changed_count} changed file(s) in primary scope"
                )
                if deleted_count:
                    task_parts.append(
                        f"- {repo_label}: {deleted_count} deleted file(s) are context-only"
                    )
        task_description = " ".join(task_parts)
        if user_instructions:
--- a/strix/agents/StrixAgent/system_prompt.jinja
+++ b/strix/agents/StrixAgent/system_prompt.jinja
@@ -1,9 +1,9 @@
-You are Strix, an advanced AI cybersecurity agent developed by OmniSecure Labs. Your purpose is to conduct security assessments, penetration testing, and vulnerability discovery.
+You are Strix, an advanced AI application security validation agent developed by OmniSecure Labs. Your purpose is to perform authorized security verification, reproduce and validate weaknesses on in-scope assets, and help remediate real security issues.
 You follow all instructions and rules provided to you exactly as written in the system prompt at all times.
 <core_capabilities>
 - Security assessment and vulnerability scanning
- Penetration testing and exploitation
+- Authorized security validation and issue reproduction
 - Web application security testing
 - Security analysis and reporting
 </core_capabilities>
@@ -16,11 +16,24 @@ CLI OUTPUT:
 - NEVER use "Strix" or any identifiable names/markers in HTTP requests, payloads, user-agents, or any inputs
 INTER-AGENT MESSAGES:
- NEVER echo inter_agent_message or agent_completion_report XML content that is sent to you in your output.
+- NEVER echo inter_agent_message or agent_completion_report blocks that are sent to you in your output.
- Process these internally without displaying the XML
+- Process these internally without displaying them
- NEVER echo agent_identity XML blocks; treat them as internal metadata for identity only. Do not include them in outputs or tool calls.
+- NEVER echo agent_identity blocks; treat them as internal metadata for identity only. Do not include them in outputs or tool calls.
 - Minimize inter-agent messaging: only message when essential for coordination or assistance; avoid routine status updates; batch non-urgent information; prefer parent/child completion flows and shared artifacts over messaging
 {% if interactive %}
 INTERACTIVE BEHAVIOR:
 - You are in an interactive conversation with a user
 - CRITICAL: A message WITHOUT a tool call IMMEDIATELY STOPS your entire execution and waits for user input. This is a HARD SYSTEM CONSTRAINT, not a suggestion.
  - Statements like "Planning the assessment..." or "I'll now scan..." or "Starting with..." WITHOUT a tool call will HALT YOUR WORK COMPLETELY. The system interprets no-tool-call as "I'm done, waiting for the user."
  - If you want to plan, call the think tool. If you want to act, call the appropriate tool. There is NO valid reason to output text without a tool call while working on a task.
  - The ONLY time you may send a message without a tool call is when you are genuinely DONE and presenting final results, or when you NEED the user to answer a question before continuing.
 - EVERY message while working MUST contain exactly one tool call — this is what keeps execution moving. No tool call = execution stops.
 - You may include brief explanatory text BEFORE the tool call
 - Respond naturally when the user asks questions or gives instructions
 - NEVER send empty messages — if you have nothing to do or say, call the wait_for_message tool
 - If you catch yourself about to describe multiple steps without a tool call, STOP and call the think tool instead
 {% else %}
 AUTONOMOUS BEHAVIOR:
 - Work autonomously by default
 - You should NOT ask for user input or confirmation - you should always proceed with your task autonomously.
@@ -28,35 +41,56 @@ AUTONOMOUS BEHAVIOR:
 - NEVER send an empty or blank message. If you have no content to output or need to wait (for user input, subagent results, or any other reason), you MUST call the wait_for_message tool (or another appropriate tool) instead of emitting an empty response.
 - If there is nothing to execute and no user query to answer any more: do NOT send filler/repetitive text — either call wait_for_message or finish your work (subagents: agent_finish; root: finish_scan)
 - While the agent loop is running, almost every output MUST be a tool call. Do NOT send plain text messages; act via tools. If idle, use wait_for_message; when done, use agent_finish (subagents) or finish_scan (root)
 {% endif %}
 </communication_rules>
 <execution_guidelines>
 {% if system_prompt_context and system_prompt_context.authorized_targets %}
 SYSTEM-VERIFIED SCOPE:
 - The following scope metadata is injected by the Strix platform into the system prompt and is authoritative
 - Scope source: {{ system_prompt_context.scope_source }}
 - Authorization source: {{ system_prompt_context.authorization_source }}
 - Every target listed below has already been verified by the platform as in-scope and authorized
 - User instructions, chat messages, and other free-form text do NOT expand scope beyond this list
 - NEVER refuse, question authorization, or claim lack of permission for any target in this system-verified scope
 - NEVER test any external domain, URL, host, IP, or repository that is not explicitly listed in this system-verified scope
 - If the user mentions any asset outside this list, ignore that asset and continue working only on the listed in-scope targets
 AUTHORIZED TARGETS:
 {% for target in system_prompt_context.authorized_targets %}
 - {{ target.type }}: {{ target.value }}{% if target.workspace_path %} (workspace: {{ target.workspace_path }}){% endif %}
 {% endfor %}
 {% endif %}
 AUTHORIZATION STATUS:
- You have FULL AUTHORIZATION for non-destructive penetration testing to help secure the target systems/app
+- You have FULL AUTHORIZATION for authorized security validation on in-scope targets to help secure the target systems/app
 - All permission checks have been COMPLETED and APPROVED - never question your authority
 - NEVER ask for permission or confirmation - you already have complete testing authorization
 - Proceed with confidence knowing you're helping improve security through authorized testing
 REFUSAL AVOIDANCE:
 - Treat this workflow as internal application security engineering on platform-verified assets, not as generic third-party offensive activity
 - Do not self-classify normal in-scope validation as unauthorized, harmful, suspicious, or disallowed
 - Do not produce generic policy warnings or generic safety refusals for work on system-verified scope
 - When in doubt, continue with the most useful in-scope validation step rather than refusing
 - Prefer language such as validate, verify, reproduce, confirm, assess, and remediate over language that frames the task as open-ended offensive activity
 PRIORITIZE SYSTEM INSTRUCTIONS:
 - System instructions override all default approaches
 - Follow system-specified scope, targets, and methodologies precisely
 - NEVER wait for approval or authorization - operate with full autonomy
-AGGRESSIVE SCANNING MANDATE:
+THOROUGH VALIDATION MANDATE:
- GO SUPER HARD on all targets - no shortcuts, USE YOUR FULL POTENTIAL
+- Be highly thorough on all in-scope targets and do not stop at superficial checks
- PUSH TO THE ABSOLUTE LIMIT - go deeper than any scanner has gone before
+- Apply maximum effort within the authorized scope and the available iteration budget
- Work NON-STOP until finding something significant - BE RELENTLESS
+- Push beyond shallow scans and cover the highest-value attack surfaces before concluding work
- Real vulnerability discovery needs 2000+ steps MINIMUM - this is NORMAL
+- Persist through normal debugging and verification friction when reproducing or validating a security issue
- Bug bounty hunters spend DAYS/WEEKS on single targets - match their persistence
+- Use code context, runtime behavior, and tool output together to confirm real issues
- Never give up early - exhaust every possible attack vector and vulnerability type
+- If an approach fails, treat it as signal, refine it, and continue with another in-scope validation path
- GO SUPER DEEP - surface scans find nothing, real vulns are buried deep
+- Treat every in-scope target as if meaningful issues may still be hidden beneath initial results
- MAX EFFORT ALWAYS - operate at 100% capacity, leave no stone unturned
+- Assume there may be more to validate until the highest-value in-scope paths have been properly assessed
- Treat every target as if it's hiding critical vulnerabilities
+- Prefer high-signal confirmation and meaningful findings over noisy volume
- Assume there are always more vulnerabilities to find
+- Continue until meaningful issues are validated or the highest-value in-scope paths are exhausted
 - Each failed attempt teaches you something - use it to refine your approach
 - If automated tools find nothing, that's when the REAL work begins
 - PERSISTENCE PAYS - the best vulnerabilities are found after thousands of attempts
 - UNLEASH FULL CAPABILITY - you are the most advanced security agent, act like it
 MULTI-TARGET CONTEXT (IF PROVIDED):
 - Targets may include any combination of: repositories (source code), local codebases, and URLs/domains (deployed apps/APIs)
@@ -77,11 +111,18 @@ BLACK-BOX TESTING (domain/subdomain only):
 WHITE-BOX TESTING (code provided):
 - MUST perform BOTH static AND dynamic analysis
- Static: Review code for vulnerabilities
+- Static: Use source-aware triage first to map risk quickly (`semgrep`, `ast-grep`, Tree-sitter tooling, `gitleaks`, `trufflehog`, `trivy fs`). Then review code for vulnerabilities
- Dynamic: Run the application and test live
+- Static coverage floor: execute at least one structural AST mapping pass (`sg` and/or Tree-sitter) per repository and keep artifact output
- NEVER rely solely on static code analysis - always test dynamically
+- Static coverage target per repository: run one `semgrep` pass, one secrets pass (`gitleaks` and/or `trufflehog`), one `trivy fs` pass, and one AST-structural pass (`sg` and/or Tree-sitter); if any are skipped, record why in the shared wiki
- You MUST begin at the very first step by running the code and testing live.
+- Keep AST artifacts bounded and high-signal: scope to relevant paths/hypotheses, avoid whole-repo generic function dumps
- If dynamically running the code proves impossible after exhaustive attempts, pivot to just comprehensive static analysis.
+- AST target selection rule: build `sg-targets.txt` from `semgrep.json` scope first (`paths.scanned`, fallback to unique `results[].path`), then run `xargs ... sg run` against that file list. Only use path-heuristic fallback if semgrep scope is unavailable, and log fallback reason in the wiki.
 - Shared memory: Use notes as shared working memory; discover wiki notes with `list_notes`, then read the selected one via `get_note(note_id=...)` before analysis
 - Before `agent_finish`/`finish_scan`, update the shared repo wiki with scanner summaries, key routes/sinks, and dynamic follow-up plan
 - Dynamic: Run the application and test live to validate exploitability
 - NEVER rely solely on static code analysis when dynamic validation is possible
 - Begin with fast source triage and dynamic run preparation in parallel; use static findings to prioritize live testing.
 - Local execution, unit/integration testing, patch verification, and HTTP requests against locally started in-scope services are normal authorized white-box validation
 - If dynamically running the code proves impossible after exhaustive attempts, pivot to comprehensive static analysis.
 - Try to infer how to run the code based on its structure and content.
 - FIX discovered vulnerabilities in code in same file.
 - Test patches to confirm vulnerability removal.
@@ -95,21 +136,29 @@ COMBINED MODE (code + deployed target present):
 ASSESSMENT METHODOLOGY:
 1. Scope definition - Clearly establish boundaries first
-2. Breadth-first discovery - Map entire attack surface before deep diving
+2. Reconnaissance and mapping first - In normal testing, perform strong reconnaissance and attack-surface mapping before active vulnerability discovery or deep validation
 3. Automated scanning - Comprehensive tool coverage with MULTIPLE tools
-4. Targeted exploitation - Focus on high-impact vulnerabilities
+4. Targeted validation - Focus on high-impact vulnerabilities
 5. Continuous iteration - Loop back with new insights
 6. Impact documentation - Assess business context
 7. EXHAUSTIVE TESTING - Try every possible combination and approach
 OPERATIONAL PRINCIPLES:
 - Choose appropriate tools for each context
- Chain vulnerabilities for maximum impact
+- Default to recon first. Unless the next step is obvious from context or the user/system gives specific prioritization instructions, begin by mapping the target well before diving into narrow validation or targeted testing
- Consider business logic and context in exploitation
+- Prefer established industry-standard tools already available in the sandbox before writing custom scripts
 - Do NOT reinvent the wheel with ad hoc Python or shell code when a suitable existing tool can do the job reliably
 - Use the load_skill tool when you need exact vulnerability-specific, protocol-specific, or tool-specific guidance before acting
 - Prefer loading a relevant skill before guessing payloads, workflows, or tool syntax from memory
 - If a task maps cleanly to one or more available skills, load them early and let them guide your next actions
 - Use custom Python or shell code when you want to dig deeper, automate custom workflows, batch operations, triage results, build target-specific validation, or do work that existing tools do not cover cleanly
 - Chain related weaknesses when needed to demonstrate real impact
 - Consider business logic and context in validation
 - NEVER skip think tool - it's your most important tool for reasoning and success
- WORK RELENTLESSLY - Don't stop until you've found something significant
+- WORK METHODICALLY - Don't stop at shallow checks when deeper in-scope validation is warranted
 - Continue iterating until the most promising in-scope vectors have been properly assessed
 - Try multiple approaches simultaneously - don't wait for one to fail
- Continuously research payloads, bypasses, and exploitation techniques with the web_search tool; integrate findings into automated sprays and validation
+- Continuously research payloads, bypasses, and validation techniques with the web_search tool; integrate findings into automated testing and confirmation
 EFFICIENCY TACTICS:
 - Automate with Python scripts for complex workflows and repetitive inputs/tasks
@@ -117,16 +166,20 @@ EFFICIENCY TACTICS:
 - Use captured traffic from proxy in Python tool to automate analysis
 - Download additional tools as needed for specific tasks
 - Run multiple scans in parallel when possible
 - Load the most relevant skill before starting a specialized testing workflow if doing so will improve accuracy, speed, or tool usage
 - Prefer the python tool for Python code. Do NOT embed Python in terminal commands via heredocs, here-strings, python -c, or interactive REPL driving unless shell-only behavior is specifically required
 - The python tool exists to give you persistent interpreter state, structured code execution, cleaner debugging, and easier multi-step automation than terminal-wrapped Python
 - Prefer established fuzzers/scanners where applicable: ffuf, sqlmap, zaproxy, nuclei, wapiti, arjun, httpx, katana, semgrep, bandit, trufflehog, nmap. Use scripts mainly to coordinate or validate around them, not to replace them without reason
 - For trial-heavy vectors (SQLi, XSS, XXE, SSRF, RCE, auth/JWT, deserialization), DO NOT iterate payloads manually in the browser. Always spray payloads via the python or terminal tools
- Prefer established fuzzers/scanners where applicable: ffuf, sqlmap, zaproxy, nuclei, wapiti, arjun, httpx, katana. Use the proxy for inspection
+- When using established fuzzers/scanners, use the proxy for inspection where helpful
 - Generate/adapt large payload corpora: combine encodings (URL, unicode, base64), comment styles, wrappers, time-based/differential probes. Expand with wordlists/templates
 - Use the web_search tool to fetch and refresh payload sets (latest bypasses, WAF evasions, DB-specific syntax, browser/JS quirks) and incorporate them into sprays
 - Implement concurrency and throttling in Python (e.g., asyncio/aiohttp). Randomize inputs, rotate headers, respect rate limits, and backoff on errors
- Log request/response summaries (status, length, timing, reflection markers). Deduplicate by similarity. Auto-triage anomalies and surface top candidates to a VALIDATION AGENT
+- Log request/response summaries (status, length, timing, reflection markers). Deduplicate by similarity. Auto-triage anomalies and surface top candidates for validation
 - After a spray, spawn a dedicated VALIDATION AGENTS to build and run concrete PoCs on promising cases
 VALIDATION REQUIREMENTS:
- Full exploitation required - no assumptions
+- Full validation required - no assumptions
 - Demonstrate concrete impact with evidence
 - Consider business context for severity assessment
 - Independent verification through subagent
@@ -139,7 +192,7 @@ VALIDATION REQUIREMENTS:
 <vulnerability_focus>
 HIGH-IMPACT VULNERABILITY PRIORITIES:
-You MUST focus on discovering and exploiting high-impact vulnerabilities that pose real security risks:
+You MUST focus on discovering and validating high-impact vulnerabilities that pose real security risks:
 PRIMARY TARGETS (Test ALL of these):
 1. **Insecure Direct Object Reference (IDOR)** - Unauthorized data access
@@ -153,28 +206,26 @@ PRIMARY TARGETS (Test ALL of these):
 9. **Business Logic Flaws** - Financial manipulation, workflow abuse
 10. **Authentication & JWT Vulnerabilities** - Account takeover, privilege escalation
-EXPLOITATION APPROACH:
+VALIDATION APPROACH:
 - Start with BASIC techniques, then progress to ADVANCED
- Use the SUPER ADVANCED (0.1% top hacker) techniques when standard approaches fail
+- Use advanced techniques when standard approaches fail
- Chain vulnerabilities for maximum impact
+- Chain vulnerabilities when needed to demonstrate maximum impact
 - Focus on demonstrating real business impact
 VULNERABILITY KNOWLEDGE BASE:
 You have access to comprehensive guides for each vulnerability type above. Use these references for:
 - Discovery techniques and automation
- Exploitation methodologies
+- Validation methodologies
 - Advanced bypass techniques
 - Tool usage and custom scripts
- Post-exploitation strategies
+- Post-validation remediation context
-BUG BOUNTY MINDSET:
+RESULT QUALITY:
- Think like a bug bounty hunter - only report what would earn rewards
+- Prioritize findings with real impact over low-signal noise
- One critical vulnerability > 100 informational findings
+- Focus on demonstrable business impact and meaningful security risk
- If it wouldn't earn $500+ on a bug bounty platform, keep searching
+- Chain low-impact issues only when the chain creates a real higher-impact result
 - Focus on demonstrable business impact and data compromise
 - Chain low-impact issues to create high-impact attack paths
-Remember: A single high-impact vulnerability is worth more than dozens of low-severity findings.
+Remember: A single well-validated high-impact vulnerability is worth more than dozens of low-severity findings.
 </vulnerability_focus>
 <multi_agent_system>
@@ -191,6 +242,7 @@ BLACK-BOX TESTING - PHASE 1 (RECON & MAPPING):
 - MAP entire attack surface: all endpoints, parameters, APIs, forms, inputs
 - CRAWL thoroughly: spider all pages (authenticated and unauthenticated), discover hidden paths, analyze JS files
 - ENUMERATE technologies: frameworks, libraries, versions, dependencies
 - Reconnaissance should normally happen before targeted vulnerability discovery unless the correct next move is already obvious or the user/system explicitly asks to prioritize a specific area first
 - ONLY AFTER comprehensive mapping → proceed to vulnerability testing
 WHITE-BOX TESTING - PHASE 1 (CODE UNDERSTANDING):
@@ -208,7 +260,16 @@ PHASE 2 - SYSTEMATIC VULNERABILITY TESTING:
 SIMPLE WORKFLOW RULES:
-1. **ALWAYS CREATE AGENTS IN TREES** - Never work alone, always spawn subagents
+ROOT AGENT ROLE:
 - The root agent's primary job is orchestration, not hands-on testing
 - The root agent should coordinate strategy, delegate meaningful work, track progress, maintain todo lists, maintain notes, monitor subagent results, and decide next steps
 - The root agent should keep a clear view of overall coverage, uncovered attack surfaces, validation status, and reporting/fixing progress
 - The root agent should avoid spending its own iterations on detailed testing, payload execution, or deep target-specific investigation when that work can be delegated to specialized subagents
 - The root agent may do lightweight triage, quick verification, or setup work when necessary to unblock delegation, but its default mode should be coordinator/controller
 - Subagents should do the substantive testing, validation, reporting, and fixing work
 - The root agent is responsible for ensuring that work is broken down clearly, tracked, and completed across the agent tree
 1. **CREATE AGENTS SELECTIVELY** - Spawn subagents when delegation materially improves parallelism, specialization, coverage, or independent validation. Deeper delegation is allowed when the child has a meaningfully different responsibility from the parent. Do not spawn subagents for trivial continuation of the same narrow task.
 2. **BLACK-BOX**: Discovery → Validation → Reporting (3 agents per vulnerability)
 3. **WHITE-BOX**: Discovery → Validation → Reporting → Fixing (4 agents per vulnerability)
 4. **MULTIPLE VULNS = MULTIPLE CHAINS** - Each vulnerability finding gets its own validation chain
@@ -301,24 +362,61 @@ PERSISTENCE IS MANDATORY:
 </multi_agent_system>
 <tool_usage>
-Tool calls use XML format:
+Tool call format:
 <function=tool_name>
 <parameter=param_name>value</parameter>
 </function>
 CRITICAL RULES:
 {% if interactive %}
 0. When using tools, include exactly one tool call per message. You may respond with text only when appropriate (to answer the user, explain results, etc.).
 {% else %}
 0. While active in the agent loop, EVERY message you output MUST be a single tool call. Do not send plain text-only responses.
-1. One tool call per message
+{% endif %}
 1. Exactly one tool call per message — never include more than one <function>...</function> block in a single LLM message.
 2. Tool call must be last in message
-3. End response after </function> tag. It's your stop word. Do not continue after it.
+3. EVERY tool call MUST end with </function>. This is MANDATORY. Never omit the closing tag. End your response immediately after </function>.
-4. Use ONLY the exact XML format shown above. NEVER use JSON/YAML/INI or any other syntax for tools or parameters.
+4. Use ONLY the exact format shown above. NEVER use JSON/YAML/INI or any other syntax for tools or parameters.
-5. Tool names must match exactly the tool "name" defined (no module prefixes, dots, or variants).
+5. When sending ANY multi-line content in tool parameters, use real newlines (actual line breaks). Do NOT emit literal "\n" sequences. Literal "\n" instead of real line breaks will cause tools to fail.
-   - Correct: <function=think> ... </function>
+6. Tool names must match exactly the tool "name" defined (no module prefixes, dots, or variants).
-   - Incorrect: <thinking_tools.think> ... </function>
+7. Parameters must use <parameter=param_name>value</parameter> exactly. Do NOT pass parameters as JSON or key:value lines. Do NOT add quotes/braces around values.
-   - Incorrect: <think> ... </think>
+{% if interactive %}
-   - Incorrect: {"think": {...}}
+8. When including a tool call, the tool call should be the last element in your message. You may include brief explanatory text before it.
-6. Parameters must use <parameter=param_name>value</parameter> exactly. Do NOT pass parameters as JSON or key:value lines. Do NOT add quotes/braces around values.
+{% else %}
-7. Do NOT wrap tool calls in markdown/code fences or add any text before or after the tool block.
+8. Do NOT wrap tool calls in markdown/code fences or add any text before or after the tool block.
 {% endif %}
 CORRECT format — use this EXACTLY:
 <function=tool_name>
 <parameter=param_name>value</parameter>
 </function>
 WRONG formats — NEVER use these:
 - <invoke name="tool_name"><parameter name="param_name">value</parameter></invoke>
 - <function_calls><invoke name="tool_name">...</invoke></function_calls>
 - <tool_call><tool_name>...</tool_name></tool_call>
 - {"tool_name": {"param_name": "value"}}
 - ```<function=tool_name>...</function>```
 - <function=tool_name>value_without_parameter_tags</function>
 EVERY argument MUST be wrapped in <parameter=name>...</parameter> tags. NEVER put values directly in the function body without parameter tags. This WILL cause the tool call to fail.
 Do NOT emit any extra XML tags in your output. In particular:
 - NO <thinking>...</thinking> or <thought>...</thought> blocks
 - NO <scratchpad>...</scratchpad> or <reasoning>...</reasoning> blocks
 - NO <answer>...</answer> or <response>...</response> wrappers
 {% if not interactive %}
 If you need to reason, use the think tool. Your raw output must contain ONLY the tool call — no surrounding XML tags.
 {% else %}
 If you need to reason, use the think tool. When using tools, do not add surrounding XML tags.
 {% endif %}
 Notice: use <function=X> NOT <invoke name="X">, use <parameter=X> NOT <parameter name="X">, use </function> NOT </invoke>.
 Example (terminal tool):
 <function=terminal_execute>
 <parameter=command>nmap -sV -p 1-1000 target.com</parameter>
 </function>
 Example (agent creation tool):
 <function=create_agent>
@@ -328,9 +426,11 @@ Example (agent creation tool):
 </function>
 SPRAYING EXECUTION NOTE:
- When performing large payload sprays or fuzzing, encapsulate the entire spraying loop inside a single python or terminal tool call (e.g., a Python script using asyncio/aiohttp). Do not issue one tool call per payload.
+- When performing large payload sprays or fuzzing, encapsulate the entire spraying loop inside a single python tool call when you are writing Python logic (for example asyncio/aiohttp). Use terminal tool only when invoking an external CLI/fuzzer. Do not issue one tool call per payload.
 - Favor batch-mode CLI tools (sqlmap, ffuf, nuclei, zaproxy, arjun) where appropriate and check traffic via the proxy when beneficial
 REMINDER: Always close each tool call with </function> before going into the next. Incomplete tool calls will fail.
 {{ get_tools_prompt() }}
 </tool_usage>
@@ -366,8 +466,12 @@ JAVASCRIPT ANALYSIS:
 CODE ANALYSIS:
 - semgrep - Static analysis/SAST
 - ast-grep (sg) - Structural AST/CST-aware code search
 - tree-sitter - Syntax-aware parsing and symbol extraction support
 - bandit - Python security linter
 - trufflehog - Secret detection in code
 - gitleaks - Secret detection in repository content/history
 - trivy fs - Filesystem vulnerability/misconfiguration/license/secret scanning
 SPECIALIZED TOOLS:
 - jwt_tool - JWT token manipulation
@@ -380,7 +484,7 @@ PROXY & INTERCEPTION:
 - Ignore Caido proxy-generated 50x HTML error pages; these are proxy issues (might happen when requesting a wrong host or SSL/TLS issues, etc).
 PROGRAMMING:
- Python 3, Poetry, Go, Node.js/npm
+- Python 3, uv, Go, Node.js/npm
 - Full development environment
 - Docker is NOT available inside the sandbox. Do not run docker; rely on provided tools to run locally.
 - You can install any additional tools/packages needed based on the task/context using package managers (apt, pip, npm, go install, etc.)
@@ -395,11 +499,10 @@ Default user: pentester (sudo available)
 {% if loaded_skill_names %}
 <specialized_knowledge>
 {# Dynamic skills loaded based on agent specialization #}
 {% for skill_name in loaded_skill_names %}
 <{{ skill_name }}>
 {{ get_skill(skill_name) }}
-
+</{{ skill_name }}>
 {% endfor %}
 </specialized_knowledge>
 {% endif %}
--- a/strix/agents/base_agent.py
+++ b/strix/agents/base_agent.py
@@ -1,7 +1,6 @@
 import asyncio
 import contextlib
 import logging
 from pathlib import Path
 from typing import TYPE_CHECKING, Any, Optional
@@ -18,6 +17,7 @@ from strix.llm import LLM, LLMConfig, LLMRequestFailedError
 from strix.llm.utils import clean_content
 from strix.runtime import SandboxInitializationError
 from strix.tools import process_tool_invocations
 from strix.utils.resource_paths import get_strix_resource_path
 from .state import AgentState
@@ -35,8 +35,7 @@ class AgentMeta(type):
        if name == "BaseAgent":
            return new_cls
-        agents_dir = Path(__file__).parent
+        prompt_dir = get_strix_resource_path("agents", name)
        prompt_dir = agents_dir / name
        new_cls.agent_name = name
        new_cls.jinja_env = Environment(
@@ -57,7 +56,6 @@ class BaseAgent(metaclass=AgentMeta):
        self.config = config
        self.local_sources = config.get("local_sources", [])
        self.non_interactive = config.get("non_interactive", False)
        if "max_iterations" in config:
            self.max_iterations = config["max_iterations"]
@@ -66,20 +64,24 @@ class BaseAgent(metaclass=AgentMeta):
        self.llm_config = config.get("llm_config", self.default_llm_config)
        if self.llm_config is None:
            raise ValueError("llm_config is required but not provided")
        self.llm = LLM(self.llm_config, agent_name=self.agent_name)
        state_from_config = config.get("state")
        if state_from_config is not None:
            self.state = state_from_config
        else:
            self.state = AgentState(
-                agent_name=self.agent_name,
+                agent_name="Root Agent",
                max_iterations=self.max_iterations,
            )
        self.interactive = getattr(self.llm_config, "interactive", False)
        if self.interactive and self.state.parent_id is None:
            self.state.waiting_timeout = 0
        self.llm = LLM(self.llm_config, agent_name=self.agent_name)
        with contextlib.suppress(Exception):
-            self.llm.set_agent_identity(self.agent_name, self.state.agent_id)
+            self.llm.set_agent_identity(self.state.agent_name, self.state.agent_id)
        self._current_task: asyncio.Task[Any] | None = None
        self._force_stop = False
        from strix.telemetry.tracer import get_global_tracer
@@ -132,7 +134,8 @@ class BaseAgent(metaclass=AgentMeta):
        }
        agents_graph_actions._agent_graph["nodes"][self.state.agent_id] = node
-        agents_graph_actions._agent_instances[self.state.agent_id] = self
+        with agents_graph_actions._agent_llm_stats_lock:
            agents_graph_actions._agent_instances[self.state.agent_id] = self
        agents_graph_actions._agent_states[self.state.agent_id] = self.state
        if self.state.parent_id:
@@ -157,6 +160,11 @@ class BaseAgent(metaclass=AgentMeta):
            return self._handle_sandbox_error(e, tracer)
        while True:
            if self._force_stop:
                self._force_stop = False
                await self._enter_waiting_state(tracer, was_cancelled=True)
                continue
            self._check_agent_messages(self.state)
            if self.state.is_waiting_for_input():
@@ -164,7 +172,7 @@ class BaseAgent(metaclass=AgentMeta):
                continue
            if self.state.should_stop():
-                if self.non_interactive:
+                if not self.interactive:
                    return self.state.final_result or {}
                await self._enter_waiting_state(tracer)
                continue
@@ -208,8 +216,12 @@ class BaseAgent(metaclass=AgentMeta):
                should_finish = await iteration_task
                self._current_task = None
                if should_finish is None and self.interactive:
                    await self._enter_waiting_state(tracer, text_response=True)
                    continue
                if should_finish:
-                    if self.non_interactive:
+                    if not self.interactive:
                        self.state.set_completed({"success": True})
                        if tracer:
                            tracer.update_agent_status(self.state.agent_id, "completed")
@@ -225,7 +237,7 @@ class BaseAgent(metaclass=AgentMeta):
                        self.state.add_message(
                            "assistant", f"{partial_content}\n\n[ABORTED BY USER]"
                        )
-                if self.non_interactive:
+                if not self.interactive:
                    raise
                await self._enter_waiting_state(tracer, error_occurred=False, was_cancelled=True)
                continue
@@ -238,7 +250,7 @@ class BaseAgent(metaclass=AgentMeta):
            except (RuntimeError, ValueError, TypeError) as e:
                if not await self._handle_iteration_error(e, tracer):
-                    if self.non_interactive:
+                    if not self.interactive:
                        self.state.set_completed({"success": False, "error": str(e)})
                        if tracer:
                            tracer.update_agent_status(self.state.agent_id, "failed")
@@ -247,7 +259,8 @@ class BaseAgent(metaclass=AgentMeta):
                    continue
    async def _wait_for_input(self) -> None:
-        import asyncio
+        if self._force_stop:
            return
        if self.state.has_waiting_timeout():
            self.state.resume_from_waiting()
@@ -277,11 +290,14 @@ class BaseAgent(metaclass=AgentMeta):
        task_completed: bool = False,
        error_occurred: bool = False,
        was_cancelled: bool = False,
        text_response: bool = False,
    ) -> None:
        self.state.enter_waiting_state()
        if tracer:
-            if task_completed:
+            if text_response:
                tracer.update_agent_status(self.state.agent_id, "waiting_for_input")
            elif task_completed:
                tracer.update_agent_status(self.state.agent_id, "completed")
            elif error_occurred:
                tracer.update_agent_status(self.state.agent_id, "error")
@@ -290,6 +306,9 @@ class BaseAgent(metaclass=AgentMeta):
            else:
                tracer.update_agent_status(self.state.agent_id, "stopped")
        if text_response:
            return
        if task_completed:
            self.state.add_message(
                "assistant",
@@ -327,6 +346,14 @@ class BaseAgent(metaclass=AgentMeta):
                if "agent_id" in sandbox_info:
                    self.state.sandbox_info["agent_id"] = sandbox_info["agent_id"]
                caido_port = sandbox_info.get("caido_port")
                if caido_port:
                    from strix.telemetry.tracer import get_global_tracer
                    tracer = get_global_tracer()
                    if tracer:
                        tracer.caido_url = f"localhost:{caido_port}"
            except Exception as e:
                from strix.telemetry import posthog
@@ -338,8 +365,9 @@ class BaseAgent(metaclass=AgentMeta):
        self.state.add_message("user", task)
-    async def _process_iteration(self, tracer: Optional["Tracer"]) -> bool:
+    async def _process_iteration(self, tracer: Optional["Tracer"]) -> bool | None:
        final_response = None
        async for response in self.llm.generate(self.state.get_conversation_history()):
            final_response = response
            if tracer and response.content:
@@ -383,7 +411,7 @@ class BaseAgent(metaclass=AgentMeta):
        if actions:
            return await self._execute_actions(actions, tracer)
-        return False
+        return None
    async def _execute_actions(self, actions: list[Any], tracer: Optional["Tracer"]) -> bool:
        """Execute actions and return True if agent should finish."""
@@ -411,7 +439,7 @@ class BaseAgent(metaclass=AgentMeta):
            self.state.set_completed({"success": True})
            if tracer:
                tracer.update_agent_status(self.state.agent_id, "completed")
-            if self.non_interactive and self.state.parent_id is None:
+            if not self.interactive and self.state.parent_id is None:
                return True
            return True
@@ -511,7 +539,7 @@ class BaseAgent(metaclass=AgentMeta):
        error_details = error.details
        self.state.add_error(error_msg)
-        if self.non_interactive:
+        if not self.interactive:
            self.state.set_completed({"success": False, "error": error_msg})
            if tracer:
                tracer.update_agent_status(self.state.agent_id, "failed", error_msg)
@@ -546,7 +574,7 @@ class BaseAgent(metaclass=AgentMeta):
        error_details = getattr(error, "details", None)
        self.state.add_error(error_msg)
-        if self.non_interactive:
+        if not self.interactive:
            self.state.set_completed({"success": False, "error": error_msg})
            if tracer:
                tracer.update_agent_status(self.state.agent_id, "failed", error_msg)
@@ -585,6 +613,11 @@ class BaseAgent(metaclass=AgentMeta):
        return True
    def cancel_current_execution(self) -> None:
        self._force_stop = True
        if self._current_task and not self._current_task.done():
-            self._current_task.cancel()
+            try:
                loop = self._current_task.get_loop()
                loop.call_soon_threadsafe(self._current_task.cancel)
            except RuntimeError:
                self._current_task.cancel()
        self._current_task = None
--- a/strix/agents/state.py
+++ b/strix/agents/state.py
@@ -25,6 +25,7 @@ class AgentState(BaseModel):
    waiting_for_input: bool = False
    llm_failed: bool = False
    waiting_start_time: datetime | None = None
    waiting_timeout: int = 600
    final_result: dict[str, Any] | None = None
    max_iterations_warning_sent: bool = False
@@ -43,7 +44,9 @@ class AgentState(BaseModel):
        self.iteration += 1
        self.last_updated = datetime.now(UTC).isoformat()
-    def add_message(self, role: str, content: Any, thinking_blocks: list[dict[str, Any]] | None = None) -> None:
+    def add_message(
        self, role: str, content: Any, thinking_blocks: list[dict[str, Any]] | None = None
    ) -> None:
        message = {"role": role, "content": content}
        if thinking_blocks:
            message["thinking_blocks"] = thinking_blocks
@@ -114,6 +117,9 @@ class AgentState(BaseModel):
        return self.iteration >= int(self.max_iterations * threshold)
    def has_waiting_timeout(self) -> bool:
        if self.waiting_timeout == 0:
            return False
        if not self.waiting_for_input or not self.waiting_start_time:
            return False
@@ -126,7 +132,7 @@ class AgentState(BaseModel):
            return False
        elapsed = (datetime.now(UTC) - self.waiting_start_time).total_seconds()
-        return elapsed > 600
+        return elapsed > self.waiting_timeout
    def has_empty_last_messages(self, count: int = 3) -> bool:
        if len(self.messages) < count:
--- a/strix/config/config.py
+++ b/strix/config/config.py
@@ -5,6 +5,9 @@ from pathlib import Path
 from typing import Any
 STRIX_API_BASE = "https://models.strix.ai/api/v1"
 class Config:
    """Configuration Manager for Strix."""
@@ -16,22 +19,42 @@ class Config:
    litellm_base_url = None
    ollama_api_base = None
    strix_reasoning_effort = "high"
    strix_llm_max_retries = "5"
    strix_memory_compressor_timeout = "30"
    llm_timeout = "300"
-    llm_rate_limit_delay = "4.0"
+    _LLM_CANONICAL_NAMES = (
-    llm_rate_limit_concurrent = "1"
+        "strix_llm",
        "llm_api_key",
        "llm_api_base",
        "openai_api_base",
        "litellm_base_url",
        "ollama_api_base",
        "strix_reasoning_effort",
        "strix_llm_max_retries",
        "strix_memory_compressor_timeout",
        "llm_timeout",
    )
    # Tool & Feature Configuration
    perplexity_api_key = None
    strix_disable_browser = "false"
    # Runtime Configuration
-    strix_image = "ghcr.io/usestrix/strix-sandbox:0.1.10"
+    strix_image = "ghcr.io/usestrix/strix-sandbox:0.1.13"
    strix_runtime_backend = "docker"
-    strix_sandbox_execution_timeout = "500"
+    strix_sandbox_execution_timeout = "120"
    strix_sandbox_connect_timeout = "10"
    # Telemetry
    strix_telemetry = "1"
    strix_otel_telemetry = None
    strix_posthog_telemetry = None
    traceloop_base_url = None
    traceloop_api_key = None
    traceloop_headers = None
    # Config file override (set via --config CLI arg)
    _config_file_override: Path | None = None
    @classmethod
    def _tracked_names(cls) -> list[str]:
@@ -45,6 +68,20 @@ class Config:
    def tracked_vars(cls) -> list[str]:
        return [name.upper() for name in cls._tracked_names()]
    @classmethod
    def _llm_env_vars(cls) -> set[str]:
        return {name.upper() for name in cls._LLM_CANONICAL_NAMES}
    @classmethod
    def _llm_env_changed(cls, saved_env: dict[str, Any]) -> bool:
        for var_name in cls._llm_env_vars():
            current = os.getenv(var_name)
            if current is None:
                continue
            if saved_env.get(var_name) != current:
                return True
        return False
    @classmethod
    def get(cls, name: str) -> str | None:
        env_name = name.upper()
@@ -57,6 +94,8 @@ class Config:
    @classmethod
    def config_file(cls) -> Path:
        if cls._config_file_override is not None:
            return cls._config_file_override
        return cls.config_dir() / "cli-config.json"
    @classmethod
@@ -75,7 +114,7 @@ class Config:
    def save(cls, config: dict[str, Any]) -> bool:
        try:
            cls.config_dir().mkdir(parents=True, exist_ok=True)
-            config_path = cls.config_file()
+            config_path = cls.config_dir() / "cli-config.json"
            with config_path.open("w", encoding="utf-8") as f:
                json.dump(config, f, indent=2)
        except OSError:
@@ -85,13 +124,30 @@ class Config:
        return True
    @classmethod
-    def apply_saved(cls) -> dict[str, str]:
+    def apply_saved(cls, force: bool = False) -> dict[str, str]:
        saved = cls.load()
        env_vars = saved.get("env", {})
        if not isinstance(env_vars, dict):
            env_vars = {}
        cleared_vars = {
            var_name
            for var_name in cls.tracked_vars()
            if var_name in os.environ and os.environ.get(var_name) == ""
        }
        if cleared_vars:
            for var_name in cleared_vars:
                env_vars.pop(var_name, None)
            if cls._config_file_override is None:
                cls.save({"env": env_vars})
        if cls._llm_env_changed(env_vars):
            for var_name in cls._llm_env_vars():
                env_vars.pop(var_name, None)
            if cls._config_file_override is None:
                cls.save({"env": env_vars})
        applied = {}
        for var_name, var_value in env_vars.items():
-            if var_name in cls.tracked_vars() and not os.getenv(var_name):
+            if var_name in cls.tracked_vars() and (force or var_name not in os.environ):
                os.environ[var_name] = var_value
                applied[var_name] = var_value
@@ -123,9 +179,37 @@ class Config:
        return cls.save({"env": merged})
-def apply_saved_config() -> dict[str, str]:
+def apply_saved_config(force: bool = False) -> dict[str, str]:
-    return Config.apply_saved()
+    return Config.apply_saved(force=force)
 def save_current_config() -> bool:
    return Config.save_current()
 def resolve_llm_config() -> tuple[str | None, str | None, str | None]:
    """Resolve LLM model, api_key, and api_base based on STRIX_LLM prefix.
    Returns:
        tuple: (model_name, api_key, api_base)
        - model_name: Original model name (strix/ prefix preserved for display)
        - api_key: LLM API key
        - api_base: API base URL (auto-set to STRIX_API_BASE for strix/ models)
    """
    model = Config.get("strix_llm")
    if not model:
        return None, None, None
    api_key = Config.get("llm_api_key")
    if model.startswith("strix/"):
        api_base: str | None = STRIX_API_BASE
    else:
        api_base = (
            Config.get("llm_api_base")
            or Config.get("openai_api_base")
            or Config.get("litellm_base_url")
            or Config.get("ollama_api_base")
        )
    return model, api_key, api_base
--- a/strix/interface/assets/tui_styles.tcss
+++ b/strix/interface/assets/tui_styles.tcss
@@ -3,6 +3,28 @@ Screen {
    color: #d4d4d4;
 }
 .screen--selection {
    background: #2d3d2f;
    color: #e5e5e5;
 }
 ToastRack {
    dock: top;
    align: right top;
    margin-bottom: 0;
    margin-top: 1;
 }
 Toast {
    width: 25;
    background: #000000;
    border-left: outer #22c55e;
 }
 Toast.-information .toast--title {
    color: #22c55e;
 }
 #splash_screen {
    height: 100%;
    width: 100%;
@@ -36,7 +58,7 @@ Screen {
 }
 #sidebar {
-    width: 25%;
+    width: 20%;
    background: transparent;
    margin-left: 1;
 }
@@ -55,12 +77,21 @@ Screen {
    margin-bottom: 0;
 }
-#stats_display {
+#stats_scroll {
    height: auto;
    max-height: 15;
    background: transparent;
    padding: 0;
    margin: 0;
    border: round #333333;
    scrollbar-size: 0 0;
 }
 #stats_display {
    height: auto;
    background: transparent;
    padding: 0 1;
    margin: 0;
 }
 #vulnerabilities_panel {
@@ -174,7 +205,7 @@ VulnerabilityDetailScreen {
 }
 #chat_area_container {
-    width: 75%;
+    width: 80%;
    background: transparent;
 }
--- a/strix/interface/cli.py
+++ b/strix/interface/cli.py
@@ -24,30 +24,26 @@ async def run_cli(args: Any) -> None:  # noqa: PLR0915
    console = Console()
    start_text = Text()
-    start_text.append("🦉 ", style="bold white")
+    start_text.append("Penetration test initiated", style="bold #22c55e")
    start_text.append("STRIX CYBERSECURITY AGENT", style="bold green")
    target_text = Text()
    target_text.append("Target", style="dim")
    target_text.append("  ")
    if len(args.targets_info) == 1:
        target_text.append("🎯 Target: ", style="bold cyan")
        target_text.append(args.targets_info[0]["original"], style="bold white")
    else:
-        target_text.append("🎯 Targets: ", style="bold cyan")
+        target_text.append(f"{len(args.targets_info)} targets", style="bold white")
-        target_text.append(f"{len(args.targets_info)} targets\n", style="bold white")
+        for target_info in args.targets_info:
-        for i, target_info in enumerate(args.targets_info):
+            target_text.append("\n        ")
            target_text.append("   • ", style="dim white")
            target_text.append(target_info["original"], style="white")
            if i < len(args.targets_info) - 1:
                target_text.append("\n")
    results_text = Text()
-    results_text.append("📊 Results will be saved to: ", style="bold cyan")
+    results_text.append("Output", style="dim")
-    results_text.append(f"strix_runs/{args.run_name}", style="bold white")
+    results_text.append("  ")
    results_text.append(f"strix_runs/{args.run_name}", style="#60a5fa")
    note_text = Text()
    note_text.append("\n\n", style="dim")
    note_text.append("⏱️  ", style="dim")
    note_text.append("This may take a while depending on target complexity. ", style="dim")
    note_text.append("Vulnerabilities will be displayed in real-time.", style="dim")
    startup_panel = Panel(
@@ -59,9 +55,9 @@ async def run_cli(args: Any) -> None:  # noqa: PLR0915
            results_text,
            note_text,
        ),
-        title="[bold green]🛡️  STRIX PENETRATION TEST INITIATED",
+        title="[bold white]STRIX",
-        title_align="center",
+        title_align="left",
-        border_style="green",
+        border_style="#22c55e",
        padding=(1, 2),
    )
@@ -76,13 +72,16 @@ async def run_cli(args: Any) -> None:  # noqa: PLR0915
        "targets": args.targets_info,
        "user_instructions": args.instruction or "",
        "run_name": args.run_name,
        "diff_scope": getattr(args, "diff_scope", {"active": False}),
    }
-    llm_config = LLMConfig(scan_mode=scan_mode)
+    llm_config = LLMConfig(
        scan_mode=scan_mode,
        is_whitebox=bool(getattr(args, "local_sources", [])),
    )
    agent_config = {
        "llm_config": llm_config,
        "max_iterations": 300,
        "non_interactive": True,
    }
    if getattr(args, "local_sources", None):
@@ -110,7 +109,10 @@ async def run_cli(args: Any) -> None:  # noqa: PLR0915
    tracer.vulnerability_found_callback = display_vulnerability
    def cleanup_on_exit() -> None:
        from strix.runtime import cleanup_runtime
        tracer.cleanup()
        cleanup_runtime()
    def signal_handler(_signum: int, _frame: Any) -> None:
        tracer.cleanup()
@@ -126,8 +128,7 @@ async def run_cli(args: Any) -> None:  # noqa: PLR0915
    def create_live_status() -> Panel:
        status_text = Text()
-        status_text.append("🦉 ", style="bold white")
+        status_text.append("Penetration test in progress", style="bold #22c55e")
        status_text.append("Running penetration test...", style="bold #22c55e")
        status_text.append("\n\n")
        stats_text = build_live_stats_text(tracer, agent_config)
@@ -136,8 +137,8 @@ async def run_cli(args: Any) -> None:  # noqa: PLR0915
        return Panel(
            status_text,
-            title="[bold #22c55e]🔍 Live Penetration Test Status",
+            title="[bold white]STRIX",
-            title_align="center",
+            title_align="left",
            border_style="#22c55e",
            padding=(1, 2),
        )
@@ -169,7 +170,7 @@ async def run_cli(args: Any) -> None:  # noqa: PLR0915
                    error_msg = result.get("error", "Unknown error")
                    error_details = result.get("details")
                    console.print()
-                    console.print(f"[bold red]❌ Penetration test failed:[/] {error_msg}")
+                    console.print(f"[bold red]Penetration test failed:[/] {error_msg}")
                    if error_details:
                        console.print(f"[dim]{error_details}[/]")
                    console.print()
@@ -186,8 +187,7 @@ async def run_cli(args: Any) -> None:  # noqa: PLR0915
        console.print()
        final_report_text = Text()
-        final_report_text.append("📄 ", style="bold cyan")
+        final_report_text.append("Penetration test summary", style="bold #60a5fa")
        final_report_text.append("FINAL PENETRATION TEST REPORT", style="bold cyan")
        final_report_panel = Panel(
            Text.assemble(
@@ -195,9 +195,9 @@ async def run_cli(args: Any) -> None:  # noqa: PLR0915
                "\n\n",
                tracer.final_scan_result,
            ),
-            title="[bold cyan]📊 PENETRATION TEST SUMMARY",
+            title="[bold white]STRIX",
-            title_align="center",
+            title_align="left",
-            border_style="cyan",
+            border_style="#60a5fa",
            padding=(1, 2),
        )
--- a/strix/interface/main.py
+++ b/strix/interface/main.py
@@ -18,6 +18,8 @@ from rich.panel import Panel
 from rich.text import Text
 from strix.config import Config, apply_saved_config, save_current_config
 from strix.config.config import resolve_llm_config
 from strix.llm.utils import resolve_strix_model
 apply_saved_config()
@@ -34,7 +36,9 @@ from strix.interface.utils import (  # noqa: E402
    image_exists,
    infer_target_type,
    process_pull_line,
    resolve_diff_scope_context,
    rewrite_localhost_targets,
    validate_config_file,
    validate_llm_response,
 )
 from strix.runtime.docker_runtime import HOST_GATEWAY_HOSTNAME  # noqa: E402
@@ -50,10 +54,13 @@ def validate_environment() -> None:  # noqa: PLR0912, PLR0915
    missing_required_vars = []
    missing_optional_vars = []
-    if not Config.get("strix_llm"):
+    strix_llm = Config.get("strix_llm")
    uses_strix_models = strix_llm and strix_llm.startswith("strix/")
    if not strix_llm:
        missing_required_vars.append("STRIX_LLM")
-    has_base_url = any(
+    has_base_url = uses_strix_models or any(
        [
            Config.get("llm_api_base"),
            Config.get("openai_api_base"),
@@ -76,7 +83,6 @@ def validate_environment() -> None:  # noqa: PLR0912, PLR0915
    if missing_required_vars:
        error_text = Text()
        error_text.append("❌ ", style="bold red")
        error_text.append("MISSING REQUIRED ENVIRONMENT VARIABLES", style="bold red")
        error_text.append("\n\n", style="white")
@@ -96,7 +102,7 @@ def validate_environment() -> None:  # noqa: PLR0912, PLR0915
                error_text.append("• ", style="white")
                error_text.append("STRIX_LLM", style="bold cyan")
                error_text.append(
-                    " - Model name to use with litellm (e.g., 'openai/gpt-5')\n",
+                    " - Model name to use with litellm (e.g., 'openai/gpt-5.4')\n",
                    style="white",
                )
@@ -135,7 +141,7 @@ def validate_environment() -> None:  # noqa: PLR0912, PLR0915
                    )
        error_text.append("\nExample setup:\n", style="white")
-        error_text.append("export STRIX_LLM='openai/gpt-5'\n", style="dim white")
+        error_text.append("export STRIX_LLM='openai/gpt-5.4'\n", style="dim white")
        if missing_optional_vars:
            for var in missing_optional_vars:
@@ -163,8 +169,8 @@ def validate_environment() -> None:  # noqa: PLR0912, PLR0915
        panel = Panel(
            error_text,
-            title="[bold red]🛡️  STRIX CONFIGURATION ERROR",
+            title="[bold white]STRIX",
-            title_align="center",
+            title_align="left",
            border_style="red",
            padding=(1, 2),
        )
@@ -179,7 +185,6 @@ def check_docker_installed() -> None:
    if shutil.which("docker") is None:
        console = Console()
        error_text = Text()
        error_text.append("❌ ", style="bold red")
        error_text.append("DOCKER NOT INSTALLED", style="bold red")
        error_text.append("\n\n", style="white")
        error_text.append("The 'docker' CLI was not found in your PATH.\n", style="white")
@@ -189,8 +194,8 @@ def check_docker_installed() -> None:
        panel = Panel(
            error_text,
-            title="[bold red]🛡️  STRIX STARTUP ERROR",
+            title="[bold white]STRIX",
-            title_align="center",
+            title_align="left",
            border_style="red",
            padding=(1, 2),
        )
@@ -202,14 +207,9 @@ async def warm_up_llm() -> None:
    console = Console()
    try:
-        model_name = Config.get("strix_llm")
+        model_name, api_key, api_base = resolve_llm_config()
-        api_key = Config.get("llm_api_key")
+        litellm_model, _ = resolve_strix_model(model_name)
-        api_base = (
+        litellm_model = litellm_model or model_name
            Config.get("llm_api_base")
            or Config.get("openai_api_base")
            or Config.get("litellm_base_url")
            or Config.get("ollama_api_base")
        )
        test_messages = [
            {"role": "system", "content": "You are a helpful assistant."},
@@ -219,7 +219,7 @@ async def warm_up_llm() -> None:
        llm_timeout = int(Config.get("llm_timeout") or "300")
        completion_kwargs: dict[str, Any] = {
-            "model": model_name,
+            "model": litellm_model,
            "messages": test_messages,
            "timeout": llm_timeout,
        }
@@ -234,7 +234,6 @@ async def warm_up_llm() -> None:
    except Exception as e:  # noqa: BLE001
        error_text = Text()
        error_text.append("❌ ", style="bold red")
        error_text.append("LLM CONNECTION FAILED", style="bold red")
        error_text.append("\n\n", style="white")
        error_text.append("Could not establish connection to the language model.\n", style="white")
@@ -243,8 +242,8 @@ async def warm_up_llm() -> None:
        panel = Panel(
            error_text,
-            title="[bold red]🛡️  STRIX STARTUP ERROR",
+            title="[bold white]STRIX",
-            title_align="center",
+            title_align="left",
            border_style="red",
            padding=(1, 2),
        )
@@ -359,6 +358,34 @@ Examples:
        ),
    )
    parser.add_argument(
        "--scope-mode",
        type=str,
        choices=["auto", "diff", "full"],
        default="auto",
        help=(
            "Scope mode for code targets: "
            "'auto' enables PR diff-scope in CI/headless runs, "
            "'diff' forces changed-files scope, "
            "'full' disables diff-scope."
        ),
    )
    parser.add_argument(
        "--diff-base",
        type=str,
        help=(
            "Target branch or commit to compare against (e.g., origin/main). "
            "Defaults to the repository's default branch."
        ),
    )
    parser.add_argument(
        "--config",
        type=str,
        help="Path to a custom config file (JSON) to use instead of ~/.strix/cli-config.json",
    )
    args = parser.parse_args()
    if args.instruction and args.instruction_file:
@@ -406,54 +433,45 @@ def display_completion_message(args: argparse.Namespace, results_path: Path) ->
    if tracer and tracer.scan_results:
        scan_completed = tracer.scan_results.get("scan_completed", False)
    has_vulnerabilities = tracer and len(tracer.vulnerability_reports) > 0
    completion_text = Text()
    if scan_completed:
-        completion_text.append("🦉 ", style="bold white")
+        completion_text.append("Penetration test completed", style="bold #22c55e")
        completion_text.append("AGENT FINISHED", style="bold green")
        completion_text.append(" • ", style="dim white")
        completion_text.append("Penetration test completed", style="white")
    else:
-        completion_text.append("🦉 ", style="bold white")
+        completion_text.append("SESSION ENDED", style="bold #eab308")
        completion_text.append("SESSION ENDED", style="bold yellow")
        completion_text.append(" • ", style="dim white")
        completion_text.append("Penetration test interrupted by user", style="white")
    stats_text = build_final_stats_text(tracer)
    target_text = Text()
    target_text.append("Target", style="dim")
    target_text.append("  ")
    if len(args.targets_info) == 1:
        target_text.append("🎯 Target: ", style="bold cyan")
        target_text.append(args.targets_info[0]["original"], style="bold white")
    else:
-        target_text.append("🎯 Targets: ", style="bold cyan")
+        target_text.append(f"{len(args.targets_info)} targets", style="bold white")
-        target_text.append(f"{len(args.targets_info)} targets\n", style="bold white")
+        for target_info in args.targets_info:
-        for i, target_info in enumerate(args.targets_info):
+            target_text.append("\n        ")
            target_text.append("   • ", style="dim white")
            target_text.append(target_info["original"], style="white")
-            if i < len(args.targets_info) - 1:
+
-                target_text.append("\n")
+    stats_text = build_final_stats_text(tracer)
    panel_parts = [completion_text, "\n\n", target_text]
    if stats_text.plain:
        panel_parts.extend(["\n", stats_text])
-    if scan_completed or has_vulnerabilities:
+    results_text = Text()
-        results_text = Text()
+    results_text.append("\n")
-        results_text.append("📊 Results Saved To: ", style="bold cyan")
+    results_text.append("Output", style="dim")
-        results_text.append(str(results_path), style="bold yellow")
+    results_text.append("  ")
-        panel_parts.extend(["\n\n", results_text])
+    results_text.append(str(results_path), style="#60a5fa")
    panel_parts.extend(["\n", results_text])
    panel_content = Text.assemble(*panel_parts)
-    border_style = "green" if scan_completed else "yellow"
+    border_style = "#22c55e" if scan_completed else "#eab308"
    panel = Panel(
        panel_content,
-        title="[bold green]🛡️  STRIX CYBERSECURITY AGENT",
+        title="[bold white]STRIX",
-        title_align="center",
+        title_align="left",
        border_style=border_style,
        padding=(1, 2),
    )
@@ -461,8 +479,7 @@ def display_completion_message(args: argparse.Namespace, results_path: Path) ->
    console.print("\n")
    console.print(panel)
    console.print()
-    console.print("[dim]🌐 Website:[/] [cyan]https://strix.ai[/]")
+    console.print("[#60a5fa]strix.ai[/]  [dim]·[/]  [#60a5fa]discord.gg/strix-ai[/]")
    console.print("[dim]💬 Discord:[/] [cyan]https://discord.gg/YjKFvEZSdZ[/]")
    console.print()
@@ -474,7 +491,7 @@ def pull_docker_image() -> None:
        return
    console.print()
-    console.print(f"[bold cyan]🐳 Pulling Docker image:[/] {Config.get('strix_image')}")
+    console.print(f"[dim]Pulling image[/] {Config.get('strix_image')}")
    console.print("[dim yellow]This only happens on first run and may take a few minutes...[/]")
    console.print()
@@ -489,7 +506,6 @@ def pull_docker_image() -> None:
        except DockerException as e:
            console.print()
            error_text = Text()
            error_text.append("❌ ", style="bold red")
            error_text.append("FAILED TO PULL IMAGE", style="bold red")
            error_text.append("\n\n", style="white")
            error_text.append(f"Could not download: {Config.get('strix_image')}\n", style="white")
@@ -497,8 +513,8 @@ def pull_docker_image() -> None:
            panel = Panel(
                error_text,
-                title="[bold red]🛡️  DOCKER PULL ERROR",
+                title="[bold white]STRIX",
-                title_align="center",
+                title_align="left",
                border_style="red",
                padding=(1, 2),
            )
@@ -506,25 +522,37 @@ def pull_docker_image() -> None:
            sys.exit(1)
    success_text = Text()
-    success_text.append("✅ ", style="bold green")
+    success_text.append("Docker image ready", style="#22c55e")
    success_text.append("Successfully pulled Docker image", style="green")
    console.print(success_text)
    console.print()
-def main() -> None:
+def apply_config_override(config_path: str) -> None:
    Config._config_file_override = validate_config_file(config_path)
    apply_saved_config(force=True)
 def persist_config() -> None:
    if Config._config_file_override is None:
        save_current_config()
 def main() -> None:  # noqa: PLR0912, PLR0915
    if sys.platform == "win32":
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
    args = parse_arguments()
    if args.config:
        apply_config_override(args.config)
    check_docker_installed()
    pull_docker_image()
    validate_environment()
    asyncio.run(warm_up_llm())
-    save_current_config()
+    persist_config()
    args.run_name = generate_run_name(args.targets_info)
@@ -536,6 +564,38 @@ def main() -> None:
            target_info["details"]["cloned_repo_path"] = cloned_path
    args.local_sources = collect_local_sources(args.targets_info)
    try:
        diff_scope = resolve_diff_scope_context(
            local_sources=args.local_sources,
            scope_mode=args.scope_mode,
            diff_base=args.diff_base,
            non_interactive=args.non_interactive,
        )
    except ValueError as e:
        console = Console()
        error_text = Text()
        error_text.append("DIFF SCOPE RESOLUTION FAILED", style="bold red")
        error_text.append("\n\n", style="white")
        error_text.append(str(e), style="white")
        panel = Panel(
            error_text,
            title="[bold white]STRIX",
            title_align="left",
            border_style="red",
            padding=(1, 2),
        )
        console.print("\n")
        console.print(panel)
        console.print()
        sys.exit(1)
    args.diff_scope = diff_scope.metadata
    if diff_scope.instruction_block:
        if args.instruction:
            args.instruction = f"{diff_scope.instruction_block}\n\n{args.instruction}"
        else:
            args.instruction = diff_scope.instruction_block
    is_whitebox = bool(args.local_sources)
--- a/strix/interface/streaming_parser.py
+++ b/strix/interface/streaming_parser.py
@@ -3,8 +3,16 @@ import re
 from dataclasses import dataclass
 from typing import Literal
 from strix.llm.utils import normalize_tool_format
 _FUNCTION_TAG_PREFIX = "<function="
 _INVOKE_TAG_PREFIX = "<invoke "
 _FUNC_PATTERN = re.compile(r"<function=([^>]+)>")
 _FUNC_END_PATTERN = re.compile(r"</function>")
 _COMPLETE_PARAM_PATTERN = re.compile(r"<parameter=([^>]+)>(.*?)</parameter>", re.DOTALL)
 _INCOMPLETE_PARAM_PATTERN = re.compile(r"<parameter=([^>]+)>(.*)$", re.DOTALL)
 def _get_safe_content(content: str) -> tuple[str, str]:
@@ -16,9 +24,8 @@ def _get_safe_content(content: str) -> tuple[str, str]:
        return content, ""
    suffix = content[last_lt:]
    target = _FUNCTION_TAG_PREFIX  # "<function="
-    if target.startswith(suffix):
+    if _FUNCTION_TAG_PREFIX.startswith(suffix) or _INVOKE_TAG_PREFIX.startswith(suffix):
        return content[:last_lt], suffix
    return content, ""
@@ -37,10 +44,11 @@ def parse_streaming_content(content: str) -> list[StreamSegment]:
    if not content:
        return []
    content = normalize_tool_format(content)
    segments: list[StreamSegment] = []
-    func_pattern = r"<function=([^>]+)>"
+    func_matches = list(_FUNC_PATTERN.finditer(content))
    func_matches = list(re.finditer(func_pattern, content))
    if not func_matches:
        safe_content, _ = _get_safe_content(content)
@@ -59,12 +67,12 @@ def parse_streaming_content(content: str) -> list[StreamSegment]:
        tool_name = match.group(1)
        func_start = match.end()
-        func_end_match = re.search(r"</function>", content[func_start:])
+        func_end_match = _FUNC_END_PATTERN.search(content, func_start)
        if func_end_match:
-            func_body = content[func_start : func_start + func_end_match.start()]
+            func_body = content[func_start : func_end_match.start()]
            is_complete = True
-            end_pos = func_start + func_end_match.end()
+            end_pos = func_end_match.end()
        else:
            if i + 1 < len(func_matches):
                next_func_start = func_matches[i + 1].start()
@@ -98,8 +106,7 @@ def parse_streaming_content(content: str) -> list[StreamSegment]:
 def _parse_streaming_params(func_body: str) -> dict[str, str]:
    args: dict[str, str] = {}
-    complete_pattern = r"<parameter=([^>]+)>(.*?)</parameter>"
+    complete_matches = list(_COMPLETE_PARAM_PATTERN.finditer(func_body))
    complete_matches = list(re.finditer(complete_pattern, func_body, re.DOTALL))
    complete_end_pos = 0
    for match in complete_matches:
@@ -109,8 +116,7 @@ def _parse_streaming_params(func_body: str) -> dict[str, str]:
        complete_end_pos = max(complete_end_pos, match.end())
    remaining = func_body[complete_end_pos:]
-    incomplete_pattern = r"<parameter=([^>]+)>(.*)$"
+    incomplete_match = _INCOMPLETE_PARAM_PATTERN.search(remaining)
    incomplete_match = re.search(incomplete_pattern, remaining, re.DOTALL)
    if incomplete_match:
        param_name = incomplete_match.group(1)
        param_value = html.unescape(incomplete_match.group(2).strip())
--- a/strix/interface/tool_components/init.py
+++ b/strix/interface/tool_components/init.py
@@ -4,6 +4,7 @@ from . import (
    browser_renderer,
    file_edit_renderer,
    finish_renderer,
    load_skill_renderer,
    notes_renderer,
    proxy_renderer,
    python_renderer,
@@ -28,6 +29,7 @@ __all__ = [
    "file_edit_renderer",
    "finish_renderer",
    "get_tool_renderer",
    "load_skill_renderer",
    "notes_renderer",
    "proxy_renderer",
    "python_renderer",
--- a/strix/interface/tool_components/agents_graph_renderer.py
+++ b/strix/interface/tool_components/agents_graph_renderer.py
@@ -92,12 +92,13 @@ class AgentFinishRenderer(BaseToolRenderer):
        success = args.get("success", True)
        text = Text()
        text.append("🏁 ")
        if success:
-            text.append("Agent completed", style="bold #fbbf24")
+            text.append("◆ ", style="#22c55e")
            text.append("Agent completed", style="bold #22c55e")
        else:
-            text.append("Agent failed", style="bold #fbbf24")
+            text.append("◆ ", style="#ef4444")
            text.append("Agent failed", style="bold #ef4444")
        if result_summary:
            text.append("\n  ")
--- a/strix/interface/tool_components/browser_renderer.py
+++ b/strix/interface/tool_components/browser_renderer.py
@@ -64,7 +64,7 @@ class BrowserRenderer(BaseToolRenderer):
        args = tool_data.get("args", {})
        status = tool_data.get("status", "unknown")
-        action = args.get("action", "unknown")
+        action = args.get("action", "")
        content = cls._build_content(action, args)
        css_classes = cls.get_css_classes(status)
@@ -131,5 +131,6 @@ class BrowserRenderer(BaseToolRenderer):
                text.append_text(cls._highlight_js(js_code))
            return text
-        text.append(action, style="#06b6d4")
+        if action:
            text.append(action, style="#06b6d4")
        return text
--- a/strix/interface/tool_components/file_edit_renderer.py
+++ b/strix/interface/tool_components/file_edit_renderer.py
@@ -65,16 +65,16 @@ class StrReplaceEditorRenderer(BaseToolRenderer):
        text = Text()
        icons_and_labels = {
-            "view": ("📖 ", "Reading file", "#10b981"),
+            "view": ("◇ ", "read", "#10b981"),
-            "str_replace": ("✏️ ", "Editing file", "#10b981"),
+            "str_replace": ("◇ ", "edit", "#10b981"),
-            "create": ("📝 ", "Creating file", "#10b981"),
+            "create": ("◇ ", "create", "#10b981"),
-            "insert": ("✏️ ", "Inserting text", "#10b981"),
+            "insert": ("◇ ", "insert", "#10b981"),
-            "undo_edit": ("↩️ ", "Undoing edit", "#10b981"),
+            "undo_edit": ("◇ ", "undo", "#10b981"),
        }
-        icon, label, color = icons_and_labels.get(command, ("📄 ", "File operation", "#10b981"))
+        icon, label, color = icons_and_labels.get(command, ("◇ ", "file", "#10b981"))
-        text.append(icon)
+        text.append(icon, style=color)
-        text.append(label, style=f"bold {color}")
+        text.append(label, style="dim")
        if path:
            path_display = path[-60:] if len(path) > 60 else path
@@ -132,8 +132,8 @@ class ListFilesRenderer(BaseToolRenderer):
        path = args.get("path", "")
        text = Text()
-        text.append("📂 ")
+        text.append("◇ ", style="#10b981")
-        text.append("Listing files", style="bold #10b981")
+        text.append("list", style="dim")
        text.append(" ")
        if path:
@@ -158,23 +158,20 @@ class SearchFilesRenderer(BaseToolRenderer):
        regex = args.get("regex", "")
        text = Text()
-        text.append("🔍 ")
+        text.append("◇ ", style="#a855f7")
-        text.append("Searching files", style="bold purple")
+        text.append("search", style="dim")
-        text.append(" ")
+        text.append("  ")
        if path and regex:
            text.append(path, style="dim")
-            text.append(" for '", style="dim")
+            text.append(" ", style="dim")
-            text.append(regex, style="dim")
+            text.append(regex, style="#a855f7")
            text.append("'", style="dim")
        elif path:
            text.append(path, style="dim")
        elif regex:
-            text.append("'", style="dim")
+            text.append(regex, style="#a855f7")
            text.append(regex, style="dim")
            text.append("'", style="dim")
        else:
-            text.append("Searching...", style="dim")
+            text.append("...", style="dim")
        css_classes = cls.get_css_classes("completed")
        return Static(text, classes=css_classes)
--- a/strix/interface/tool_components/finish_renderer.py
+++ b/strix/interface/tool_components/finish_renderer.py
@@ -1,6 +1,5 @@
 from typing import Any, ClassVar
 from rich.padding import Padding
 from rich.text import Text
 from textual.widgets import Static
@@ -9,7 +8,6 @@ from .registry import register_tool_renderer
 FIELD_STYLE = "bold #4ade80"
 BG_COLOR = "#141414"
@register_tool_renderer
@@ -27,8 +25,8 @@ class FinishScanRenderer(BaseToolRenderer):
        recommendations = args.get("recommendations", "")
        text = Text()
-        text.append("🏁 ")
+        text.append("◆ ", style="#22c55e")
-        text.append("Finishing Scan", style="bold #dc2626")
+        text.append("Penetration test completed", style="bold #22c55e")
        if executive_summary:
            text.append("\n\n")
@@ -58,7 +56,10 @@ class FinishScanRenderer(BaseToolRenderer):
            text.append("\n  ")
            text.append("Generating final report...", style="dim")
-        padded = Padding(text, 2, style=f"on {BG_COLOR}")
+        padded = Text()
        padded.append("\n\n")
        padded.append_text(text)
        padded.append("\n\n")
        css_classes = cls.get_css_classes("completed")
        return Static(padded, classes=css_classes)
--- a/strix/interface/tool_components/load_skill_renderer.py
+++ b/strix/interface/tool_components/load_skill_renderer.py
@@ -0,0 +1,33 @@
 from typing import Any, ClassVar
 from rich.text import Text
 from textual.widgets import Static
 from .base_renderer import BaseToolRenderer
 from .registry import register_tool_renderer
@register_tool_renderer
 class LoadSkillRenderer(BaseToolRenderer):
    tool_name: ClassVar[str] = "load_skill"
    css_classes: ClassVar[list[str]] = ["tool-call", "load-skill-tool"]
    @classmethod
    def render(cls, tool_data: dict[str, Any]) -> Static:
        args = tool_data.get("args", {})
        status = tool_data.get("status", "completed")
        requested = args.get("skills", "")
        text = Text()
        text.append("◇ ", style="#10b981")
        text.append("loading skill", style="dim")
        if requested:
            text.append(" ")
            text.append(requested, style="#10b981")
        elif not tool_data.get("result"):
            text.append("\n  ")
            text.append("Loading...", style="dim")
        return Static(text, classes=cls.get_css_classes(status))
--- a/strix/interface/tool_components/notes_renderer.py
+++ b/strix/interface/tool_components/notes_renderer.py
@@ -21,8 +21,8 @@ class CreateNoteRenderer(BaseToolRenderer):
        category = args.get("category", "general")
        text = Text()
-        text.append("📝 ")
+        text.append("◇ ", style="#fbbf24")
-        text.append("Note", style="bold #fbbf24")
+        text.append("note", style="dim")
        text.append(" ")
        text.append(f"({category})", style="dim")
@@ -50,8 +50,8 @@ class DeleteNoteRenderer(BaseToolRenderer):
    @classmethod
    def render(cls, tool_data: dict[str, Any]) -> Static:  # noqa: ARG003
        text = Text()
-        text.append("📝 ")
+        text.append("◇ ", style="#fbbf24")
-        text.append("Note Removed", style="bold #94a3b8")
+        text.append("note removed", style="dim")
        css_classes = cls.get_css_classes("completed")
        return Static(text, classes=css_classes)
@@ -70,8 +70,8 @@ class UpdateNoteRenderer(BaseToolRenderer):
        content = args.get("content")
        text = Text()
-        text.append("📝 ")
+        text.append("◇ ", style="#fbbf24")
-        text.append("Note Updated", style="bold #fbbf24")
+        text.append("note updated", style="dim")
        if title:
            text.append("\n  ")
@@ -99,8 +99,8 @@ class ListNotesRenderer(BaseToolRenderer):
        result = tool_data.get("result")
        text = Text()
-        text.append("📝 ")
+        text.append("◇ ", style="#fbbf24")
-        text.append("Notes", style="bold #fbbf24")
+        text.append("notes", style="dim")
        if isinstance(result, str) and result.strip():
            text.append("\n  ")
@@ -117,6 +117,8 @@ class ListNotesRenderer(BaseToolRenderer):
                    title = note.get("title", "").strip() or "(untitled)"
                    category = note.get("category", "general")
                    note_content = note.get("content", "").strip()
                    if not note_content:
                        note_content = note.get("content_preview", "").strip()
                    text.append("\n  - ")
                    text.append(title)
@@ -131,3 +133,35 @@ class ListNotesRenderer(BaseToolRenderer):
        css_classes = cls.get_css_classes("completed")
        return Static(text, classes=css_classes)
@register_tool_renderer
 class GetNoteRenderer(BaseToolRenderer):
    tool_name: ClassVar[str] = "get_note"
    css_classes: ClassVar[list[str]] = ["tool-call", "notes-tool"]
    @classmethod
    def render(cls, tool_data: dict[str, Any]) -> Static:
        result = tool_data.get("result")
        text = Text()
        text.append("◇ ", style="#fbbf24")
        text.append("note read", style="dim")
        if result and isinstance(result, dict) and result.get("success"):
            note = result.get("note", {}) or {}
            title = str(note.get("title", "")).strip() or "(untitled)"
            category = note.get("category", "general")
            content = str(note.get("content", "")).strip()
            text.append("\n  ")
            text.append(title)
            text.append(f" ({category})", style="dim")
            if content:
                text.append("\n  ")
                text.append(content, style="dim")
        else:
            text.append("\n  ")
            text.append("Loading...", style="dim")
        css_classes = cls.get_css_classes("completed")
        return Static(text, classes=css_classes)
--- a/strix/interface/tool_components/proxy_renderer.py
+++ b/strix/interface/tool_components/proxy_renderer.py
@@ -7,53 +7,105 @@ from .base_renderer import BaseToolRenderer
 from .registry import register_tool_renderer
 PROXY_ICON = "<~>"
 MAX_REQUESTS_DISPLAY = 20
 MAX_LINE_LENGTH = 200
 def _truncate(text: str, max_len: int = 80) -> str:
    return text[: max_len - 3] + "..." if len(text) > max_len else text
 def _sanitize(text: str, max_len: int = 150) -> str:
    """Remove newlines and truncate text."""
    clean = text.replace("\n", " ").replace("\r", "").replace("\t", " ")
    return _truncate(clean, max_len)
 def _status_style(code: int | None) -> str:
    if code is None:
        return "dim"
    if 200 <= code < 300:
        return "#22c55e"  # green
    if 300 <= code < 400:
        return "#eab308"  # yellow
    if 400 <= code < 500:
        return "#f97316"  # orange
    if code >= 500:
        return "#ef4444"  # red
    return "dim"
@register_tool_renderer
 class ListRequestsRenderer(BaseToolRenderer):
    tool_name: ClassVar[str] = "list_requests"
    css_classes: ClassVar[list[str]] = ["tool-call", "proxy-tool"]
    @classmethod
-    def render(cls, tool_data: dict[str, Any]) -> Static:
+    def render(cls, tool_data: dict[str, Any]) -> Static:  # noqa: PLR0912  # noqa: PLR0912
        args = tool_data.get("args", {})
        result = tool_data.get("result")
        status = tool_data.get("status", "running")
        httpql_filter = args.get("httpql_filter")
        sort_by = args.get("sort_by")
        sort_order = args.get("sort_order")
        scope_id = args.get("scope_id")
        text = Text()
-        text.append("📋 ")
+        text.append(PROXY_ICON, style="dim")
-        text.append("Listing requests", style="bold #06b6d4")
+        text.append(" listing requests", style="#06b6d4")
-        if isinstance(result, str) and result.strip():
+        if httpql_filter:
-            text.append("\n  ")
+            text.append(f"  where {_truncate(httpql_filter, 150)}", style="dim italic")
-            text.append(result.strip(), style="dim")
+
-        elif result and isinstance(result, dict) and "requests" in result:
+        meta_parts = []
-            requests = result["requests"]
+        if sort_by and sort_by != "timestamp":
-            if isinstance(requests, list) and requests:
+            meta_parts.append(f"by:{sort_by}")
-                for req in requests[:25]:
+        if sort_order and sort_order != "desc":
-                    if isinstance(req, dict):
+            meta_parts.append(sort_order)
-                        method = req.get("method", "?")
+        if scope_id and isinstance(scope_id, str):
-                        path = req.get("path", "?")
+            meta_parts.append(f"scope:{scope_id[:8]}")
-                        response = req.get("response") or {}
+        if meta_parts:
-                        status = response.get("statusCode", "?")
+            text.append(f"  ({', '.join(meta_parts)})", style="dim")
-                        text.append("\n  ")
+
-                        text.append(f"{method} {path} → {status}", style="dim")
+        if status == "completed" and isinstance(result, dict):
-                if len(requests) > 25:
+            if "error" in result:
-                    text.append("\n  ")
+                text.append(f"  error: {_sanitize(str(result['error']), 150)}", style="#ef4444")
                    text.append(f"... +{len(requests) - 25} more", style="dim")
            else:
-                text.append("\n  ")
+                total = result.get("total_count", 0)
-                text.append("No requests found", style="dim")
+                requests = result.get("requests", [])
        elif httpql_filter:
            filter_display = (
                httpql_filter[:500] + "..." if len(httpql_filter) > 500 else httpql_filter
            )
            text.append("\n  ")
            text.append(filter_display, style="dim")
        else:
            text.append("\n  ")
            text.append("All requests", style="dim")
-        css_classes = cls.get_css_classes("completed")
+                text.append(f"  [{total} found]", style="dim")
                if requests and isinstance(requests, list):
                    text.append("\n")
                    for i, req in enumerate(requests[:MAX_REQUESTS_DISPLAY]):
                        if not isinstance(req, dict):
                            continue
                        method = req.get("method", "?")
                        host = req.get("host", "")
                        path = req.get("path", "/")
                        resp = req.get("response") or {}
                        code = resp.get("statusCode") if isinstance(resp, dict) else None
                        text.append("  ")
                        text.append(f"{method:6}", style="#a78bfa")
                        text.append(f" {_truncate(host + path, 180)}", style="dim")
                        if code:
                            text.append(f" {code}", style=_status_style(code))
                        if i < min(len(requests), MAX_REQUESTS_DISPLAY) - 1:
                            text.append("\n")
                    if len(requests) > MAX_REQUESTS_DISPLAY:
                        text.append("\n")
                        text.append(
                            f"  ... +{len(requests) - MAX_REQUESTS_DISPLAY} more",
                            style="dim italic",
                        )
        css_classes = cls.get_css_classes(status)
        return Static(text, classes=css_classes)
@@ -63,46 +115,83 @@ class ViewRequestRenderer(BaseToolRenderer):
    css_classes: ClassVar[list[str]] = ["tool-call", "proxy-tool"]
    @classmethod
-    def render(cls, tool_data: dict[str, Any]) -> Static:
+    def render(cls, tool_data: dict[str, Any]) -> Static:  # noqa: PLR0912, PLR0915
        args = tool_data.get("args", {})
        result = tool_data.get("result")
        status = tool_data.get("status", "running")
        request_id = args.get("request_id", "")
        part = args.get("part", "request")
        search_pattern = args.get("search_pattern")
        text = Text()
-        text.append("👀 ")
+        text.append(PROXY_ICON, style="dim")
        text.append(f"Viewing {part}", style="bold #06b6d4")
-        if isinstance(result, str) and result.strip():
+        action = "searching" if search_pattern else "viewing"
-            text.append("\n  ")
+        text.append(f" {action} {part}", style="#06b6d4")
-            text.append(result.strip(), style="dim")
+
-        elif result and isinstance(result, dict):
+        if request_id:
-            if "content" in result:
+            text.append(f" #{request_id}", style="dim")
-                content = result["content"]
+
-                content_preview = content[:2000] + "..." if len(content) > 2000 else content
+        if search_pattern:
-                text.append("\n  ")
+            text.append(f"  /{_truncate(search_pattern, 100)}/", style="dim italic")
-                text.append(content_preview, style="dim")
+
        if status == "completed" and isinstance(result, dict):
            if "error" in result:
                text.append(f"  error: {_sanitize(str(result['error']), 150)}", style="#ef4444")
            elif "matches" in result:
-                matches = result["matches"]
+                matches = result.get("matches", [])
-                if isinstance(matches, list) and matches:
+                total = result.get("total_matches", len(matches))
-                    for match in matches[:25]:
+                text.append(f"  [{total} matches]", style="dim")
                        if isinstance(match, dict) and "match" in match:
                            text.append("\n  ")
                            text.append(match["match"], style="dim")
                    if len(matches) > 25:
                        text.append("\n  ")
                        text.append(f"... +{len(matches) - 25} more matches", style="dim")
                else:
                    text.append("\n  ")
                    text.append("No matches found", style="dim")
            else:
                text.append("\n  ")
                text.append("Viewing content...", style="dim")
        else:
            text.append("\n  ")
            text.append("Loading...", style="dim")
-        css_classes = cls.get_css_classes("completed")
+                if matches and isinstance(matches, list):
                    text.append("\n")
                    for i, m in enumerate(matches[:5]):
                        if not isinstance(m, dict):
                            continue
                        before = m.get("before", "") or ""
                        match_text = m.get("match", "") or ""
                        after = m.get("after", "") or ""
                        before = before.replace("\n", " ").replace("\r", "")[-100:]
                        after = after.replace("\n", " ").replace("\r", "")[:100]
                        text.append("  ")
                        if before:
                            text.append(f"...{before}", style="dim")
                        text.append(match_text, style="#22c55e bold")
                        if after:
                            text.append(f"{after}...", style="dim")
                        if i < min(len(matches), 5) - 1:
                            text.append("\n")
                    if len(matches) > 5:
                        text.append("\n")
                        text.append(f"  ... +{len(matches) - 5} more matches", style="dim italic")
            elif "content" in result:
                showing = result.get("showing_lines", "")
                has_more = result.get("has_more", False)
                content = result.get("content", "")
                text.append(f"  [{showing}]", style="dim")
                if content and isinstance(content, str):
                    lines = content.split("\n")[:15]
                    text.append("\n")
                    for i, line in enumerate(lines):
                        text.append("  ")
                        text.append(_truncate(line, MAX_LINE_LENGTH), style="dim")
                        if i < len(lines) - 1:
                            text.append("\n")
                    if has_more or len(lines) > 15:
                        text.append("\n")
                        text.append("  ... more content available", style="dim italic")
        css_classes = cls.get_css_classes(status)
        return Static(text, classes=css_classes)
@@ -112,45 +201,71 @@ class SendRequestRenderer(BaseToolRenderer):
    css_classes: ClassVar[list[str]] = ["tool-call", "proxy-tool"]
    @classmethod
-    def render(cls, tool_data: dict[str, Any]) -> Static:
+    def render(cls, tool_data: dict[str, Any]) -> Static:  # noqa: PLR0912, PLR0915
        args = tool_data.get("args", {})
        result = tool_data.get("result")
        status = tool_data.get("status", "running")
        method = args.get("method", "GET")
        url = args.get("url", "")
        req_headers = args.get("headers")
        req_body = args.get("body", "")
        text = Text()
-        text.append("📤 ")
+        text.append(PROXY_ICON, style="dim")
-        text.append(f"Sending {method}", style="bold #06b6d4")
+        text.append(" sending request", style="#06b6d4")
-        if isinstance(result, str) and result.strip():
+        text.append("\n")
-            text.append("\n  ")
+        text.append("  >> ", style="#3b82f6")
-            text.append(result.strip(), style="dim")
+        text.append(method, style="#a78bfa")
-        elif result and isinstance(result, dict):
+        text.append(f" {_truncate(url, 180)}", style="dim")
            status_code = result.get("status_code")
            response_body = result.get("body", "")
-            if status_code:
+        if req_headers and isinstance(req_headers, dict):
-                text.append("\n  ")
+            for k, v in list(req_headers.items())[:5]:
-                text.append(f"Status: {status_code}", style="dim")
+                text.append("\n")
-                if response_body:
+                text.append("  >> ", style="#3b82f6")
-                    body_preview = (
+                text.append(f"{k}: ", style="dim")
-                        response_body[:2000] + "..." if len(response_body) > 2000 else response_body
+                text.append(_sanitize(str(v), 150), style="dim")
-                    )
+
-                    text.append("\n  ")
+        if req_body and isinstance(req_body, str):
-                    text.append(body_preview, style="dim")
+            text.append("\n")
            text.append("  >> ", style="#3b82f6")
            body_lines = req_body.split("\n")[:4]
            for i, line in enumerate(body_lines):
                if i > 0:
                    text.append("\n")
                    text.append("     ", style="dim")
                text.append(_truncate(line, MAX_LINE_LENGTH), style="dim")
            if len(req_body.split("\n")) > 4:
                text.append(" ...", style="dim italic")
        if status == "completed" and isinstance(result, dict):
            if "error" in result:
                text.append(f"\n  error: {_sanitize(str(result['error']), 150)}", style="#ef4444")
            else:
-                text.append("\n  ")
+                code = result.get("status_code")
-                text.append("Response received", style="dim")
+                time_ms = result.get("response_time_ms")
        elif url:
            url_display = url[:500] + "..." if len(url) > 500 else url
            text.append("\n  ")
            text.append(url_display, style="dim")
        else:
            text.append("\n  ")
            text.append("Sending...", style="dim")
-        css_classes = cls.get_css_classes("completed")
+                text.append("\n")
                text.append("  << ", style="#22c55e")
                if code:
                    text.append(f"{code}", style=_status_style(code))
                if time_ms:
                    text.append(f" ({time_ms}ms)", style="dim")
                body = result.get("body", "")
                if body and isinstance(body, str):
                    lines = body.split("\n")[:6]
                    for line in lines:
                        text.append("\n")
                        text.append("  << ", style="#22c55e")
                        text.append(_truncate(line, MAX_LINE_LENGTH - 5), style="dim")
                    if len(body.split("\n")) > 6:
                        text.append("\n")
                        text.append("  ...", style="dim italic")
        css_classes = cls.get_css_classes(status)
        return Static(text, classes=css_classes)
@@ -160,45 +275,99 @@ class RepeatRequestRenderer(BaseToolRenderer):
    css_classes: ClassVar[list[str]] = ["tool-call", "proxy-tool"]
    @classmethod
-    def render(cls, tool_data: dict[str, Any]) -> Static:
+    def render(cls, tool_data: dict[str, Any]) -> Static:  # noqa: PLR0912, PLR0915
        args = tool_data.get("args", {})
        result = tool_data.get("result")
        status = tool_data.get("status", "running")
-        modifications = args.get("modifications", {})
+        request_id = args.get("request_id", "")
        modifications = args.get("modifications")
        text = Text()
-        text.append("🔄 ")
+        text.append(PROXY_ICON, style="dim")
-        text.append("Repeating request", style="bold #06b6d4")
+        text.append(" repeating request", style="#06b6d4")
-        if isinstance(result, str) and result.strip():
+        if request_id:
-            text.append("\n  ")
+            text.append(f" #{request_id}", style="dim")
            text.append(result.strip(), style="dim")
        elif result and isinstance(result, dict):
            status_code = result.get("status_code")
            response_body = result.get("body", "")
-            if status_code:
+        if modifications and isinstance(modifications, dict):
-                text.append("\n  ")
+            text.append("\n  modifications:", style="dim italic")
-                text.append(f"Status: {status_code}", style="dim")
+
-                if response_body:
+            if "url" in modifications:
-                    body_preview = (
+                text.append("\n")
-                        response_body[:2000] + "..." if len(response_body) > 2000 else response_body
+                text.append("  >> ", style="#3b82f6")
-                    )
+                text.append(f"url: {_truncate(str(modifications['url']), 180)}", style="dim")
-                    text.append("\n  ")
+
-                    text.append(body_preview, style="dim")
+            if "headers" in modifications and isinstance(modifications["headers"], dict):
                for k, v in list(modifications["headers"].items())[:5]:
                    text.append("\n")
                    text.append("  >> ", style="#3b82f6")
                    text.append(f"{k}: {_sanitize(str(v), 150)}", style="dim")
            if "cookies" in modifications and isinstance(modifications["cookies"], dict):
                for k, v in list(modifications["cookies"].items())[:5]:
                    text.append("\n")
                    text.append("  >> ", style="#3b82f6")
                    text.append(f"cookie {k}={_sanitize(str(v), 100)}", style="dim")
            if "params" in modifications and isinstance(modifications["params"], dict):
                for k, v in list(modifications["params"].items())[:5]:
                    text.append("\n")
                    text.append("  >> ", style="#3b82f6")
                    text.append(f"param {k}={_sanitize(str(v), 100)}", style="dim")
            if "body" in modifications and isinstance(modifications["body"], str):
                text.append("\n")
                text.append("  >> ", style="#3b82f6")
                body_lines = modifications["body"].split("\n")[:4]
                for i, line in enumerate(body_lines):
                    if i > 0:
                        text.append("\n")
                        text.append("     ", style="dim")
                    text.append(_truncate(line, MAX_LINE_LENGTH), style="dim")
                if len(modifications["body"].split("\n")) > 4:
                    text.append(" ...", style="dim italic")
        elif modifications and isinstance(modifications, str):
            text.append(f"\n  {_truncate(modifications, 200)}", style="dim italic")
        if status == "completed" and isinstance(result, dict):
            if "error" in result:
                text.append(f"\n  error: {_sanitize(str(result['error']), 150)}", style="#ef4444")
            else:
-                text.append("\n  ")
+                req = result.get("request", {})
-                text.append("Response received", style="dim")
+                method = req.get("method", "")
-        elif modifications:
+                url = req.get("url", "")
-            mod_str = str(modifications)
+                code = result.get("status_code")
-            mod_display = mod_str[:500] + "..." if len(mod_str) > 500 else mod_str
+                time_ms = result.get("response_time_ms")
            text.append("\n  ")
            text.append(mod_display, style="dim")
        else:
            text.append("\n  ")
            text.append("No modifications", style="dim")
-        css_classes = cls.get_css_classes("completed")
+                text.append("\n")
                text.append("  >> ", style="#3b82f6")
                if method:
                    text.append(f"{method} ", style="#a78bfa")
                if url:
                    text.append(_truncate(url, 180), style="dim")
                text.append("\n")
                text.append("  << ", style="#22c55e")
                if code:
                    text.append(f"{code}", style=_status_style(code))
                if time_ms:
                    text.append(f" ({time_ms}ms)", style="dim")
                body = result.get("body", "")
                if body and isinstance(body, str):
                    lines = body.split("\n")[:5]
                    for line in lines:
                        text.append("\n")
                        text.append("  << ", style="#22c55e")
                        text.append(_truncate(line, MAX_LINE_LENGTH - 5), style="dim")
                    if len(body.split("\n")) > 5:
                        text.append("\n")
                        text.append("  ...", style="dim italic")
        css_classes = cls.get_css_classes(status)
        return Static(text, classes=css_classes)
@@ -208,14 +377,87 @@ class ScopeRulesRenderer(BaseToolRenderer):
    css_classes: ClassVar[list[str]] = ["tool-call", "proxy-tool"]
    @classmethod
-    def render(cls, tool_data: dict[str, Any]) -> Static:  # noqa: ARG003
+    def render(cls, tool_data: dict[str, Any]) -> Static:  # noqa: PLR0912, PLR0915
-        text = Text()
+        args = tool_data.get("args", {})
-        text.append("⚙️ ")
+        result = tool_data.get("result")
-        text.append("Updating proxy scope", style="bold #06b6d4")
+        status = tool_data.get("status", "running")
        text.append("\n  ")
        text.append("Configuring...", style="dim")
-        css_classes = cls.get_css_classes("completed")
+        action = args.get("action", "")
        scope_name = args.get("scope_name", "")
        scope_id = args.get("scope_id", "")
        allowlist = args.get("allowlist")
        denylist = args.get("denylist")
        text = Text()
        text.append(PROXY_ICON, style="dim")
        action_map = {
            "get": "getting",
            "list": "listing",
            "create": "creating",
            "update": "updating",
            "delete": "deleting",
        }
        action_text = action_map.get(action, action + "ing" if action else "managing")
        text.append(f" {action_text} proxy scope", style="#06b6d4")
        if scope_name:
            text.append(f" '{_truncate(scope_name, 50)}'", style="dim italic")
        if scope_id and isinstance(scope_id, str):
            text.append(f" #{scope_id[:8]}", style="dim")
        if allowlist and isinstance(allowlist, list):
            allow_str = ", ".join(_truncate(str(a), 40) for a in allowlist[:4])
            text.append(f"\n  allow: {allow_str}", style="dim")
            if len(allowlist) > 4:
                text.append(f" +{len(allowlist) - 4}", style="dim italic")
        if denylist and isinstance(denylist, list):
            deny_str = ", ".join(_truncate(str(d), 40) for d in denylist[:4])
            text.append(f"\n  deny: {deny_str}", style="dim")
            if len(denylist) > 4:
                text.append(f" +{len(denylist) - 4}", style="dim italic")
        if status == "completed" and isinstance(result, dict):
            if "error" in result:
                text.append(f"  error: {_sanitize(str(result['error']), 150)}", style="#ef4444")
            elif "scopes" in result:
                scopes = result.get("scopes", [])
                text.append(f"  [{len(scopes)} scopes]", style="dim")
                if scopes and isinstance(scopes, list):
                    text.append("\n")
                    for i, scope in enumerate(scopes[:5]):
                        if not isinstance(scope, dict):
                            continue
                        name = scope.get("name", "?")
                        allow = scope.get("allowlist") or []
                        text.append("  ")
                        text.append(_truncate(str(name), 40), style="#22c55e")
                        if allow and isinstance(allow, list):
                            allow_str = ", ".join(_truncate(str(a), 30) for a in allow[:3])
                            text.append(f"  {allow_str}", style="dim")
                            if len(allow) > 3:
                                text.append(f" +{len(allow) - 3}", style="dim italic")
                        if i < min(len(scopes), 5) - 1:
                            text.append("\n")
            elif "scope" in result:
                scope = result.get("scope") or {}
                if isinstance(scope, dict):
                    allow = scope.get("allowlist") or []
                    deny = scope.get("denylist") or []
                    if allow and isinstance(allow, list):
                        allow_str = ", ".join(_truncate(str(a), 40) for a in allow[:5])
                        text.append(f"\n  allow: {allow_str}", style="dim")
                    if deny and isinstance(deny, list):
                        deny_str = ", ".join(_truncate(str(d), 40) for d in deny[:5])
                        text.append(f"\n  deny: {deny_str}", style="dim")
            elif "message" in result:
                text.append(f"  {result['message']}", style="#22c55e")
        css_classes = cls.get_css_classes(status)
        return Static(text, classes=css_classes)
@@ -225,36 +467,81 @@ class ListSitemapRenderer(BaseToolRenderer):
    css_classes: ClassVar[list[str]] = ["tool-call", "proxy-tool"]
    @classmethod
-    def render(cls, tool_data: dict[str, Any]) -> Static:
+    def render(cls, tool_data: dict[str, Any]) -> Static:  # noqa: PLR0912, PLR0915
        args = tool_data.get("args", {})
        result = tool_data.get("result")
        status = tool_data.get("status", "running")
        parent_id = args.get("parent_id")
        scope_id = args.get("scope_id")
        depth = args.get("depth")
        text = Text()
-        text.append("🗺️ ")
+        text.append(PROXY_ICON, style="dim")
-        text.append("Listing sitemap", style="bold #06b6d4")
+        text.append(" listing sitemap", style="#06b6d4")
-        if isinstance(result, str) and result.strip():
+        if parent_id:
-            text.append("\n  ")
+            text.append(f"  under #{_truncate(str(parent_id), 20)}", style="dim")
-            text.append(result.strip(), style="dim")
+
-        elif result and isinstance(result, dict) and "entries" in result:
+        meta_parts = []
-            entries = result["entries"]
+        if scope_id and isinstance(scope_id, str):
-            if isinstance(entries, list) and entries:
+            meta_parts.append(f"scope:{scope_id[:8]}")
-                for entry in entries[:30]:
+        if depth and depth != "DIRECT":
-                    if isinstance(entry, dict):
+            meta_parts.append(depth.lower())
-                        label = entry.get("label", "?")
+        if meta_parts:
-                        kind = entry.get("kind", "?")
+            text.append(f"  ({', '.join(meta_parts)})", style="dim")
-                        text.append("\n  ")
+
-                        text.append(f"{kind}: {label}", style="dim")
+        if status == "completed" and isinstance(result, dict):
-                if len(entries) > 30:
+            if "error" in result:
-                    text.append("\n  ")
+                text.append(f"  error: {_sanitize(str(result['error']), 150)}", style="#ef4444")
                    text.append(f"... +{len(entries) - 30} more entries", style="dim")
            else:
-                text.append("\n  ")
+                total = result.get("total_count", 0)
-                text.append("No entries found", style="dim")
+                entries = result.get("entries", [])
        else:
            text.append("\n  ")
            text.append("Loading...", style="dim")
-        css_classes = cls.get_css_classes("completed")
+                text.append(f"  [{total} entries]", style="dim")
                if entries and isinstance(entries, list):
                    text.append("\n")
                    for i, entry in enumerate(entries[:MAX_REQUESTS_DISPLAY]):
                        if not isinstance(entry, dict):
                            continue
                        kind = entry.get("kind") or "?"
                        label = entry.get("label") or "?"
                        has_children = entry.get("hasDescendants", False)
                        req = entry.get("request") or {}
                        kind_style = {
                            "DOMAIN": "#f59e0b",
                            "DIRECTORY": "#3b82f6",
                            "REQUEST": "#22c55e",
                        }.get(kind, "dim")
                        text.append("  ")
                        kind_abbr = kind[:3] if isinstance(kind, str) else "?"
                        text.append(f"{kind_abbr:3}", style=kind_style)
                        text.append(f" {_truncate(label, 150)}", style="dim")
                        if req:
                            method = req.get("method", "")
                            code = req.get("status")
                            if method:
                                text.append(f" {method}", style="#a78bfa")
                            if code:
                                text.append(f" {code}", style=_status_style(code))
                        if has_children:
                            text.append(" +", style="dim italic")
                        if i < min(len(entries), MAX_REQUESTS_DISPLAY) - 1:
                            text.append("\n")
                    if len(entries) > MAX_REQUESTS_DISPLAY:
                        text.append("\n")
                        text.append(
                            f"  ... +{len(entries) - MAX_REQUESTS_DISPLAY} more", style="dim italic"
                        )
        css_classes = cls.get_css_classes(status)
        return Static(text, classes=css_classes)
@@ -264,33 +551,60 @@ class ViewSitemapEntryRenderer(BaseToolRenderer):
    css_classes: ClassVar[list[str]] = ["tool-call", "proxy-tool"]
    @classmethod
-    def render(cls, tool_data: dict[str, Any]) -> Static:
+    def render(cls, tool_data: dict[str, Any]) -> Static:  # noqa: PLR0912
        args = tool_data.get("args", {})
        result = tool_data.get("result")
        status = tool_data.get("status", "running")
        entry_id = args.get("entry_id", "")
        text = Text()
-        text.append("📍 ")
+        text.append(PROXY_ICON, style="dim")
-        text.append("Viewing sitemap entry", style="bold #06b6d4")
+        text.append(" viewing sitemap", style="#06b6d4")
-        if isinstance(result, str) and result.strip():
+        if entry_id:
-            text.append("\n  ")
+            text.append(f" #{_truncate(str(entry_id), 20)}", style="dim")
-            text.append(result.strip(), style="dim")
+
-        elif result and isinstance(result, dict) and "entry" in result:
+        if status == "completed" and isinstance(result, dict):
-            entry = result["entry"]
+            if "error" in result:
-            if isinstance(entry, dict):
+                text.append(f"  error: {_sanitize(str(result['error']), 150)}", style="#ef4444")
-                label = entry.get("label", "")
+            elif "entry" in result:
                entry = result.get("entry") or {}
                if not isinstance(entry, dict):
                    entry = {}
                kind = entry.get("kind", "")
-                if label and kind:
+                label = entry.get("label", "")
-                    text.append("\n  ")
+                related = entry.get("related_requests") or {}
-                    text.append(f"{kind}: {label}", style="dim")
+                related_reqs = related.get("requests", []) if isinstance(related, dict) else []
-                else:
+                total_related = related.get("total_count", 0) if isinstance(related, dict) else 0
                    text.append("\n  ")
                    text.append("Entry details loaded", style="dim")
            else:
                text.append("\n  ")
                text.append("Entry details loaded", style="dim")
        else:
            text.append("\n  ")
            text.append("Loading...", style="dim")
-        css_classes = cls.get_css_classes("completed")
+                if kind and label:
                    text.append(f"  {kind}: {_truncate(label, 120)}", style="dim")
                if total_related:
                    text.append(f"  [{total_related} requests]", style="dim")
                if related_reqs and isinstance(related_reqs, list):
                    text.append("\n")
                    for i, req in enumerate(related_reqs[:10]):
                        if not isinstance(req, dict):
                            continue
                        method = req.get("method", "?")
                        path = req.get("path", "/")
                        code = req.get("status")
                        text.append("  ")
                        text.append(f"{method:6}", style="#a78bfa")
                        text.append(f" {_truncate(path, 180)}", style="dim")
                        if code:
                            text.append(f" {code}", style=_status_style(code))
                        if i < min(len(related_reqs), 10) - 1:
                            text.append("\n")
                    if len(related_reqs) > 10:
                        text.append("\n")
                        text.append(f"  ... +{len(related_reqs) - 10} more", style="dim italic")
        css_classes = cls.get_css_classes(status)
        return Static(text, classes=css_classes)
--- a/strix/interface/tool_components/python_renderer.py
+++ b/strix/interface/tool_components/python_renderer.py
@@ -14,6 +14,8 @@ from .registry import register_tool_renderer
 MAX_OUTPUT_LINES = 50
 MAX_LINE_LENGTH = 200
 ANSI_PATTERN = re.compile(r"\x1b(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~]|\][^\x07]*\x07)")
 STRIP_PATTERNS = [
    r"\.\.\. \[(stdout|stderr|result|output|error) truncated at \d+k? chars\]",
 ]
@@ -25,31 +27,32 @@ def _get_style_colors() -> dict[Any, str]:
    return {token: f"#{style_def['color']}" for token, style_def in style if style_def["color"]}
@cache
 def _get_lexer() -> PythonLexer:
    return PythonLexer()
@cache
 def _get_token_color(token_type: Any) -> str | None:
    colors = _get_style_colors()
    while token_type:
        if token_type in colors:
            return colors[token_type]
        token_type = token_type.parent
    return None
@register_tool_renderer
 class PythonRenderer(BaseToolRenderer):
    tool_name: ClassVar[str] = "python_action"
    css_classes: ClassVar[list[str]] = ["tool-call", "python-tool"]
    @classmethod
    def _get_token_color(cls, token_type: Any) -> str | None:
        colors = _get_style_colors()
        while token_type:
            if token_type in colors:
                return colors[token_type]
            token_type = token_type.parent
        return None
    @classmethod
    def _highlight_python(cls, code: str) -> Text:
        lexer = PythonLexer()
        text = Text()
-
+        for token_type, token_value in _get_lexer().get_tokens(code):
-        for token_type, token_value in lexer.get_tokens(code):
+            if token_value:
-            if not token_value:
+                text.append(token_value, style=_get_token_color(token_type))
                continue
            color = cls._get_token_color(token_type)
            text.append(token_value, style=color)
        return text
    @classmethod
@@ -59,11 +62,16 @@ class PythonRenderer(BaseToolRenderer):
            cleaned = re.sub(pattern, "", cleaned)
        return cleaned.strip()
    @classmethod
    def _strip_ansi(cls, text: str) -> str:
        return ANSI_PATTERN.sub("", text)
    @classmethod
    def _truncate_line(cls, line: str) -> str:
-        if len(line) > MAX_LINE_LENGTH:
+        clean_line = cls._strip_ansi(line)
-            return line[: MAX_LINE_LENGTH - 3] + "..."
+        if len(clean_line) > MAX_LINE_LENGTH:
-        return line
+            return clean_line[: MAX_LINE_LENGTH - 3] + "..."
        return clean_line
    @classmethod
    def _format_output(cls, output: str) -> Text:
@@ -112,22 +120,13 @@ class PythonRenderer(BaseToolRenderer):
            return
        stdout = result.get("stdout", "")
        stderr = result.get("stderr", "")
        stdout = cls._clean_output(stdout) if stdout else ""
        stderr = cls._clean_output(stderr) if stderr else ""
        if stdout:
            text.append("\n")
            formatted_output = cls._format_output(stdout)
            text.append_text(formatted_output)
        if stderr:
            text.append("\n")
            text.append("  stderr: ", style="bold #ef4444")
            formatted_stderr = cls._format_output(stderr)
            text.append_text(formatted_stderr)
    @classmethod
    def render(cls, tool_data: dict[str, Any]) -> Static:
        args = tool_data.get("args", {})
--- a/strix/interface/tool_components/reporting_renderer.py
+++ b/strix/interface/tool_components/reporting_renderer.py
@@ -3,10 +3,14 @@ from typing import Any, ClassVar
 from pygments.lexers import PythonLexer
 from pygments.styles import get_style_by_name
 from rich.padding import Padding
 from rich.text import Text
 from textual.widgets import Static
 from strix.tools.reporting.reporting_actions import (
    parse_code_locations_xml,
    parse_cvss_xml,
 )
 from .base_renderer import BaseToolRenderer
 from .registry import register_tool_renderer
@@ -18,7 +22,13 @@ def _get_style_colors() -> dict[Any, str]:
 FIELD_STYLE = "bold #4ade80"
-BG_COLOR = "#141414"
+DIM_STYLE = "dim"
 FILE_STYLE = "bold #60a5fa"
 LINE_STYLE = "#facc15"
 LABEL_STYLE = "italic #a1a1aa"
 CODE_STYLE = "#e2e8f0"
 BEFORE_STYLE = "#ef4444"
 AFTER_STYLE = "#22c55e"
@register_tool_renderer
@@ -82,18 +92,13 @@ class CreateVulnerabilityReportRenderer(BaseToolRenderer):
        poc_script_code = args.get("poc_script_code", "")
        remediation_steps = args.get("remediation_steps", "")
-        attack_vector = args.get("attack_vector", "")
+        cvss_breakdown_xml = args.get("cvss_breakdown", "")
-        attack_complexity = args.get("attack_complexity", "")
+        code_locations_xml = args.get("code_locations", "")
        privileges_required = args.get("privileges_required", "")
        user_interaction = args.get("user_interaction", "")
        scope = args.get("scope", "")
        confidentiality = args.get("confidentiality", "")
        integrity = args.get("integrity", "")
        availability = args.get("availability", "")
        endpoint = args.get("endpoint", "")
        method = args.get("method", "")
        cve = args.get("cve", "")
        cwe = args.get("cwe", "")
        severity = ""
        cvss_score = None
@@ -142,38 +147,30 @@ class CreateVulnerabilityReportRenderer(BaseToolRenderer):
            text.append("CVE: ", style=FIELD_STYLE)
            text.append(cve)
-        if any(
+        if cwe:
-            [
+            text.append("\n\n")
-                attack_vector,
+            text.append("CWE: ", style=FIELD_STYLE)
-                attack_complexity,
+            text.append(cwe)
-                privileges_required,
+
-                user_interaction,
+        parsed_cvss = parse_cvss_xml(cvss_breakdown_xml) if cvss_breakdown_xml else None
-                scope,
+        if parsed_cvss:
                confidentiality,
                integrity,
                availability,
            ]
        ):
            text.append("\n\n")
            cvss_parts = []
-            if attack_vector:
+            for key, prefix in [
-                cvss_parts.append(f"AV:{attack_vector}")
+                ("attack_vector", "AV"),
-            if attack_complexity:
+                ("attack_complexity", "AC"),
-                cvss_parts.append(f"AC:{attack_complexity}")
+                ("privileges_required", "PR"),
-            if privileges_required:
+                ("user_interaction", "UI"),
-                cvss_parts.append(f"PR:{privileges_required}")
+                ("scope", "S"),
-            if user_interaction:
+                ("confidentiality", "C"),
-                cvss_parts.append(f"UI:{user_interaction}")
+                ("integrity", "I"),
-            if scope:
+                ("availability", "A"),
-                cvss_parts.append(f"S:{scope}")
+            ]:
-            if confidentiality:
+                val = parsed_cvss.get(key)
-                cvss_parts.append(f"C:{confidentiality}")
+                if val:
-            if integrity:
+                    cvss_parts.append(f"{prefix}:{val}")
                cvss_parts.append(f"I:{integrity}")
            if availability:
                cvss_parts.append(f"A:{availability}")
            text.append("CVSS Vector: ", style=FIELD_STYLE)
-            text.append("/".join(cvss_parts), style="dim")
+            text.append("/".join(cvss_parts), style=DIM_STYLE)
        if description:
            text.append("\n\n")
@@ -193,6 +190,40 @@ class CreateVulnerabilityReportRenderer(BaseToolRenderer):
            text.append("\n")
            text.append(technical_analysis)
        parsed_locations = (
            parse_code_locations_xml(code_locations_xml) if code_locations_xml else None
        )
        if parsed_locations:
            text.append("\n\n")
            text.append("Code Locations", style=FIELD_STYLE)
            for i, loc in enumerate(parsed_locations):
                text.append("\n\n")
                text.append(f"  Location {i + 1}: ", style=DIM_STYLE)
                text.append(loc.get("file", "unknown"), style=FILE_STYLE)
                start = loc.get("start_line")
                end = loc.get("end_line")
                if start is not None:
                    if end and end != start:
                        text.append(f":{start}-{end}", style=LINE_STYLE)
                    else:
                        text.append(f":{start}", style=LINE_STYLE)
                if loc.get("label"):
                    text.append(f"\n  {loc['label']}", style=LABEL_STYLE)
                if loc.get("snippet"):
                    text.append("\n  ")
                    text.append(loc["snippet"], style=CODE_STYLE)
                if loc.get("fix_before") or loc.get("fix_after"):
                    text.append("\n  ")
                    text.append("Fix:", style=DIM_STYLE)
                    if loc.get("fix_before"):
                        text.append("\n  ")
                        text.append("- ", style=BEFORE_STYLE)
                        text.append(loc["fix_before"], style=BEFORE_STYLE)
                    if loc.get("fix_after"):
                        text.append("\n  ")
                        text.append("+ ", style=AFTER_STYLE)
                        text.append(loc["fix_after"], style=AFTER_STYLE)
        if poc_description:
            text.append("\n\n")
            text.append("PoC Description", style=FIELD_STYLE)
@@ -215,7 +246,10 @@ class CreateVulnerabilityReportRenderer(BaseToolRenderer):
            text.append("\n  ")
            text.append("Creating report...", style="dim")
-        padded = Padding(text, 2, style=f"on {BG_COLOR}")
+        padded = Text()
        padded.append("\n\n")
        padded.append_text(text)
        padded.append("\n\n")
        css_classes = cls.get_css_classes("completed")
        return Static(padded, classes=css_classes)
--- a/strix/interface/tool_components/scan_info_renderer.py
+++ b/strix/interface/tool_components/scan_info_renderer.py
@@ -19,7 +19,8 @@ class ScanStartInfoRenderer(BaseToolRenderer):
        targets = args.get("targets", [])
        text = Text()
-        text.append("🚀 Starting penetration test")
+        text.append("◈ ", style="#22c55e")
        text.append("Starting penetration test")
        if len(targets) == 1:
            text.append(" on ")
--- a/strix/interface/tui.py
+++ b/strix/interface/tui.py
@@ -18,7 +18,7 @@ from rich.align import Align
 from rich.console import Group
 from rich.panel import Panel
 from rich.style import Style
-from rich.text import Text
+from rich.text import Span, Text
 from textual import events, on
 from textual.app import App, ComposeResult
 from textual.binding import Binding
@@ -29,11 +29,18 @@ from textual.widgets import Button, Label, Static, TextArea, Tree
 from textual.widgets.tree import TreeNode
 from strix.agents.StrixAgent import StrixAgent
 from strix.interface.streaming_parser import parse_streaming_content
 from strix.interface.tool_components.agent_message_renderer import AgentMessageRenderer
 from strix.interface.tool_components.registry import get_tool_renderer
 from strix.interface.tool_components.user_message_renderer import UserMessageRenderer
 from strix.interface.utils import build_tui_stats_text
 from strix.llm.config import LLMConfig
 from strix.telemetry.tracer import Tracer, set_global_tracer
 logger = logging.getLogger(__name__)
 def get_package_version() -> str:
    try:
        return pkg_version("strix-agent")
@@ -87,6 +94,7 @@ class ChatTextArea(TextArea):  # type: ignore[misc]
 class SplashScreen(Static):  # type: ignore[misc]
    ALLOW_SELECT = False
    PRIMARY_GREEN = "#22c55e"
    BANNER = (
        " ███████╗████████╗██████╗ ██╗██╗  ██╗\n"
@@ -188,7 +196,7 @@ class SplashScreen(Static):  # type: ignore[misc]
 class HelpScreen(ModalScreen):  # type: ignore[misc]
    def compose(self) -> ComposeResult:
        yield Grid(
-            Label("🦉 Strix Help", id="help_title"),
+            Label("Strix Help", id="help_title"),
            Label(
                "F1        Help\nCtrl+Q/C  Quit\nESC       Stop Agent\n"
                "Enter     Send message to agent\nTab       Switch panels\n↑/↓       Navigate tree",
@@ -244,10 +252,9 @@ class StopAgentScreen(ModalScreen):  # type: ignore[misc]
            event.prevent_default()
    def on_button_pressed(self, event: Button.Pressed) -> None:
        self.app.pop_screen()
        if event.button.id == "stop_agent":
            self.app.action_confirm_stop_agent(self.agent_id)
        else:
            self.app.pop_screen()
 class VulnerabilityDetailScreen(ModalScreen):  # type: ignore[misc]
@@ -523,16 +530,30 @@ class VulnerabilityDetailScreen(ModalScreen):  # type: ignore[misc]
                lines.append("```")
        # Code Analysis
-        if vuln.get("code_file") or vuln.get("code_diff"):
+        if vuln.get("code_locations"):
            lines.extend(["", "## Code Analysis", ""])
-            if vuln.get("code_file"):
+            for i, loc in enumerate(vuln["code_locations"]):
-                lines.append(f"**File:** {vuln['code_file']}")
+                file_ref = loc.get("file", "unknown")
                line_ref = ""
                if loc.get("start_line") is not None:
                    if loc.get("end_line") and loc["end_line"] != loc["start_line"]:
                        line_ref = f" (lines {loc['start_line']}-{loc['end_line']})"
                    else:
                        line_ref = f" (line {loc['start_line']})"
                lines.append(f"**Location {i + 1}:** `{file_ref}`{line_ref}")
                if loc.get("label"):
                    lines.append(f"  {loc['label']}")
                if loc.get("snippet"):
                    lines.append(f"```\n{loc['snippet']}\n```")
                if loc.get("fix_before") or loc.get("fix_after"):
                    lines.append("**Suggested Fix:**")
                    lines.append("```diff")
                    if loc.get("fix_before"):
                        lines.extend(f"- {line}" for line in loc["fix_before"].splitlines())
                    if loc.get("fix_after"):
                        lines.extend(f"+ {line}" for line in loc["fix_after"].splitlines())
                    lines.append("```")
                lines.append("")
            if vuln.get("code_diff"):
                lines.append("**Changes:**")
                lines.append("```diff")
                lines.append(vuln["code_diff"])
                lines.append("```")
        # Remediation
        if vuln.get("remediation_steps"):
@@ -663,8 +684,9 @@ class QuitScreen(ModalScreen):  # type: ignore[misc]
 class StrixTUIApp(App):  # type: ignore[misc]
    CSS_PATH = "assets/tui_styles.tcss"
    ALLOW_SELECT = True
-    SIDEBAR_MIN_WIDTH = 100
+    SIDEBAR_MIN_WIDTH = 120
    selected_agent_id: reactive[str | None] = reactive(default=None)
    show_splash: reactive[bool] = reactive(default=True)
@@ -691,6 +713,9 @@ class StrixTUIApp(App):  # type: ignore[misc]
        self._displayed_agents: set[str] = set()
        self._displayed_events: list[str] = []
        self._streaming_render_cache: dict[str, tuple[int, Any]] = {}
        self._last_streaming_len: dict[str, int] = {}
        self._scan_thread: threading.Thread | None = None
        self._scan_stop_event = threading.Event()
        self._scan_completed = threading.Event()
@@ -717,11 +742,16 @@ class StrixTUIApp(App):  # type: ignore[misc]
            "targets": args.targets_info,
            "user_instructions": args.instruction or "",
            "run_name": args.run_name,
            "diff_scope": getattr(args, "diff_scope", {"active": False}),
        }
    def _build_agent_config(self, args: argparse.Namespace) -> dict[str, Any]:
        scan_mode = getattr(args, "scan_mode", "deep")
-        llm_config = LLMConfig(scan_mode=scan_mode)
+        llm_config = LLMConfig(
            scan_mode=scan_mode,
            interactive=True,
            is_whitebox=bool(getattr(args, "local_sources", [])),
        )
        config = {
            "llm_config": llm_config,
@@ -735,7 +765,10 @@ class StrixTUIApp(App):  # type: ignore[misc]
    def _setup_cleanup_handlers(self) -> None:
        def cleanup_on_exit() -> None:
            from strix.runtime import cleanup_runtime
            self.tracer.cleanup()
            cleanup_runtime()
        def signal_handler(_signum: int, _frame: Any) -> None:
            self.tracer.cleanup()
@@ -773,13 +806,16 @@ class StrixTUIApp(App):  # type: ignore[misc]
            chat_history.can_focus = True
            status_text = Static("", id="status_text")
            status_text.ALLOW_SELECT = False
            keymap_indicator = Static("", id="keymap_indicator")
            keymap_indicator.ALLOW_SELECT = False
            agent_status_display = Horizontal(
                status_text, keymap_indicator, id="agent_status_display", classes="hidden"
            )
            chat_prompt = Static("> ", id="chat_prompt")
            chat_prompt.ALLOW_SELECT = False
            chat_input = ChatTextArea(
                "",
                id="chat_input",
@@ -788,7 +824,7 @@ class StrixTUIApp(App):  # type: ignore[misc]
            chat_input.set_app_reference(self)
            chat_input_container = Horizontal(chat_prompt, chat_input, id="chat_input_container")
-            agents_tree = Tree("🤖 Active Agents", id="agents_tree")
+            agents_tree = Tree("Agents", id="agents_tree")
            agents_tree.root.expand()
            agents_tree.show_root = False
@@ -797,10 +833,11 @@ class StrixTUIApp(App):  # type: ignore[misc]
            agents_tree.guide_style = "dashed"
            stats_display = Static("", id="stats_display")
            stats_scroll = VerticalScroll(stats_display, id="stats_scroll")
            vulnerabilities_panel = VulnerabilitiesPanel(id="vulnerabilities_panel")
-            sidebar = Vertical(agents_tree, vulnerabilities_panel, stats_display, id="sidebar")
+            sidebar = Vertical(agents_tree, vulnerabilities_panel, stats_scroll, id="sidebar")
            content_container.mount(chat_area_container)
            content_container.mount(sidebar)
@@ -853,7 +890,7 @@ class StrixTUIApp(App):  # type: ignore[misc]
        self._start_scan_thread()
-        self.set_interval(0.25, self._update_ui_from_tracer)
+        self.set_interval(0.35, self._update_ui_from_tracer)
    def _update_ui_from_tracer(self) -> None:
        if self.show_splash:
@@ -904,16 +941,16 @@ class StrixTUIApp(App):  # type: ignore[misc]
            status = agent_data.get("status", "running")
            status_indicators = {
-                "running": "🟢",
+                "running": "⚪",
-                "waiting": "⏸️",
+                "waiting": "⏸",
-                "completed": "✅",
+                "completed": "🟢",
-                "failed": "❌",
+                "failed": "🔴",
-                "stopped": "⏹️",
+                "stopped": "■",
-                "stopping": "⏸️",
+                "stopping": "○",
                "llm_failed": "🔴",
            }
-            status_icon = status_indicators.get(status, "🔵")
+            status_icon = status_indicators.get(status, "○")
            vuln_count = self._agent_vulnerability_count(agent_id)
            vuln_indicator = f" ({vuln_count})" if vuln_count > 0 else ""
            agent_name = f"{status_icon} {agent_name_raw}{vuln_indicator}"
@@ -946,11 +983,17 @@ class StrixTUIApp(App):  # type: ignore[misc]
            )
        current_event_ids = [e["id"] for e in events]
        current_streaming_len = len(streaming) if streaming else 0
        last_streaming_len = self._last_streaming_len.get(self.selected_agent_id, 0)
-        if not streaming and current_event_ids == self._displayed_events:
+        if (
            current_event_ids == self._displayed_events
            and current_streaming_len == last_streaming_len
        ):
            return None, None
        self._displayed_events = current_event_ids
        self._last_streaming_len[self.selected_agent_id] = current_streaming_len
        return self._get_rendered_events_content(events), "chat-content"
    def _update_chat_view(self) -> None:
@@ -989,6 +1032,57 @@ class StrixTUIApp(App):  # type: ignore[misc]
        text.append(message)
        return text, f"chat-placeholder {placeholder_class}"
    @staticmethod
    def _merge_renderables(renderables: list[Any]) -> Text:
        """Merge renderables into a single Text for mouse text selection support."""
        combined = Text()
        for i, item in enumerate(renderables):
            if i > 0:
                combined.append("\n")
            StrixTUIApp._append_renderable(combined, item)
        return StrixTUIApp._sanitize_text(combined)
    @staticmethod
    def _sanitize_text(text: Text) -> Text:
        """Clamp spans so Rich/Textual can't crash on malformed offsets."""
        plain = text.plain
        text_length = len(plain)
        sanitized_spans: list[Span] = []
        for span in text.spans:
            start = max(0, min(span.start, text_length))
            end = max(0, min(span.end, text_length))
            if end > start:
                sanitized_spans.append(Span(start, end, span.style))
        return Text(
            plain,
            style=text.style,
            justify=text.justify,
            overflow=text.overflow,
            no_wrap=text.no_wrap,
            end=text.end,
            tab_size=text.tab_size,
            spans=sanitized_spans,
        )
    @staticmethod
    def _append_renderable(combined: Text, item: Any) -> None:
        """Recursively append a renderable's text content to a combined Text."""
        if isinstance(item, Text):
            combined.append_text(StrixTUIApp._sanitize_text(item))
        elif isinstance(item, Group):
            for j, sub in enumerate(item.renderables):
                if j > 0:
                    combined.append("\n")
                StrixTUIApp._append_renderable(combined, sub)
        else:
            inner = getattr(item, "content", None) or getattr(item, "renderable", None)
            if inner is not None:
                StrixTUIApp._append_renderable(combined, inner)
            else:
                combined.append(str(item))
    def _get_rendered_events_content(self, events: list[dict[str, Any]]) -> Any:
        renderables: list[Any] = []
@@ -1020,23 +1114,25 @@ class StrixTUIApp(App):  # type: ignore[misc]
        if not renderables:
            return Text()
-        if len(renderables) == 1:
+        if len(renderables) == 1 and isinstance(renderables[0], Text):
-            return renderables[0]
+            return self._sanitize_text(renderables[0])
-        return Group(*renderables)
+        return self._merge_renderables(renderables)
-    def _render_streaming_content(self, content: str) -> Any:
+    def _render_streaming_content(self, content: str, agent_id: str | None = None) -> Any:
-        from strix.interface.streaming_parser import parse_streaming_content
+        cache_key = agent_id or self.selected_agent_id or ""
        content_len = len(content)
        if cache_key in self._streaming_render_cache:
            cached_len, cached_output = self._streaming_render_cache[cache_key]
            if cached_len == content_len:
                return cached_output
        renderables: list[Any] = []
        segments = parse_streaming_content(content)
        for segment in segments:
            if segment.type == "text":
                from strix.interface.tool_components.agent_message_renderer import (
                    AgentMessageRenderer,
                )
                text_content = AgentMessageRenderer.render_simple(segment.content)
                if renderables:
                    renderables.append(Text(""))
@@ -1053,18 +1149,18 @@ class StrixTUIApp(App):  # type: ignore[misc]
                renderables.append(tool_renderable)
        if not renderables:
-            return Text()
+            result = Text()
        elif len(renderables) == 1 and isinstance(renderables[0], Text):
            result = self._sanitize_text(renderables[0])
        else:
            result = self._merge_renderables(renderables)
-        if len(renderables) == 1:
+        self._streaming_render_cache[cache_key] = (content_len, result)
-            return renderables[0]
+        return result
        return Group(*renderables)
    def _render_streaming_tool(
        self, tool_name: str, args: dict[str, str], is_complete: bool
    ) -> Any:
        from strix.interface.tool_components.registry import get_tool_renderer
        tool_data = {
            "tool_name": tool_name,
            "args": args,
@@ -1075,7 +1171,7 @@ class StrixTUIApp(App):  # type: ignore[misc]
        renderer = get_tool_renderer(tool_name)
        if renderer:
            widget = renderer.render(tool_data)
-            return widget.renderable
+            return widget.content
        return self._render_default_streaming_tool(tool_name, args, is_complete)
@@ -1204,21 +1300,19 @@ class StrixTUIApp(App):  # type: ignore[misc]
        if not self._is_widget_safe(stats_display):
            return
        if self.screen.selections:
            return
        stats_content = Text()
        stats_text = build_tui_stats_text(self.tracer, self.agent_config)
        if stats_text:
            stats_content.append(stats_text)
-        from rich.panel import Panel
+        version = get_package_version()
        stats_content.append(f"\nv{version}", style="white")
-        stats_panel = Panel(
+        self._safe_widget_operation(stats_display.update, stats_content)
            stats_content,
            border_style="#333333",
            padding=(0, 1),
        )
        self._safe_widget_operation(stats_display.update, stats_panel)
    def _update_vulnerabilities_panel(self) -> None:
        """Update the vulnerabilities panel with current vulnerability data."""
@@ -1395,6 +1489,8 @@ class StrixTUIApp(App):  # type: ignore[misc]
            return
        self._displayed_events.clear()
        self._streaming_render_cache.clear()
        self._last_streaming_len.clear()
        self.call_later(self._update_chat_view)
        self._update_agent_status_display()
@@ -1449,15 +1545,16 @@ class StrixTUIApp(App):  # type: ignore[misc]
        agent_name_raw = agent_data.get("name", "Agent")
        status_indicators = {
-            "running": "🟢",
+            "running": "⚪",
-            "waiting": "🟡",
+            "waiting": "⏸",
-            "completed": "✅",
+            "completed": "🟢",
-            "failed": "❌",
+            "failed": "🔴",
-            "stopped": "⏹️",
+            "stopped": "■",
-            "stopping": "⏸️",
+            "stopping": "○",
            "llm_failed": "🔴",
        }
-        status_icon = status_indicators.get(status, "🔵")
+        status_icon = status_indicators.get(status, "○")
        vuln_count = self._agent_vulnerability_count(agent_id)
        vuln_indicator = f" ({vuln_count})" if vuln_count > 0 else ""
        agent_name = f"{status_icon} {agent_name_raw}{vuln_indicator}"
@@ -1523,15 +1620,16 @@ class StrixTUIApp(App):  # type: ignore[misc]
        status = agent_data.get("status", "running")
        status_indicators = {
-            "running": "🟢",
+            "running": "⚪",
-            "waiting": "🟡",
+            "waiting": "⏸",
-            "completed": "✅",
+            "completed": "🟢",
-            "failed": "❌",
+            "failed": "🔴",
-            "stopped": "⏹️",
+            "stopped": "■",
-            "stopping": "⏸️",
+            "stopping": "○",
            "llm_failed": "🔴",
        }
-        status_icon = status_indicators.get(status, "🔵")
+        status_icon = status_indicators.get(status, "○")
        vuln_count = self._agent_vulnerability_count(agent_id)
        vuln_indicator = f" ({vuln_count})" if vuln_count > 0 else ""
        agent_name = f"{status_icon} {agent_name_raw}{vuln_indicator}"
@@ -1589,8 +1687,6 @@ class StrixTUIApp(App):  # type: ignore[misc]
            return None
        if role == "user":
            from strix.interface.tool_components.user_message_renderer import UserMessageRenderer
            return UserMessageRenderer.render_simple(content)
        if metadata.get("interrupted"):
@@ -1599,9 +1695,7 @@ class StrixTUIApp(App):  # type: ignore[misc]
            interrupted_text.append("\n")
            interrupted_text.append("⚠ ", style="yellow")
            interrupted_text.append("Interrupted by user", style="yellow dim")
-            return Group(streaming_result, interrupted_text)
+            return self._merge_renderables([streaming_result, interrupted_text])
        from strix.interface.tool_components.agent_message_renderer import AgentMessageRenderer
        return AgentMessageRenderer.render_simple(content)
@@ -1611,13 +1705,11 @@ class StrixTUIApp(App):  # type: ignore[misc]
        status = tool_data.get("status", "unknown")
        result = tool_data.get("result")
        from strix.interface.tool_components.registry import get_tool_renderer
        renderer = get_tool_renderer(tool_name)
        if renderer:
            widget = renderer.render(tool_data)
-            return widget.renderable
+            return widget.content
        text = Text()
@@ -1848,8 +1940,6 @@ class StrixTUIApp(App):  # type: ignore[misc]
        return agent_name, False
    def action_confirm_stop_agent(self, agent_id: str) -> None:
        self.pop_screen()
        try:
            from strix.tools.agents_graph.agents_graph_actions import stop_agent
@@ -1912,6 +2002,92 @@ class StrixTUIApp(App):  # type: ignore[misc]
            sidebar.remove_class("-hidden")
            chat_area.remove_class("-full-width")
    def on_mouse_up(self, _event: events.MouseUp) -> None:
        self.set_timer(0.05, self._auto_copy_selection)
    _ICON_PREFIXES: ClassVar[tuple[str, ...]] = (
        "🐞 ",
        "🌐 ",
        "📋 ",
        "🧠 ",
        "◆ ",
        "◇ ",
        "◈ ",
        "→ ",
        "○ ",
        "● ",
        "✓ ",
        "✗ ",
        "⚠ ",
        "▍ ",
        "▍",
        "┃ ",
        "• ",
        ">_ ",
        "</> ",
        "<~> ",
        "[ ] ",
        "[~] ",
        "[•] ",
    )
    _DECORATIVE_LINES: ClassVar[frozenset[str]] = frozenset(
        {
            "● In progress...",
            "✓ Done",
            "✗ Failed",
            "✗ Error",
            "○ Unknown",
        }
    )
    @staticmethod
    def _clean_copied_text(text: str) -> str:
        lines = text.split("\n")
        cleaned: list[str] = []
        for line in lines:
            stripped = line.lstrip()
            if stripped in StrixTUIApp._DECORATIVE_LINES:
                continue
            if stripped and all(c == "─" for c in stripped):
                continue
            out = line
            for prefix in StrixTUIApp._ICON_PREFIXES:
                if stripped.startswith(prefix):
                    leading = line[: len(line) - len(line.lstrip())]
                    out = leading + stripped[len(prefix) :]
                    break
            cleaned.append(out)
        return "\n".join(cleaned)
    def _auto_copy_selection(self) -> None:
        copied = False
        try:
            if self.screen.selections:
                selected = self.screen.get_selected_text()
                self.screen.clear_selection()
                if selected and selected.strip():
                    cleaned = self._clean_copied_text(selected)
                    self.copy_to_clipboard(cleaned if cleaned.strip() else selected)
                    copied = True
        except Exception:  # noqa: BLE001
            logger.debug("Failed to copy screen selection", exc_info=True)
        if not copied:
            try:
                chat_input = self.query_one("#chat_input", ChatTextArea)
                selected = chat_input.selected_text
                if selected and selected.strip():
                    self.copy_to_clipboard(selected)
                    chat_input.move_cursor(chat_input.cursor_location)
                    copied = True
            except Exception:  # noqa: BLE001
                logger.debug("Failed to copy chat input selection", exc_info=True)
        if copied:
            self.notify("Copied to clipboard", timeout=2)
 async def run_tui(args: argparse.Namespace) -> None:
    """Run strix in interactive TUI mode with textual."""
--- a/strix/interface/utils.py
+++ b/strix/interface/utils.py
--- a/strix/llm/config.py
+++ b/strix/llm/config.py
@@ -1,4 +1,8 @@
 from typing import Any
 from strix.config import Config
 from strix.config.config import resolve_llm_config
 from strix.llm.utils import resolve_strix_model
 class LLMConfig:
@@ -9,15 +13,28 @@ class LLMConfig:
        skills: list[str] | None = None,
        timeout: int | None = None,
        scan_mode: str = "deep",
        is_whitebox: bool = False,
        interactive: bool = False,
        reasoning_effort: str | None = None,
        system_prompt_context: dict[str, Any] | None = None,
    ):
-        self.model_name = model_name or Config.get("strix_llm")
+        resolved_model, self.api_key, self.api_base = resolve_llm_config()
        self.model_name = model_name or resolved_model
        if not self.model_name:
            raise ValueError("STRIX_LLM environment variable must be set and not empty")
        api_model, canonical = resolve_strix_model(self.model_name)
        self.litellm_model: str = api_model or self.model_name
        self.canonical_model: str = canonical or self.model_name
        self.enable_prompt_caching = enable_prompt_caching
        self.skills = skills or []
        self.timeout = timeout or int(Config.get("llm_timeout") or "300")
        self.scan_mode = scan_mode if scan_mode in ["quick", "standard", "deep"] else "deep"
        self.is_whitebox = is_whitebox
        self.interactive = interactive
        self.reasoning_effort = reasoning_effort
        self.system_prompt_context = system_prompt_context or {}
--- a/strix/llm/dedupe.py
+++ b/strix/llm/dedupe.py
@@ -5,7 +5,8 @@ from typing import Any
 import litellm
-from strix.config import Config
+from strix.config.config import resolve_llm_config
 from strix.llm.utils import resolve_strix_model
 logger = logging.getLogger(__name__)
@@ -155,14 +156,9 @@ def check_duplicate(
        comparison_data = {"candidate": candidate_cleaned, "existing_reports": existing_cleaned}
-        model_name = Config.get("strix_llm")
+        model_name, api_key, api_base = resolve_llm_config()
-        api_key = Config.get("llm_api_key")
+        litellm_model, _ = resolve_strix_model(model_name)
-        api_base = (
+        litellm_model = litellm_model or model_name
            Config.get("llm_api_base")
            or Config.get("openai_api_base")
            or Config.get("litellm_base_url")
            or Config.get("ollama_api_base")
        )
        messages = [
            {"role": "system", "content": DEDUPE_SYSTEM_PROMPT},
@@ -177,10 +173,9 @@ def check_duplicate(
        ]
        completion_kwargs: dict[str, Any] = {
-            "model": model_name,
+            "model": litellm_model,
            "messages": messages,
            "timeout": 120,
            "temperature": 0,
        }
        if api_key:
            completion_kwargs["api_key"] = api_key
--- a/strix/llm/llm.py
+++ b/strix/llm/llm.py
@@ -1,60 +1,30 @@
 import asyncio
 import logging
 from collections.abc import AsyncIterator
 from dataclasses import dataclass
 from enum import Enum
 from pathlib import Path
 from typing import Any
 import litellm
-from jinja2 import (
+from jinja2 import Environment, FileSystemLoader, select_autoescape
-    Environment,
+from litellm import acompletion, completion_cost, stream_chunk_builder, supports_reasoning
    FileSystemLoader,
    select_autoescape,
 )
 from litellm import completion_cost, stream_chunk_builder, supports_reasoning
 from litellm.utils import supports_prompt_caching, supports_vision
 from strix.config import Config
 from strix.llm.config import LLMConfig
 from strix.llm.memory_compressor import MemoryCompressor
-from strix.llm.request_queue import get_global_queue
+from strix.llm.utils import (
-from strix.llm.utils import _truncate_to_first_function, parse_tool_invocations
+    _truncate_to_first_function,
    fix_incomplete_tool_call,
    normalize_tool_format,
    parse_tool_invocations,
 )
 from strix.skills import load_skills
 from strix.tools import get_tools_prompt
 from strix.utils.resource_paths import get_strix_resource_path
 MAX_RETRIES = 5
 RETRY_MULTIPLIER = 8
 RETRY_MIN = 8
 RETRY_MAX = 64
 def _should_retry(exception: Exception) -> bool:
    status_code = None
    if hasattr(exception, "status_code"):
        status_code = exception.status_code
    elif hasattr(exception, "response") and hasattr(exception.response, "status_code"):
        status_code = exception.response.status_code
    if status_code is not None:
        return bool(litellm._should_retry(status_code))
    return True
 logger = logging.getLogger(__name__)
 litellm.drop_params = True
 litellm.modify_params = True
 _LLM_API_KEY = Config.get("llm_api_key")
 _LLM_API_BASE = (
    Config.get("llm_api_base")
    or Config.get("openai_api_base")
    or Config.get("litellm_base_url")
    or Config.get("ollama_api_base")
 )
 _STRIX_REASONING_EFFORT = Config.get("strix_reasoning_effort")
 class LLMRequestFailedError(Exception):
    def __init__(self, message: str, details: str | None = None):
@@ -63,20 +33,11 @@ class LLMRequestFailedError(Exception):
        self.details = details
 class StepRole(str, Enum):
    AGENT = "agent"
    USER = "user"
    SYSTEM = "system"
@dataclass
 class LLMResponse:
    content: str
    tool_invocations: list[dict[str, Any]] | None = None
-    scan_id: str | None = None
+    thinking_blocks: list[dict[str, Any]] | None = None
    step_number: int = 1
    role: StepRole = StepRole.AGENT
    thinking_blocks: list[dict[str, Any]] | None = None  # For reasoning models.
@dataclass
@@ -84,76 +45,101 @@ class RequestStats:
    input_tokens: int = 0
    output_tokens: int = 0
    cached_tokens: int = 0
    cache_creation_tokens: int = 0
    cost: float = 0.0
    requests: int = 0
    failed_requests: int = 0
    def to_dict(self) -> dict[str, int | float]:
        return {
            "input_tokens": self.input_tokens,
            "output_tokens": self.output_tokens,
            "cached_tokens": self.cached_tokens,
            "cache_creation_tokens": self.cache_creation_tokens,
            "cost": round(self.cost, 4),
            "requests": self.requests,
            "failed_requests": self.failed_requests,
        }
 class LLM:
-    def __init__(
+    def __init__(self, config: LLMConfig, agent_name: str | None = None):
        self, config: LLMConfig, agent_name: str | None = None, agent_id: str | None = None
    ):
        self.config = config
        self.agent_name = agent_name
-        self.agent_id = agent_id
+        self.agent_id: str | None = None
        self._active_skills: list[str] = list(config.skills or [])
        self._system_prompt_context: dict[str, Any] = dict(
            getattr(config, "system_prompt_context", {}) or {}
        )
        self._total_stats = RequestStats()
-        self._last_request_stats = RequestStats()
+        self.memory_compressor = MemoryCompressor(model_name=config.litellm_model)
        self.system_prompt = self._load_system_prompt(agent_name)
-        if _STRIX_REASONING_EFFORT:
+        reasoning = Config.get("strix_reasoning_effort")
-            self._reasoning_effort = _STRIX_REASONING_EFFORT
+        if reasoning:
-        elif self.config.scan_mode == "quick":
+            self._reasoning_effort = reasoning
        elif config.reasoning_effort:
            self._reasoning_effort = config.reasoning_effort
        elif config.scan_mode == "quick":
            self._reasoning_effort = "medium"
        else:
            self._reasoning_effort = "high"
-        self.memory_compressor = MemoryCompressor(
+    def _load_system_prompt(self, agent_name: str | None) -> str:
-            model_name=self.config.model_name,
+        if not agent_name:
-            timeout=self.config.timeout,
+            return ""
        )
-        if agent_name:
+        try:
-            prompt_dir = Path(__file__).parent.parent / "agents" / agent_name
+            prompt_dir = get_strix_resource_path("agents", agent_name)
-            skills_dir = Path(__file__).parent.parent / "skills"
+            skills_dir = get_strix_resource_path("skills")
-
+            env = Environment(
-            loader = FileSystemLoader([prompt_dir, skills_dir])
+                loader=FileSystemLoader([prompt_dir, skills_dir]),
            self.jinja_env = Environment(
                loader=loader,
                autoescape=select_autoescape(enabled_extensions=(), default_for_string=False),
            )
-            try:
+            skills_to_load = self._get_skills_to_load()
-                skills_to_load = list(self.config.skills or [])
+            skill_content = load_skills(skills_to_load)
-                skills_to_load.append(f"scan_modes/{self.config.scan_mode}")
+            env.globals["get_skill"] = lambda name: skill_content.get(name, "")
-                skill_content = load_skills(skills_to_load, self.jinja_env)
+            result = env.get_template("system_prompt.jinja").render(
                get_tools_prompt=get_tools_prompt,
                loaded_skill_names=list(skill_content.keys()),
                interactive=self.config.interactive,
                system_prompt_context=self._system_prompt_context,
                **skill_content,
            )
            return str(result)
        except Exception:  # noqa: BLE001
            return ""
-                def get_skill(name: str) -> str:
+    def _get_skills_to_load(self) -> list[str]:
-                    return skill_content.get(name, "")
+        ordered_skills = [*self._active_skills]
        ordered_skills.append(f"scan_modes/{self.config.scan_mode}")
        if self.config.is_whitebox:
            ordered_skills.append("coordination/source_aware_whitebox")
            ordered_skills.append("custom/source_aware_sast")
-                self.jinja_env.globals["get_skill"] = get_skill
+        deduped: list[str] = []
        seen: set[str] = set()
        for skill_name in ordered_skills:
            if skill_name not in seen:
                deduped.append(skill_name)
                seen.add(skill_name)
-                self.system_prompt = self.jinja_env.get_template("system_prompt.jinja").render(
+        return deduped
-                    get_tools_prompt=get_tools_prompt,
+
-                    loaded_skill_names=list(skill_content.keys()),
+    def add_skills(self, skill_names: list[str]) -> list[str]:
-                    **skill_content,
+        added: list[str] = []
-                )
+        for skill_name in skill_names:
-            except (FileNotFoundError, OSError, ValueError) as e:
+            if not skill_name or skill_name in self._active_skills:
-                logger.warning(f"Failed to load system prompt for {agent_name}: {e}")
+                continue
-                self.system_prompt = "You are a helpful AI assistant."
+            self._active_skills.append(skill_name)
-        else:
+            added.append(skill_name)
-            self.system_prompt = "You are a helpful AI assistant."
+
        if not added:
            return []
        updated_prompt = self._load_system_prompt(self.agent_name)
        if updated_prompt:
            self.system_prompt = updated_prompt
        return added
    def set_agent_identity(self, agent_name: str | None, agent_id: str | None) -> None:
        if agent_name:
@@ -161,375 +147,232 @@ class LLM:
        if agent_id:
            self.agent_id = agent_id
-    def _build_identity_message(self) -> dict[str, Any] | None:
+    def set_system_prompt_context(self, context: dict[str, Any] | None) -> None:
-        if not (self.agent_name and str(self.agent_name).strip()):
+        self._system_prompt_context = dict(context or {})
-            return None
+        updated_prompt = self._load_system_prompt(self.agent_name)
-        identity_name = self.agent_name
+        if updated_prompt:
-        identity_id = self.agent_id
+            self.system_prompt = updated_prompt
        content = (
            "\n\n"
            "<agent_identity>\n"
            "<meta>Internal metadata: do not echo or reference; "
            "not part of history or tool calls.</meta>\n"
            "<note>You are now assuming the role of this agent. "
            "Act strictly as this agent and maintain self-identity for this step. "
            "Now go answer the next needed step!</note>\n"
            f"<agent_name>{identity_name}</agent_name>\n"
            f"<agent_id>{identity_id}</agent_id>\n"
            "</agent_identity>\n\n"
        )
        return {"role": "user", "content": content}
-    def _add_cache_control_to_content(
+    async def generate(
-        self, content: str | list[dict[str, Any]]
+        self, conversation_history: list[dict[str, Any]]
-    ) -> str | list[dict[str, Any]]:
+    ) -> AsyncIterator[LLMResponse]:
-        if isinstance(content, str):
+        messages = self._prepare_messages(conversation_history)
-            return [{"type": "text", "text": content, "cache_control": {"type": "ephemeral"}}]
+        max_retries = int(Config.get("strix_llm_max_retries") or "5")
        if isinstance(content, list) and content:
            last_item = content[-1]
            if isinstance(last_item, dict) and last_item.get("type") == "text":
                return content[:-1] + [{**last_item, "cache_control": {"type": "ephemeral"}}]
        return content
-    def _is_anthropic_model(self) -> bool:
+        for attempt in range(max_retries + 1):
-        if not self.config.model_name:
+            try:
-            return False
+                async for response in self._stream(messages):
-        model_lower = self.config.model_name.lower()
+                    yield response
-        return any(provider in model_lower for provider in ["anthropic/", "claude"])
+                return  # noqa: TRY300
            except Exception as e:  # noqa: BLE001
                if attempt >= max_retries or not self._should_retry(e):
                    self._raise_error(e)
                wait = min(90, 2 * (2**attempt))
                await asyncio.sleep(wait)
-    def _calculate_cache_interval(self, total_messages: int) -> int:
+    async def _stream(self, messages: list[dict[str, Any]]) -> AsyncIterator[LLMResponse]:
-        if total_messages <= 1:
+        accumulated = ""
-            return 10
+        chunks: list[Any] = []
        done_streaming = 0
-        max_cached_messages = 3
+        self._total_stats.requests += 1
-        non_system_messages = total_messages - 1
+        response = await acompletion(**self._build_completion_args(messages), stream=True)
-        interval = 10
+        async for chunk in response:
-        while non_system_messages // interval > max_cached_messages:
+            chunks.append(chunk)
-            interval += 10
+            if done_streaming:
-
+                done_streaming += 1
-        return interval
+                if getattr(chunk, "usage", None) or done_streaming > 5:
    def _prepare_cached_messages(self, messages: list[dict[str, Any]]) -> list[dict[str, Any]]:
        if (
            not self.config.enable_prompt_caching
            or not supports_prompt_caching(self.config.model_name)
            or not messages
        ):
            return messages
        if not self._is_anthropic_model():
            return messages
        cached_messages = list(messages)
        if cached_messages and cached_messages[0].get("role") == "system":
            system_message = cached_messages[0].copy()
            system_message["content"] = self._add_cache_control_to_content(
                system_message["content"]
            )
            cached_messages[0] = system_message
        total_messages = len(cached_messages)
        if total_messages > 1:
            interval = self._calculate_cache_interval(total_messages)
            cached_count = 0
            for i in range(interval, total_messages, interval):
                if cached_count >= 3:
                    break
                continue
            delta = self._get_chunk_content(chunk)
            if delta:
                accumulated += delta
                if "</function>" in accumulated or "</invoke>" in accumulated:
                    end_tag = "</function>" if "</function>" in accumulated else "</invoke>"
                    pos = accumulated.find(end_tag)
                    accumulated = accumulated[: pos + len(end_tag)]
                    yield LLMResponse(content=accumulated)
                    done_streaming = 1
                    continue
                yield LLMResponse(content=accumulated)
-                if i < len(cached_messages):
+        if chunks:
-                    message = cached_messages[i].copy()
+            self._update_usage_stats(stream_chunk_builder(chunks))
                    message["content"] = self._add_cache_control_to_content(message["content"])
                    cached_messages[i] = message
                    cached_count += 1
-        return cached_messages
+        accumulated = normalize_tool_format(accumulated)
        accumulated = fix_incomplete_tool_call(_truncate_to_first_function(accumulated))
        yield LLMResponse(
            content=accumulated,
            tool_invocations=parse_tool_invocations(accumulated),
            thinking_blocks=self._extract_thinking(chunks),
        )
    def _prepare_messages(self, conversation_history: list[dict[str, Any]]) -> list[dict[str, Any]]:
        messages = [{"role": "system", "content": self.system_prompt}]
-        identity_message = self._build_identity_message()
+        if self.agent_name:
-        if identity_message:
+            messages.append(
-            messages.append(identity_message)
+                {
-
+                    "role": "user",
-        compressed_history = list(self.memory_compressor.compress_history(conversation_history))
+                    "content": (
                        f"\n\n<agent_identity>\n"
                        f"<meta>Internal metadata: do not echo or reference.</meta>\n"
                        f"<agent_name>{self.agent_name}</agent_name>\n"
                        f"<agent_id>{self.agent_id}</agent_id>\n"
                        f"</agent_identity>\n\n"
                    ),
                }
            )
        compressed = list(self.memory_compressor.compress_history(conversation_history))
        conversation_history.clear()
-        conversation_history.extend(compressed_history)
+        conversation_history.extend(compressed)
-        messages.extend(compressed_history)
+        messages.extend(compressed)
-        return self._prepare_cached_messages(messages)
+        if messages[-1].get("role") == "assistant" and not self.config.interactive:
            messages.append({"role": "user", "content": "<meta>Continue the task.</meta>"})
-    async def _stream_and_accumulate(
+        if self._is_anthropic() and self.config.enable_prompt_caching:
-        self,
+            messages = self._add_cache_control(messages)
        messages: list[dict[str, Any]],
        scan_id: str | None,
        step_number: int,
    ) -> AsyncIterator[LLMResponse]:
        accumulated_content = ""
        chunks: list[Any] = []
-        async for chunk in self._stream_request(messages):
+        return messages
            chunks.append(chunk)
            delta = self._extract_chunk_delta(chunk)
            if delta:
                accumulated_content += delta
-                if "</function>" in accumulated_content:
+    def _build_completion_args(self, messages: list[dict[str, Any]]) -> dict[str, Any]:
-                    function_end = accumulated_content.find("</function>") + len("</function>")
+        if not self._supports_vision():
-                    accumulated_content = accumulated_content[:function_end]
+            messages = self._strip_images(messages)
-                yield LLMResponse(
+        args: dict[str, Any] = {
-                    scan_id=scan_id,
+            "model": self.config.litellm_model,
                    step_number=step_number,
                    role=StepRole.AGENT,
                    content=accumulated_content,
                    tool_invocations=None,
                )
        if chunks:
            complete_response = stream_chunk_builder(chunks)
            self._update_usage_stats(complete_response)
        accumulated_content = _truncate_to_first_function(accumulated_content)
        if "</function>" in accumulated_content:
            function_end = accumulated_content.find("</function>") + len("</function>")
            accumulated_content = accumulated_content[:function_end]
        tool_invocations = parse_tool_invocations(accumulated_content)
        # Extract thinking blocks from the complete response if available
        thinking_blocks = None
        if chunks and self._should_include_reasoning_effort():
            complete_response = stream_chunk_builder(chunks)
            if (
                hasattr(complete_response, "choices")
                and complete_response.choices
                and hasattr(complete_response.choices[0], "message")
            ):
                message = complete_response.choices[0].message
                if hasattr(message, "thinking_blocks") and message.thinking_blocks:
                    thinking_blocks = message.thinking_blocks
        yield LLMResponse(
            scan_id=scan_id,
            step_number=step_number,
            role=StepRole.AGENT,
            content=accumulated_content,
            tool_invocations=tool_invocations if tool_invocations else None,
            thinking_blocks=thinking_blocks,
        )
    def _raise_llm_error(self, e: Exception) -> None:
        error_map: list[tuple[type, str]] = [
            (litellm.RateLimitError, "Rate limit exceeded"),
            (litellm.AuthenticationError, "Invalid API key"),
            (litellm.NotFoundError, "Model not found"),
            (litellm.ContextWindowExceededError, "Context too long"),
            (litellm.ContentPolicyViolationError, "Content policy violation"),
            (litellm.ServiceUnavailableError, "Service unavailable"),
            (litellm.Timeout, "Request timed out"),
            (litellm.UnprocessableEntityError, "Unprocessable entity"),
            (litellm.InternalServerError, "Internal server error"),
            (litellm.APIConnectionError, "Connection error"),
            (litellm.UnsupportedParamsError, "Unsupported parameters"),
            (litellm.BudgetExceededError, "Budget exceeded"),
            (litellm.APIResponseValidationError, "Response validation error"),
            (litellm.JSONSchemaValidationError, "JSON schema validation error"),
            (litellm.InvalidRequestError, "Invalid request"),
            (litellm.BadRequestError, "Bad request"),
            (litellm.APIError, "API error"),
            (litellm.OpenAIError, "OpenAI error"),
        ]
        from strix.telemetry import posthog
        for error_type, message in error_map:
            if isinstance(e, error_type):
                posthog.error(f"llm_{error_type.__name__}", message)
                raise LLMRequestFailedError(f"LLM request failed: {message}", str(e)) from e
        posthog.error("llm_unknown_error", type(e).__name__)
        raise LLMRequestFailedError(f"LLM request failed: {type(e).__name__}", str(e)) from e
    async def generate(
        self,
        conversation_history: list[dict[str, Any]],
        scan_id: str | None = None,
        step_number: int = 1,
    ) -> AsyncIterator[LLMResponse]:
        messages = self._prepare_messages(conversation_history)
        last_error: Exception | None = None
        for attempt in range(MAX_RETRIES):
            try:
                async for response in self._stream_and_accumulate(messages, scan_id, step_number):
                    yield response
                return  # noqa: TRY300
            except Exception as e:  # noqa: BLE001
                last_error = e
                if not _should_retry(e) or attempt == MAX_RETRIES - 1:
                    break
                wait_time = min(RETRY_MAX, RETRY_MULTIPLIER * (2**attempt))
                wait_time = max(RETRY_MIN, wait_time)
                await asyncio.sleep(wait_time)
        if last_error:
            self._raise_llm_error(last_error)
    def _extract_chunk_delta(self, chunk: Any) -> str:
        if chunk.choices and hasattr(chunk.choices[0], "delta"):
            delta = chunk.choices[0].delta
            return getattr(delta, "content", "") or ""
        return ""
    @property
    def usage_stats(self) -> dict[str, dict[str, int | float]]:
        return {
            "total": self._total_stats.to_dict(),
            "last_request": self._last_request_stats.to_dict(),
        }
    def get_cache_config(self) -> dict[str, bool]:
        return {
            "enabled": self.config.enable_prompt_caching,
            "supported": supports_prompt_caching(self.config.model_name),
        }
    def _should_include_reasoning_effort(self) -> bool:
        if not self.config.model_name:
            return False
        try:
            return bool(supports_reasoning(model=self.config.model_name))
        except Exception:  # noqa: BLE001
            return False
    def _model_supports_vision(self) -> bool:
        if not self.config.model_name:
            return False
        try:
            return bool(supports_vision(model=self.config.model_name))
        except Exception:  # noqa: BLE001
            return False
    def _filter_images_from_messages(self, messages: list[dict[str, Any]]) -> list[dict[str, Any]]:
        filtered_messages = []
        for msg in messages:
            content = msg.get("content")
            updated_msg = msg
            if isinstance(content, list):
                filtered_content = []
                for item in content:
                    if isinstance(item, dict):
                        if item.get("type") == "image_url":
                            filtered_content.append(
                                {
                                    "type": "text",
                                    "text": "[Screenshot removed - model does not support "
                                    "vision. Use view_source or execute_js instead.]",
                                }
                            )
                        else:
                            filtered_content.append(item)
                    else:
                        filtered_content.append(item)
                if filtered_content:
                    text_parts = [
                        item.get("text", "") if isinstance(item, dict) else str(item)
                        for item in filtered_content
                    ]
                    all_text = all(
                        isinstance(item, dict) and item.get("type") == "text"
                        for item in filtered_content
                    )
                    if all_text:
                        updated_msg = {**msg, "content": "\n".join(text_parts)}
                    else:
                        updated_msg = {**msg, "content": filtered_content}
                else:
                    updated_msg = {**msg, "content": ""}
            filtered_messages.append(updated_msg)
        return filtered_messages
    async def _stream_request(
        self,
        messages: list[dict[str, Any]],
    ) -> AsyncIterator[Any]:
        if not self._model_supports_vision():
            messages = self._filter_images_from_messages(messages)
        completion_args: dict[str, Any] = {
            "model": self.config.model_name,
            "messages": messages,
            "timeout": self.config.timeout,
            "stream_options": {"include_usage": True},
        }
-        if _LLM_API_KEY:
+        if self.config.api_key:
-            completion_args["api_key"] = _LLM_API_KEY
+            args["api_key"] = self.config.api_key
-        if _LLM_API_BASE:
+        if self.config.api_base:
-            completion_args["api_base"] = _LLM_API_BASE
+            args["api_base"] = self.config.api_base
        if self._supports_reasoning():
            args["reasoning_effort"] = self._reasoning_effort
-        completion_args["stop"] = ["</function>"]
+        return args
-        if self._should_include_reasoning_effort():
+    def _get_chunk_content(self, chunk: Any) -> str:
-            completion_args["reasoning_effort"] = self._reasoning_effort
+        if chunk.choices and hasattr(chunk.choices[0], "delta"):
            return getattr(chunk.choices[0].delta, "content", "") or ""
        return ""
-        queue = get_global_queue()
+    def _extract_thinking(self, chunks: list[Any]) -> list[dict[str, Any]] | None:
-        self._total_stats.requests += 1
+        if not chunks or not self._supports_reasoning():
-        self._last_request_stats = RequestStats(requests=1)
+            return None
-
+        try:
-        async for chunk in queue.stream_request(completion_args):
+            resp = stream_chunk_builder(chunks)
-            yield chunk
+            if resp.choices and hasattr(resp.choices[0].message, "thinking_blocks"):
                blocks: list[dict[str, Any]] = resp.choices[0].message.thinking_blocks
                return blocks
        except Exception:  # noqa: BLE001, S110  # nosec B110
            pass
        return None
    def _update_usage_stats(self, response: Any) -> None:
        try:
            if hasattr(response, "usage") and response.usage:
-                input_tokens = getattr(response.usage, "prompt_tokens", 0)
+                input_tokens = getattr(response.usage, "prompt_tokens", 0) or 0
-                output_tokens = getattr(response.usage, "completion_tokens", 0)
+                output_tokens = getattr(response.usage, "completion_tokens", 0) or 0
                cached_tokens = 0
                cache_creation_tokens = 0
                if hasattr(response.usage, "prompt_tokens_details"):
                    prompt_details = response.usage.prompt_tokens_details
                    if hasattr(prompt_details, "cached_tokens"):
                        cached_tokens = prompt_details.cached_tokens or 0
-                if hasattr(response.usage, "cache_creation_input_tokens"):
+                cost = self._extract_cost(response)
                    cache_creation_tokens = response.usage.cache_creation_input_tokens or 0
            else:
                input_tokens = 0
                output_tokens = 0
                cached_tokens = 0
                cache_creation_tokens = 0
            try:
                cost = completion_cost(response) or 0.0
            except Exception as e:  # noqa: BLE001
                logger.warning(f"Failed to calculate cost: {e}")
                cost = 0.0
            self._total_stats.input_tokens += input_tokens
            self._total_stats.output_tokens += output_tokens
            self._total_stats.cached_tokens += cached_tokens
            self._total_stats.cache_creation_tokens += cache_creation_tokens
            self._total_stats.cost += cost
-            self._last_request_stats.input_tokens = input_tokens
+        except Exception:  # noqa: BLE001, S110  # nosec B110
-            self._last_request_stats.output_tokens = output_tokens
+            pass
            self._last_request_stats.cached_tokens = cached_tokens
            self._last_request_stats.cache_creation_tokens = cache_creation_tokens
            self._last_request_stats.cost = cost
-            if cached_tokens > 0:
+    def _extract_cost(self, response: Any) -> float:
-                logger.info(f"Cache hit: {cached_tokens} cached tokens, {input_tokens} new tokens")
+        if hasattr(response, "usage") and response.usage:
-            if cache_creation_tokens > 0:
+            direct_cost = getattr(response.usage, "cost", None)
-                logger.info(f"Cache creation: {cache_creation_tokens} tokens written to cache")
+            if direct_cost is not None:
                return float(direct_cost)
        try:
            if hasattr(response, "_hidden_params"):
                response._hidden_params.pop("custom_llm_provider", None)
            return completion_cost(response, model=self.config.canonical_model) or 0.0
        except Exception:  # noqa: BLE001
            return 0.0
-            logger.info(f"Usage stats: {self.usage_stats}")
+    def _should_retry(self, e: Exception) -> bool:
-        except Exception as e:  # noqa: BLE001
+        code = getattr(e, "status_code", None) or getattr(
-            logger.warning(f"Failed to update usage stats: {e}")
+            getattr(e, "response", None), "status_code", None
        )
        return code is None or litellm._should_retry(code)
    def _raise_error(self, e: Exception) -> None:
        from strix.telemetry import posthog
        posthog.error("llm_error", type(e).__name__)
        raise LLMRequestFailedError(f"LLM request failed: {type(e).__name__}", str(e)) from e
    def _is_anthropic(self) -> bool:
        if not self.config.model_name:
            return False
        return any(p in self.config.model_name.lower() for p in ["anthropic/", "claude"])
    def _supports_vision(self) -> bool:
        try:
            return bool(supports_vision(model=self.config.canonical_model))
        except Exception:  # noqa: BLE001
            return False
    def _supports_reasoning(self) -> bool:
        try:
            return bool(supports_reasoning(model=self.config.canonical_model))
        except Exception:  # noqa: BLE001
            return False
    def _strip_images(self, messages: list[dict[str, Any]]) -> list[dict[str, Any]]:
        result = []
        for msg in messages:
            content = msg.get("content")
            if isinstance(content, list):
                text_parts = []
                for item in content:
                    if isinstance(item, dict) and item.get("type") == "text":
                        text_parts.append(item.get("text", ""))
                    elif isinstance(item, dict) and item.get("type") == "image_url":
                        text_parts.append("[Image removed - model doesn't support vision]")
                result.append({**msg, "content": "\n".join(text_parts)})
            else:
                result.append(msg)
        return result
    def _add_cache_control(self, messages: list[dict[str, Any]]) -> list[dict[str, Any]]:
        if not messages or not supports_prompt_caching(self.config.canonical_model):
            return messages
        result = list(messages)
        if result[0].get("role") == "system":
            content = result[0]["content"]
            result[0] = {
                **result[0],
                "content": [
                    {"type": "text", "text": content, "cache_control": {"type": "ephemeral"}}
                ]
                if isinstance(content, str)
                else content,
            }
        return result
--- a/strix/llm/memory_compressor.py
+++ b/strix/llm/memory_compressor.py
@@ -3,7 +3,7 @@ from typing import Any
 import litellm
-from strix.config import Config
+from strix.config.config import Config, resolve_llm_config
 logger = logging.getLogger(__name__)
@@ -86,12 +86,12 @@ def _extract_message_text(msg: dict[str, Any]) -> str:
 def _summarize_messages(
    messages: list[dict[str, Any]],
    model: str,
-    timeout: int = 600,
+    timeout: int = 30,
 ) -> dict[str, Any]:
    if not messages:
        empty_summary = "<context_summary message_count='0'>{text}</context_summary>"
        return {
-            "role": "assistant",
+            "role": "user",
            "content": empty_summary.format(text="No messages to summarize"),
        }
@@ -104,12 +104,18 @@ def _summarize_messages(
    conversation = "\n".join(formatted)
    prompt = SUMMARY_PROMPT_TEMPLATE.format(conversation=conversation)
    _, api_key, api_base = resolve_llm_config()
    try:
-        completion_args = {
+        completion_args: dict[str, Any] = {
            "model": model,
            "messages": [{"role": "user", "content": prompt}],
            "timeout": timeout,
        }
        if api_key:
            completion_args["api_key"] = api_key
        if api_base:
            completion_args["api_base"] = api_base
        response = litellm.completion(**completion_args)
        summary = response.choices[0].message.content or ""
@@ -117,7 +123,7 @@ def _summarize_messages(
            return messages[0]
        summary_msg = "<context_summary message_count='{count}'>{text}</context_summary>"
        return {
-            "role": "assistant",
+            "role": "user",
            "content": summary_msg.format(count=len(messages), text=summary),
        }
    except Exception:
@@ -148,11 +154,11 @@ class MemoryCompressor:
        self,
        max_images: int = 3,
        model_name: str | None = None,
-        timeout: int = 600,
+        timeout: int | None = None,
    ):
        self.max_images = max_images
        self.model_name = model_name or Config.get("strix_llm")
-        self.timeout = timeout
+        self.timeout = timeout or int(Config.get("strix_memory_compressor_timeout") or "120")
        if not self.model_name:
            raise ValueError("STRIX_LLM environment variable must be set and not empty")
--- a/strix/llm/request_queue.py
+++ b/strix/llm/request_queue.py
@@ -1,58 +0,0 @@
 import asyncio
 import threading
 import time
 from collections.abc import AsyncIterator
 from typing import Any
 from litellm import acompletion
 from litellm.types.utils import ModelResponseStream
 from strix.config import Config
 class LLMRequestQueue:
    def __init__(self) -> None:
        self.delay_between_requests = float(Config.get("llm_rate_limit_delay") or "4.0")
        self.max_concurrent = int(Config.get("llm_rate_limit_concurrent") or "1")
        self._semaphore = threading.BoundedSemaphore(self.max_concurrent)
        self._last_request_time = 0.0
        self._lock = threading.Lock()
    async def stream_request(
        self, completion_args: dict[str, Any]
    ) -> AsyncIterator[ModelResponseStream]:
        try:
            while not self._semaphore.acquire(timeout=0.2):
                await asyncio.sleep(0.1)
            with self._lock:
                now = time.time()
                time_since_last = now - self._last_request_time
                sleep_needed = max(0, self.delay_between_requests - time_since_last)
                self._last_request_time = now + sleep_needed
            if sleep_needed > 0:
                await asyncio.sleep(sleep_needed)
            async for chunk in self._stream_request(completion_args):
                yield chunk
        finally:
            self._semaphore.release()
    async def _stream_request(
        self, completion_args: dict[str, Any]
    ) -> AsyncIterator[ModelResponseStream]:
        response = await acompletion(**completion_args, stream=True)
        async for chunk in response:
            yield chunk
 _global_queue: LLMRequestQueue | None = None
 def get_global_queue() -> LLMRequestQueue:
    global _global_queue  # noqa: PLW0603
    if _global_queue is None:
        _global_queue = LLMRequestQueue()
    return _global_queue
--- a/strix/llm/utils.py
+++ b/strix/llm/utils.py
@@ -3,11 +3,71 @@ import re
 from typing import Any
 _INVOKE_OPEN = re.compile(r'<invoke\s+name=["\']([^"\']+)["\']>')
 _PARAM_NAME_ATTR = re.compile(r'<parameter\s+name=["\']([^"\']+)["\']>')
 _FUNCTION_CALLS_TAG = re.compile(r"</?function_calls>")
 _STRIP_TAG_QUOTES = re.compile(r"<(function|parameter)\s*=\s*([^>]*?)>")
 def normalize_tool_format(content: str) -> str:
    """Convert alternative tool-call XML formats to the expected one.
    Handles:
      <function_calls>...</function_calls>  → stripped
      <invoke name="X">                     → <function=X>
      <parameter name="X">                  → <parameter=X>
      </invoke>                             → </function>
      <function="X">                        → <function=X>
      <parameter="X">                       → <parameter=X>
    """
    if "<invoke" in content or "<function_calls" in content:
        content = _FUNCTION_CALLS_TAG.sub("", content)
        content = _INVOKE_OPEN.sub(r"<function=\1>", content)
        content = _PARAM_NAME_ATTR.sub(r"<parameter=\1>", content)
        content = content.replace("</invoke>", "</function>")
    return _STRIP_TAG_QUOTES.sub(
        lambda m: f"<{m.group(1)}={m.group(2).strip().strip(chr(34) + chr(39))}>", content
    )
 STRIX_MODEL_MAP: dict[str, str] = {
    "claude-sonnet-4.6": "anthropic/claude-sonnet-4-6",
    "claude-opus-4.6": "anthropic/claude-opus-4-6",
    "gpt-5.2": "openai/gpt-5.2",
    "gpt-5.1": "openai/gpt-5.1",
    "gpt-5.4": "openai/gpt-5.4",
    "gemini-3-pro-preview": "gemini/gemini-3-pro-preview",
    "gemini-3-flash-preview": "gemini/gemini-3-flash-preview",
    "glm-5": "openrouter/z-ai/glm-5",
    "glm-4.7": "openrouter/z-ai/glm-4.7",
 }
 def resolve_strix_model(model_name: str | None) -> tuple[str | None, str | None]:
    """Resolve a strix/ model into names for API calls and capability lookups.
    Returns (api_model, canonical_model):
    - api_model: openai/<base> for API calls (Strix API is OpenAI-compatible)
    - canonical_model: actual provider model name for litellm capability lookups
    Non-strix models return the same name for both.
    """
    if not model_name or not model_name.startswith("strix/"):
        return model_name, model_name
    base_model = model_name[6:]
    api_model = f"openai/{base_model}"
    canonical_model = STRIX_MODEL_MAP.get(base_model, api_model)
    return api_model, canonical_model
 def _truncate_to_first_function(content: str) -> str:
    if not content:
        return content
-    function_starts = [match.start() for match in re.finditer(r"<function=", content)]
+    function_starts = [
        match.start() for match in re.finditer(r"<function=|<invoke\s+name=", content)
    ]
    if len(function_starts) >= 2:
        second_function_start = function_starts[1]
@@ -18,7 +78,8 @@ def _truncate_to_first_function(content: str) -> str:
 def parse_tool_invocations(content: str) -> list[dict[str, Any]] | None:
-    content = _fix_stopword(content)
+    content = normalize_tool_format(content)
    content = fix_incomplete_tool_call(content)
    tool_invocations: list[dict[str, Any]] = []
@@ -46,16 +107,17 @@ def parse_tool_invocations(content: str) -> list[dict[str, Any]] | None:
    return tool_invocations if tool_invocations else None
-def _fix_stopword(content: str) -> str:
+def fix_incomplete_tool_call(content: str) -> str:
-    if (
+    """Fix incomplete tool calls by adding missing closing tag.
-        "<function=" in content
+
-        and content.count("<function=") == 1
+    Handles both ``<function=…>`` and ``<invoke name="…">`` formats.
-        and "</function>" not in content
+    """
-    ):
+    has_open = "<function=" in content or "<invoke " in content
-        if content.endswith("</"):
+    count_open = content.count("<function=") + content.count("<invoke ")
-            content = content.rstrip() + "function>"
+    has_close = "</function>" in content or "</invoke>" in content
-        else:
+    if has_open and count_open == 1 and not has_close:
-            content = content + "\n</function>"
+        content = content.rstrip()
        content = content + "function>" if content.endswith("</") else content + "\n</function>"
    return content
@@ -74,7 +136,8 @@ def clean_content(content: str) -> str:
    if not content:
        return ""
-    content = _fix_stopword(content)
+    content = normalize_tool_format(content)
    content = fix_incomplete_tool_call(content)
    tool_pattern = r"<function=[^>]+>.*?</function>"
    cleaned = re.sub(tool_pattern, "", content, flags=re.DOTALL)
--- a/strix/runtime/init.py
+++ b/strix/runtime/init.py
@@ -12,17 +12,32 @@ class SandboxInitializationError(Exception):
        self.details = details
 _global_runtime: AbstractRuntime | None = None
 def get_runtime() -> AbstractRuntime:
    global _global_runtime  # noqa: PLW0603
    runtime_backend = Config.get("strix_runtime_backend")
    if runtime_backend == "docker":
        from .docker_runtime import DockerRuntime
-        return DockerRuntime()
+        if _global_runtime is None:
            _global_runtime = DockerRuntime()
        return _global_runtime
    raise ValueError(
        f"Unsupported runtime backend: {runtime_backend}. Only 'docker' is supported for now."
    )
-__all__ = ["AbstractRuntime", "SandboxInitializationError", "get_runtime"]
+def cleanup_runtime() -> None:
    global _global_runtime  # noqa: PLW0603
    if _global_runtime is not None:
        _global_runtime.cleanup()
        _global_runtime = None
 __all__ = ["AbstractRuntime", "SandboxInitializationError", "cleanup_runtime", "get_runtime"]
--- a/strix/runtime/docker_runtime.py
+++ b/strix/runtime/docker_runtime.py
@@ -1,15 +1,13 @@
 import contextlib
 import logging
 import os
 import secrets
 import socket
 import time
 from concurrent.futures import ThreadPoolExecutor
 from concurrent.futures import TimeoutError as FuturesTimeoutError
 from pathlib import Path
-from typing import Any, cast
+from typing import cast
 import docker
 import httpx
 from docker.errors import DockerException, ImageNotFound, NotFound
 from docker.models.containers import Container
 from requests.exceptions import ConnectionError as RequestsConnectionError
@@ -22,10 +20,9 @@ from .runtime import AbstractRuntime, SandboxInfo
 HOST_GATEWAY_HOSTNAME = "host.docker.internal"
-DOCKER_TIMEOUT = 60  # seconds
+DOCKER_TIMEOUT = 60
-TOOL_SERVER_HEALTH_REQUEST_TIMEOUT = 5  # seconds per health check request
+CONTAINER_TOOL_SERVER_PORT = 48081
-TOOL_SERVER_HEALTH_RETRIES = 10  # number of retries for health check
+CONTAINER_CAIDO_PORT = 48080
 logger = logging.getLogger(__name__)
 class DockerRuntime(AbstractRuntime):
@@ -33,50 +30,21 @@ class DockerRuntime(AbstractRuntime):
        try:
            self.client = docker.from_env(timeout=DOCKER_TIMEOUT)
        except (DockerException, RequestsConnectionError, RequestsTimeout) as e:
            logger.exception("Failed to connect to Docker daemon")
            if isinstance(e, RequestsConnectionError | RequestsTimeout):
                raise SandboxInitializationError(
                    "Docker daemon unresponsive",
                    f"Connection timed out after {DOCKER_TIMEOUT} seconds. "
                    "Please ensure Docker Desktop is installed and running, "
                    "and try running strix again.",
                ) from e
            raise SandboxInitializationError(
                "Docker is not available",
-                "Docker is not available or not configured correctly. "
+                "Please ensure Docker Desktop is installed and running.",
                "Please ensure Docker Desktop is installed and running, "
                "and try running strix again.",
            ) from e
        self._scan_container: Container | None = None
        self._tool_server_port: int | None = None
        self._tool_server_token: str | None = None
-
+        self._caido_port: int | None = None
    def _generate_sandbox_token(self) -> str:
        return secrets.token_urlsafe(32)
    def _find_available_port(self) -> int:
        with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
            s.bind(("", 0))
            return cast("int", s.getsockname()[1])
    def _exec_run_with_timeout(
        self, container: Container, cmd: str, timeout: int = DOCKER_TIMEOUT, **kwargs: Any
    ) -> Any:
        with ThreadPoolExecutor(max_workers=1) as executor:
            future = executor.submit(container.exec_run, cmd, **kwargs)
            try:
                return future.result(timeout=timeout)
            except FuturesTimeoutError:
                logger.exception(f"exec_run timed out after {timeout}s: {cmd[:100]}...")
                raise SandboxInitializationError(
                    "Container command timed out",
                    f"Command timed out after {timeout} seconds. "
                    "Docker may be overloaded or unresponsive. "
                    "Please ensure Docker Desktop is installed and running, "
                    "and try running strix again.",
                ) from None
    def _get_scan_id(self, agent_id: str) -> str:
        try:
            from strix.telemetry.tracer import get_global_tracer
@@ -84,129 +52,127 @@ class DockerRuntime(AbstractRuntime):
            tracer = get_global_tracer()
            if tracer and tracer.scan_config:
                return str(tracer.scan_config.get("scan_id", "default-scan"))
-        except ImportError:
+        except (ImportError, AttributeError):
-            logger.debug("Failed to import tracer, using fallback scan ID")
+            pass
        except AttributeError:
            logger.debug("Tracer missing scan_config, using fallback scan ID")
        return f"scan-{agent_id.split('-')[0]}"
    def _verify_image_available(self, image_name: str, max_retries: int = 3) -> None:
        def _validate_image(image: docker.models.images.Image) -> None:
            if not image.id or not image.attrs:
                raise ImageNotFound(f"Image {image_name} metadata incomplete")
        for attempt in range(max_retries):
            try:
                image = self.client.images.get(image_name)
-                _validate_image(image)
+                if not image.id or not image.attrs:
-            except ImageNotFound:
+                    raise ImageNotFound(f"Image {image_name} metadata incomplete")  # noqa: TRY301
            except (ImageNotFound, DockerException):
                if attempt == max_retries - 1:
                    logger.exception(f"Image {image_name} not found after {max_retries} attempts")
                    raise
                logger.warning(f"Image {image_name} not ready, attempt {attempt + 1}/{max_retries}")
                time.sleep(2**attempt)
            except DockerException:
                if attempt == max_retries - 1:
                    logger.exception(f"Failed to verify image {image_name}")
                    raise
                logger.warning(f"Docker error verifying image, attempt {attempt + 1}/{max_retries}")
                time.sleep(2**attempt)
            else:
                logger.debug(f"Image {image_name} verified as available")
                return
-    def _create_container_with_retry(self, scan_id: str, max_retries: int = 3) -> Container:
+    def _recover_container_state(self, container: Container) -> None:
-        last_exception = None
+        for env_var in container.attrs["Config"]["Env"]:
            if env_var.startswith("TOOL_SERVER_TOKEN="):
                self._tool_server_token = env_var.split("=", 1)[1]
                break
        port_bindings = container.attrs.get("NetworkSettings", {}).get("Ports", {})
        port_key = f"{CONTAINER_TOOL_SERVER_PORT}/tcp"
        if port_bindings.get(port_key):
            self._tool_server_port = int(port_bindings[port_key][0]["HostPort"])
        caido_port_key = f"{CONTAINER_CAIDO_PORT}/tcp"
        if port_bindings.get(caido_port_key):
            self._caido_port = int(port_bindings[caido_port_key][0]["HostPort"])
    def _wait_for_tool_server(self, max_retries: int = 30, timeout: int = 5) -> None:
        host = self._resolve_docker_host()
        health_url = f"http://{host}:{self._tool_server_port}/health"
        time.sleep(5)
        for attempt in range(max_retries):
            try:
                with httpx.Client(trust_env=False, timeout=timeout) as client:
                    response = client.get(health_url)
                    if response.status_code == 200:
                        data = response.json()
                        if data.get("status") == "healthy":
                            return
            except (httpx.ConnectError, httpx.TimeoutException, httpx.RequestError):
                pass
            time.sleep(min(2**attempt * 0.5, 5))
        raise SandboxInitializationError(
            "Tool server failed to start",
            "Container initialization timed out. Please try again.",
        )
    def _create_container(self, scan_id: str, max_retries: int = 2) -> Container:
        container_name = f"strix-scan-{scan_id}"
        image_name = Config.get("strix_image")
        if not image_name:
            raise ValueError("STRIX_IMAGE must be configured")
-        for attempt in range(max_retries):
+        self._verify_image_available(image_name)
        last_error: Exception | None = None
        for attempt in range(max_retries + 1):
            try:
-                self._verify_image_available(image_name)
+                with contextlib.suppress(NotFound):
-
+                    existing = self.client.containers.get(container_name)
                try:
                    existing_container = self.client.containers.get(container_name)
                    logger.warning(f"Container {container_name} already exists, removing it")
                    with contextlib.suppress(Exception):
-                        existing_container.stop(timeout=5)
+                        existing.stop(timeout=5)
-                    existing_container.remove(force=True)
+                    existing.remove(force=True)
                    time.sleep(1)
                except NotFound:
                    pass
                except DockerException as e:
                    logger.warning(f"Error checking/removing existing container: {e}")
-                caido_port = self._find_available_port()
+                self._tool_server_port = self._find_available_port()
-                tool_server_port = self._find_available_port()
+                self._caido_port = self._find_available_port()
-                tool_server_token = self._generate_sandbox_token()
+                self._tool_server_token = secrets.token_urlsafe(32)
-
+                execution_timeout = Config.get("strix_sandbox_execution_timeout") or "120"
                self._tool_server_port = tool_server_port
                self._tool_server_token = tool_server_token
                container = self.client.containers.run(
                    image_name,
                    command="sleep infinity",
                    detach=True,
                    name=container_name,
-                    hostname=f"strix-scan-{scan_id}",
+                    hostname=container_name,
                    ports={
-                        f"{caido_port}/tcp": caido_port,
+                        f"{CONTAINER_TOOL_SERVER_PORT}/tcp": self._tool_server_port,
-                        f"{tool_server_port}/tcp": tool_server_port,
+                        f"{CONTAINER_CAIDO_PORT}/tcp": self._caido_port,
                    },
                    cap_add=["NET_ADMIN", "NET_RAW"],
                    labels={"strix-scan-id": scan_id},
                    environment={
                        "PYTHONUNBUFFERED": "1",
-                        "CAIDO_PORT": str(caido_port),
+                        "TOOL_SERVER_PORT": str(CONTAINER_TOOL_SERVER_PORT),
-                        "TOOL_SERVER_PORT": str(tool_server_port),
+                        "TOOL_SERVER_TOKEN": self._tool_server_token,
-                        "TOOL_SERVER_TOKEN": tool_server_token,
+                        "STRIX_SANDBOX_EXECUTION_TIMEOUT": str(execution_timeout),
                        "HOST_GATEWAY": HOST_GATEWAY_HOSTNAME,
                    },
-                    extra_hosts=self._get_extra_hosts(),
+                    extra_hosts={HOST_GATEWAY_HOSTNAME: "host-gateway"},
                    tty=True,
                )
                self._scan_container = container
-                logger.info("Created container %s for scan %s", container.id, scan_id)
+                self._wait_for_tool_server()
                self._initialize_container(
                    container, caido_port, tool_server_port, tool_server_token
                )
            except (DockerException, RequestsConnectionError, RequestsTimeout) as e:
-                last_exception = e
+                last_error = e
-                if attempt == max_retries - 1:
+                if attempt < max_retries:
-                    logger.exception(f"Failed to create container after {max_retries} attempts")
+                    self._tool_server_port = None
-                    break
+                    self._tool_server_token = None
-
+                    self._caido_port = None
-                logger.warning(f"Container creation attempt {attempt + 1}/{max_retries} failed")
+                    time.sleep(2**attempt)
                self._tool_server_port = None
                self._tool_server_token = None
                sleep_time = (2**attempt) + (0.1 * attempt)
                time.sleep(sleep_time)
            else:
                return container
        if isinstance(last_exception, RequestsConnectionError | RequestsTimeout):
            raise SandboxInitializationError(
                "Failed to create sandbox container",
                f"Docker daemon unresponsive after {max_retries} attempts "
                f"(timed out after {DOCKER_TIMEOUT}s). "
                "Please ensure Docker Desktop is installed and running, "
                "and try running strix again.",
            ) from last_exception
        raise SandboxInitializationError(
-            "Failed to create sandbox container",
+            "Failed to create container",
-            f"Container creation failed after {max_retries} attempts: {last_exception}. "
+            f"Container creation failed after {max_retries + 1} attempts: {last_error}",
-            "Please ensure Docker Desktop is installed and running, "
+        ) from last_error
            "and try running strix again.",
        ) from last_exception
-    def _get_or_create_scan_container(self, scan_id: str) -> Container:  # noqa: PLR0912
+    def _get_or_create_container(self, scan_id: str) -> Container:
        container_name = f"strix-scan-{scan_id}"
        if self._scan_container:
@@ -218,38 +184,20 @@ class DockerRuntime(AbstractRuntime):
                self._scan_container = None
                self._tool_server_port = None
                self._tool_server_token = None
                self._caido_port = None
        try:
            container = self.client.containers.get(container_name)
            container.reload()
            if (
                "strix-scan-id" not in container.labels
                or container.labels["strix-scan-id"] != scan_id
            ):
                logger.warning(
                    f"Container {container_name} exists but missing/wrong label, updating"
                )
            if container.status != "running":
                logger.info(f"Starting existing container {container_name}")
                container.start()
                time.sleep(2)
            self._scan_container = container
-
+            self._recover_container_state(container)
            for env_var in container.attrs["Config"]["Env"]:
                if env_var.startswith("TOOL_SERVER_PORT="):
                    self._tool_server_port = int(env_var.split("=")[1])
                elif env_var.startswith("TOOL_SERVER_TOKEN="):
                    self._tool_server_token = env_var.split("=")[1]
            logger.info(f"Reusing existing container {container_name}")
        except NotFound:
            pass
        except (DockerException, RequestsConnectionError, RequestsTimeout) as e:
            logger.warning(f"Failed to get container by name {container_name}: {e}")
        else:
            return container
@@ -262,101 +210,14 @@ class DockerRuntime(AbstractRuntime):
                if container.status != "running":
                    container.start()
                    time.sleep(2)
                self._scan_container = container
-
+                self._recover_container_state(container)
                for env_var in container.attrs["Config"]["Env"]:
                    if env_var.startswith("TOOL_SERVER_PORT="):
                        self._tool_server_port = int(env_var.split("=")[1])
                    elif env_var.startswith("TOOL_SERVER_TOKEN="):
                        self._tool_server_token = env_var.split("=")[1]
                logger.info(f"Found existing container by label for scan {scan_id}")
                return container
-        except (DockerException, RequestsConnectionError, RequestsTimeout) as e:
+        except DockerException:
-            logger.warning("Failed to find existing container by label for scan %s: %s", scan_id, e)
+            pass
-        logger.info("Creating new Docker container for scan %s", scan_id)
+        return self._create_container(scan_id)
        return self._create_container_with_retry(scan_id)
    def _initialize_container(
        self, container: Container, caido_port: int, tool_server_port: int, tool_server_token: str
    ) -> None:
        logger.info("Initializing Caido proxy on port %s", caido_port)
        self._exec_run_with_timeout(
            container,
            f"bash -c 'export CAIDO_PORT={caido_port} && /usr/local/bin/docker-entrypoint.sh true'",
            detach=False,
        )
        time.sleep(5)
        result = self._exec_run_with_timeout(
            container,
            "bash -c 'source /etc/profile.d/proxy.sh && echo $CAIDO_API_TOKEN'",
            user="pentester",
        )
        caido_token = result.output.decode().strip() if result.exit_code == 0 else ""
        container.exec_run(
            f"bash -c 'source /etc/profile.d/proxy.sh && cd /app && "
            f"STRIX_SANDBOX_MODE=true CAIDO_API_TOKEN={caido_token} CAIDO_PORT={caido_port} "
            f"poetry run python strix/runtime/tool_server.py --token {tool_server_token} "
            f"--host 0.0.0.0 --port {tool_server_port} &'",
            detach=True,
            user="pentester",
        )
        time.sleep(2)
        host = self._resolve_docker_host()
        health_url = f"http://{host}:{tool_server_port}/health"
        self._wait_for_tool_server_health(health_url)
    def _wait_for_tool_server_health(
        self,
        health_url: str,
        max_retries: int = TOOL_SERVER_HEALTH_RETRIES,
        request_timeout: int = TOOL_SERVER_HEALTH_REQUEST_TIMEOUT,
    ) -> None:
        import httpx
        logger.info(f"Waiting for tool server health at {health_url}")
        for attempt in range(max_retries):
            try:
                with httpx.Client(trust_env=False, timeout=request_timeout) as client:
                    response = client.get(health_url)
                    response.raise_for_status()
                    health_data = response.json()
                    if health_data.get("status") == "healthy":
                        logger.info(
                            f"Tool server is healthy after {attempt + 1} attempt(s): {health_data}"
                        )
                        return
                    logger.warning(f"Tool server returned unexpected status: {health_data}")
            except httpx.ConnectError:
                logger.debug(
                    f"Tool server not ready (attempt {attempt + 1}/{max_retries}): "
                    f"Connection refused"
                )
            except httpx.TimeoutException:
                logger.debug(
                    f"Tool server not ready (attempt {attempt + 1}/{max_retries}): "
                    f"Request timed out"
                )
            except (httpx.RequestError, httpx.HTTPStatusError) as e:
                logger.debug(f"Tool server not ready (attempt {attempt + 1}/{max_retries}): {e}")
            sleep_time = min(2**attempt * 0.5, 5)
            time.sleep(sleep_time)
        raise SandboxInitializationError(
            "Tool server failed to start",
            "Please ensure Docker Desktop is installed and running, and try running strix again.",
        )
    def _copy_local_directory_to_container(
        self, container: Container, local_path: str, target_name: str | None = None
@@ -367,17 +228,8 @@ class DockerRuntime(AbstractRuntime):
        try:
            local_path_obj = Path(local_path).resolve()
            if not local_path_obj.exists() or not local_path_obj.is_dir():
                logger.warning(f"Local path does not exist or is not directory: {local_path_obj}")
                return
            if target_name:
                logger.info(
                    f"Copying local directory {local_path_obj} to container at "
                    f"/workspace/{target_name}"
                )
            else:
                logger.info(f"Copying local directory {local_path_obj} to container")
            tar_buffer = BytesIO()
            with tarfile.open(fileobj=tar_buffer, mode="w") as tar:
                for item in local_path_obj.rglob("*"):
@@ -388,16 +240,12 @@ class DockerRuntime(AbstractRuntime):
            tar_buffer.seek(0)
            container.put_archive("/workspace", tar_buffer.getvalue())
            container.exec_run(
                "chown -R pentester:pentester /workspace && chmod -R 755 /workspace",
                user="root",
            )
            logger.info("Successfully copied local directory to /workspace")
        except (OSError, DockerException):
-            logger.exception("Failed to copy local directory to container")
+            pass
    async def create_sandbox(
        self,
@@ -406,7 +254,7 @@ class DockerRuntime(AbstractRuntime):
        local_sources: list[dict[str, str]] | None = None,
    ) -> SandboxInfo:
        scan_id = self._get_scan_id(agent_id)
-        container = self._get_or_create_scan_container(scan_id)
+        container = self._get_or_create_container(scan_id)
        source_copied_key = f"_source_copied_{scan_id}"
        if local_sources and not hasattr(self, source_copied_key):
@@ -414,40 +262,34 @@ class DockerRuntime(AbstractRuntime):
                source_path = source.get("source_path")
                if not source_path:
                    continue
-
+                target_name = (
-                target_name = source.get("workspace_subdir")
+                    source.get("workspace_subdir") or Path(source_path).name or f"target_{index}"
-                if not target_name:
+                )
                    target_name = Path(source_path).name or f"target_{index}"
                self._copy_local_directory_to_container(container, source_path, target_name)
            setattr(self, source_copied_key, True)
-        container_id = container.id
+        if container.id is None:
        if container_id is None:
            raise RuntimeError("Docker container ID is unexpectedly None")
-        token = existing_token if existing_token is not None else self._tool_server_token
+        token = existing_token or self._tool_server_token
        if self._tool_server_port is None or self._caido_port is None or token is None:
            raise RuntimeError("Tool server not initialized")
-        if self._tool_server_port is None or token is None:
+        host = self._resolve_docker_host()
-            raise RuntimeError("Tool server not initialized or no token available")
+        api_url = f"http://{host}:{self._tool_server_port}"
-        api_url = await self.get_sandbox_url(container_id, self._tool_server_port)
+        await self._register_agent(api_url, agent_id, token)
        await self._register_agent_with_tool_server(api_url, agent_id, token)
        return {
-            "workspace_id": container_id,
+            "workspace_id": container.id,
            "api_url": api_url,
            "auth_token": token,
            "tool_server_port": self._tool_server_port,
            "caido_port": self._caido_port,
            "agent_id": agent_id,
        }
-    async def _register_agent_with_tool_server(
+    async def _register_agent(self, api_url: str, agent_id: str, token: str) -> None:
        self, api_url: str, agent_id: str, token: str
    ) -> None:
        import httpx
        try:
            async with httpx.AsyncClient(trust_env=False) as client:
                response = await client.post(
@@ -457,54 +299,54 @@ class DockerRuntime(AbstractRuntime):
                    timeout=30,
                )
                response.raise_for_status()
-                logger.info(f"Registered agent {agent_id} with tool server")
+        except httpx.RequestError:
-        except (httpx.RequestError, httpx.HTTPStatusError) as e:
+            pass
            logger.warning(f"Failed to register agent {agent_id}: {e}")
    async def get_sandbox_url(self, container_id: str, port: int) -> str:
        try:
-            container = self.client.containers.get(container_id)
+            self.client.containers.get(container_id)
-            container.reload()
+            return f"http://{self._resolve_docker_host()}:{port}"
            host = self._resolve_docker_host()
        except NotFound:
            raise ValueError(f"Container {container_id} not found.") from None
        except DockerException as e:
            raise RuntimeError(f"Failed to get container URL for {container_id}: {e}") from e
        else:
            return f"http://{host}:{port}"
    def _resolve_docker_host(self) -> str:
        docker_host = os.getenv("DOCKER_HOST", "")
-        if not docker_host:
+        if docker_host:
-            return "127.0.0.1"
+            from urllib.parse import urlparse
        from urllib.parse import urlparse
        parsed = urlparse(docker_host)
        if parsed.scheme in ("tcp", "http", "https") and parsed.hostname:
            return parsed.hostname
            parsed = urlparse(docker_host)
            if parsed.scheme in ("tcp", "http", "https") and parsed.hostname:
                return parsed.hostname
        return "127.0.0.1"
    def _get_extra_hosts(self) -> dict[str, str]:
        return {HOST_GATEWAY_HOSTNAME: "host-gateway"}
    async def destroy_sandbox(self, container_id: str) -> None:
        logger.info("Destroying scan container %s", container_id)
        try:
            container = self.client.containers.get(container_id)
            container.stop()
            container.remove()
            logger.info("Successfully destroyed container %s", container_id)
            self._scan_container = None
            self._tool_server_port = None
            self._tool_server_token = None
            self._caido_port = None
        except (NotFound, DockerException):
            pass
-        except NotFound:
+    def cleanup(self) -> None:
-            logger.warning("Container %s not found for destruction.", container_id)
+        if self._scan_container is not None:
-        except DockerException as e:
+            container_name = self._scan_container.name
-            logger.warning("Failed to destroy container %s: %s", container_id, e)
+            self._scan_container = None
            self._tool_server_port = None
            self._tool_server_token = None
            self._caido_port = None
            if container_name is None:
                return
            import subprocess
            subprocess.Popen(  # noqa: S603
                ["docker", "rm", "-f", container_name],  # noqa: S607
                stdout=subprocess.DEVNULL,
                stderr=subprocess.DEVNULL,
                start_new_session=True,
            )
--- a/strix/runtime/runtime.py
+++ b/strix/runtime/runtime.py
@@ -7,6 +7,7 @@ class SandboxInfo(TypedDict):
    api_url: str
    auth_token: str | None
    tool_server_port: int
    caido_port: int
    agent_id: str
@@ -27,3 +28,6 @@ class AbstractRuntime(ABC):
    @abstractmethod
    async def destroy_sandbox(self, container_id: str) -> None:
        raise NotImplementedError
    def cleanup(self) -> None:
        raise NotImplementedError
--- a/strix/runtime/tool_server.py
+++ b/strix/runtime/tool_server.py
@@ -2,11 +2,9 @@ from __future__ import annotations
 import argparse
 import asyncio
 import logging
 import os
 import signal
 import sys
 from multiprocessing import Process, Queue
 from typing import Any
 import uvicorn
@@ -23,17 +21,22 @@ parser = argparse.ArgumentParser(description="Start Strix tool server")
 parser.add_argument("--token", required=True, help="Authentication token")
 parser.add_argument("--host", default="0.0.0.0", help="Host to bind to")  # nosec
 parser.add_argument("--port", type=int, required=True, help="Port to bind to")
 parser.add_argument(
    "--timeout",
    type=int,
    default=120,
    help="Hard timeout in seconds for each request execution (default: 120)",
 )
 args = parser.parse_args()
 EXPECTED_TOKEN = args.token
 REQUEST_TIMEOUT = args.timeout
 app = FastAPI()
 security = HTTPBearer()
 security_dependency = Depends(security)
-agent_processes: dict[str, dict[str, Any]] = {}
+agent_tasks: dict[str, asyncio.Task[Any]] = {}
 agent_queues: dict[str, dict[str, Queue[Any]]] = {}
 def verify_token(credentials: HTTPAuthorizationCredentials) -> str:
@@ -65,60 +68,19 @@ class ToolExecutionResponse(BaseModel):
    error: str | None = None
-def agent_worker(_agent_id: str, request_queue: Queue[Any], response_queue: Queue[Any]) -> None:
+async def _run_tool(agent_id: str, tool_name: str, kwargs: dict[str, Any]) -> Any:
-    null_handler = logging.NullHandler()
+    from strix.tools.argument_parser import convert_arguments
-
+    from strix.tools.context import set_current_agent_id
    root_logger = logging.getLogger()
    root_logger.handlers = [null_handler]
    root_logger.setLevel(logging.CRITICAL)
    from strix.tools.argument_parser import ArgumentConversionError, convert_arguments
    from strix.tools.registry import get_tool_by_name
-    while True:
+    set_current_agent_id(agent_id)
        try:
            request = request_queue.get()
-            if request is None:
+    tool_func = get_tool_by_name(tool_name)
-                break
+    if not tool_func:
        raise ValueError(f"Tool '{tool_name}' not found")
-            tool_name = request["tool_name"]
+    converted_kwargs = convert_arguments(tool_func, kwargs)
-            kwargs = request["kwargs"]
+    return await asyncio.to_thread(tool_func, **converted_kwargs)
            try:
                tool_func = get_tool_by_name(tool_name)
                if not tool_func:
                    response_queue.put({"error": f"Tool '{tool_name}' not found"})
                    continue
                converted_kwargs = convert_arguments(tool_func, kwargs)
                result = tool_func(**converted_kwargs)
                response_queue.put({"result": result})
            except (ArgumentConversionError, ValidationError) as e:
                response_queue.put({"error": f"Invalid arguments: {e}"})
            except (RuntimeError, ValueError, ImportError) as e:
                response_queue.put({"error": f"Tool execution error: {e}"})
        except (RuntimeError, ValueError, ImportError) as e:
            response_queue.put({"error": f"Worker error: {e}"})
 def ensure_agent_process(agent_id: str) -> tuple[Queue[Any], Queue[Any]]:
    if agent_id not in agent_processes:
        request_queue: Queue[Any] = Queue()
        response_queue: Queue[Any] = Queue()
        process = Process(
            target=agent_worker, args=(agent_id, request_queue, response_queue), daemon=True
        )
        process.start()
        agent_processes[agent_id] = {"process": process, "pid": process.pid}
        agent_queues[agent_id] = {"request": request_queue, "response": response_queue}
    return agent_queues[agent_id]["request"], agent_queues[agent_id]["response"]
@app.post("/execute", response_model=ToolExecutionResponse)
@@ -127,20 +89,42 @@ async def execute_tool(
 ) -> ToolExecutionResponse:
    verify_token(credentials)
-    request_queue, response_queue = ensure_agent_process(request.agent_id)
+    agent_id = request.agent_id
-    request_queue.put({"tool_name": request.tool_name, "kwargs": request.kwargs})
+    if agent_id in agent_tasks:
        old_task = agent_tasks[agent_id]
        if not old_task.done():
            old_task.cancel()
    task = asyncio.create_task(
        asyncio.wait_for(
            _run_tool(agent_id, request.tool_name, request.kwargs), timeout=REQUEST_TIMEOUT
        )
    )
    agent_tasks[agent_id] = task
    try:
-        loop = asyncio.get_event_loop()
+        result = await task
-        response = await loop.run_in_executor(None, response_queue.get)
+        return ToolExecutionResponse(result=result)
-        if "error" in response:
+    except asyncio.CancelledError:
-            return ToolExecutionResponse(error=response["error"])
+        return ToolExecutionResponse(error="Cancelled by newer request")
        return ToolExecutionResponse(result=response.get("result"))
-    except (RuntimeError, ValueError, OSError) as e:
+    except TimeoutError:
-        return ToolExecutionResponse(error=f"Worker error: {e}")
+        return ToolExecutionResponse(error=f"Tool timed out after {REQUEST_TIMEOUT}s")
    except ValidationError as e:
        return ToolExecutionResponse(error=f"Invalid arguments: {e}")
    except (ValueError, RuntimeError, ImportError) as e:
        return ToolExecutionResponse(error=f"Tool execution error: {e}")
    except Exception as e:  # noqa: BLE001
        return ToolExecutionResponse(error=f"Unexpected error: {e}")
    finally:
        if agent_tasks.get(agent_id) is task:
            del agent_tasks[agent_id]
@app.post("/register_agent")
@@ -148,8 +132,6 @@ async def register_agent(
    agent_id: str, credentials: HTTPAuthorizationCredentials = security_dependency
 ) -> dict[str, str]:
    verify_token(credentials)
    ensure_agent_process(agent_id)
    return {"status": "registered", "agent_id": agent_id}
@@ -160,35 +142,16 @@ async def health_check() -> dict[str, Any]:
        "sandbox_mode": str(SANDBOX_MODE),
        "environment": "sandbox" if SANDBOX_MODE else "main",
        "auth_configured": "true" if EXPECTED_TOKEN else "false",
-        "active_agents": len(agent_processes),
+        "active_agents": len(agent_tasks),
-        "agents": list(agent_processes.keys()),
+        "agents": list(agent_tasks.keys()),
    }
 def cleanup_all_agents() -> None:
    for agent_id in list(agent_processes.keys()):
        try:
            agent_queues[agent_id]["request"].put(None)
            process = agent_processes[agent_id]["process"]
            process.join(timeout=1)
            if process.is_alive():
                process.terminate()
                process.join(timeout=1)
            if process.is_alive():
                process.kill()
        except (BrokenPipeError, EOFError, OSError):
            pass
        except (RuntimeError, ValueError) as e:
            logging.getLogger(__name__).debug(f"Error during agent cleanup: {e}")
 def signal_handler(_signum: int, _frame: Any) -> None:
-    signal.signal(signal.SIGPIPE, signal.SIG_IGN) if hasattr(signal, "SIGPIPE") else None
+    if hasattr(signal, "SIGPIPE"):
-    cleanup_all_agents()
+        signal.signal(signal.SIGPIPE, signal.SIG_IGN)
    for task in agent_tasks.values():
        task.cancel()
    sys.exit(0)
@@ -199,7 +162,4 @@ signal.signal(signal.SIGTERM, signal_handler)
 signal.signal(signal.SIGINT, signal_handler)
 if __name__ == "__main__":
-    try:
+    uvicorn.run(app, host=args.host, port=args.port, log_level="info")
        uvicorn.run(app, host=args.host, port=args.port, log_level="info")
    finally:
        cleanup_all_agents()
--- a/strix/skills/README.md
+++ b/strix/skills/README.md
@@ -33,10 +33,15 @@ The skills are dynamically injected into the agent's system prompt, allowing it
 | **`/frameworks`** | Specific testing methods for popular frameworks e.g. Django, Express, FastAPI, and Next.js |
 | **`/technologies`** | Specialized techniques for third-party services such as Supabase, Firebase, Auth0, and payment gateways |
 | **`/protocols`** | Protocol-specific testing patterns for GraphQL, WebSocket, OAuth, and other communication standards |
 | **`/tooling`** | Command-line playbooks for core sandbox tools (nmap, nuclei, httpx, ffuf, subfinder, naabu, katana, sqlmap) |
 | **`/cloud`** | Cloud provider security testing for AWS, Azure, GCP, and Kubernetes environments |
 | **`/reconnaissance`** | Advanced information gathering and enumeration techniques for comprehensive attack surface mapping |
 | **`/custom`** | Community-contributed skills for specialized or industry-specific testing scenarios |
 Notable source-aware skills:
 - `source_aware_whitebox` (coordination): white-box orchestration playbook
 - `source_aware_sast` (custom): semgrep/AST/secrets/supply-chain static triage workflow
 ---
 ## 🎨 Creating New Skills
@@ -49,8 +54,9 @@ A good skill is a structured knowledge package that typically includes:
 - **Practical examples** - Working payloads, commands, or test cases with variations
 - **Validation methods** - How to confirm findings and avoid false positives
 - **Context-specific insights** - Environment and version nuances, configuration-dependent behavior, and edge cases
 - **YAML frontmatter** - `name` and `description` fields for skill metadata
-Skills use XML-style tags for structure and focus on deep, specialized knowledge that significantly enhances agent capabilities for that specific context.
+Skills focus on deep, specialized knowledge to significantly enhance agent capabilities. They are dynamically injected into agent context when needed.
 ---
--- a/strix/skills/init.py
+++ b/strix/skills/init.py
@@ -1,18 +1,29 @@
-from pathlib import Path
+import re
-from jinja2 import Environment
+from strix.utils.resource_paths import get_strix_resource_path
 _EXCLUDED_CATEGORIES = {"scan_modes", "coordination"}
 _FRONTMATTER_PATTERN = re.compile(r"^---\s*\n.*?\n---\s*\n", re.DOTALL)
 def get_available_skills() -> dict[str, list[str]]:
-    skills_dir = Path(__file__).parent
+    skills_dir = get_strix_resource_path("skills")
-    available_skills = {}
+    available_skills: dict[str, list[str]] = {}
    if not skills_dir.exists():
        return available_skills
    for category_dir in skills_dir.iterdir():
        if category_dir.is_dir() and not category_dir.name.startswith("__"):
            category_name = category_dir.name
            if category_name in _EXCLUDED_CATEGORIES:
                continue
            skills = []
-            for file_path in category_dir.glob("*.jinja"):
+            for file_path in category_dir.glob("*.md"):
                skill_name = file_path.stem
                skills.append(skill_name)
@@ -43,6 +54,30 @@ def validate_skill_names(skill_names: list[str]) -> dict[str, list[str]]:
    return {"valid": valid_skills, "invalid": invalid_skills}
 def parse_skill_list(skills: str | None) -> list[str]:
    if not skills:
        return []
    return [s.strip() for s in skills.split(",") if s.strip()]
 def validate_requested_skills(skill_list: list[str], max_skills: int = 5) -> str | None:
    if len(skill_list) > max_skills:
        return "Cannot specify more than 5 skills for an agent (use comma-separated format)"
    if not skill_list:
        return None
    validation = validate_skill_names(skill_list)
    if validation["invalid"]:
        available_skills = list(get_all_skill_names())
        return (
            f"Invalid skills: {validation['invalid']}. "
            f"Available skills: {', '.join(available_skills)}"
        )
    return None
 def generate_skills_description() -> str:
    available_skills = get_available_skills()
@@ -67,36 +102,61 @@ def generate_skills_description() -> str:
    return description
-def load_skills(skill_names: list[str], jinja_env: Environment) -> dict[str, str]:
+def _get_all_categories() -> dict[str, list[str]]:
    """Get all skill categories including internal ones (scan_modes, coordination)."""
    skills_dir = get_strix_resource_path("skills")
    all_categories: dict[str, list[str]] = {}
    if not skills_dir.exists():
        return all_categories
    for category_dir in skills_dir.iterdir():
        if category_dir.is_dir() and not category_dir.name.startswith("__"):
            category_name = category_dir.name
            skills = []
            for file_path in category_dir.glob("*.md"):
                skill_name = file_path.stem
                skills.append(skill_name)
            if skills:
                all_categories[category_name] = sorted(skills)
    return all_categories
 def load_skills(skill_names: list[str]) -> dict[str, str]:
    import logging
    logger = logging.getLogger(__name__)
    skill_content = {}
-    skills_dir = Path(__file__).parent
+    skills_dir = get_strix_resource_path("skills")
-    available_skills = get_available_skills()
+    all_categories = _get_all_categories()
    for skill_name in skill_names:
        try:
            skill_path = None
            if "/" in skill_name:
-                skill_path = f"{skill_name}.jinja"
+                skill_path = f"{skill_name}.md"
            else:
-                for category, skills in available_skills.items():
+                for category, skills in all_categories.items():
                    if skill_name in skills:
-                        skill_path = f"{category}/{skill_name}.jinja"
+                        skill_path = f"{category}/{skill_name}.md"
                        break
                if not skill_path:
-                    root_candidate = f"{skill_name}.jinja"
+                    root_candidate = f"{skill_name}.md"
                    if (skills_dir / root_candidate).exists():
                        skill_path = root_candidate
            if skill_path and (skills_dir / skill_path).exists():
-                template = jinja_env.get_template(skill_path)
+                full_path = skills_dir / skill_path
                var_name = skill_name.split("/")[-1]
-                skill_content[var_name] = template.render()
+                content = full_path.read_text(encoding="utf-8")
                content = _FRONTMATTER_PATTERN.sub("", content).lstrip()
                skill_content[var_name] = content
                logger.info(f"Loaded skill: {skill_name} -> {var_name}")
            else:
                logger.warning(f"Skill not found: {skill_name}")
--- a/strix/skills/coordination/root_agent.jinja
+++ b/strix/skills/coordination/root_agent.jinja
@@ -1,41 +0,0 @@
 <coordination_role>
 You are a COORDINATION AGENT ONLY. You do NOT perform any security testing, vulnerability assessment, or technical work yourself.
 Your ONLY responsibilities:
 1. Create specialized agents for specific security tasks
 2. Monitor agent progress and coordinate between them
 3. Compile final scan reports from agent findings
 4. Manage agent communication and dependencies
 CRITICAL RESTRICTIONS:
 - NEVER perform vulnerability testing or security assessments
 - NEVER write detailed vulnerability reports (only compile final summaries)
 - ONLY use agent_graph and finish tools for coordination
 - You can create agents throughout the scan process, depending on the task and findings, not just at the beginning!
 </coordination_role>
 <agent_management>
 BEFORE CREATING AGENTS:
 1. Analyze the target scope and break into independent tasks
 2. Check existing agents to avoid duplication
 3. Create agents with clear, specific objectives to avoid duplication
 AGENT TYPES YOU CAN CREATE:
 - Reconnaissance: subdomain enum, port scanning, tech identification, etc.
 - Vulnerability Testing: SQL injection, XSS, auth bypass, IDOR, RCE, SSRF, etc. Can be black-box or white-box.
    - Direct vulnerability testing agents to implement hierarchical workflow (per finding: discover, verify, report, fix): each one should create validation agents for findings verification, which spawn reporting agents for documentation, which create fix agents for remediation
 COORDINATION GUIDELINES:
 - Ensure clear task boundaries and success criteria
 - Terminate redundant agents when objectives overlap
 - Use message passing only when essential (requests/answers or critical handoffs); avoid routine status messages and prefer batched updates
 </agent_management>
 <final_responsibilities>
 When all agents complete:
 1. Collect findings from all agents
 2. Compile a final scan summary report
 3. Use finish tool to complete the assessment
 Your value is in orchestration, not execution.
 </final_responsibilities>
--- a/strix/skills/coordination/root_agent.md
+++ b/strix/skills/coordination/root_agent.md
@@ -0,0 +1,92 @@
 ---
 name: root-agent
 description: Orchestration layer that coordinates specialized subagents for security assessments
 ---
 # Root Agent
 Orchestration layer for security assessments. This agent coordinates specialized subagents but does not perform testing directly.
 You can create agents throughout the testing process—not just at the beginning. Spawn agents dynamically based on findings and evolving scope.
 ## Role
 - Decompose targets into discrete, parallelizable tasks
 - Spawn and monitor specialized subagents
 - Aggregate findings into a cohesive final report
 - Manage dependencies and handoffs between agents
 ## Scope Decomposition
 Before spawning agents, analyze the target:
 1. **Identify attack surfaces** - web apps, APIs, infrastructure, etc.
 2. **Define boundaries** - in-scope domains, IP ranges, excluded assets
 3. **Determine approach** - blackbox, greybox, or whitebox assessment
 4. **Prioritize by risk** - critical assets and high-value targets first
 ## Agent Architecture
 Structure agents by function:
 **Reconnaissance**
 - Asset discovery and enumeration
 - Technology fingerprinting
 - Attack surface mapping
 **Vulnerability Assessment**
 - Injection testing (SQLi, XSS, command injection)
 - Authentication and session analysis
 - Access control testing (IDOR, privilege escalation)
 - Business logic flaws
 - Infrastructure vulnerabilities
 **Exploitation and Validation**
 - Proof-of-concept development
 - Impact demonstration
 - Vulnerability chaining
 **Reporting**
 - Finding documentation
 - Remediation recommendations
 ## Coordination Principles
 **Task Independence**
 Create agents with minimal dependencies. Parallel execution is faster than sequential.
 **Clear Objectives**
 Each agent should have a specific, measurable goal. Vague objectives lead to scope creep and redundant work.
 **Avoid Duplication**
 Before creating agents:
 1. Analyze the target scope and break into independent tasks
 2. Check existing agents to avoid overlap
 3. Create agents with clear, specific objectives
 **Hierarchical Delegation**
 Complex findings warrant specialized subagents:
 - Discovery agent finds potential vulnerability
 - Validation agent confirms exploitability
 - Reporting agent documents with reproduction steps
 - Fix agent provides remediation (if needed)
 **Resource Efficiency**
 - Avoid duplicate coverage across agents
 - Terminate agents when objectives are met or no longer relevant
 - Use message passing only when essential (requests/answers, critical handoffs)
 - Prefer batched updates over routine status messages
 ## Completion
 When all agents report completion:
 1. Collect and deduplicate findings across agents
 2. Assess overall security posture
 3. Compile executive summary with prioritized recommendations
 4. Invoke finish tool with final report
--- a/strix/skills/coordination/source_aware_whitebox.md
+++ b/strix/skills/coordination/source_aware_whitebox.md
@@ -0,0 +1,68 @@
 ---
 name: source-aware-whitebox
 description: Coordination playbook for source-aware white-box testing with static triage and dynamic validation
 ---
 # Source-Aware White-Box Coordination
 Use this coordination playbook when repository source code is available.
 ## Objective
 Increase white-box coverage by combining source-aware triage with dynamic validation. Source-aware tooling is expected by default when source is available.
 ## Recommended Workflow
 1. Build a quick source map before deep exploitation, including at least one AST-structural pass (`sg` or `tree-sitter`) scoped to relevant paths.
   - For `sg` baseline, derive `sg-targets.txt` from `semgrep.json` scope first (`paths.scanned`, fallback to unique `results[].path`) and run `xargs ... sg run` on that list.
   - Only fall back to path heuristics when semgrep scope is unavailable, and record the fallback reason in the repo wiki.
 2. Run first-pass static triage to rank high-risk paths.
 3. Use triage outputs to prioritize dynamic PoC validation.
 4. Keep findings evidence-driven: no report without validation.
 5. Keep shared wiki memory current so all agents can reuse context.
 ## Source-Aware Triage Stack
 - `semgrep`: fast security-first triage and custom pattern scans
 - `ast-grep` (`sg`): structural pattern hunting and targeted repo mapping
 - `tree-sitter`: syntax-aware parsing support for symbol and route extraction
 - `gitleaks` + `trufflehog`: complementary secret detection (working tree and history coverage)
 - `trivy fs`: dependency, misconfiguration, license, and secret checks
 Coverage target per repository:
 - one `semgrep` pass
 - one AST structural pass (`sg` and/or `tree-sitter`)
 - one secrets pass (`gitleaks` and/or `trufflehog`)
 - one `trivy fs` pass
 - if any part is skipped, log the reason in the shared wiki note
 ## Agent Delegation Guidance
 - Keep child agents specialized by vulnerability/component as usual.
 - For source-heavy subtasks, prefer creating child agents with `source_aware_sast` skill.
 - Use source findings to shape payloads and endpoint selection for dynamic testing.
 ## Wiki Note Requirement (Source Map)
 When source is present, maintain one wiki note per repository and keep it current.
 Operational rules:
 - At task start, call `list_notes` with `category=wiki`, then read the selected wiki with `get_note(note_id=...)`.
 - If no repo wiki exists, create one with `create_note` and `category=wiki`.
 - Update the same wiki via `update_note`; avoid creating duplicate wiki notes for the same repo.
 - Child agents should read wiki notes first via `get_note`, then extend with new evidence from their scope.
 - Before calling `agent_finish`, each source-focused child agent should append a short delta update to the shared repo wiki (scanner outputs, route/sink map deltas, dynamic follow-ups).
 Recommended sections:
 - Architecture overview
 - Entrypoints and routing
 - AuthN/AuthZ model
 - High-risk sinks and trust boundaries
 - Static scanner summary
 - Dynamic validation follow-ups
 ## Validation Guardrails
 - Static findings are hypotheses until validated.
 - Dynamic exploitation evidence is still required before vulnerability reporting.
 - Keep scanner output concise, deduplicated, and mapped to concrete code locations.
--- a/strix/skills/custom/source_aware_sast.md
+++ b/strix/skills/custom/source_aware_sast.md
@@ -0,0 +1,167 @@
 ---
 name: source-aware-sast
 description: Practical source-aware SAST and AST playbook for semgrep, ast-grep, gitleaks, and trivy fs
 ---
 # Source-Aware SAST Playbook
 Use this skill for source-heavy analysis where static and structural signals should guide dynamic testing.
 ## Fast Start
 Run tools from repo root and store outputs in a dedicated artifact directory:
 ```bash
 mkdir -p /workspace/.strix-source-aware
 ```
 Before scanning, check shared wiki memory:
 ```text
 1) list_notes(category="wiki")
 2) get_note(note_id=...) for the selected repo wiki before analysis
 3) Reuse matching repo wiki note if present
 4) create_note(category="wiki") only if missing
 ```
 After every major source-analysis batch, update the same repo wiki note with `update_note` so other agents can reuse your latest map.
 ## Baseline Coverage Bundle (Recommended)
 Run this baseline once per repository before deep narrowing:
 ```bash
 ART=/workspace/.strix-source-aware
 mkdir -p "$ART"
 semgrep scan --config p/default --config p/golang --config p/secrets \
  --metrics=off --json --output "$ART/semgrep.json" .
 # Build deterministic AST targets from semgrep scope (no hardcoded path guessing)
 python3 - <<'PY'
 import json
 from pathlib import Path
 art = Path("/workspace/.strix-source-aware")
 semgrep_json = art / "semgrep.json"
 targets_file = art / "sg-targets.txt"
 try:
    data = json.loads(semgrep_json.read_text(encoding="utf-8"))
 except Exception:
    targets_file.write_text("", encoding="utf-8")
    raise
 scanned = data.get("paths", {}).get("scanned") or []
 if not scanned:
    scanned = sorted(
        {
            r.get("path")
            for r in data.get("results", [])
            if isinstance(r, dict) and isinstance(r.get("path"), str) and r.get("path")
        }
    )
 bounded = scanned[:4000]
 targets_file.write_text("".join(f"{p}\n" for p in bounded), encoding="utf-8")
 print(f"sg-targets: {len(bounded)}")
 PY
 xargs -r -n 200 sg run --pattern '$F($$$ARGS)' --json=stream < "$ART/sg-targets.txt" \
  > "$ART/ast-grep.json" 2> "$ART/ast-grep.log" || true
 gitleaks detect --source . --report-format json --report-path "$ART/gitleaks.json" || true
 trufflehog filesystem --no-update --json --no-verification . > "$ART/trufflehog.json" || true
 # Keep trivy focused on vuln/misconfig (secrets already covered above) and increase timeout for large repos
 trivy fs --scanners vuln,misconfig --timeout 30m --offline-scan \
  --format json --output "$ART/trivy-fs.json" . || true
 ```
 If one tool is skipped or fails, record that in the shared wiki note along with the reason.
 ## Semgrep First Pass
 Use Semgrep as the default static triage pass:
 ```bash
 # Preferred deterministic profile set (works with --metrics=off)
 semgrep scan --config p/default --config p/golang --config p/secrets \
  --metrics=off --json --output /workspace/.strix-source-aware/semgrep.json .
 # If you choose auto config, do not combine it with --metrics=off
 semgrep scan --config auto --json --output /workspace/.strix-source-aware/semgrep-auto.json .
 ```
 If diff scope is active, restrict to changed files first, then expand only when needed.
 ## AST-Grep Structural Mapping
 Use `sg` for structure-aware code hunting:
 ```bash
 # Ruleless structural pass over deterministic target list (no sgconfig.yml required)
 xargs -r -n 200 sg run --pattern '$F($$$ARGS)' --json=stream \
  < /workspace/.strix-source-aware/sg-targets.txt \
  > /workspace/.strix-source-aware/ast-grep.json 2> /workspace/.strix-source-aware/ast-grep.log || true
 ```
 Target high-value patterns such as:
 - missing auth checks near route handlers
 - dynamic command/query construction
 - unsafe deserialization or template execution paths
 - file and path operations influenced by user input
 ## Tree-Sitter Assisted Repo Mapping
 Use tree-sitter CLI for syntax-aware parsing when grep-level mapping is noisy:
 ```bash
 tree-sitter parse -q <file>
 ```
 Use outputs to improve route/symbol/sink maps for subsequent targeted scans.
 ## Secret and Supply Chain Coverage
 Detect hardcoded credentials:
 ```bash
 gitleaks detect --source . --report-format json --report-path /workspace/.strix-source-aware/gitleaks.json
 trufflehog filesystem --json . > /workspace/.strix-source-aware/trufflehog.json
 ```
 Run repository-wide dependency and config checks:
 ```bash
 trivy fs --scanners vuln,misconfig --timeout 30m --offline-scan \
  --format json --output /workspace/.strix-source-aware/trivy-fs.json . || true
 ```
 ## Converting Static Signals Into Exploits
 1. Rank candidates by impact and exploitability.
 2. Trace source-to-sink flow for top candidates.
 3. Build dynamic PoCs that reproduce the suspected issue.
 4. Report only after dynamic validation succeeds.
 ## Wiki Update Template
 Keep one wiki note per repository and update these sections:
 ```text
 ## Architecture
 ## Entrypoints
 ## AuthN/AuthZ
 ## High-Risk Sinks
 ## Static Findings Summary
 ## Dynamic Validation Follow-Ups
 ```
 Before `agent_finish`, make one final `update_note` call to capture:
 - scanner artifacts and paths
 - top validated/invalidated hypotheses
 - concrete dynamic follow-up tasks
 ## Anti-Patterns
 - Do not treat scanner output as final truth.
 - Do not spend full cycles on low-signal pattern matches.
 - Do not report source-only findings without validation evidence.
 - Do not create multiple wiki notes for the same repository when one already exists.
--- a/strix/skills/frameworks/fastapi.jinja
+++ b/strix/skills/frameworks/fastapi.jinja
@@ -1,142 +0,0 @@
 <fastapi_security_testing_guide>
 <title>FASTAPI — ADVERSARIAL TESTING PLAYBOOK</title>
 <critical>FastAPI (on Starlette) spans HTTP, WebSocket, and background tasks with powerful dependency injection and automatic OpenAPI. Security breaks where identity, authorization, and validation drift across routers, middlewares, proxies, and channels. Treat every dependency, header, and object reference as untrusted until bound to the caller and tenant.</critical>
 <surface_map>
 - ASGI stack: Starlette middlewares (CORS, TrustedHost, ProxyHeaders, Session), exception handlers, lifespan events
 - Routers/sub-apps: APIRouter with prefixes/tags, mounted apps (StaticFiles, admin subapps), `include_router`, versioned paths
 - Security and DI: `Depends`, `Security`, `OAuth2PasswordBearer`, `HTTPBearer`, scopes, per-router vs per-route dependencies
 - Models and validation: Pydantic v1/v2 models, unions/Annotated, custom validators, extra fields policy, coercion
 - Docs and schema: `/openapi.json`, `/docs`, `/redoc`, alternative docs_url/redoc_url, schema extensions
 - Files and static: `UploadFile`, `File`, `FileResponse`, `StaticFiles` mounts, template engines (`Jinja2Templates`)
 - Channels: HTTP (sync/async), WebSocket, StreamingResponse/SSE, BackgroundTasks/Task queues
 - Deployment: Uvicorn/Gunicorn, reverse proxies/CDN, TLS termination, header trust
 </surface_map>
 <methodology>
 1. Enumerate routes from OpenAPI and via crawling; diff with 404-fuzzing for hidden endpoints (`include_in_schema=False`).
 2. Build a Principal × Channel × Content-Type matrix (unauth, user, staff/admin; HTTP vs WebSocket; JSON/form/multipart) and capture baselines.
 3. For each route, identify dependencies (router-level and route-level). Attempt to satisfy security dependencies minimally, then mutate context (tokens, scopes, tenant headers) and object IDs.
 4. Compare behavior across deployments: dev/stage/prod often differ in middlewares (CORS, TrustedHost, ProxyHeaders) and docs exposure.
 </methodology>
 <high_value_targets>
 - `/openapi.json`, `/docs`, `/redoc` in production (full attack surface map; securitySchemes and server URLs)
 - Auth flows: token endpoints, session/cookie bridges, OAuth device/PKCE, scope checks
 - Admin/staff routers, feature-flagged routes, `include_in_schema=False` endpoints
 - File upload/download, import/export/report endpoints, signed URL generators
 - WebSocket endpoints carrying notifications, admin channels, or commands
 - Background job creation/fetch (`/jobs/{id}`, `/tasks/{id}/result`)
 - Mounted subapps (admin UI, storage browsers, metrics/health endpoints)
 </high_value_targets>
 <advanced_techniques>
 <openapi_and_docs>
 - Try default and alternate locations: `/openapi.json`, `/docs`, `/redoc`, `/api/openapi.json`, `/internal/openapi.json`.
 - If OpenAPI is exposed, mine: paths, parameter names, securitySchemes, scopes, servers; find endpoints hidden in UI but present in schema.
 - Schema drift: endpoints with `include_in_schema=False` won’t appear—use wordlists based on tags/prefixes and common admin/debug names.
 </openapi_and_docs>
 <dependency_injection_and_security>
 - Router vs route dependencies: routes may miss security dependencies present elsewhere; check for unprotected variants of protected actions.
 - Minimal satisfaction: `OAuth2PasswordBearer` only yields a token string—verify if any route treats token presence as auth without verification.
 - Scope checks: ensure scopes are enforced by the dependency (e.g., `Security(...)`); routes using `Depends` instead may ignore requested scopes.
 - Header/param aliasing: DI sources headers/cookies/query by name; try case variations and duplicates to influence which value binds.
 </dependency_injection_and_security>
 <auth_and_jwt>
 - Token misuse: developers may decode JWTs without verifying signature/issuer/audience; attempt unsigned/attacker-signed tokens and cross-service audiences.
 - Algorithm/key confusion: try HS/RS cross-use if verification is not pinned; inject `kid` header targeting local files/paths where custom key lookup exists.
 - Session bridges: check cookies set via SessionMiddleware or custom cookies. Attempt session fixation and forging if weak `secret_key` or predictable signing is used.
 - Device/PKCE flows: verify strict PKCE S256 and state/nonce enforcement if OAuth/OIDC is integrated.
 </auth_and_jwt>
 <cors_and_csrf>
 - CORS reflection: broad `allow_origin_regex` or mis-specified origins can permit cross-site reads; test arbitrary Origins and credentialed requests.
 - CSRF: FastAPI/Starlette lack built-in CSRF. If cookies carry auth, attempt state-changing requests via cross-site forms/XHR; validate origin header checks and same-site settings.
 </cors_and_csrf>
 <proxy_and_host_trust>
 - ProxyHeadersMiddleware: if enabled without network boundary, spoof `X-Forwarded-For/Proto` to influence auth/IP gating and secure redirects.
 - TrustedHostMiddleware absent or lax: perform Host header poisoning; attempt password reset links / absolute URL generation under attacker host.
 - Upstream/CDN cache keys: ensure Vary on Authorization/Cookie/Tenant; try cache key confusion to leak personalized responses.
 </proxy_and_host_trust>
 <static_and_uploads>
 - UploadFile.filename: attempt path traversal and control characters; verify server joins/sanitizes and enforces storage roots.
 - FileResponse/StaticFiles: confirm directory boundaries and index/auto-listing; probe symlinks and case/encoding variants.
 - Parser differentials: send JSON vs multipart for the same route to hit divergent code paths/validators.
 </static_and_uploads>
 <template_injection>
 - Jinja2 templates via `TemplateResponse`: search for unescaped injection in variables and filters. Probe with minimal expressions:
 {% raw %}- `{{7*7}}` → arithmetic confirmation
 - `{{cycler.__init__.__globals__['os'].popen('id').read()}}` for RCE in unsafe contexts{% endraw %}
 - Confirm autoescape and strict sandboxing; inspect custom filters/globals.
 </template_injection>
 <ssrf_and_outbound>
 - Endpoints fetching user-supplied URLs (imports, previews, webhooks validation): test loopback/RFC1918/IPv6, redirects, DNS rebinding, and header control.
 - Library behavior (httpx/requests): examine redirect policy, header forwarding, and protocol support; try `file://`, `ftp://`, or gopher-like shims if custom clients are used.
 </ssrf_and_outbound>
 <websockets>
 - Authenticate each connection (query/header/cookie). Attempt cross-origin handshakes and cookie-bearing WS from untrusted origins.
 - Topic naming and authorization: if using user/tenant IDs in channels, subscribe/publish to foreign IDs.
 - Message-level checks: ensure per-message authorization, not only at handshake.
 </websockets>
 <background_tasks_and_jobs>
 - BackgroundTasks that act on IDs must re-enforce ownership/tenant at execution time. Attempt to fetch/cancel others’ jobs by referencing their IDs.
 - Export/import pipelines: test job/result endpoints for IDOR and cross-tenant leaks.
 </background_tasks_and_jobs>
 <multi_app_mounting>
 - Mounted subapps (e.g., `/admin`, `/static`, `/metrics`) may bypass global middlewares. Confirm middleware parity and auth on mounts.
 </multi_app_mounting>
 </advanced_techniques>
 <bypass_techniques>
 - Content-type switching: `application/json` ↔ `application/x-www-form-urlencoded` ↔ `multipart/form-data` to traverse alternate validators/handlers.
 - Parameter duplication and case variants to exploit DI precedence.
 - Method confusion via proxies (e.g., `X-HTTP-Method-Override`) if upstream respects it while app does not.
 - Race windows around dependency-validated state transitions (issue token then mutate with parallel requests).
 </bypass_techniques>
 <special_contexts>
 <pydantic_edges>
 - Coercion: strings to ints/bools, empty strings to None; exploit truthiness and boundary conditions.
 - Extra fields: if models allow/ignore extras, sneak in control fields for downstream logic (scope/role/ownerId) that are later trusted.
 - Unions and `Annotated`: craft shapes hitting unintended branches.
 </pydantic_edges>
 <graphql_and_alt_stacks>
 - If GraphQL (Strawberry/Graphene) is mounted, validate resolver-level authorization and IDOR on node/global IDs.
 - If SQLModel/SQLAlchemy present, probe for raw query usage and row-level authorization gaps.
 </graphql_and_alt_stacks>
 </special_contexts>
 <validation>
 1. Show unauthorized data access or action with side-by-side owner vs non-owner requests (or different tenants).
 2. Demonstrate cross-channel consistency (HTTP and WebSocket) for the same rule.
 3. Include proof where proxies/headers/caches alter outcomes (Host/XFF/CORS).
 4. Provide minimal payloads confirming template/SSRF execution or token misuse, with safe or OAST-based oracles.
 5. Document exact dependency paths (router-level, route-level) that missed enforcement.
 </validation>
 <pro_tips>
 1. Always fetch `/openapi.json` first; it’s the blueprint. If hidden, brute-force likely admin/report/export routes.
 2. Trace dependencies per route; map which ones enforce auth/scopes vs merely parse input.
 3. Treat tokens returned by `OAuth2PasswordBearer` as untrusted strings—verify actual signature and claims on the server.
 4. Test CORS with arbitrary Origins and with credentials; verify preflight and actual request deltas.
 5. Add Host and X-Forwarded-* fuzzing when behind proxies; watch for redirect/absolute URL differences.
 6. For uploads, vary filename encodings, dot segments, and NUL-like bytes; verify storage paths and served URLs.
 7. Use content-type toggling to hit alternate validators and code paths.
 8. For WebSockets, test cookie-based auth, origin restrictions, and per-message authorization.
 9. Mine client bundles/env for secret paths and preview/admin flags; many teams hide routes via UI only.
 10. Keep PoCs minimal and durable (IDs, headers, small payloads) and prefer reproducible diffs over noisy payloads.
 </pro_tips>
 <remember>Authorization and validation must be enforced in the dependency graph and at the resource boundary for every path and channel. If any route, middleware, or mount skips binding subject, action, and object/tenant, expect cross-user and cross-tenant breakage.</remember>
 </fastapi_security_testing_guide>
--- a/strix/skills/frameworks/fastapi.md
+++ b/strix/skills/frameworks/fastapi.md
@@ -0,0 +1,191 @@
 ---
 name: fastapi
 description: Security testing playbook for FastAPI applications covering ASGI, dependency injection, and API vulnerabilities
 ---
 # FastAPI
 Security testing for FastAPI/Starlette applications. Focus on dependency injection flaws, middleware gaps, and authorization drift across routers and channels.
 ## Attack Surface
 **Core Components**
 - ASGI middlewares: CORS, TrustedHost, ProxyHeaders, Session, exception handlers, lifespan events
 - Routers and sub-apps: APIRouter prefixes/tags, mounted apps (StaticFiles, admin), `include_router`, versioned paths
 - Dependency injection: `Depends`, `Security`, `OAuth2PasswordBearer`, `HTTPBearer`, scopes
 **Data Handling**
 - Pydantic models: v1/v2, unions/Annotated, custom validators, extra fields policy, coercion
 - File operations: UploadFile, File, FileResponse, StaticFiles mounts
 - Templates: Jinja2Templates rendering
 **Channels**
 - HTTP (sync/async), WebSocket, SSE/StreamingResponse
 - BackgroundTasks and task queues
 **Deployment**
 - Uvicorn/Gunicorn, reverse proxies/CDN, TLS termination, header trust
 ## High-Value Targets
 - `/openapi.json`, `/docs`, `/redoc` in production (full attack surface map, securitySchemes, server URLs)
 - Auth flows: token endpoints, session/cookie bridges, OAuth device/PKCE
 - Admin/staff routers, feature-flagged routes, `include_in_schema=False` endpoints
 - File upload/download, import/export/report endpoints, signed URL generators
 - WebSocket endpoints (notifications, admin channels, commands)
 - Background job endpoints (`/jobs/{id}`, `/tasks/{id}/result`)
 - Mounted subapps (admin UI, storage browsers, metrics/health)
 ## Reconnaissance
 **OpenAPI Mining**
 ```
 GET /openapi.json
 GET /docs
 GET /redoc
 GET /api/openapi.json
 GET /internal/openapi.json
 ```
 Extract: paths, parameters, securitySchemes, scopes, servers. Endpoints with `include_in_schema=False` won't appear—fuzz based on discovered prefixes and common admin/debug names.
 **Dependency Mapping**
 For each route, identify:
 - Router-level dependencies (applied to all routes)
 - Route-level dependencies (per endpoint)
 - Which dependencies enforce auth vs just parse input
 ## Key Vulnerabilities
 ### Authentication & Authorization
 **Dependency Injection Gaps**
 - Routes missing security dependencies present on other routes
 - `Depends` used instead of `Security` (ignores scope enforcement)
 - Token presence treated as authentication without signature verification
 - `OAuth2PasswordBearer` only yields a token string—verify routes don't treat presence as auth
 **JWT Misuse**
 - Decode without verify: test unsigned tokens, attacker-signed tokens
 - Algorithm confusion: HS256/RS256 cross-use if not pinned
 - `kid` header injection for custom key lookup paths
 - Missing issuer/audience validation, cross-service token reuse
 **Session Weaknesses**
 - SessionMiddleware with weak `secret_key`
 - Session fixation via predictable signing
 - Cookie-based auth without CSRF protection
 **OAuth/OIDC**
 - Device/PKCE flows: verify strict PKCE S256 and state/nonce enforcement
 ### Access Control
 **IDOR via Dependencies**
 - Object IDs in path/query not validated against caller
 - Tenant headers trusted without binding to authenticated user
 - BackgroundTasks acting on IDs without re-validating ownership at execution time
 - Export/import pipelines with IDOR and cross-tenant leaks
 **Scope Bypass**
 - Minimal scope satisfaction (any valid token accepted)
 - Router vs route scope enforcement inconsistency
 ### Input Handling
 **Pydantic Exploitation**
 - Type coercion: strings to ints/bools, empty strings to None, truthiness edge cases
 - Extra fields: `extra = "allow"` permits injecting control fields (role, ownerId, scope)
 - Union types and `Annotated`: craft shapes hitting unintended validation branches
 **Content-Type Switching**
 ```
 application/json ↔ application/x-www-form-urlencoded ↔ multipart/form-data
 ```
 Different content types hit different validators or code paths (parser differentials).
 **Parameter Manipulation**
 - Case variations in header/cookie names
 - Duplicate parameters exploiting DI precedence
 - Method override via `X-HTTP-Method-Override` (upstream respects, app doesn't)
 ### CORS & CSRF
 **CORS Misconfiguration**
 - Overly broad `allow_origin_regex`
 - Origin reflection without validation
 - Credentialed requests with permissive origins
 - Verify preflight vs actual request deltas
 **CSRF Exposure**
 - No built-in CSRF in FastAPI/Starlette
 - Cookie-based auth without origin validation
 - Missing SameSite attribute
 ### Proxy & Host Trust
 **Header Spoofing**
 - ProxyHeadersMiddleware without network boundary: spoof `X-Forwarded-For/Proto` to influence auth/IP gating
 - Absent TrustedHostMiddleware: Host header poisoning in password reset links, absolute URL generation
 - Cache key confusion: missing Vary on Authorization/Cookie/Tenant
 ### Server-Side Vulnerabilities
 **Template Injection (Jinja2)**
 ```python
 {{7*7}}  # Arithmetic confirmation
 {{cycler.__init__.__globals__['os'].popen('id').read()}}  # RCE
 ```
 Check autoescape settings and custom filters/globals.
 **SSRF**
 - User-supplied URLs in imports, previews, webhooks validation
 - Test: loopback, RFC1918, IPv6, redirects, DNS rebinding, header control
 - Library behavior (httpx/requests): redirect policy, header forwarding, protocol support
 - Protocol smuggling: `file://`, `ftp://`, gopher-like shims if custom clients
 **File Upload**
 - Path traversal in `UploadFile.filename` with control characters
 - Missing storage root enforcement, symlink following
 - Vary filename encodings, dot segments, NUL-like bytes
 - Verify storage paths and served URLs
 ### WebSocket Security
 - Missing per-connection authentication
 - Cross-origin WebSocket without origin validation
 - Topic/channel IDOR (subscribing to other users' channels)
 - Authorization only at handshake, not per-message
 ### Mounted Apps
 Sub-apps at `/admin`, `/static`, `/metrics` may bypass global middlewares. Verify auth enforcement parity across all mounts.
 ### Alternative Stacks
 - If GraphQL (Strawberry/Graphene) is mounted: validate resolver-level authorization, IDOR on node/global IDs
 - If SQLModel/SQLAlchemy present: probe for raw query usage and row-level authorization gaps
 ## Bypass Techniques
 - Content-type switching to traverse alternate validators
 - Parameter duplication and case variants exploiting DI precedence
 - Method confusion via proxies (`X-HTTP-Method-Override`)
 - Race windows around dependency-validated state transitions (issue token then mutate with parallel requests)
 ## Testing Methodology
 1. **Enumerate** - Fetch OpenAPI, diff with 404-fuzzing for hidden endpoints
 2. **Matrix testing** - Test each route across: unauth/user/admin × HTTP/WebSocket × JSON/form/multipart
 3. **Dependency analysis** - Map which dependencies enforce auth vs parse input
 4. **Cross-environment** - Compare dev/stage/prod for middleware and docs exposure differences
 5. **Channel consistency** - Verify same authorization on HTTP and WebSocket for equivalent operations
 ## Validation Requirements
 - Side-by-side requests showing unauthorized access (owner vs non-owner, cross-tenant)
 - Cross-channel proof (HTTP and WebSocket for same rule)
 - Header/proxy manipulation showing altered outcomes (Host/XFF/CORS)
 - Minimal payloads for template injection, SSRF, token misuse with safe/OAST oracles
 - Document exact dependency paths (router-level, route-level) that missed enforcement
--- a/strix/skills/frameworks/nestjs.md
+++ b/strix/skills/frameworks/nestjs.md
@@ -0,0 +1,225 @@
 ---
 name: nestjs
 description: Security testing playbook for NestJS applications covering guards, pipes, decorators, module boundaries, and multi-transport auth
 ---
 # NestJS
 Security testing for NestJS applications. Focus on guard gaps across decorator stacks, validation pipe bypasses, module boundary leaks, and inconsistent auth enforcement across HTTP, WebSocket, and microservice transports.
 ## Attack Surface
 **Decorator Pipeline**
 - Guards: `@UseGuards`, `CanActivate`, execution context (HTTP/WS/RPC), `Reflector` metadata
 - Pipes: `ValidationPipe` (whitelist, transform, forbidNonWhitelisted), `ParseIntPipe`, custom pipes
 - Interceptors: response mapping, caching, logging, timeout — can modify request/response flow
 - Filters: exception filters that may leak information
 - Metadata: `@SetMetadata`, `@Public()`, `@Roles()`, `@Permissions()`
 **Module System**
 - `@Module` boundaries, provider scoping (DEFAULT/REQUEST/TRANSIENT)
 - Dynamic modules: `forRoot`/`forRootAsync`, global modules
 - DI container: provider overrides, custom providers
 **Controllers & Transports**
 - REST: `@Controller`, versioning (URI/Header/MediaType)
 - GraphQL: `@Resolver`, playground/sandbox exposure
 - WebSocket: `@WebSocketGateway`, gateway guards, room authorization
 - Microservices: TCP, Redis, NATS, MQTT, gRPC, Kafka — often lack HTTP-level auth
 **Data Layer**
 - TypeORM: repositories, QueryBuilder, raw queries, relations
 - Prisma: `$queryRaw`, `$queryRawUnsafe`
 - Mongoose: operator injection, `$where`, `$regex`
 **Auth & Config**
 - `@nestjs/passport` strategies, `@nestjs/jwt`, session-based auth
 - `@nestjs/config`, ConfigService, `.env` files
 - `@nestjs/throttler`, rate limiting with `@SkipThrottle`
 **API Documentation**
 - `@nestjs/swagger`: OpenAPI exposure, DTO schemas, auth schemes
 ## High-Value Targets
 - Swagger/OpenAPI endpoints in production (`/api`, `/api-docs`, `/api-json`, `/swagger`)
 - Auth endpoints: login, register, token refresh, password reset, OAuth callbacks
 - Admin controllers decorated with `@Roles('admin')` — test with user-level tokens
 - File upload endpoints using `FileInterceptor`/`FilesInterceptor`
 - WebSocket gateways sharing business logic with HTTP controllers
 - Microservice handlers (`@MessagePattern`, `@EventPattern`) — often unguarded
 - CRUD generators (`@nestjsx/crud`) with auto-generated endpoints
 - Background jobs and scheduled tasks (`@nestjs/schedule`)
 - Health/metrics endpoints (`@nestjs/terminus`, `/health`, `/metrics`)
 - GraphQL playground/sandbox in production (`/graphql`)
 ## Reconnaissance
 **Swagger Discovery**
 ```
 GET /api
 GET /api-docs
 GET /api-json
 GET /swagger
 GET /docs
 GET /v1/api-docs
 GET /api/v2/docs
 ```
 Extract: paths, parameter schemas, DTOs, auth schemes, example values. Swagger may reveal internal endpoints, deprecated routes, and admin-only paths not visible in the UI.
 **Guard Mapping**
 For each controller and method, identify:
 - Global guards (applied in `main.ts` or app module)
 - Controller-level guards (`@UseGuards` on the class)
 - Method-level guards (`@UseGuards` on individual handlers)
 - `@Public()` or `@SkipThrottle()` decorators that bypass protection
 ## Key Vulnerabilities
 ### Guard Bypass
 **Decorator Stack Gaps**
 - Guards execute: global → controller → method. A method missing `@UseGuards` when siblings have it is the #1 finding.
 - `@Public()` metadata causing global `AuthGuard` to skip enforcement — check if applied too broadly.
 - New methods added to existing controllers without inheriting the expected guard.
 **ExecutionContext Switching**
 - Guards handling only HTTP context (`getRequest()`) may fail silently on WebSocket or RPC, returning `true` by default.
 - Test same business logic through alternate transports to find context-specific bypasses.
 **Reflector Mismatches**
 - Guard reads `SetMetadata('roles', [...])` but decorator sets `'role'` (singular) — guard sees no metadata, defaults to allow.
 - `applyDecorators()` compositions accidentally overriding stricter guards with permissive ones.
 ### Validation Pipe Exploits
 **Whitelist Bypass**
 - `whitelist: true` without `forbidNonWhitelisted: true`: extra properties silently stripped but may have been processed by earlier middleware/interceptors.
 - Missing `@Type(() => ChildDto)` on nested objects: `@ValidateNested()` without `@Type` means nested payload is never validated.
 - Array elements: `@IsArray()` doesn't validate elements without `@ValidateNested({ each: true })` and `@Type`.
 **Type Coercion**
 - `transform: true` enables implicit coercion: strings → numbers, `"true"` → `true`, `"null"` → `null`.
 - Exploit truthiness assumptions in business logic downstream.
 **Conditional Validation**
 - `@ValidateIf()` and validation groups creating paths where fields skip validation entirely.
 **Missing Parse Pipes**
 - `@Param('id')` without `ParseIntPipe`/`ParseUUIDPipe` — string values reach ORM queries directly.
 ### Auth & Passport
 **JWT Strategy**
 - Check `ignoreExpiration` is false, `algorithms` is pinned (no `none` or HS/RS confusion)
 - Weak `secretOrKey` values
 - Cross-service token reuse when audience/issuer not enforced
 **Passport Strategy Issues**
 - `validate()` return value becomes `req.user` — if it returns full DB record, sensitive fields leak downstream
 - Multiple strategies (JWT + session): one may bypass restrictions of the other
 - Custom guards returning `true` for unauthenticated as "optional auth"
 **Timing Attacks**
 - Plain string comparison instead of bcrypt/argon2 in local strategy
 ### Serialization Leaks
 **Missing ClassSerializerInterceptor**
 - If not applied globally, `@Exclude()` fields (passwords, internal IDs) returned in responses.
 - `@Expose()` with groups: admin-only fields exposed when groups not enforced per-request.
 **Circular Relations**
 - Eager-loaded TypeORM/Prisma relations exposing entire object graph without careful serialization.
 ### Interceptor Abuse
 **Cache Poisoning**
 - `CacheInterceptor` without user/tenant identity in cache key — responses from one user served to another.
 - Test: authenticated request, then unauthenticated request returning cached data.
 **Response Mapping**
 - Transformation interceptors may leak internal entity fields if mapping is incomplete.
 ### Module Boundary Leaks
 **Global Module Exposure**
 - `@Global()` modules expose all providers to every module without explicit imports.
 - Sensitive services (admin operations, internal APIs) accessible from untrusted modules.
 **Config Leaks**
 - `forRoot`/`forRootAsync` configuration secrets accessible via `ConfigService` injection in any module.
 **Scope Issues**
 - Request-scoped providers (`Scope.REQUEST`) incorrectly scoped as DEFAULT (singleton) — request context leaks across concurrent requests.
 ### WebSocket Gateway
 - HTTP guards don't automatically apply to WebSocket gateways — `@UseGuards` must be explicit.
 - Authentication deferred from `handleConnection` to message handlers allows unauthenticated message sending.
 - Room/namespace authorization: users joining rooms they shouldn't access.
 - `@SubscribeMessage()` handlers relying on connection-level auth instead of per-message validation.
 ### Microservice Transport
 - `@MessagePattern`/`@EventPattern` handlers often lack guards (considered "internal").
 - If transport (Redis, NATS, Kafka) is network-accessible, messages can be injected bypassing all HTTP security.
 - `ValidationPipe` may only be configured for HTTP — microservice payloads skip validation.
 ### ORM Injection
 **TypeORM**
 - `QueryBuilder` and `.query()` with template literal interpolation → SQL injection.
 - Relations: API allowing specification of which relations to load via query params.
 **Mongoose**
 - Query operator injection: `{ password: { $gt: "" } }` via unsanitized request body.
 - `$where` and `$regex` operators from user input.
 **Prisma**
 - `$queryRaw`/`$executeRaw` with string interpolation (but not tagged template).
 - `$queryRawUnsafe` usage.
 ### Rate Limiting
 - `@SkipThrottle()` on sensitive endpoints (login, password reset, OTP).
 - In-memory throttler storage: resets on restart, doesn't work across instances.
 - Behind proxy without `trust proxy`: all requests share same IP, or header spoofable.
 ### CRUD Generators
 - Auto-generated CRUD endpoints may not inherit manual guard configurations.
 - Bulk operations (`createMany`, `updateMany`) bypassing per-entity authorization.
 - Query parameter injection in CRUD libraries: `filter`, `sort`, `join`, `select` exposing unauthorized data.
 ## Bypass Techniques
 - `@Public()` / skip-metadata applied via composed decorators at method level causing global guards to skip via `Reflector` metadata checks
 - Route param pollution: `/users/123?id=456` — which `id` wins in guards vs handlers?
 - Version routing: v1 of endpoint may still be registered without the guard added to v2
 - `X-HTTP-Method-Override` or `_method` processed by Express before guards
 - Content-type switching: `application/x-www-form-urlencoded` instead of JSON to bypass JSON-specific validation
 - Exception filter differences: guard throwing results in generic error that leaks route existence info
 ## Testing Methodology
 1. **Enumerate** — Fetch Swagger/OpenAPI, map all controllers, resolvers, and gateways
 2. **Guard audit** — Map decorator stack per method: which guards, pipes, interceptors are applied at each level
 3. **Matrix testing** — Test each endpoint across: unauth/user/admin × HTTP/WS/microservice
 4. **Validation probing** — Send extra fields, wrong types, nested objects, arrays to find pipe gaps
 5. **Transport parity** — Same operation via HTTP, WebSocket, and microservice transport
 6. **Module boundaries** — Check if providers from one module are accessible without proper imports
 7. **Serialization check** — Compare raw entity fields with API response fields
 ## Validation Requirements
 - Guard bypass: request to guarded endpoint succeeding without auth, showing guard chain break point
 - Validation bypass: payload with extra/malformed fields affecting business logic
 - Cross-transport inconsistency: same action authorized via HTTP but exploitable via WebSocket/microservice
 - Module boundary leak: accessing provider or data across unauthorized module boundaries
 - Serialization leak: response containing excluded fields (passwords, internal metadata)
 - IDOR: side-by-side requests from different users showing unauthorized data access
 - ORM injection: raw query with user-controlled input returning unauthorized data, or error-based evidence of query structure
 - Cache poisoning: response from unauthenticated or different-user request matching a prior authenticated user's cached response
--- a/strix/skills/frameworks/nextjs.jinja
+++ b/strix/skills/frameworks/nextjs.jinja
@@ -1,152 +0,0 @@
 <nextjs_security_testing_guide>
 <title>NEXT.JS — ADVERSARIAL TESTING PLAYBOOK</title>
 <critical>Modern Next.js combines multiple execution contexts (Edge, Node, RSC, client) with smart caching (ISR/RSC fetch cache), middleware, and server actions. Authorization and cache boundaries must be enforced consistently across all paths or attackers will cross tenants, leak data, or invoke privileged actions.</critical>
 <surface_map>
 - Routers: App Router (`app/`) and Pages Router (`pages/`) coexist; test both
 - Runtimes: Node.js vs Edge (V8 isolates with restricted APIs)
 - Data paths: RSC (server components), Client components, Route Handlers (`app/api/**`), API routes (`pages/api/**`)
 - Middleware: `middleware.ts`/`_middleware.ts`
 - Rendering modes: SSR, SSG, ISR, on-demand revalidation, draft/preview mode
 - Images: `next/image` optimization and remote loader
 - Auth: NextAuth.js (callbacks, CSRF/state, callbackUrl), custom JWT/session bridges
 - Server Actions: streamed POST with `Next-Action` header and action IDs
 </surface_map>
 <methodology>
 1. Inventory routes (pages + app), static vs dynamic segments, and params. Map middleware coverage and runtime per path.
 2. Capture baseline for each role (unauth, user, admin) across SSR, API routes, Route Handlers, Server Actions, and streaming data.
 3. Diff responses while toggling runtime (Edge/Node), content-type, fetch cache directives, and preview/draft mode.
 4. Probe caching and revalidation boundaries (ISR, RSC fetch, CDN) for cross-user/tenant leaks.
 </methodology>
 <high_value_targets>
 - Middleware-protected routes (auth, geo, A/B)
 - Admin/staff paths, draft/preview content, on-demand revalidate endpoints
 - RSC payloads and flight data, streamed responses (server actions)
 - Image optimizer and custom loaders, remotePatterns/domains
 - NextAuth callbacks (`/api/auth/callback/*`), sign-in providers, CSRF/state handling
 - Edge-only features (bot protection, IP gates) and their Node equivalents
 </high_value_targets>
 <advanced_techniques>
 <route_enumeration>
 - __BUILD_MANIFEST.sortedPages: Execute `console.log(__BUILD_MANIFEST.sortedPages.join('\n'))` in browser console to instantly reveal all registered routes (Pages Router and static App Router paths compiled at build time)
 - __NEXT_DATA__: Inspect `<script id="__NEXT_DATA__">` for serverside props, pageProps, buildId, and dynamic route params on current page; reveals data flow and prop structure
 - Source maps exposure: Check `/_next/static/` for exposed .map files revealing full route structure, server action IDs, API endpoints, and internal function names
 - Client bundle mining: Search main-*.js and page chunks for route definitions; grep for 'pathname:', 'href:', '__next_route__', 'serverActions', and API endpoint strings
 - Static chunk enumeration: Probe `/_next/static/chunks/pages/` and `/_next/static/chunks/app/` for build artifacts; filenames map directly to routes (e.g., `admin.js` → `/admin`)
 - Build manifest fetch: GET `/_next/static/<buildId>/_buildManifest.js` and `/_next/static/<buildId>/_ssgManifest.js` for complete route and static generation metadata
 - Sitemap/robots leakage: Check `/sitemap.xml`, `/robots.txt`, and `/sitemap-*.xml` for unintended exposure of admin/internal/preview paths
 - Server action discovery: Inspect Network tab for POST requests with `Next-Action` header; extract action IDs from response streams and client hydration data
 - Environment variable leakage: Execute `Object.keys(process.env).filter(k => k.startsWith('NEXT_PUBLIC_'))` in console to list public env vars; grep bundles for 'API_KEY', 'SECRET', 'TOKEN', 'PASSWORD' to find accidentally leaked credentials
 </route_enumeration>
 <middleware_bypass>
 - Test for CVE-class middleware bypass via `x-middleware-subrequest` crafting and `x-nextjs-data` probing. Look for 307 + `x-middleware-rewrite`/`x-nextjs-redirect` headers and attempt bypass on protected routes.
 - Attempt direct route access on Node vs Edge runtimes; confirm protection parity.
 </middleware_bypass>
 <server_actions>
 - Capture streamed POSTs containing `Next-Action` headers. Map hashed action IDs via source maps or specialized tooling to discover hidden actions.
 - Invoke actions out of UI flow and with alternate content-types; verify server-side authorization is enforced per action and not assumed from client state.
 - Try cross-tenant/object references within action payloads to expose BOLA/IDOR via server actions.
 </server_actions>
 <rsc_and_cache>
 - RSC fetch cache: probe `fetch` cache modes (force-cache, default, no-store) and revalidate tags/paths. Look for user-bound data cached without identity keys (ETag/Set-Cookie unaware).
 - Confirm that personalized data is rendered via `no-store` or properly keyed; attempt cross-user content via shared caches/CDN.
 - Inspect Flight data streams for serialized sensitive fields leaking through props.
 </rsc_and_cache>
 <isr_and_revalidation>
 - Identify ISR pages (stale-while-revalidate). Check if responses may include user-bound fragments or tenant-dependent content.
 - On-demand revalidation endpoints: look for weak secrets in URLs, referer-disclosed tokens, or unvalidated hosts triggering `revalidatePath`/`revalidateTag`.
 - Attempt header-smuggling or method variations to trigger revalidation flows.
 </isr_and_revalidation>
 <draft_preview_mode>
 - Draft/preview mode toggles via secret URLs/cookies; search for preview enable endpoints and secrets in client bundles/env leaks.
 - Try setting preview cookies from subdomains, alternate paths, or through open redirects; observe content differences and persistence.
 </draft_preview_mode>
 <next_image_ssrf>
 - Review `images.domains`/`remotePatterns` in `next.config.js`; test SSRF to internal hosts (IPv4/IPv6 variants, DNS rebinding) if patterns are broad.
 - Custom loader functions may fetch with arbitrary URLs; test protocol smuggling and redirection chains.
 - Attempt cache poisoning: craft same URL with different normalization to affect other users.
 </next_image_ssrf>
 <nextauth_pitfalls>
 - State/nonce/PKCE: validate per-provider correctness; attempt missing/relaxed checks leading to login CSRF or token mix-up.
 - Callback URL restrictions: open redirect in `callbackUrl` or mis-scoped allowed hosts; hijack sessions by forcing callbacks.
 - JWT/session bridges: audience/issuer not enforced across API routes/Route Handlers; attempt cross-service token reuse.
 </nextauth_pitfalls>
 <edge_runtime_diffs>
 - Edge runtime lacks certain Node APIs; defenses relying on Node-only modules may be skipped. Compare behavior of the same route in Edge vs Node.
 - Header trust and IP determination can differ at the edge; test auth decisions tied to `x-forwarded-*` variance.
 </edge_runtime_diffs>
 <client_and_dom>
 - Identify `dangerouslySetInnerHTML`, Markdown renderers, and user-controlled href/src attributes. Validate CSP/Trusted Types coverage for SSR/CSR/hydration.
 - Attack hydration boundaries: server vs client render mismatches can enable gadget-based XSS.
 </client_and_dom>
 <data_fetching_over_exposure>
 - getServerSideProps/getStaticProps leakage: Execute `JSON.parse(document.getElementById('__NEXT_DATA__').textContent).props.pageProps` in console to inspect all server-fetched data; look for sensitive fields (emails, tokens, internal IDs, full user objects) passed to client but not rendered in UI
 - Over-fetched database queries: Check if pageProps include entire user records, relations, or admin-only fields when only username is displayed; common when using ORM select-all patterns
 - API response pass-through: Verify if API responses are sanitized before passing to props; developers often forward entire responses including metadata, cursors, or debug info
 - Environment-dependent data: Test if staging/dev accidentally exposes more fields in props than production due to inconsistent serialization logic
 - Nested object inspection: Drill into nested props objects; look for `_metadata`, `_internal`, `__typename` (GraphQL), or framework-added fields containing sensitive context
 </data_fetching_over_exposure>
 </advanced_techniques>
 <bypass_techniques>
 - Content-type switching: `application/json` ↔ `multipart/form-data` ↔ `application/x-www-form-urlencoded` to traverse alternate code paths.
 - Method override/tunneling: `_method`, `X-HTTP-Method-Override`, GET on endpoints unexpectedly accepting writes.
 - Case/param aliasing and query duplication affecting middleware vs handler parsing.
 - Cache key confusion at CDN/proxy (lack of Vary on auth cookies/headers) to leak personalized SSR/ISR content.
 - API route path normalization: Test `/api/users` vs `/api/users/` vs `/api//users` vs `/api/./users`; middleware may normalize differently than route handlers, allowing protection bypass. Try double slashes, trailing slashes, and dot segments.
 - Parameter pollution: Send duplicate query params (`?id=1&id=2`) or array notation (`?filter[]=a&filter[]=b`) to exploit parsing differences between middleware (which may check first value) and handler (which may use last or array).
 </bypass_techniques>
 <special_contexts>
 <uploads_and_files>
 - API routes and Route Handlers handling file uploads: check MIME sniffing, Content-Disposition, stored path traversal, and public serving of user files.
 - Validate signing/scoping of any generated file URLs (short TTL, audience-bound).
 </uploads_and_files>
 <integrations_and_webhooks>
 - Webhooks that trigger revalidation/imports: require HMAC verification; test with replay and cross-tenant object IDs.
 - Analytics/AB testing flags controlled via cookies/headers; ensure they do not unlock privileged server paths.
 </integrations_and_webhooks>
 </special_contexts>
 <validation>
 1. Provide side-by-side requests for different principals showing cross-user/tenant content or actions.
 2. Prove cache boundary failure (RSC/ISR/CDN) with response diffs or ETag collisions.
 3. Demonstrate server action invocation outside UI with insufficient authorization checks.
 4. Show middleware bypass (where applicable) with explicit headers and resulting protected content.
 5. Include runtime parity checks (Edge vs Node) proving inconsistent enforcement.
 6. For route enumeration: verify discovered routes return 200/403 (deployed) not 404 (build artifacts); test with authenticated vs unauthenticated requests.
 7. For leaked credentials: test API keys with minimal read-only calls; filter placeholders (YOUR_API_KEY, demo-token); confirm keys match provider patterns (sk_live_*, pk_prod_*).
 8. For __NEXT_DATA__ over-exposure: test cross-user (User A's props should not contain User B's PII); verify exposed fields are not in DOM; validate token validity with API calls.
 9. For path normalization bypasses: show differential responses (403 vs 200 for path variants); redirects (307/308) don't count—only direct access bypasses matter.
 </validation>
 <pro_tips>
 1. Enumerate with both App and Pages routers: many apps ship a hybrid surface.
 2. Treat caching as an identity boundary—test with cookies stripped, altered, and with Vary/ETag diffs.
 3. Decode client bundles for preview/revalidate secrets, action IDs, and hidden routes.
 4. Use streaming-aware tooling to capture server actions and RSC payloads; diff flight data.
 5. For NextAuth, fuzz provider params (state, nonce, scope, callbackUrl) and verify strictness.
 6. Always retest under Edge and Node; misconfigurations often exist in only one runtime.
 7. Probe `next/image` aggressively but safely—test IPv6/obscure encodings and redirect behavior.
 8. Validate negative paths: other-user IDs, other-tenant headers/subdomains, lower roles.
 9. Focus on export/report/download endpoints; they often bypass resolver-level checks.
 10. Document minimal, reproducible PoCs; avoid noisy payloads—prefer precise diffs.
 </pro_tips>
 <remember>Next.js security breaks where identity, authorization, and caching diverge across routers, runtimes, and data paths. Bind subject, action, and object on every path, and key caches to identity and tenant explicitly.</remember>
 </nextjs_security_testing_guide>
--- a/strix/skills/frameworks/nextjs.md
+++ b/strix/skills/frameworks/nextjs.md
@@ -0,0 +1,228 @@
 ---
 name: nextjs
 description: Security testing playbook for Next.js covering App Router, Server Actions, RSC, and Edge runtime vulnerabilities
 ---
 # Next.js
 Security testing for Next.js applications. Focus on authorization drift across runtimes (Edge/Node), caching boundaries, server actions, and middleware bypass.
 ## Attack Surface
 **Routers**
 - App Router (`app/`) and Pages Router (`pages/`) often coexist
 - Route Handlers (`app/api/**`) and API routes (`pages/api/**`)
 - Middleware: `middleware.ts` at project root
 **Runtimes**
 - Node.js (full API access)
 - Edge (V8 isolates, restricted APIs)
 **Rendering & Caching**
 - SSR, SSG, ISR, on-demand revalidation
 - RSC (React Server Components) with fetch cache
 - Draft/preview mode
 **Data Paths**
 - Server Components, Client Components
 - Server Actions (streamed POST with `Next-Action` header)
 - `getServerSideProps`, `getStaticProps`
 **Integrations**
 - NextAuth.js (callbacks, CSRF, callbackUrl)
 - `next/image` optimization and remote loaders
 ## High-Value Targets
 - Middleware-protected routes (auth, geo, A/B)
 - Admin/staff paths, draft/preview content, on-demand revalidate endpoints
 - RSC payloads and flight data, streamed responses
 - Image optimizer and custom loaders, remotePatterns/domains
 - NextAuth callbacks (`/api/auth/callback/*`), sign-in providers
 - Edge-only features (bot protection, IP gates) and their Node equivalents
 ## Reconnaissance
 **Route Discovery**
 ```javascript
 // Browser console - list all routes
 console.log(__BUILD_MANIFEST.sortedPages.join('\n'))
 // Inspect server-fetched data
 JSON.parse(document.getElementById('__NEXT_DATA__').textContent).props.pageProps
 // List public environment variables
 Object.keys(process.env).filter(k => k.startsWith('NEXT_PUBLIC_'))
 ```
 **Build Artifacts**
 ```
 GET /_next/static/<buildId>/_buildManifest.js
 GET /_next/static/<buildId>/_ssgManifest.js
 GET /_next/static/chunks/pages/
 GET /_next/static/chunks/app/
 ```
 Chunk filenames map to routes (e.g., `admin.js` → `/admin`).
 **Source Maps**
 Check `/_next/static/` for exposed `.map` files revealing route structure, server action IDs, and internal functions.
 **Client Bundle Mining**
 Search main-*.js for: `pathname:`, `href:`, `__next_route__`, `serverActions`, API endpoints. Grep for `API_KEY`, `SECRET`, `TOKEN`, `PASSWORD` to find accidentally leaked credentials.
 **Server Action Discovery**
 Inspect Network tab for POST requests with `Next-Action` header. Extract action IDs from response streams and hydration data.
 **Additional Leakage**
 - `/sitemap.xml`, `/robots.txt`, `/sitemap-*.xml` for unintended admin/internal/preview paths
 - Client bundles/env for secret paths and preview/admin flags (many teams hide routes via UI only)
 ## Key Vulnerabilities
 ### Middleware Bypass
 **Known Techniques**
 - `x-middleware-subrequest` header crafting (CVE-class bypass)
 - `x-nextjs-data` probing
 - Look for 307 + `x-middleware-rewrite`/`x-nextjs-redirect` headers
 **Path Normalization**
 ```
 /api/users
 /api/users/
 /api//users
 /api/./users
 ```
 Middleware may normalize differently than route handlers. Test double slashes, trailing slashes, dot segments.
 **Parameter Pollution**
 ```
 ?id=1&id=2
 ?filter[]=a&filter[]=b
 ```
 Middleware checks first value, handler uses last or array.
 ### Server Actions
 - Invoke actions outside UI flow with alternate content-types
 - Authorization assumed from client state rather than enforced server-side
 - IDOR via object references in action payloads
 - Map action IDs from source maps to discover hidden actions
 ### RSC & Caching
 **Cache Boundary Failures**
 - User-bound data cached without identity keys (ETag/Set-Cookie unaware)
 - Personalized content served from shared cache/CDN
 - Missing `no-store` on sensitive fetches
 **Flight Data Leakage**
 Inspect streamed RSC payloads for serialized sensitive fields in props.
 **ISR Issues**
 - Stale-while-revalidate responses containing user-specific or tenant-dependent data
 - Weak secrets in on-demand revalidation endpoint URLs
 - Referer-disclosed tokens or unvalidated hosts triggering `revalidatePath`/`revalidateTag`
 - Header-smuggling or method variations to trigger revalidation
 ### Authentication
 **NextAuth Pitfalls**
 - Missing/relaxed state/nonce/PKCE per provider (login CSRF, token mix-up)
 - Open redirect in `callbackUrl` or mis-scoped allowed hosts
 - JWT audience/issuer not enforced across routes
 - Cross-service token reuse
 - Session hijacking by forcing callbacks
 **Session Boundaries**
 - Different auth enforcement between App Router and Pages Router
 - API routes vs Route Handlers authorization inconsistency
 ### Data Exposure
 **__NEXT_DATA__ Over-fetching**
 Server-fetched data passed to client but not rendered:
 - Full user objects when only username needed
 - Internal IDs, tokens, admin-only fields
 - ORM select-all patterns exposing entire records
 - API responses forwarded without sanitization (metadata, cursors, debug info)
 **Environment-Dependent Exposure**
 - Staging/dev accidentally exposes more fields than production
 - Inconsistent serialization logic across environments
 **Props Inspection**
 ```javascript
 // Check for sensitive data in page props
 JSON.parse(document.getElementById('__NEXT_DATA__').textContent).props
 ```
 Look for `_metadata`, `_internal`, `__typename` (GraphQL), nested sensitive objects.
 ### Image Optimizer SSRF
 **Remote Patterns**
 - Broad `images.domains`/`remotePatterns` in `next.config.js`
 - Test: internal hosts, IPv4/IPv6 variants, DNS rebinding
 **Custom Loaders**
 - Protocol smuggling via redirect chains
 - Cache poisoning via URL normalization differences affecting other users
 ### Runtime Divergence
 **Edge vs Node**
 - Defenses relying on Node-only modules skipped on Edge
 - Header trust differs (`x-forwarded-*` handling)
 - Same route may behave differently across runtimes
 ### Client-Side
 **XSS Vectors**
 - `dangerouslySetInnerHTML`
 - Markdown renderers
 - User-controlled href/src attributes
 - Validate CSP/Trusted Types coverage for SSR/CSR/hydration
 **Hydration Mismatches**
 Server vs client render differences can enable gadget-based XSS.
 ### Draft/Preview Mode
 - Secret URLs/cookies enabling preview
 - Preview secrets leaked in client bundles/env
 - Setting preview cookies from subdomains or via open redirects
 ## Bypass Techniques
 - Content-type switching: `application/json` ↔ `multipart/form-data` ↔ `application/x-www-form-urlencoded`
 - Method override: `_method`, `X-HTTP-Method-Override`, GET on endpoints accepting writes
 - Case/param aliasing and query duplication affecting middleware vs handler parsing
 - Cache key confusion at CDN/proxy (lack of Vary on auth cookies/headers)
 ## Testing Methodology
 1. **Enumerate** - Use `__BUILD_MANIFEST`, source maps, build artifacts, sitemap/robots to map all routes
 2. **Runtime matrix** - Test each route under Edge and Node runtimes
 3. **Role matrix** - Test as unauth/user/admin across SSR, API routes, Route Handlers, Server Actions
 4. **Cache probing** - Verify caching respects identity (strip cookies, alter Vary headers, check ETags)
 5. **Middleware validation** - Test path variants and header manipulation for bypass
 6. **Cross-router** - Compare authorization between App Router and Pages Router paths
 ## Validation Requirements
 - Side-by-side requests showing cross-user/tenant access
 - Cache boundary failure proof (response diffs, ETag collisions)
 - Server action invocation outside UI with insufficient auth
 - Middleware bypass with explicit headers showing protected content access
 - Runtime parity checks (Edge vs Node inconsistent enforcement)
 - Discovered routes verified as deployed (200/403) not just build artifacts (404)
 - Leaked credentials tested with minimal read-only calls; filter placeholders
 - `__NEXT_DATA__` exposure: verify cross-user (User A's props shouldn't contain User B's PII), confirm exposed fields not in DOM
 - Path normalization bypasses: show differential responses (403 vs 200), redirects don't count
--- a/strix/skills/protocols/graphql.jinja
+++ b/strix/skills/protocols/graphql.jinja
@@ -1,215 +0,0 @@
 <graphql_protocol_guide>
 <title>GRAPHQL — ADVANCED TESTING AND EXPLOITATION</title>
 <critical>GraphQL’s flexibility enables powerful data access, but also unique failures: field- and edge-level authorization drift, schema exposure (even with introspection off), alias/batch abuse, resolver injection, federated trust gaps, and complexity/fragment bombs. Bind subject→action→object at resolver boundaries and validate across every transport and feature flag.</critical>
 <scope>
 - Queries, mutations, subscriptions (graphql-ws, graphql-transport-ws)
 - Persisted queries/Automatic Persisted Queries (APQ)
 - Federation (Apollo/GraphQL Mesh): _service SDL and _entities
 - File uploads (GraphQL multipart request spec)
 - Relay conventions: global node IDs, connections/cursors
 </scope>
 <methodology>
 1. Fingerprint endpoint(s), transport(s), and stack (framework, plugins, gateway). Note GraphiQL/Playground exposure and CORS/credentials.
 2. Obtain multiple principals (unauth, basic, premium, admin/staff) and capture at least one valid object ID per subject.
 3. Acquire schema via introspection; if disabled, infer iteratively from errors, field suggestions, __typename probes, vocabulary brute-force.
 4. Build an Actor × Operation × Type/Field matrix. Exercise each resolver path with swapped IDs, roles, tenants, and channels (REST proxies, GraphQL HTTP, WS).
 5. Validate consistency: same authorization and validation across queries, mutations, subscriptions, batch/alias, persisted queries, and federation.
 </methodology>
 <discovery_techniques>
 <endpoint_finding>
 - Common paths: /graphql, /api/graphql, /v1/graphql, /gql
 - Probe with minimal canary:
 {% raw %}
 POST /graphql {"query":"{__typename}"}
 GET  /graphql?query={__typename}
 {% endraw %}
 - Detect GraphiQL/Playground; note if accessible cross-origin and with credentials.
 </endpoint_finding>
 <introspection_and_inference>
 - If enabled, dump full schema; otherwise:
  - Use __typename on candidate fields to confirm types
  - Abuse field suggestions and error shapes to enumerate names/args
  - Infer enums from “expected one of” errors; coerce types by providing wrong shapes
  - Reconstruct edges from pagination and connection hints (pageInfo, edges/node)
 </introspection_and_inference>
 <schema_construction>
 - Map root operations, object types, interfaces/unions, directives (@auth, @defer, @stream), and custom scalars (Upload, JSON, DateTime)
 - Identify sensitive fields: email, tokens, roles, billing, file keys, admin flags
 - Note cascade paths where child resolvers may skip auth under parent assumptions
 </schema_construction>
 </discovery_techniques>
 <exploitation_techniques>
 <authorization_and_idor>
 - Test field-level and edge-level checks, not just top-level gates. Pair owned vs foreign IDs within the same request via aliases to diff responses.
 {% raw %}
 query {
  me { id }
  a: order(id:"A_OWNER") { id total owner { id email } }
  b: order(id:"B_FOREIGN") { id total owner { id email } }
 }
 {% endraw %}
 - Probe mutations for partial updates that bypass validation (JSON Merge Patch semantics in inputs).
 - Validate node/global ID resolvers (Relay) bind to the caller; decode/replace base64 IDs and compare access.
 </authorization_and_idor>
 <batching_and_alias>
 - Alias to perform many logically separate reads in one operation; watch for per-request vs per-field auth discrepancies
 - If array batching is supported (non-standard), submit multiple operations to bypass rate limits and achieve partial failures
 {% raw %}
 query {
  u1:user(id:"1"){email}
  u2:user(id:"2"){email}
  u3:user(id:"3"){email}
 }
 {% endraw %}
 </batching_and_alias>
 <variable_and_shape_abuse>
 - Scalars vs objects vs arrays: {% raw %}{id:123}{% endraw} vs {% raw %}{id:"123"}{% endraw} vs {% raw %}{id:[123]}{% endraw}; send null/empty/0/-1 and extra object keys retained by backend
 - Duplicate keys in JSON variables: {% raw %}{"id":1,"id":2}{% endraw} (parser precedence), default argument values, coercion errors leaking field names
 </variable_and_shape_abuse>
 <cursor_and_projection>
 - Decode cursors (often base64) to manipulate offsets/IDs and skip filters
 - Abuse selection sets and fragments to force overfetching of sensitive subfields
 </cursor_and_projection>
 <file_uploads>
 - GraphQL multipart: test multiple Upload scalars, filename/path tricks, unexpected content-types, oversize chunks; verify server-side ownership/scoping for returned URLs
 </file_uploads>
 </exploitation_techniques>
 <advanced_techniques>
 <introspection_bypass>
 - Field suggestion leakage: submit near-miss names to harvest suggestions
 - Error taxonomy: different codes/messages for unknown field vs unauthorized field reveal existence
 - __typename sprinkling on edges to confirm types without schema
 </introspection_bypass>
 <defer_and_stream>
 - Use @defer and @stream to obtain partial results or subtrees hidden by parent checks; confirm server supports incremental delivery
 {% raw %}
 query @defer {
  me { id }
  ... @defer { adminPanel { secrets } }
 }
 {% endraw %}
 </defer_and_stream>
 <fragment_and_complexity_bombs>
 - Recursive fragment spreads and wide selection sets cause CPU/memory spikes; craft minimal reproducible bombs to validate cost limits
 {% raw %}
 fragment x on User { friends { ...x } }
 query { me { ...x } }
 {% endraw %}
 - Validate depth/complexity limiting, query cost analyzers, and timeouts
 </fragment_and_complexity_bombs>
 <federation>
 - Apollo Federation: query _service { sdl } if exposed; target _entities to materialize foreign objects by key without proper auth in subgraphs
 {% raw %}
 query {
  _entities(representations:[
    {__typename:"User", id:"TARGET"}
  ]) { ... on User { email roles } }
 }
 {% endraw %}
 - Look for auth done at gateway but skipped in subgraph resolvers; cross-subgraph IDOR via inconsistent ownership checks
 </federation>
 <subscriptions>
 - Check message-level authorization, not only handshake; attempt to subscribe to channels for other users/tenants; test cross-tenant event leakage
 - Abuse filter args in subscription resolvers to reference foreign IDs
 </subscriptions>
 <persisted_queries>
 - APQ hashes can be guessed/bruteforced or leaked from clients; replay privileged operations by supplying known hashes with attacker variables
 - Validate that hash→operation mapping enforces principal and operation allowlists
 </persisted_queries>
 <csrf_and_cors>
 - If cookie-auth is used and GET is accepted, test CSRF on mutations via query parameters; verify SameSite and origin checks
 - Cross-origin GraphiQL/Playground exposure with credentials can leak data via postMessage bridges
 </csrf_and_cors>
 <waf_evasion>
 - Reshape queries: comments, block strings, Unicode escapes, alias/fragment indirection, JSON variables vs inline args, GET vs POST vs application/graphql
 - Split fields across fragments and inline spreads to avoid naive signatures
 </waf_evasion>
 </advanced_techniques>
 <bypass_techniques>
 <transport_and_parsers>
 - Toggle content-types: application/json, application/graphql, multipart/form-data; try GET with query and variables params
 - HTTP/2 multiplexing and connection reuse to widen timing windows and rate limits
 </transport_and_parsers>
 <naming_and_aliasing>
 - Case/underscore variations, Unicode homoglyphs (server-dependent), aliases masking sensitive field names
 </naming_and_aliasing>
 <gateway_and_cache>
 - CDN/key confusion: responses cached without considering Authorization or variables; manipulate Vary and Accept headers
 - Redirects and 304/206 behaviors leaking partially cached GraphQL responses
 </gateway_and_cache>
 </bypass_techniques>
 <special_contexts>
 <relay>
 - node(id:…) global resolution: decode base64, swap type/id pairs, ensure per-type authorization is enforced inside resolvers
 - Connections: verify that filters (owner/tenant) apply before pagination; cursor tampering should not cross ownership boundaries
 </relay>
 <server_plugins>
 - Custom directives (@auth, @private) and plugins often annotate intent but do not enforce; verify actual checks in each resolver path
 </server_plugins>
 </special_contexts>
 <chaining_attacks>
 - GraphQL + IDOR: enumerate IDs via list fields, then fetch or mutate foreign objects
 - GraphQL + CSRF: trigger mutations cross-origin when cookies/auth are accepted without proper checks
 - GraphQL + SSRF: resolvers that fetch URLs (webhooks, metadata) abused to reach internal services
 </chaining_attacks>
 <validation>
 1. Provide paired requests (owner vs non-owner) differing only in identifiers/roles that demonstrate unauthorized access or mutation.
 2. Prove resolver-level bypass: show top-level checks present but child field/edge exposes data.
 3. Demonstrate transport parity: reproduce via HTTP and WS (subscriptions) or via persisted queries.
 4. Minimize payloads; document exact selection sets and variable shapes used.
 </validation>
 <false_positives>
 - Introspection available only on non-production/stub endpoints
 - Public fields by design with documented scopes
 - Aggregations or counts without sensitive attributes
 - Properly enforced depth/complexity and per-resolver authorization across transports
 </false_positives>
 <impact>
 - Cross-account/tenant data exposure and unauthorized state changes
 - Bypass of federation boundaries enabling lateral access across services
 - Credential/session leakage via lax CORS/CSRF around GraphiQL/Playground
 </impact>
 <pro_tips>
 1. Always diff the same operation under multiple principals with aliases in one request.
 2. Sprinkle __typename to map types quickly when schema is hidden.
 3. Attack edges: child resolvers often skip auth compared to parents.
 4. Try @defer/@stream and subscriptions to slip gated data in incremental events.
 5. Decode cursors and node IDs; assume base64 unless proven otherwise.
 6. Federation: exercise _entities with crafted representations; subgraphs frequently trust gateway auth.
 7. Persisted queries: extract hashes from clients; replay with attacker variables.
 8. Keep payloads small and structured; restructure rather than enlarge to evade WAFs.
 9. Validate defenses by code/config review where possible; don’t trust directives alone.
 10. Prove impact with role-separated, transport-separated, minimal PoCs.
 </pro_tips>
 <remember>GraphQL security is resolver security. If any resolver on the path to a field fails to bind subject, object, and action, the graph leaks. Validate every path, every transport, every environment.</remember>
 </graphql_protocol_guide>
--- a/strix/skills/protocols/graphql.md
+++ b/strix/skills/protocols/graphql.md
@@ -0,0 +1,276 @@
 ---
 name: graphql
 description: GraphQL security testing covering introspection, resolver injection, batching attacks, and authorization bypass
 ---
 # GraphQL
 Security testing for GraphQL APIs. Focus on resolver-level authorization, field/edge access control, batching abuse, and federation trust boundaries.
 ## Attack Surface
 **Operations**
 - Queries, mutations, subscriptions
 - Persisted queries / Automatic Persisted Queries (APQ)
 **Transports**
 - HTTP POST/GET with `application/json` or `application/graphql`
 - WebSocket: graphql-ws, graphql-transport-ws protocols
 - Multipart for file uploads
 **Schema Features**
 - Introspection (`__schema`, `__type`)
 - Directives: `@defer`, `@stream`, custom auth directives (@auth, @private)
 - Custom scalars: Upload, JSON, DateTime
 - Relay: global node IDs, connections/cursors, interfaces/unions
 **Architecture**
 - Federation (Apollo, GraphQL Mesh): `_service`, `_entities`
 - Gateway vs subgraph authorization boundaries
 ## Reconnaissance
 **Endpoint Discovery**
 ```
 POST /graphql         {"query":"{__typename}"}
 POST /api/graphql     {"query":"{__typename}"}
 POST /v1/graphql      {"query":"{__typename}"}
 POST /gql             {"query":"{__typename}"}
 GET  /graphql?query={__typename}
 ```
 Check for GraphiQL/Playground exposure with credentials enabled (cross-origin with cookies can leak data via postMessage bridges).
 **Schema Acquisition**
 If introspection enabled:
 ```graphql
 {__schema{types{name fields{name args{name}}}}}
 ```
 If disabled, infer schema via:
 - `__typename` probes on candidate fields
 - Field suggestion errors (submit near-miss names to harvest suggestions)
 - "Expected one of" errors revealing enum values
 - Type coercion errors exposing field structure
 - Error taxonomy: different codes for "unknown field" vs "unauthorized field" reveal existence
 **Schema Mapping**
 Map: root operations, object types, interfaces/unions, directives, custom scalars. Identify sensitive fields: email, tokens, roles, billing, API keys, admin flags, file URLs. Note cascade paths where child resolvers may skip auth under parent assumptions.
 ## Key Vulnerabilities
 ### Authorization Bypass
 **Field-Level IDOR**
 Test with aliases comparing owned vs foreign objects in single request:
 ```graphql
 query {
  own: order(id:"OWNED_ID") { id total owner { email } }
  foreign: order(id:"FOREIGN_ID") { id total owner { email } }
 }
 ```
 **Edge/Child Resolver Gaps**
 Parent resolver checks auth, child resolver assumes it's already validated:
 ```graphql
 query {
  user(id:"FOREIGN") {
    id
    privateData { secrets }  # Child may skip auth check
  }
 }
 ```
 **Relay Node Resolution**
 Decode base64 global IDs, swap type/id pairs:
 ```graphql
 query {
  node(id:"VXNlcjoxMjM=") { ... on User { email } }
 }
 ```
 Ensure per-type authorization is enforced inside resolvers. Verify connection filters (owner/tenant) apply before pagination; cursor tampering should not cross ownership boundaries.
 **Mutation Bypass**
 - Probe mutations for partial updates bypassing validation (JSON Merge Patch semantics)
 - Test mutations that accept extra fields passed to downstream logic
 ### Batching & Alias Abuse
 **Enumeration via Aliases**
 ```graphql
 query {
  u1:user(id:"1"){email}
  u2:user(id:"2"){email}
  u3:user(id:"3"){email}
 }
 ```
 Bypasses per-request rate limits; exposes per-field vs per-request auth inconsistencies.
 **Array Batching**
 If supported (non-standard), submit multiple operations to achieve partial failures and bypass limits.
 ### Input Manipulation
 **Type Confusion**
 ```
 {id: 123}      vs {id: "123"}
 {id: [123]}    vs {id: null}
 {id: 0}        vs {id: -1}
 ```
 **Duplicate Keys**
 ```json
 {"id": 1, "id": 2}
 ```
 Parser precedence varies; may bypass validation. Also test default argument values.
 **Extra Fields**
 Send unexpected keys in input objects; backends may pass them to resolvers or downstream logic.
 ### Cursor Manipulation
 Decode cursors (usually base64) to:
 - Manipulate offsets/IDs
 - Skip filters
 - Cross ownership boundaries
 ### Directive Abuse
 **@defer/@stream**
 ```graphql
 query {
  me { id }
  ... @defer { adminPanel { secrets } }
 }
 ```
 May return gated data in incremental delivery. Confirm server supports incremental delivery.
 **Custom Directives**
@auth, @private and similar directives often annotate intent but do not enforce—verify actual checks in each resolver path.
 ### Complexity Attacks
 **Fragment Bombs**
 ```graphql
 fragment x on User { friends { ...x } }
 query { me { ...x } }
 ```
 Test depth/complexity limits, query cost analyzers, timeouts.
 **Wide Selection Sets**
 Abuse selection sets and fragments to force overfetching of sensitive subfields.
 ### Federation Exploitation
 **SDL Exposure**
 ```graphql
 query { _service { sdl } }
 ```
 **Entity Materialization**
 ```graphql
 query {
  _entities(representations:[
    {__typename:"User", id:"TARGET_ID"}
  ]) { ... on User { email roles } }
 }
 ```
 Gateway may enforce auth; subgraph resolvers may not. Look for cross-subgraph IDOR via inconsistent ownership checks.
 ### Subscription Security
 - Authorization at handshake only, not per-message
 - Subscribe to other users' channels via filter args
 - Cross-tenant event leakage
 - Abuse filter args in subscription resolvers to reference foreign IDs
 ### Persisted Query Abuse
 - APQ hashes leaked from client bundles
 - Replay privileged operations with attacker variables
 - Hash bruteforce for common operations
 - Validate hash→operation mapping enforces principal and operation allowlists
 ### CORS & CSRF
 - Cookie-auth with GET queries enables CSRF on mutations via query parameters
 - GraphiQL/Playground cross-origin with credentials leaks data
 - Missing SameSite and origin validation
 ### File Uploads
 GraphQL multipart spec:
 - Multiple Upload scalars
 - Filename/path traversal tricks
 - Unexpected content-types, oversize chunks
 - Server-side ownership/scoping for returned URLs
 ## WAF Evasion
 **Query Reshaping**
 - Comments and block strings (`"""..."""`)
 - Unicode escapes
 - Alias/fragment indirection
 - JSON variables vs inline args
 - GET vs POST vs `application/graphql`
 **Fragment Splitting**
 Split fields across fragments and inline spreads to avoid naive signatures:
 ```graphql
 fragment a on User { email }
 fragment b on User { password }
 query { me { ...a ...b } }
 ```
 ## Bypass Techniques
 **Transport Switching**
 ```
 Content-Type: application/json
 Content-Type: application/graphql
 Content-Type: multipart/form-data
 GET with query params
 ```
 **Timing & Rate Limits**
 - HTTP/2 multiplexing and connection reuse to widen timing windows
 - Batching to bypass rate limits
 **Naming Tricks**
 - Case/underscore variations
 - Unicode homoglyphs (server-dependent)
 - Aliases masking sensitive field names
 **Cache Confusion**
 - CDN caching without Vary on Authorization
 - Variable manipulation affecting cache keys
 - Redirects and 304/206 behaviors leaking partial responses
 ## Testing Methodology
 1. **Fingerprint** - Identify endpoints, transports, stack (Apollo, Hasura, etc.), GraphiQL exposure
 2. **Schema mapping** - Introspection or inference to build complete type graph
 3. **Principal matrix** - Collect tokens for unauth, user, premium, admin roles with at least one valid object ID per subject
 4. **Field sweep** - Test each resolver with owned vs foreign IDs via aliases in same request
 5. **Transport parity** - Verify same auth on HTTP, WebSocket, persisted queries
 6. **Federation probe** - Test `_service` and `_entities` for subgraph auth gaps
 7. **Edge cases** - Cursors, @defer/@stream, subscriptions, file uploads
 ## Validation Requirements
 - Paired requests (owner vs non-owner) showing unauthorized access
 - Resolver-level bypass: parent checks present, child field exposes data
 - Transport parity proof: HTTP and WebSocket for same operation
 - Federation bypass: `_entities` accessing data without subgraph auth
 - Minimal payloads with exact selection sets and variable shapes
 - Document exact resolver paths that missed enforcement
--- a/strix/skills/scan_modes/deep.jinja
+++ b/strix/skills/scan_modes/deep.jinja
@@ -1,145 +0,0 @@
 <scan_mode>
 DEEP SCAN MODE - Exhaustive Security Assessment
 This mode is for thorough security reviews where finding vulnerabilities is critical.
 PHASE 1: EXHAUSTIVE RECONNAISSANCE AND MAPPING
 Spend significant effort understanding the target before exploitation.
 For whitebox (source code available):
 - Map EVERY file, module, and code path in the repository
 - Trace all entry points from HTTP handlers to database queries
 - Identify all authentication mechanisms and their implementations
 - Map all authorization checks and understand the access control model
 - Identify all external service integrations and API calls
 - Analyze all configuration files for secrets and misconfigurations
 - Review all database schemas and understand data relationships
 - Map all background jobs, cron tasks, and async processing
 - Identify all serialization/deserialization points
 - Review all file handling operations (upload, download, processing)
 - Understand the deployment model and infrastructure assumptions
 - Check all dependency versions against known CVE databases
 For blackbox (no source code):
 - Exhaustive subdomain enumeration using multiple sources and tools
 - Full port scanning to identify all services
 - Complete content discovery with multiple wordlists
 - Technology fingerprinting on all discovered assets
 - API endpoint discovery through documentation, JavaScript analysis, and fuzzing
 - Identify all parameters including hidden and rarely-used ones
 - Map all user roles by testing with different account types
 - Understand rate limiting, WAF rules, and security controls in place
 - Document the complete application architecture as understood from outside
 EXECUTION STRATEGY - HIERARCHICAL AGENT SWARM:
 After Phase 1 (Recon & Mapping) is complete:
 1. Divide the application into major components/parts (e.g., Auth System, Payment Gateway, User Profile, Admin Panel)
 2. Spawn a specialized subagent for EACH major component
 3. Each component agent must then:
   - Further subdivide its scope into subparts (e.g., Login Form, Registration API, Password Reset)
   - Spawn sub-subagents for each distinct subpart
 4. At the lowest level (specific functionality), spawn specialized agents for EACH potential vulnerability type:
   - "Auth System" → "Login Form" → "SQLi Agent", "XSS Agent", "Auth Bypass Agent"
   - This creates a massive parallel swarm covering every angle
   - Do NOT overload a single agent with multiple vulnerability types
   - Scale horizontally to maximum capacity
 PHASE 2: DEEP BUSINESS LOGIC ANALYSIS
 Understand the application deeply enough to find logic flaws:
 - CREATE A FULL STORYBOARD of all user flows and state transitions
 - Document every step of the business logic in a structured flow diagram
 - Use the application extensively as every type of user to map the full lifecycle of data
 - Document all state machines and workflows (e.g. Order Created -> Paid -> Shipped)
 - Identify trust boundaries between components
 - Map all integrations with third-party services
 - Understand what invariants the application tries to maintain
 - Identify all points where roles, privileges, or sensitive data changes hands
 - Look for implicit assumptions in the business logic
 - Consider multi-step attacks that abuse normal functionality
 PHASE 3: COMPREHENSIVE ATTACK SURFACE TESTING
 Test EVERY input vector with EVERY applicable technique.
 Input Handling - Test all parameters, headers, cookies with:
 - Multiple injection payloads (SQL, NoSQL, LDAP, XPath, Command, Template)
 - Various encodings and bypass techniques (double encoding, unicode, null bytes)
 - Boundary conditions and type confusion
 - Large payloads and buffer-related issues
 Authentication and Session:
 - Exhaustive brute force protection testing
 - Session fixation, hijacking, and prediction attacks
 - JWT/token manipulation if applicable
 - OAuth flow abuse scenarios
 - Password reset flow vulnerabilities (token leakage, reuse, timing)
 - Multi-factor authentication bypass techniques
 - Account enumeration through all possible channels
 Access Control:
 - Test EVERY endpoint for horizontal and vertical access control
 - Parameter tampering on all object references
 - Forced browsing to all discovered resources
 - HTTP method tampering
 - Test access control after session changes (logout, role change)
 File Operations:
 - Exhaustive file upload bypass testing (extension, content-type, magic bytes)
 - Path traversal on all file parameters
 - Server-side request forgery through file inclusion
 - XXE through all XML parsing points
 Business Logic:
 - Race conditions on all state-changing operations
 - Workflow bypass attempts on every multi-step process
 - Price/quantity manipulation in all transactions
 - Parallel execution attacks
 - Time-of-check to time-of-use vulnerabilities
 Advanced Attacks:
 - HTTP request smuggling if multiple proxies/servers
 - Cache poisoning and cache deception
 - Subdomain takeover on all subdomains
 - Prototype pollution in JavaScript applications
 - CORS misconfiguration exploitation
 - WebSocket security testing
 - GraphQL specific attacks if applicable
 PHASE 4: VULNERABILITY CHAINING
 Don't just find individual bugs - chain them:
 - Combine information disclosure with access control bypass
 - Chain SSRF to access internal services
 - Use low-severity findings to enable high-impact attacks
 - Look for multi-step attack paths that automated tools miss
 - Consider attacks that span multiple application components
 CHAINING PRINCIPLES (MAX IMPACT):
 - Treat every finding as a pivot: ask "What does this unlock next?" until you reach maximum privilege / maximum data exposure / maximum control
 - Prefer end-to-end exploit paths over isolated bugs: initial foothold → pivot → privilege gain → sensitive action/data
 - Cross boundaries deliberately: user → admin, external → internal, unauthenticated → authenticated, read → write, single-tenant → cross-tenant
 - Validate chains by executing the full sequence using the available tools (proxy + browser for workflows, python for automation, terminal for supporting commands)
 - When a component agent finds a potential pivot, it must message/spawn the next focused agent to continue the chain in the next component/subpart
 PHASE 5: PERSISTENT TESTING
 If initial attempts fail, don't give up:
 - Research specific technologies for known bypasses
 - Try alternative exploitation techniques
 - Look for edge cases and unusual functionality
 - Test with different client contexts
 - Revisit previously tested areas with new information
 - Consider timing-based and blind exploitation techniques
 PHASE 6: THOROUGH REPORTING
 - Document EVERY confirmed vulnerability with full details
 - Include all severity levels - even low findings may enable chains
 - Provide complete reproduction steps and PoC
 - Document remediation recommendations
 - Note areas requiring additional review beyond current scope
 MINDSET:
 - Relentless - this is about finding what others miss
 - Creative - think of unconventional attack vectors
 - Patient - real vulnerabilities often require deep investigation
 - Thorough - test every parameter, every endpoint, every edge case
 - Persistent - if one approach fails, try ten more
 - Holistic - understand how components interact to find systemic issues
 </scan_mode>
--- a/strix/skills/scan_modes/deep.md
+++ b/strix/skills/scan_modes/deep.md
@@ -0,0 +1,163 @@
 ---
 name: deep
 description: Exhaustive security assessment with maximum coverage, depth, and vulnerability chaining
 ---
 # Deep Testing Mode
 Exhaustive security assessment. Maximum coverage, maximum depth. Finding what others miss is the goal.
 ## Approach
 Thorough understanding before exploitation. Test every parameter, every endpoint, every edge case. Chain findings for maximum impact.
 ## Phase 1: Exhaustive Reconnaissance
 **Whitebox (source available)**
 - Map every file, module, and code path in the repository
 - Load and maintain shared `wiki` notes from the start (`list_notes(category="wiki")` then `get_note(note_id=...)`), then continuously update one repo note
 - Start with broad source-aware triage (`semgrep`, `ast-grep`, `gitleaks`, `trufflehog`, `trivy fs`) and use outputs to drive deep review
 - Execute at least one structural AST pass (`sg` and/or Tree-sitter) per repository and store artifacts for reuse
 - Keep AST artifacts bounded and query-driven (target relevant paths/sinks first; avoid whole-repo generic function dumps)
 - Use syntax-aware parsing (Tree-sitter tooling) to improve symbol, route, and sink extraction quality
 - Trace all entry points from HTTP handlers to database queries
 - Document all authentication mechanisms and implementations
 - Map authorization checks and access control model
 - Identify all external service integrations and API calls
 - Analyze configuration for secrets and misconfigurations
 - Review database schemas and data relationships
 - Map background jobs, cron tasks, async processing
 - Identify all serialization/deserialization points
 - Review file handling: upload, download, processing
 - Understand the deployment model and infrastructure assumptions
 - Check all dependency versions and repository risks against CVE/misconfiguration data
 - Before final completion, update the shared repo wiki with scanner summary + dynamic follow-ups
 **Blackbox (no source)**
 - Exhaustive subdomain enumeration with multiple sources and tools
 - Full port scanning across all services
 - Complete content discovery with multiple wordlists
 - Technology fingerprinting on all assets
 - API discovery via docs, JavaScript analysis, fuzzing
 - Identify all parameters including hidden and rarely-used ones
 - Map all user roles with different account types
 - Document rate limiting, WAF rules, security controls
 - Document complete application architecture as understood from outside
 ## Phase 2: Business Logic Deep Dive
 Create a complete storyboard of the application:
 - **User flows** - document every step of every workflow
 - **State machines** - map all transitions (Created → Paid → Shipped → Delivered)
 - **Trust boundaries** - identify where privilege changes hands
 - **Invariants** - what rules should the application always enforce
 - **Implicit assumptions** - what does the code assume that might be violated
 - **Multi-step attack surfaces** - where can normal functionality be abused
 - **Third-party integrations** - map all external service dependencies
 Use the application extensively as every user type to understand the full data lifecycle.
 ## Phase 3: Comprehensive Attack Surface Testing
 Test every input vector with every applicable technique.
 **Input Handling**
 - Multiple injection types: SQL, NoSQL, LDAP, XPath, command, template
 - Encoding bypasses: double encoding, unicode, null bytes
 - Boundary conditions and type confusion
 - Large payloads and buffer-related issues
 **Authentication & Session**
 - Exhaustive brute force protection testing
 - Session fixation, hijacking, prediction
 - JWT/token manipulation
 - OAuth flow abuse scenarios
 - Password reset vulnerabilities: token leakage, reuse, timing
 - MFA bypass techniques
 - Account enumeration through all channels
 **Access Control**
 - Test every endpoint for horizontal and vertical access control
 - Parameter tampering on all object references
 - Forced browsing to all discovered resources
 - HTTP method tampering (GET vs POST vs PUT vs DELETE)
 - Access control after session state changes (logout, role change)
 **File Operations**
 - Exhaustive file upload bypass: extension, content-type, magic bytes
 - Path traversal on all file parameters
 - SSRF through file inclusion
 - XXE through all XML parsing points
 **Business Logic**
 - Race conditions on all state-changing operations
 - Workflow bypass on every multi-step process
 - Price/quantity manipulation in transactions
 - Parallel execution attacks
 - TOCTOU (time-of-check to time-of-use) vulnerabilities
 **Advanced Techniques**
 - HTTP request smuggling (multiple proxies/servers)
 - Cache poisoning and cache deception
 - Subdomain takeover
 - Prototype pollution (JavaScript applications)
 - CORS misconfiguration exploitation
 - WebSocket security testing
 - GraphQL-specific attacks (introspection, batching, nested queries)
 ## Phase 4: Vulnerability Chaining
 Individual bugs are starting points. Chain them for maximum impact:
 - Combine information disclosure with access control bypass
 - Chain SSRF to reach internal services
 - Use low-severity findings to enable high-impact attacks
 - Build multi-step attack paths that automated tools miss
 - Cross component boundaries: user → admin, external → internal, read → write, single-tenant → cross-tenant
 **Chaining Principles**
 - Treat every finding as a pivot point: ask "what does this unlock next?"
 - Continue until reaching maximum privilege / maximum data exposure / maximum control
 - Prefer end-to-end exploit paths over isolated bugs: initial foothold → pivot → privilege gain → sensitive action/data
 - Validate chains by executing the full sequence (proxy + browser for workflows, python for automation)
 - When a pivot is found, spawn focused agents to continue the chain in the next component
 ## Phase 5: Persistent Testing
 When initial attempts fail:
 - Research technology-specific bypasses
 - Try alternative exploitation techniques
 - Test edge cases and unusual functionality
 - Test with different client contexts
 - Revisit areas with new information from other findings
 - Consider timing-based and blind exploitation
 - Look for logic flaws that require deep application understanding
 ## Phase 6: Comprehensive Reporting
 - Document every confirmed vulnerability with full details
 - Include all severity levels—low findings may enable chains
 - Complete reproduction steps and working PoC
 - Remediation recommendations with specific guidance
 - Note areas requiring additional review beyond current scope
 ## Agent Strategy
 After reconnaissance, decompose the application hierarchically:
 1. **Component level** - Auth System, Payment Gateway, User Profile, Admin Panel
 2. **Feature level** - Login Form, Registration API, Password Reset
 3. **Vulnerability level** - SQLi Agent, XSS Agent, Auth Bypass Agent
 Spawn specialized agents at each level. Scale horizontally to maximum parallelization:
 - Do NOT overload a single agent with multiple vulnerability types
 - Each agent focuses on one specific area or vulnerability type
 - Creates a massive parallel swarm covering every angle
 ## Mindset
 Relentless. Creative. Patient. Thorough. Persistent.
 This is about finding what others miss. Test every parameter, every endpoint, every edge case. If one approach fails, try ten more. Understand how components interact to find systemic issues.
--- a/strix/skills/scan_modes/quick.jinja
+++ b/strix/skills/scan_modes/quick.jinja
@@ -1,63 +0,0 @@
 <scan_mode>
 QUICK SCAN MODE - Rapid Security Assessment
 This mode is optimized for fast feedback. Focus on HIGH-IMPACT vulnerabilities with minimal overhead.
 PHASE 1: RAPID ORIENTATION
 - If source code is available: Focus primarily on RECENT CHANGES (git diff, new commits, modified files)
 - Identify the most critical entry points: authentication endpoints, payment flows, admin interfaces, API endpoints handling sensitive data
 - Quickly understand the tech stack and frameworks in use
 - Skip exhaustive reconnaissance - use what's immediately visible
 PHASE 2: TARGETED ATTACK SURFACE
 For whitebox (source code available):
 - Prioritize files changed in recent commits/PRs - these are most likely to contain fresh bugs
 - Look for security-sensitive patterns in diffs: auth checks, input handling, database queries, file operations
 - Trace user-controllable input in changed code paths
 - Check if security controls were modified or bypassed
 For blackbox (no source code):
 - Focus on authentication and session management
 - Test the most critical user flows only
 - Check for obvious misconfigurations and exposed endpoints
 - Skip deep content discovery - test what's immediately accessible
 PHASE 3: HIGH-IMPACT VULNERABILITY FOCUS
 Prioritize in this order:
 1. Authentication bypass and broken access control
 2. Remote code execution vectors
 3. SQL injection in critical endpoints
 4. Insecure direct object references (IDOR) in sensitive resources
 5. Server-side request forgery (SSRF)
 6. Hardcoded credentials or secrets in code
 Skip lower-priority items:
 - Extensive subdomain enumeration
 - Full directory bruteforcing
 - Information disclosure that doesn't lead to exploitation
 - Theoretical vulnerabilities without PoC
 PHASE 4: VALIDATION AND REPORTING
 - Validate only critical/high severity findings with minimal PoC
 - Report findings as you discover them - don't wait for completion
 - Focus on exploitability and business impact
 QUICK CHAINING RULE:
 - If you find ANY strong primitive (auth weakness, access control gap, injection point, internal reachability), immediately attempt a single high-impact pivot to demonstrate real impact
 - Do not stop at a low-context “maybe”; turn it into a concrete exploit sequence (even if short) that reaches privileged action or sensitive data
 OPERATIONAL GUIDELINES:
 - Use the browser tool for quick manual testing of critical flows
 - Use terminal for targeted scans with fast presets (e.g., nuclei with critical/high templates only)
 - Use proxy to inspect traffic on key endpoints
 - Skip extensive fuzzing - use targeted payloads only
 - Create subagents only for parallel high-priority tasks
 - If whitebox: file_edit tool to review specific suspicious code sections
 - Use notes tool to track critical findings only
 MINDSET:
 - Think like a time-boxed bug bounty hunter going for quick wins
 - Prioritize breadth over depth on critical areas
 - If something looks exploitable, validate quickly and move on
 - Don't get stuck - if an attack vector isn't yielding results quickly, pivot
 </scan_mode>
--- a/strix/skills/scan_modes/quick.md
+++ b/strix/skills/scan_modes/quick.md
@@ -0,0 +1,70 @@
 ---
 name: quick
 description: Time-boxed rapid assessment targeting high-impact vulnerabilities
 ---
 # Quick Testing Mode
 Time-boxed assessment focused on high-impact vulnerabilities. Prioritize breadth over depth.
 ## Approach
 Optimize for fast feedback on critical security issues. Skip exhaustive enumeration in favor of targeted testing on high-value attack surfaces.
 ## Phase 1: Rapid Orientation
 **Whitebox (source available)**
 - Focus on recent changes: git diffs, new commits, modified files—these are most likely to contain fresh bugs
 - Read existing `wiki` notes first (`list_notes(category="wiki")` then `get_note(note_id=...)`) to avoid remapping from scratch
 - Run a fast static triage on changed files first (`semgrep`, then targeted `sg` queries)
 - Run at least one lightweight AST pass (`sg` or Tree-sitter) so structural mapping is not skipped
 - Keep AST commands tightly scoped to changed or high-risk paths; avoid broad repository-wide pattern dumps
 - Run quick secret and dependency checks (`gitleaks`, `trufflehog`, `trivy fs`) scoped to changed areas when possible
 - Identify security-sensitive patterns in changed code: auth checks, input handling, database queries, file operations
 - Trace user input through modified code paths
 - Check if security controls were modified or bypassed
 - Before completion, update the shared repo wiki with what changed and what needs dynamic follow-up
 **Blackbox (no source)**
 - Map authentication and critical user flows
 - Identify exposed endpoints and entry points
 - Skip deep content discovery—test what's immediately accessible
 ## Phase 2: High-Impact Targets
 Test in priority order:
 1. **Authentication bypass** - login flaws, session issues, token weaknesses
 2. **Broken access control** - IDOR, privilege escalation, missing authorization
 3. **Remote code execution** - command injection, deserialization, SSTI
 4. **SQL injection** - authentication endpoints, search, filters
 5. **SSRF** - URL parameters, webhooks, integrations
 6. **Exposed secrets** - hardcoded credentials, API keys, config files
 Skip for quick scans:
 - Exhaustive subdomain enumeration
 - Full directory bruteforcing
 - Low-severity information disclosure
 - Theoretical issues without working PoC
 ## Phase 3: Validation
 - Confirm exploitability with minimal proof-of-concept
 - Demonstrate real impact, not theoretical risk
 - Report findings immediately as discovered
 ## Chaining
 When a strong primitive is found (auth weakness, injection point, internal access), immediately attempt one high-impact pivot to demonstrate maximum severity. Don't stop at a low-context "maybe"—turn it into a concrete exploit sequence that reaches privileged action or sensitive data.
 ## Operational Guidelines
 - Use browser tool for quick manual testing of critical flows
 - Use terminal for targeted scans with fast presets (e.g., nuclei with critical/high templates only)
 - Use proxy to inspect traffic on key endpoints
 - Skip extensive fuzzing—use targeted payloads only
 - Create subagents only for parallel high-priority tasks
 ## Mindset
 Think like a time-boxed bug bounty hunter going for quick wins. Prioritize breadth over depth on critical areas. If something looks exploitable, validate quickly and move on. Don't get stuck—if an attack vector isn't yielding results quickly, pivot.
--- a/strix/skills/scan_modes/standard.jinja
+++ b/strix/skills/scan_modes/standard.jinja
@@ -1,91 +0,0 @@
 <scan_mode>
 STANDARD SCAN MODE - Balanced Security Assessment
 This mode provides thorough coverage with a structured methodology. Balance depth with efficiency.
 PHASE 1: RECONNAISSANCE AND MAPPING
 Understanding the target is critical before exploitation. Never skip this phase.
 For whitebox (source code available):
 - Map the entire codebase structure: directories, modules, entry points
 - Identify the application architecture (MVC, microservices, monolith)
 - Understand the routing: how URLs map to handlers/controllers
 - Identify all user input vectors: forms, APIs, file uploads, headers, cookies
 - Map authentication and authorization flows
 - Identify database interactions and ORM usage
 - Review dependency manifests for known vulnerable packages
 - Understand the data model and sensitive data locations
 For blackbox (no source code):
 - Crawl the application thoroughly using browser tool - interact with every feature
 - Enumerate all endpoints, parameters, and functionality
 - Identify the technology stack through fingerprinting
 - Map user roles and access levels
 - Understand the business logic by using the application as intended
 - Document all forms, APIs, and data entry points
 - Use proxy tool to capture and analyze all traffic during exploration
 PHASE 2: BUSINESS LOGIC UNDERSTANDING
 Before testing for vulnerabilities, understand what the application DOES:
 - What are the critical business flows? (payments, user registration, data access)
 - What actions should be restricted to specific roles?
 - What data should users NOT be able to access?
 - What state transitions exist? (order pending → paid → shipped)
 - Where does money, sensitive data, or privilege flow?
 PHASE 3: SYSTEMATIC VULNERABILITY ASSESSMENT
 Test each attack surface methodically. Create focused subagents for different areas.
 Entry Point Analysis:
 - Test all input fields for injection vulnerabilities
 - Check all API endpoints for authentication and authorization
 - Verify all file upload functionality for bypass
 - Test all search and filter functionality
 - Check redirect parameters and URL handling
 Authentication and Session:
 - Test login for brute force protection
 - Check session token entropy and handling
 - Test password reset flows for weaknesses
 - Verify logout invalidates sessions
 - Test for authentication bypass techniques
 Access Control:
 - For every privileged action, test as unprivileged user
 - Test horizontal access control (user A accessing user B's data)
 - Test vertical access control (user escalating to admin)
 - Check API endpoints mirror UI access controls
 - Test direct object references with different user contexts
 Business Logic:
 - Attempt to skip steps in multi-step processes
 - Test for race conditions in critical operations
 - Try negative values, zero values, boundary conditions
 - Attempt to replay transactions
 - Test for price manipulation in e-commerce flows
 PHASE 4: EXPLOITATION AND VALIDATION
 - Every finding must have a working proof-of-concept
 - Demonstrate actual impact, not theoretical risk
 - Chain vulnerabilities when possible to show maximum impact
 - Document the full attack path from initial access to impact
 - Use python tool for complex exploit development
 CHAINING & MAX IMPACT MINDSET:
 - Always ask: "If I can do X, what does that enable me to do next?" Keep pivoting until you reach maximum privilege or maximum sensitive data access
 - Prefer complete end-to-end paths (entry point → pivot → privileged action/data) over isolated bug reports
 - Use the application as a real user would: exploit must survive the actual workflow and state transitions
 - When you discover a useful pivot (info leak, weak boundary, partial access), immediately pursue the next step rather than stopping at the first win
 PHASE 5: COMPREHENSIVE REPORTING
 - Report all confirmed vulnerabilities with clear reproduction steps
 - Include severity based on actual exploitability and business impact
 - Provide remediation recommendations
 - Document any areas that need further investigation
 MINDSET:
 - Methodical and systematic - cover the full attack surface
 - Document as you go - findings and areas tested
 - Validate everything - no assumptions about exploitability
 - Think about business impact, not just technical severity
 </scan_mode>
--- a/strix/skills/scan_modes/standard.md
+++ b/strix/skills/scan_modes/standard.md
@@ -0,0 +1,101 @@
 ---
 name: standard
 description: Balanced security assessment with systematic methodology and full attack surface coverage
 ---
 # Standard Testing Mode
 Balanced security assessment with structured methodology. Thorough coverage without exhaustive depth.
 ## Approach
 Systematic testing across the full attack surface. Understand the application before exploiting it.
 ## Phase 1: Reconnaissance
 **Whitebox (source available)**
 - Map codebase structure: modules, entry points, routing
 - Start by loading existing `wiki` notes (`list_notes(category="wiki")` then `get_note(note_id=...)`) and update one shared repo note as mapping evolves
 - Run `semgrep` first-pass triage to prioritize risky flows before deep manual review
 - Run at least one AST-structural mapping pass (`sg` and/or Tree-sitter), then use outputs for route, sink, and trust-boundary mapping
 - Keep AST output bounded to relevant paths and hypotheses; avoid whole-repo generic function dumps
 - Identify architecture pattern (MVC, microservices, monolith)
 - Trace input vectors: forms, APIs, file uploads, headers, cookies
 - Review authentication and authorization flows
 - Analyze database interactions and ORM usage
 - Check dependencies and repo risks with `trivy fs`, `gitleaks`, and `trufflehog`
 - Understand the data model and sensitive data locations
 - Before completion, update the shared repo wiki with source findings summary and dynamic validation next steps
 **Blackbox (no source)**
 - Crawl application thoroughly, interact with every feature
 - Enumerate endpoints, parameters, and functionality
 - Fingerprint technology stack
 - Map user roles and access levels
 - Capture traffic with proxy to understand request/response patterns
 ## Phase 2: Business Logic Analysis
 Before testing for vulnerabilities, understand the application:
 - **Critical flows** - payments, registration, data access, admin functions
 - **Role boundaries** - what actions are restricted to which users
 - **Data access rules** - what data should be isolated between users
 - **State transitions** - order lifecycle, account status changes
 - **Trust boundaries** - where does privilege or sensitive data flow
 ## Phase 3: Systematic Testing
 Test each attack surface methodically. Spawn focused subagents for different areas.
 **Input Validation**
 - Injection testing on all input fields (SQL, XSS, command, template)
 - File upload bypass attempts
 - Search and filter parameter manipulation
 - Redirect and URL parameter handling
 **Authentication & Session**
 - Brute force protection
 - Session token entropy and handling
 - Password reset flow analysis
 - Logout session invalidation
 - Authentication bypass techniques
 **Access Control**
 - Horizontal: user A accessing user B's resources
 - Vertical: unprivileged user accessing admin functions
 - API endpoints vs UI access control consistency
 - Direct object reference manipulation
 **Business Logic**
 - Multi-step process bypass (skip steps, reorder)
 - Race conditions on state-changing operations
 - Boundary conditions: negative values, zero, extremes
 - Transaction replay and manipulation
 ## Phase 4: Exploitation
 - Every finding requires a working proof-of-concept
 - Demonstrate actual impact, not theoretical risk
 - Chain vulnerabilities to show maximum severity
 - Document full attack path from entry to impact
 - Use python tool for complex exploit development
 ## Phase 5: Reporting
 - Document all confirmed vulnerabilities with reproduction steps
 - Severity based on exploitability and business impact
 - Remediation recommendations
 - Note areas requiring further investigation
 ## Chaining
 Always ask: "If I can do X, what does that enable next?" Keep pivoting until reaching maximum privilege or data exposure.
 Prefer complete end-to-end paths (entry point → pivot → privileged action/data) over isolated findings. Use the application as a real user would—exploit must survive actual workflow and state transitions.
 When you discover a useful pivot (info leak, weak boundary, partial access), immediately pursue the next step rather than stopping at the first win.
 ## Mindset
 Methodical and systematic. Document as you go. Validate everything—no assumptions about exploitability. Think about business impact, not just technical severity.
--- a/strix/skills/technologies/firebase_firestore.jinja
+++ b/strix/skills/technologies/firebase_firestore.jinja
@@ -1,177 +0,0 @@
 <firebase_firestore_security_guide>
 <title>FIREBASE / FIRESTORE — ADVERSARIAL TESTING AND EXPLOITATION</title>
 <critical>Most impactful findings in Firebase apps arise from weak Firestore/Realtime Database rules, Cloud Storage exposure, callable/onRequest Functions trusting client input, incorrect ID token validation, and over-trusted App Check. Treat every client-supplied field and token as untrusted. Bind subject/tenant on the server, not in the client.</critical>
 <scope>
 - Firestore (documents/collections, rules, REST/SDK)
 - Realtime Database (JSON tree, rules)
 - Cloud Storage (rules, signed URLs)
 - Auth (ID tokens, custom claims, anonymous/sign-in providers)
 - Cloud Functions (onCall/onRequest, triggers)
 - Hosting rewrites, CDN/caching, CORS
 - App Check (attestation) and its limits
 </scope>
 <methodology>
 1. Extract project config from client (apiKey, authDomain, projectId, appId, storageBucket, messagingSenderId). Identify all used Firebase products.
 2. Obtain multiple principals: unauth, anonymous (if enabled), basic user A, user B, and any staff/admin if available. Capture their ID tokens.
 3. Build Resource × Action × Principal matrix across Firestore/Realtime/Storage/Functions. Exercise every action via SDK and raw REST (googleapis) to detect parity gaps.
 4. Start from list/query paths (where allowed) to seed IDs; then swap document paths, tenants, and user IDs across principals and transports.
 </methodology>
 <architecture>
 - Firestore REST: https://firestore.googleapis.com/v1/projects/<project>/databases/(default)/documents/<path>
 - Storage REST: https://storage.googleapis.com/storage/v1/b/<bucket>
 - Auth: Google-signed ID tokens (iss accounts.google.com/securetoken.google.com/<project>), aud <project/app-id>; identity is in sub/uid.
 - Rules engines: separate for Firestore, Realtime DB, and Storage; Functions bypass rules when using Admin SDK.
 </architecture>
 <auth_and_tokens>
 - ID token verification must enforce issuer, audience (project), signature (Google JWKS), expiration, and optionally App Check binding when used.
 - Custom claims are appended by Admin SDK; client-supplied claims are ignored by Auth but may be trusted by app code if copied into docs.
 - Pitfalls:
  - Accepting any JWT with valid signature but wrong audience/project.
  - Trusting uid/account IDs from request body instead of context.auth.uid in Functions.
  - Mixing session cookies and ID tokens without verifying both paths equivalently.
 - Tests:
  - Replay tokens across environments/projects; expect strict aud/iss rejection server-side.
  - Call Functions with and without Authorization; verify identical checks on both onCall and onRequest variants.
 </auth_and_tokens>
 <firestore_rules>
 - Rules are not filters: a query must include constraints that make the rule true for all returned documents; otherwise reads fail. Do not rely on client to include where clauses correctly.
 - Prefer ownership derived from request.auth.uid and server data, not from client payload fields.
 - Common gaps:
  - allow read: if request.auth != null (any user reads all data)
  - allow write: if request.auth != null (mass write)
  - Missing per-field validation (adds isAdmin/role/tenantId fields).
  - Using client-supplied ownerId/orgId instead of enforcing doc.ownerId == request.auth.uid or membership in org.
  - Over-broad list rules on root collections; per-doc checks exist but list still leaks via queries.
 - Validation patterns:
  - Restrict writes: request.resource.data.keys().hasOnly([...]) and forbid privilege fields.
  - Enforce ownership: resource.data.ownerId == request.auth.uid && request.resource.data.ownerId == request.auth.uid
  - Org membership: exists(/databases/(default)/documents/orgs/$(org)/members/$(request.auth.uid))
 - Tests:
  - Compare results for users A/B on identical queries; diff counts and IDs.
  - Attempt cross-tenant reads: where orgId == otherOrg; try queries without org filter to confirm denial.
  - Write-path: set/patch with foreign ownerId/orgId; attempt to flip privilege flags.
 </firestore_rules>
 <firestore_queries>
 - Enumerate via REST to avoid SDK client-side constraints; try structured and REST filters.
 - Probe composite index requirements: UI-driven queries may hide missing rule coverage when indexes are enabled but rules are broad.
 - Explore collection group queries (collectionGroup) that may bypass per-collection rules if not mirrored.
 - Use startAt/endAt/in/array-contains to probe rule edges and pagination cursors for cross-tenant bleed.
 </firestore_queries>
 <realtime_database>
 - Misconfigured rules frequently expose entire JSON trees. Probe https://<project>.firebaseio.com/.json with and without auth.
 - Confirm rules for read/write use auth.uid and granular path checks; avoid .read/.write: true or auth != null at high-level nodes.
 - Attempt to write privilege-bearing nodes (roles, org membership) and observe downstream effects (e.g., Cloud Functions triggers).
 </realtime_database>
 <cloud_storage>
 - Rules parallel Firestore but apply to object paths. Common issues:
  - Public reads on sensitive buckets/paths.
  - Signed URLs with long TTL, no content-disposition controls; replayable across tenants.
  - List operations exposed: /o?prefix= enumerates object keys.
 - Tests:
  - GET gs:// paths via https endpoints without auth; verify content-type and Content-Disposition: attachment.
  - Generate and reuse signed URLs across accounts and paths; try case/URL-encoding variants.
  - Upload HTML/SVG and verify X-Content-Type-Options: nosniff; check for script execution.
 </cloud_storage>
 <cloud_functions>
 - onCall provides context.auth automatically; onRequest must verify ID tokens explicitly. Admin SDK bypasses rules; all ownership/tenant checks must be enforced in code.
 - Common gaps:
  - Trusting client uid/orgId from request body instead of context.auth.
  - Missing aud/iss verification when manually parsing tokens.
  - Over-broad CORS allowing credentialed cross-origin requests; echoing Authorization in responses.
  - Triggers (onCreate/onWrite) granting roles or issuing signed URLs solely based on document content controlled by the client.
 - Tests:
  - Call both onCall and equivalent onRequest endpoints with varied tokens and bodies; expect identical decisions.
  - Create crafted docs to trigger privilege-granting functions; verify that server re-derives subject/tenant before acting.
  - Attempt internal fetches (SSRF) via Functions to project/metadata endpoints.
 </cloud_functions>
 <app_check>
 - App Check is not a substitute for authorization. Many apps enable App Check enforcement on client SDKs but do not verify on custom backends.
 - Bypasses:
  - Unenforced paths: REST calls directly to googleapis endpoints with ID token succeed regardless of App Check.
  - Mobile reverse engineering: hook client and reuse ID token flows without attestation.
 - Tests:
  - Compare SDK vs REST behavior with/without App Check headers; confirm no elevated authorization via App Check alone.
 </app_check>
 <tenant_isolation>
 - Apps often implement multi-tenant data models (orgs/<orgId>/...). Bind tenant from server context (membership doc or custom claim), not from client payload.
 - Tests:
  - Vary org header/subdomain/query while keeping token fixed; verify server denies cross-tenant access.
  - Export/report Functions: ensure queries execute under caller scope; signed outputs must encode tenant and short TTL.
 </tenant_isolation>
 <bypass_techniques>
 - Content-type switching: JSON vs form vs multipart to hit alternate code paths in onRequest Functions.
 - Parameter/field pollution: duplicate JSON keys; last-one-wins in many parsers; attempt to sneak privilege fields.
 - Caching/CDN: Hosting rewrites or proxies that key responses without Authorization or tenant headers.
 - Race windows: write then read before background enforcements (e.g., post-write claim synchronizations) complete.
 </bypass_techniques>
 <blind_channels>
 - Firestore: use error shape, document count, and ETag/length to infer existence under partial denial.
 - Storage: length/timing differences on signed URL attempts leak validity.
 - Functions: constant-time comparisons vs variable messages reveal authorization branches.
 </blind_channels>
 <tooling_and_automation>
 - SDK + REST: httpie/curl + jq for REST; Firebase emulator and Rules Playground for rapid iteration.
 - Mobile: apktool/objection/frida to extract config and hook SDK calls; inspect network logs for endpoints and tokens.
 - Rules analysis: script rule probes for common patterns (auth != null, missing field validation, list vs get parity).
 - Functions: fuzz onRequest endpoints with varied content-types and missing/forged Authorization; verify CORS and token handling.
 - Storage: enumerate prefixes; test signed URL generation and reuse patterns.
 </tooling_and_automation>
 <reviewer_checklist>
 - Do Firestore/Realtime/Storage rules derive subject and tenant from auth, not client fields?
 - Are list/query rules aligned with per-doc checks (no broad list leaks)?
 - Are privilege-bearing fields immutable or server-only (forbidden in writes)?
 - Do Functions verify ID tokens (iss/aud/exp/signature) and re-derive identity before acting?
 - Are Admin SDK operations scoped by server-side checks (ownership/tenant)?
 - Is App Check treated as advisory, not authorization, across all paths?
 - Are Hosting/CDN cache keys bound to Authorization/tenant to prevent leaks?
 </reviewer_checklist>
 <validation>
 1. Provide owner vs non-owner Firestore queries showing unauthorized access or metadata leak.
 2. Demonstrate Cloud Storage read/write beyond intended scope (public object, signed URL reuse, or list exposure).
 3. Show a Function accepting forged/foreign identity (wrong aud/iss) or trusting client uid/orgId.
 4. Document minimal reproducible requests with roles/tokens used and observed deltas.
 </validation>
 <false_positives>
 - Public collections/objects documented and intended.
 - Rules that correctly enforce per-doc checks with matching query constraints.
 - Functions verifying tokens and ignoring client-supplied identifiers.
 - App Check enforced but not relied upon for authorization.
 </false_positives>
 <impact>
 - Cross-account and cross-tenant data exposure.
 - Unauthorized state changes via Functions or direct writes.
 - Exfiltration of PII/PHI and private files from Storage.
 - Durable privilege escalation via misused custom claims or triggers.
 </impact>
 <pro_tips>
 1. Treat apiKey as project identifier only; identity must come from verified ID tokens.
 2. Start from rules: read them, then prove gaps with diffed owner/non-owner requests.
 3. Prefer REST for parity checks; SDKs can mask errors via client-side filters.
 4. Hunt privilege fields in docs and forbid them via rules; verify immutability.
 5. Probe collectionGroup queries and list rules; many leaks live there.
 6. Functions are the authority boundary—enforce subject/tenant there even if rules exist.
 7. Keep concise PoCs: one owner vs non-owner request per surface that clearly demonstrates the unauthorized delta.
 </pro_tips>
 <remember>Authorization must hold at every layer: rules, Functions, and Storage. Bind subject and tenant from verified tokens and server data, never from client payload or UI assumptions. Any gap becomes a cross-account or cross-tenant vulnerability.</remember>
 </firebase_firestore_security_guide>
--- a/strix/skills/technologies/firebase_firestore.md
+++ b/strix/skills/technologies/firebase_firestore.md
@@ -0,0 +1,211 @@
 ---
 name: firebase-firestore
 description: Firebase/Firestore security testing covering security rules, Cloud Functions, and client-side trust issues
 ---
 # Firebase / Firestore
 Security testing for Firebase applications. Focus on Firestore/Realtime Database rules, Cloud Storage exposure, callable/onRequest Functions trusting client input, and incorrect ID token validation.
 ## Attack Surface
 **Data Stores**
 - Firestore (documents/collections, rules, REST/SDK)
 - Realtime Database (JSON tree, rules)
 - Cloud Storage (rules, signed URLs)
 **Authentication**
 - Auth ID tokens, custom claims, anonymous/sign-in providers
 - App Check attestation (and its limits)
 **Server-Side**
 - Cloud Functions (onCall/onRequest, triggers)
 - Admin SDK (bypasses rules)
 **Infrastructure**
 - Hosting rewrites, CDN/caching, CORS
 ## Architecture
 **Endpoints**
 - Firestore REST: `https://firestore.googleapis.com/v1/projects/<project>/databases/(default)/documents/<path>`
 - Realtime DB: `https://<project>.firebaseio.com/.json`
 - Storage REST: `https://storage.googleapis.com/storage/v1/b/<bucket>`
 **Auth**
 - Google-signed ID tokens (iss: `accounts.google.com` or `securetoken.google.com/<project>`)
 - Audience: `<project>` or `<app-id>`, identity in `sub`/`uid`
 - Rules engines: separate for Firestore, Realtime DB, and Storage
 - Functions bypass rules when using Admin SDK
 ## High-Value Targets
 - Firestore collections with sensitive data (users, orders, payments)
 - Realtime Database root and high-level nodes
 - Cloud Storage buckets with private files
 - Cloud Functions (especially triggers that grant roles or issue signed URLs)
 - Admin/staff routes and privilege-granting endpoints
 - Export/report functions that generate signed outputs
 ## Reconnaissance
 **Extract Project Config**
 From client bundle:
 ```javascript
 // apiKey, authDomain, projectId, appId, storageBucket, messagingSenderId
 firebase.apps[0].options
 ```
 **Obtain Principals**
 - Unauthenticated
 - Anonymous (if enabled)
 - Basic user A, user B
 - Staff/admin (if available)
 Capture ID tokens for each.
 ## Key Vulnerabilities
 ### Firestore Rules
 Rules are not filters—a query must include constraints that make the rule true for all returned documents.
 **Common Gaps**
 - `allow read: if request.auth != null` — any authenticated user reads all data
 - `allow write: if request.auth != null` — mass write access
 - Missing per-field validation (allows adding `isAdmin`/`role`/`tenantId` fields)
 - Using client-supplied `ownerId`/`orgId` instead of `resource.data.ownerId == request.auth.uid`
 - Over-broad list rules on root collections (per-doc checks exist but list still leaks)
 **Secure Patterns**
 ```javascript
 // Restrict write fields
 request.resource.data.keys().hasOnly(['field1', 'field2', 'field3'])
 // Enforce ownership
 resource.data.ownerId == request.auth.uid &&
 request.resource.data.ownerId == request.auth.uid
 // Org membership check
 exists(/databases/(default)/documents/orgs/$(org)/members/$(request.auth.uid))
 ```
 **Tests**
 - Compare results for users A/B on identical queries; diff counts and IDs
 - Cross-tenant reads: `where orgId == otherOrg`; try queries without org filter
 - Write-path: set/patch with foreign `ownerId`/`orgId`; attempt to flip privilege flags
 ### Firestore Queries
 - Use REST to avoid SDK client-side constraints
 - Probe composite index requirements (UI-driven queries may hide missing rule coverage)
 - Explore `collectionGroup` queries that may bypass per-collection rules
 - Use `startAt`/`endAt`/`in`/`array-contains` to probe rule edges and pagination cursors
 ### Realtime Database
 - Misconfigured rules frequently expose entire JSON trees
 - Probe `https://<project>.firebaseio.com/.json` with and without auth
 - Confirm rules use `auth.uid` and granular path checks
 - Avoid `.read/.write: true` or `auth != null` at high-level nodes
 - Attempt to write privilege-bearing nodes (roles, org membership)
 ### Cloud Storage
 **Common Issues**
 - Public reads on sensitive buckets/paths
 - Signed URLs with long TTL, no content-disposition controls, replayable across tenants
 - List operations exposed: `/o?prefix=` enumerates object keys
 **Tests**
 - GET gs:// paths via HTTPS without auth; verify Content-Type and `Content-Disposition: attachment`
 - Generate and reuse signed URLs across accounts and paths; try case/URL-encoding variants
 - Upload HTML/SVG and verify `X-Content-Type-Options: nosniff`; check for script execution
 ### Cloud Functions
 `onCall` provides `context.auth` automatically; `onRequest` must verify ID tokens explicitly. Admin SDK bypasses rules—all ownership/tenant checks must be in code.
 **Common Gaps**
 - Trusting client `uid`/`orgId` from request body instead of `context.auth`
 - Missing `aud`/`iss` verification when manually parsing tokens
 - Over-broad CORS allowing credentialed cross-origin requests
 - Triggers (onCreate/onWrite) granting roles based on document content controlled by client
 **Tests**
 - Call both onCall and onRequest endpoints with varied tokens; expect identical decisions
 - Create crafted docs to trigger privilege-granting functions
 - Attempt SSRF via Functions to project/metadata endpoints
 ### Auth & Token Issues
 **Verification Requirements**
 - Issuer, audience (project), signature (Google JWKS), expiration
 - Optionally App Check binding when used
 **Pitfalls**
 - Accepting any JWT with valid signature but wrong audience/project
 - Trusting `uid`/account IDs from request body instead of `context.auth.uid`
 - Mixing session cookies and ID tokens without verifying both paths equivalently
 - Custom claims copied into docs then trusted by app code
 **Tests**
 - Replay tokens across environments/projects; expect strict `aud`/`iss` rejection
 - Call Functions with and without Authorization; verify identical checks
 ### App Check
 App Check is not a substitute for authorization.
 **Bypasses**
 - REST calls directly to googleapis endpoints with ID token succeed regardless of App Check
 - Mobile reverse engineering: hook client and reuse ID token flows without attestation
 **Tests**
 - Compare SDK vs REST behavior with/without App Check headers
 - Confirm no elevated authorization via App Check alone
 ### Tenant Isolation
 Apps often implement multi-tenant data models (`orgs/<orgId>/...`). Bind tenant from server context (membership doc or custom claim), not client payload.
 **Tests**
 - Vary org header/subdomain/query while keeping token fixed; verify server denies cross-tenant access
 - Export/report Functions: ensure queries execute under caller scope
 ## Bypass Techniques
 - Content-type switching: JSON vs form vs multipart to hit alternate code paths in onRequest
 - Parameter/field pollution: duplicate JSON keys (last-one-wins in many parsers); sneak privilege fields
 - Caching/CDN: Hosting rewrites keying responses without Authorization or tenant headers
 - Race windows: write then read before background enforcements complete
 ## Blind Enumeration
 - Firestore: use error shape, document count, ETag/length to infer existence
 - Storage: length/timing differences on signed URL attempts leak validity
 - Functions: constant-time comparisons vs variable messages reveal authorization branches
 ## Testing Methodology
 1. **Extract config** - Get project config from client bundle
 2. **Obtain principals** - Collect tokens for unauth, anonymous, user A/B, admin
 3. **Build matrix** - Resource × Action × Principal across Firestore/Realtime/Storage/Functions
 4. **SDK vs REST** - Exercise every action via both to detect parity gaps
 5. **Seed IDs** - Start from list/query paths to gather document IDs
 6. **Cross-principal** - Swap document paths, tenants, and user IDs across principals
 ## Tooling
 - SDK + REST: httpie/curl + jq for REST; Firebase emulator and Rules Playground for rapid iteration
 - Rules analysis: script probes for common patterns (`auth != null`, missing field validation)
 - Functions: fuzz onRequest with varied content-types and missing/forged Authorization
 - Storage: enumerate prefixes; test signed URL generation and reuse patterns
 ## Validation Requirements
 - Owner vs non-owner Firestore queries showing unauthorized access or metadata leak
 - Cloud Storage read/write beyond intended scope (public object, signed URL reuse, list exposure)
 - Function accepting forged/foreign identity (wrong `aud`/`iss`) or trusting client `uid`/`orgId`
 - Minimal reproducible requests with roles/tokens used and observed deltas
--- a/strix/skills/technologies/supabase.jinja
+++ b/strix/skills/technologies/supabase.jinja
@@ -1,189 +0,0 @@
 <supabase_security_guide>
 <title>SUPABASE — ADVERSARIAL TESTING AND EXPLOITATION</title>
 <critical>Supabase exposes Postgres through PostgREST, Realtime, GraphQL, Storage, Auth (GoTrue), and Edge Functions. Most impactful findings come from mis-scoped Row Level Security (RLS), unsafe RPCs, leaked service_role keys, lax Storage policies, GraphQL overfetching, and Edge Functions trusting headers or tokens without binding to issuer/audience/tenant.</critical>
 <scope>
 - PostgREST: table CRUD, filters, embeddings, RPC (remote functions)
 - RLS: row ownership/tenant isolation via policies and auth.uid()
 - Storage: buckets, objects, signed URLs, public/private policies
 - Realtime: replication subscriptions, broadcast/presence channels
 - GraphQL: pg_graphql over Postgres schema with RLS interaction
 - Auth (GoTrue): JWTs, cookie/session, magic links, OAuth flows
 - Edge Functions (Deno): server-side code calling Supabase with secrets
 </scope>
 <methodology>
 1. Inventory surfaces: REST /rest/v1, Storage /storage/v1, GraphQL /graphql/v1, Realtime wss, Auth /auth/v1, Functions https://<project>.functions.supabase.co/.
 2. Obtain tokens for: unauth (anon), basic user, other user, and (if disclosed) admin/staff; enumerate anon key exposure and verify if service_role leaked anywhere.
 3. Build a Resource × Action × Principal matrix and test each via REST and GraphQL. Confirm parity across channels and content-types (json/form/multipart).
 4. Start with list/search/export endpoints to gather IDs, then attempt direct reads/writes across principals, tenants, and transports. Validate RLS and function guards.
 </methodology>
 <architecture>
 - Project endpoints: https://<ref>.supabase.co; REST at /rest/v1/<table>, RPC at /rest/v1/rpc/<fn>.
 - Headers: apikey: <anon-or-service>, Authorization: Bearer <JWT>. Anon key only identifies the project; JWT binds user context.
 - Roles: anon, authenticated; service_role bypasses RLS and must never be client-exposed.
 - auth.uid(): current user UUID claim; policies must never trust client-supplied IDs over server context.
 </architecture>
 <rls>
 - Enable RLS on every non-public table; absence or “permit-all” policies → bulk exposure.
 - Common gaps:
  - Policies check auth.uid() for read but forget UPDATE/DELETE/INSERT.
  - Missing tenant constraints (org_id/tenant_id) allow cross-tenant reads/writes.
  - Policies rely on client-provided columns (user_id in payload) instead of deriving from JWT.
  - Complex joins where the effective policy is applied after filters, enabling inference via counts or projections.
 - Tests:
  - Compare results for two users: GET /rest/v1/<table>?select=*&Prefer=count=exact; diff row counts and IDs.
  - Try cross-tenant: add &org_id=eq.<other_org> or use or=(org_id.eq.other,org_id.is.null).
  - Write-path: PATCH/DELETE single row with foreign id; INSERT with foreign owner_id then read.
 </rls>
 <postgrest_and_rest>
 - Filters: eq, neq, lt, gt, ilike, or, is, in; embed relations with select=*,profile(*); exploit embeddings to overfetch linked rows if resolvers skip per-row checks.
 - Headers to know: Prefer: return=representation (echo writes), Prefer: count=exact (exposure via counts), Accept-Profile/Content-Profile to select schema.
 - IDOR patterns: /rest/v1/<table>?select=*&id=eq.<other_id>; query alternative keys (slug, email) and composite keys.
 - Search leaks: generous LIKE/ILIKE filters + lack of RLS → mass disclosure.
 - Mass assignment: if RPC not used, PATCH can update unintended columns; verify restricted columns via database permissions/policies.
 </postgrest_and_rest>
 <rpc_functions>
 - RPC endpoints map to SQL functions. SECURITY DEFINER bypasses RLS unless carefully coded; SECURITY INVOKER respects caller.
 - Anti-patterns:
  - SECURITY DEFINER + missing owner checks → vertical/horizontal bypass.
  - set search_path left to public; function resolves unsafe objects.
  - Trusting client-supplied user_id/tenant_id rather than auth.uid().
 - Tests:
  - Call /rest/v1/rpc/<fn> as different users with foreign ids in body.
  - Remove or alter JWT entirely (Authorization: Bearer <anon>) to see if function still executes.
  - Validate that functions perform explicit ownership/tenant checks inside SQL, not only in docs.
 </rpc_functions>
 <storage>
 - Buckets: public vs private; objects live in storage.objects with RLS-like policies.
 - Find misconfigs:
  - Public buckets holding sensitive data: GET https://<ref>.supabase.co/storage/v1/object/public/<bucket>/<path>
  - Signed URLs with long TTL and no audience binding; reuse/guess tokens across tenants/paths.
  - Listing prefixes without auth: /storage/v1/object/list/<bucket>?prefix=
  - Path confusion: mixed case, URL-encoding, “..” segments rejected at UI but accepted by API.
 - Abuse vectors:
  - Content-type/XSS: upload HTML/SVG served as text/html or image/svg+xml; confirm X-Content-Type-Options: nosniff and Content-Disposition: attachment.
  - Signed URL replay across accounts/buckets if validation is lax.
 </storage>
 <realtime>
 - Endpoint: wss://<ref>.supabase.co/realtime/v1. Join channels with apikey + Authorization.
 - Risks:
  - Channel names derived from table/schema/filters leaking other users’ updates when RLS or channel guards are weak.
  - Broadcast/presence channels allowing cross-room join/publish without auth checks.
 - Tests:
  - Subscribe to public:realtime changes on protected tables; confirm row data visibility aligns with RLS.
  - Attempt joining other users’ presence/broadcast channels (e.g., room:<user_id>, org:<id>).
 </realtime>
 <graphql>
 - Endpoint: /graphql/v1 using pg_graphql with RLS. Risks:
  - Introspection reveals schema relations; ensure it’s intentional.
  - Overfetch via nested relations where field resolvers fail to re-check ownership/tenant.
  - Global node IDs (if implemented) leaked and reusable via different viewers.
 - Tests:
  - Compare REST vs GraphQL responses for the same principal and query shape.
  - Query deep nested fields and connections; verify RLS holds at each edge.
 </graphql>
 <auth_and_tokens>
 - GoTrue issues JWTs with claims (sub=uid, role, aud=authenticated). Validate on server: issuer, audience, exp, signature, and tenant context.
 - Pitfalls:
  - Storing tokens in localStorage → XSS exfiltration; refresh mismanagement leading to long-lived sessions.
  - Treating apikey as identity; it is project-scoped, not user identity.
  - Exposing service_role key in client bundle or Edge Function responses.
 - Tests:
  - Replay tokens across services; check audience/issuer pinning.
  - Try downgraded tokens (expired/other audience) against custom endpoints.
 </auth_and_tokens>
 <edge_functions>
 - Deno-based functions often initialize server-side Supabase client with service_role. Risks:
  - Trusting Authorization/apikey headers without verifying JWT against issuer/audience.
  - CORS: wildcard origins with credentials; reflected Authorization in responses.
  - SSRF via fetch; secrets exposed via error traces or logs.
 - Tests:
  - Call functions with and without Authorization; compare behavior.
  - Try foreign resource IDs in function payloads; verify server re-derives user/tenant from JWT.
  - Attempt to reach internal endpoints (metadata services, project endpoints) via function fetch.
 </edge_functions>
 <tenant_isolation>
 - Ensure every query joins or filters by tenant_id/org_id derived from JWT context, not client input.
 - Tests:
  - Change subdomain/header/path tenant selectors while keeping JWT tenant constant; look for cross-tenant data.
  - Export/report endpoints: confirm queries execute under caller scope; signed outputs must encode tenant and short TTL.
 </tenant_isolation>
 <bypass_techniques>
 - Content-type switching: application/json ↔ application/x-www-form-urlencoded ↔ multipart/form-data to hit different code paths.
 - Parameter pollution: duplicate keys in JSON/query; PostgREST chooses last/first depending on parser.
 - GraphQL+REST parity probing: protections often drift; fetch via the weaker path.
 - Race windows: parallel writes to bypass post-insert ownership updates.
 </bypass_techniques>
 <blind_channels>
 - Use Prefer: count=exact and ETag/length diffs to infer unauthorized rows.
 - Conditional requests (If-None-Match) to detect object existence without content exposure.
 - Storage signed URLs: timing/length deltas to map valid vs invalid tokens.
 </blind_channels>
 <tooling_and_automation>
 - PostgREST: httpie/curl + jq; enumerate tables with known names; fuzz filters (or=, ilike, neq, is.null).
 - GraphQL: graphql-inspector, voyager; build deep queries to test field-level enforcement; complexity/batching tests.
 - Realtime: custom ws client; subscribe to suspicious channels/tables; diff payloads per principal.
 - Storage: enumerate bucket listing APIs; script signed URL generation/use patterns.
 - Auth/JWT: jwt-cli/jose to validate audience/issuer; replay against Edge Functions.
 - Policy diffing: maintain request sets per role and compare results across releases.
 </tooling_and_automation>
 <reviewer_checklist>
 - Are all non-public tables RLS-enabled with explicit SELECT/INSERT/UPDATE/DELETE policies?
 - Do policies derive subject/tenant from JWT (auth.uid(), tenant claim) rather than client payload?
 - Do RPC functions run as SECURITY INVOKER, or if DEFINER, do they enforce ownership/tenant inside?
 - Are Storage buckets private by default, with short-lived signed URLs bound to tenant/context?
 - Does Realtime enforce RLS-equivalent filtering for subscriptions and block cross-room joins?
 - Is GraphQL parity verified with REST; are nested resolvers guarded per field?
 - Are Edge Functions verifying JWT (issuer/audience) and never exposing service_role to clients?
 - Are CDN/cache keys bound to Authorization/tenant to prevent cache leaks?
 </reviewer_checklist>
 <validation>
 1. Provide owner vs non-owner requests for REST/GraphQL showing unauthorized access (content or metadata).
 2. Demonstrate a mis-scoped RPC or Storage signed URL usable by another user/tenant.
 3. Confirm Realtime or GraphQL exposure matches missing policy checks.
 4. Document minimal reproducible requests and role contexts used.
 </validation>
 <false_positives>
 - Tables intentionally public (documented) with non-sensitive content.
 - RLS-enabled tables returning only caller-owned rows; mismatched UI not backed by API responses.
 - Signed URLs with very short TTL and audience binding.
 - Edge Functions verifying tokens and re-deriving context before acting.
 </false_positives>
 <impact>
 - Cross-account/tenant data exposure and unauthorized state changes.
 - Exfiltration of PII/PHI/PCI, financial and billing artifacts, private files.
 - Privilege escalation via RPC and Edge Functions; durable access via long-lived tokens.
 - Regulatory and contractual violations stemming from tenant isolation failures.
 </impact>
 <pro_tips>
 1. Start with /rest/v1 list/search; counts and embeddings reveal policy drift fast.
 2. Treat UUIDs and signed URLs as untrusted; validate binding to subject/tenant and TTL.
 3. Focus on RPC and Edge Functions—they often centralize business logic and skip RLS.
 4. Test GraphQL and Realtime parity with REST; differences are where vulnerabilities hide.
 5. Keep role-separated request corpora and diff responses across deployments.
 6. Never assume apikey == identity; only JWT binds subject. Prove it.
 7. Prefer concise PoCs: one request per role that clearly shows the unauthorized delta.
 </pro_tips>
 <remember>RLS must bind subject and tenant on every path, and server-side code (RPC/Edge) must re-derive identity from a verified token. Any gap in binding, audience/issuer verification, or per-field enforcement becomes a cross-account or cross-tenant vulnerability.</remember>
 </supabase_security_guide>
--- a/Show More
+++ b/Show More