Files
personas/personas/_shared/community-skills/browser-use-cloud/references/guides/subagent.md
salvacybersec 00d30e8db3 feat: add 78 community skills from skills.sh marketplace
Source repos: shadcn/ui, vercel-labs/agent-skills, coreyhaines31/marketingskills,
supabase/agent-skills, vercel-labs/next-skills, kepano/obsidian-skills,
pbakaus/impeccable, browser-use/browser-use

Categories:
- shadcn (1): shadcn UI component system
- vercel (7): react-best-practices, composition-patterns, deploy-to-vercel, etc.
- marketing (35): seo-audit, copywriting, marketing-psychology, pricing-strategy, etc.
- supabase (2): postgres-best-practices, supabase
- next.js (3): next-best-practices, next-cache-components, next-upgrade
- obsidian (5): obsidian-markdown, obsidian-cli, obsidian-bases, json-canvas, defuddle
- impeccable (21): polish, animate, critique, colorize, audit, harden, etc.
- browser-use (4): browser automation and testing

Location: personas/_shared/community-skills/

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 21:38:21 +03:00

7.2 KiB

Guide: Browser-Use as a Subagent

Delegate entire web tasks to browser-use from your orchestrator. Task in, result out — browser-use handles all browsing autonomously.

Table of Contents


When to Use This Pattern

Your system has an orchestrator — some agent, pipeline, or workflow engine that coordinates multiple capabilities. At some point it decides "I need data from the web" or "I need to interact with a website." It delegates to browser-use, which autonomously navigates, clicks, extracts, and returns a result. The orchestrator never touches the browser.

Use subagent when:

  • You want a black box: task in → result out
  • The web task is self-contained (search, extract, fill a form)
  • You don't need action-by-action control

Use tools integration instead when:

  • Your agent needs to make individual browser decisions (click this, then check that)
  • You want your agent's reasoning loop to drive the browser

Pick Your Integration

Your agent type Best approach
CLI coding agent in sandbox (Claude Code, Codex, OpenCode, Cline, Windsurf, Cursor bg, Hermes, OpenClaw) CLI cloud passthrough
Python framework (LangChain, CrewAI, AutoGen, PydanticAI, custom) Python Agent wrapper
TypeScript/JS (Vercel AI SDK, LangChain.js, custom) Cloud SDK
MCP client (Claude Desktop, Cursor with MCP) MCP browser_task tool
Workflow engine (n8n, Make, Zapier, Temporal) or any HTTP client Cloud REST API

Shell Command Agents (CLI)

For: Agents running in sandboxes/VMs with terminal access.

The agent delegates a complete task to the cloud via CLI commands. No Python imports needed.

# 1. Set API key (once)
browser-use cloud login $BROWSER_USE_API_KEY

# 2. Fire off a task
browser-use cloud v2 POST /tasks '{"task": "Find the top HN post and return title and URL"}'
# Returns: {"id": "<task-id>", "sessionId": "<session-id>"}

# 3. Poll until done (blocks)
browser-use cloud v2 poll <task-id>

# 4. Get the result
browser-use cloud v2 GET /tasks/<task-id>
# Returns full TaskView with output, steps, outputFiles

For structured output, pass a JSON schema:

browser-use cloud v2 POST /tasks '{
  "task": "Find the CEO of OpenAI",
  "structuredOutput": "{\"type\":\"object\",\"properties\":{\"name\":{\"type\":\"string\"},\"company\":{\"type\":\"string\"}},\"required\":[\"name\",\"company\"]}"
}'

Python Agents (Cloud SDK)

For: LangChain, CrewAI, AutoGen, PydanticAI, Semantic Kernel, or custom Python agents. Uses the Cloud SDK — no local browser needed.

from browser_use_sdk import AsyncBrowserUse
from pydantic import BaseModel

client = AsyncBrowserUse()

# Simple
async def browse(task: str) -> str:
    result = await client.run(task)
    return result.output

# Structured output
class SearchResult(BaseModel):
    title: str
    url: str

async def browse_structured(task: str) -> SearchResult:
    result = await client.run(task, output_schema=SearchResult)
    return result.output  # SearchResult instance

Multi-step with keep_alive:

session = await client.sessions.create(proxy_country_code="us")
await client.run("Log into site", session_id=str(session.id), keep_alive=True)
result = await client.run("Extract data", session_id=str(session.id))
await client.sessions.stop(str(session.id))

TypeScript/JS Agents

For: Vercel AI SDK, LangChain.js, or custom TypeScript agents.

import { BrowserUse } from "browser-use-sdk";
import { z } from "zod";

const client = new BrowserUse();

// Simple
async function browse(task: string): Promise<string> {
  const result = await client.run(task);
  return result.output;
}

// Structured
const SearchResult = z.object({
  title: z.string(),
  url: z.string(),
});

async function browseStructured(task: string) {
  const result = await client.run(task, { schema: SearchResult });
  return result.output; // { title: string, url: string }
}

Multi-step with keepAlive:

const session = await client.sessions.create({ proxyCountryCode: "us" });
await client.run("Log into site", { sessionId: session.id, keepAlive: true });
const result = await client.run("Extract data", { sessionId: session.id });
await client.sessions.stop(session.id);

MCP-Native Agents

For: Claude Desktop, Cursor with MCP enabled, any MCP client.

Cloud MCP (entire task delegation)

Add to MCP config:

{
  "mcpServers": {
    "browser-use": {
      "url": "https://api.browser-use.com/mcp",
      "headers": { "X-Browser-Use-API-Key": "YOUR_KEY" }
    }
  }
}

The agent gets a browser_task tool. It calls it with a task description, gets back the result.

Local MCP (free, open-source)

The retry_with_browser_use_agent tool delegates an entire task to the local Agent:

uvx --from 'browser-use[cli]' browser-use --mcp

HTTP / Workflow Engines

For: n8n, Make, Zapier, Temporal, serverless functions, any HTTP client.

Create task → Poll → Get result

# 1. Create task
curl -X POST https://api.browser-use.com/api/v2/tasks \
  -H "X-Browser-Use-API-Key: $BROWSER_USE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"task": "Find the top HN post and return title+URL"}'
# → {"id": "task-uuid", "sessionId": "session-uuid"}

# 2. Poll status
curl https://api.browser-use.com/api/v2/tasks/<task-id>/status \
  -H "X-Browser-Use-API-Key: $BROWSER_USE_API_KEY"
# → {"status": "finished"}

# 3. Get result
curl https://api.browser-use.com/api/v2/tasks/<task-id> \
  -H "X-Browser-Use-API-Key: $BROWSER_USE_API_KEY"
# → Full TaskView with output, steps, outputFiles

Or use webhooks for event-driven workflows (see ../features.md).


Cross-Cutting Concerns

Structured output

  • Cloud SDK Python: output_schema=MyPydanticModelresult.output (typed)
  • Cloud SDK TypeScript: { schema: ZodSchema }result.output (typed)
  • Cloud REST: "structuredOutput": "<json-schema-string>"output in response

Error handling

from browser_use_sdk import AsyncBrowserUse, BrowserUseError

try:
    result = await client.run(task, max_cost_usd=0.10)
except TimeoutError:
    pass  # Polling timed out (5 min default)
except BrowserUseError as e:
    pass  # API error

Cost control

  • Cloud v2: Per-step pricing. Use max_steps to limit.
  • Cloud v3: max_cost_usd=0.10 caps spending. Check result.total_cost_usd.

Cleanup

Always stop sessions when done:

session = await client.sessions.create(proxy_country_code="us")
try:
    result = await client.run(task, session_id=str(session.id))
finally:
    await client.sessions.stop(str(session.id))