Source repos: shadcn/ui, vercel-labs/agent-skills, coreyhaines31/marketingskills, supabase/agent-skills, vercel-labs/next-skills, kepano/obsidian-skills, pbakaus/impeccable, browser-use/browser-use Categories: - shadcn (1): shadcn UI component system - vercel (7): react-best-practices, composition-patterns, deploy-to-vercel, etc. - marketing (35): seo-audit, copywriting, marketing-psychology, pricing-strategy, etc. - supabase (2): postgres-best-practices, supabase - next.js (3): next-best-practices, next-cache-components, next-upgrade - obsidian (5): obsidian-markdown, obsidian-cli, obsidian-bases, json-canvas, defuddle - impeccable (21): polish, animate, critique, colorize, audit, harden, etc. - browser-use (4): browser automation and testing Location: personas/_shared/community-skills/ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7.2 KiB
Guide: Browser-Use as a Subagent
Delegate entire web tasks to browser-use from your orchestrator. Task in, result out — browser-use handles all browsing autonomously.
Table of Contents
- When to Use This Pattern
- Pick Your Integration
- Shell Command Agents (CLI)
- Python Agents (Cloud SDK)
- TypeScript/JS Agents
- MCP-Native Agents
- HTTP / Workflow Engines
- Cross-Cutting Concerns
When to Use This Pattern
Your system has an orchestrator — some agent, pipeline, or workflow engine that coordinates multiple capabilities. At some point it decides "I need data from the web" or "I need to interact with a website." It delegates to browser-use, which autonomously navigates, clicks, extracts, and returns a result. The orchestrator never touches the browser.
Use subagent when:
- You want a black box: task in → result out
- The web task is self-contained (search, extract, fill a form)
- You don't need action-by-action control
Use tools integration instead when:
- Your agent needs to make individual browser decisions (click this, then check that)
- You want your agent's reasoning loop to drive the browser
Pick Your Integration
| Your agent type | Best approach |
|---|---|
| CLI coding agent in sandbox (Claude Code, Codex, OpenCode, Cline, Windsurf, Cursor bg, Hermes, OpenClaw) | CLI cloud passthrough |
| Python framework (LangChain, CrewAI, AutoGen, PydanticAI, custom) | Python Agent wrapper |
| TypeScript/JS (Vercel AI SDK, LangChain.js, custom) | Cloud SDK |
| MCP client (Claude Desktop, Cursor with MCP) | MCP browser_task tool |
| Workflow engine (n8n, Make, Zapier, Temporal) or any HTTP client | Cloud REST API |
Shell Command Agents (CLI)
For: Agents running in sandboxes/VMs with terminal access.
The agent delegates a complete task to the cloud via CLI commands. No Python imports needed.
# 1. Set API key (once)
browser-use cloud login $BROWSER_USE_API_KEY
# 2. Fire off a task
browser-use cloud v2 POST /tasks '{"task": "Find the top HN post and return title and URL"}'
# Returns: {"id": "<task-id>", "sessionId": "<session-id>"}
# 3. Poll until done (blocks)
browser-use cloud v2 poll <task-id>
# 4. Get the result
browser-use cloud v2 GET /tasks/<task-id>
# Returns full TaskView with output, steps, outputFiles
For structured output, pass a JSON schema:
browser-use cloud v2 POST /tasks '{
"task": "Find the CEO of OpenAI",
"structuredOutput": "{\"type\":\"object\",\"properties\":{\"name\":{\"type\":\"string\"},\"company\":{\"type\":\"string\"}},\"required\":[\"name\",\"company\"]}"
}'
Python Agents (Cloud SDK)
For: LangChain, CrewAI, AutoGen, PydanticAI, Semantic Kernel, or custom Python agents. Uses the Cloud SDK — no local browser needed.
from browser_use_sdk import AsyncBrowserUse
from pydantic import BaseModel
client = AsyncBrowserUse()
# Simple
async def browse(task: str) -> str:
result = await client.run(task)
return result.output
# Structured output
class SearchResult(BaseModel):
title: str
url: str
async def browse_structured(task: str) -> SearchResult:
result = await client.run(task, output_schema=SearchResult)
return result.output # SearchResult instance
Multi-step with keep_alive:
session = await client.sessions.create(proxy_country_code="us")
await client.run("Log into site", session_id=str(session.id), keep_alive=True)
result = await client.run("Extract data", session_id=str(session.id))
await client.sessions.stop(str(session.id))
TypeScript/JS Agents
For: Vercel AI SDK, LangChain.js, or custom TypeScript agents.
import { BrowserUse } from "browser-use-sdk";
import { z } from "zod";
const client = new BrowserUse();
// Simple
async function browse(task: string): Promise<string> {
const result = await client.run(task);
return result.output;
}
// Structured
const SearchResult = z.object({
title: z.string(),
url: z.string(),
});
async function browseStructured(task: string) {
const result = await client.run(task, { schema: SearchResult });
return result.output; // { title: string, url: string }
}
Multi-step with keepAlive:
const session = await client.sessions.create({ proxyCountryCode: "us" });
await client.run("Log into site", { sessionId: session.id, keepAlive: true });
const result = await client.run("Extract data", { sessionId: session.id });
await client.sessions.stop(session.id);
MCP-Native Agents
For: Claude Desktop, Cursor with MCP enabled, any MCP client.
Cloud MCP (entire task delegation)
Add to MCP config:
{
"mcpServers": {
"browser-use": {
"url": "https://api.browser-use.com/mcp",
"headers": { "X-Browser-Use-API-Key": "YOUR_KEY" }
}
}
}
The agent gets a browser_task tool. It calls it with a task description, gets back the result.
Local MCP (free, open-source)
The retry_with_browser_use_agent tool delegates an entire task to the local Agent:
uvx --from 'browser-use[cli]' browser-use --mcp
HTTP / Workflow Engines
For: n8n, Make, Zapier, Temporal, serverless functions, any HTTP client.
Create task → Poll → Get result
# 1. Create task
curl -X POST https://api.browser-use.com/api/v2/tasks \
-H "X-Browser-Use-API-Key: $BROWSER_USE_API_KEY" \
-H "Content-Type: application/json" \
-d '{"task": "Find the top HN post and return title+URL"}'
# → {"id": "task-uuid", "sessionId": "session-uuid"}
# 2. Poll status
curl https://api.browser-use.com/api/v2/tasks/<task-id>/status \
-H "X-Browser-Use-API-Key: $BROWSER_USE_API_KEY"
# → {"status": "finished"}
# 3. Get result
curl https://api.browser-use.com/api/v2/tasks/<task-id> \
-H "X-Browser-Use-API-Key: $BROWSER_USE_API_KEY"
# → Full TaskView with output, steps, outputFiles
Or use webhooks for event-driven workflows (see ../features.md).
Cross-Cutting Concerns
Structured output
- Cloud SDK Python:
output_schema=MyPydanticModel→result.output(typed) - Cloud SDK TypeScript:
{ schema: ZodSchema }→result.output(typed) - Cloud REST:
"structuredOutput": "<json-schema-string>"→outputin response
Error handling
from browser_use_sdk import AsyncBrowserUse, BrowserUseError
try:
result = await client.run(task, max_cost_usd=0.10)
except TimeoutError:
pass # Polling timed out (5 min default)
except BrowserUseError as e:
pass # API error
Cost control
- Cloud v2: Per-step pricing. Use
max_stepsto limit. - Cloud v3:
max_cost_usd=0.10caps spending. Checkresult.total_cost_usd.
Cleanup
Always stop sessions when done:
session = await client.sessions.create(proxy_country_code="us")
try:
result = await client.run(task, session_id=str(session.id))
finally:
await client.sessions.stop(str(session.id))