Architecture

This page describes the technical architecture of Chrome ACP in detail.

Overview

Chrome ACP consists of four main components:

graph LR
    A[Chrome Extension<br/>or Web Client] <-->|WebSocket| B[Proxy Server]
    B <-->|stdin/stdout| C[ACP Agent]

Why a Proxy Server?

Chrome extensions run in a browser sandbox and cannot spawn subprocesses. The proxy server acts as a local bridge that:

Spawns the ACP agent subprocess
Communicates with the agent via stdin/stdout (NDJSON)
Relays messages to/from browser clients via WebSocket
Exposes browser tools to agents via MCP

Components

Chrome Extension

Location: packages/chrome-extension

A Chrome Manifest V3 extension with:

Sidepanel UI - Chat interface in Chrome's side panel
Background service worker - Maintains WebSocket connection to proxy
Browser tools - browser_tabs, browser_read, browser_execute

Uses chrome.scripting.executeScript() with world: "MAIN" to run scripts in the page's JavaScript context.

Web Client

Location: packages/web-client

A Progressive Web App (PWA) served by the proxy server. Features:

Same chat UI as the extension (shared components)
No browser tool access (runs in regular web context)
Mobile-friendly for QR code connections

Proxy Server

Location: packages/proxy-server

A Node.js server built with Hono:

Serves web client at /app
WebSocket endpoint at /ws
MCP endpoint at /mcp (Streamable HTTP)
File explorer API for workspace browsing

Shared Package

Location: packages/shared

Shared code used by both chrome-extension and web-client:

ACPClient - WebSocket client for proxy communication
UI components (shadcn/ui + Vercel AI Elements)
TypeScript type definitions

WebSocket Protocol

Communication between browser clients and proxy server uses JSON messages.

Client → Server Messages

Type	Description
`connect`	Initial handshake (sends auth token)
`new_session`	Request new ACP session
`prompt`	Send user message with content blocks
`cancel`	Cancel current agent response
`set_session_model`	Switch AI model
`permission_response`	User response to permission request
`browser_tool_result`	Result from browser tool execution

Server → Client Messages

Type	Description
`connected`	Connection confirmed
`error`	Error occurred
`session_created`	New session ready
`session_update`	Agent response chunks
`prompt_complete`	Agent finished responding
`permission_request`	Request user confirmation
`browser_tool_call`	Request browser tool execution
`model_state`	Available models and current selection

Session Update Types

The session_update message has different subtypes:

type SessionUpdate =
  | { sessionUpdate: "user_message_chunk"; content: ContentBlock }
  | { sessionUpdate: "agent_message_chunk"; content: ContentBlock }
  | { sessionUpdate: "agent_thought_chunk"; content: ContentBlock }
  | { sessionUpdate: "tool_call"; toolCallId: string; title: string; status: string; ... }
  | { sessionUpdate: "tool_call_update"; toolCallId: string; ... };

Content Types

Messages support multiple content types:

type ContentBlock =
  | { type: "text"; text: string }
  | { type: "image"; mimeType: string; data: string }  // base64
  | { type: "resource_link"; uri: string; mimeType?: string };

Image Support

Agents can declare image support via promptCapabilities:

interface PromptCapabilities {
  audio?: boolean;
  image?: boolean;
  embeddedContext?: boolean;
}

The client checks promptCapabilities.image to enable/disable image attachments.

Permission System

Some operations require user confirmation:

interface PermissionOption {
  id: string;
  label: string;
  kind: "allow_once" | "allow_always" | "reject_once" | "reject_always";
}

Flow:

Agent requests operation → Proxy sends permission_request
UI shows permission buttons → User clicks one
Client sends permission_response with selected optionId
Proxy forwards decision → Agent continues or aborts

Model Selection

Agents can expose multiple models:

interface SessionModelState {
  availableModels: Array<{
    id: string;
    displayName?: string;
    provider?: string;
  }>;
  currentModelId?: string;
}

The UI shows a model selector popover. When user switches:

Client sends set_session_model with new modelId
Proxy forwards to agent
Agent updates session model

File Explorer

The proxy server provides file system access for workspace browsing:

List Directory

// Request
{ type: "list_dir", path: "/some/path" }

// Response
{
  items: [
    { name: "src", type: "directory" },
    { name: "README.md", type: "file" }
  ]
}

Read File

// Request
{ type: "read_file", path: "/some/path/file.txt" }

// Response
{
  path: "/some/path/file.txt",
  content: "file contents...",
  mimeType: "text/plain"
}

MCP Integration

Browser tools are exposed to agents via MCP (Model Context Protocol):

Transport: Streamable HTTP at /mcp
Protocol Version: 2024-11-05

MCP Methods

Method	Description
`initialize`	Protocol handshake
`tools/list`	List available browser tools
`tools/call`	Execute a browser tool

Tool Definitions

const BROWSER_TOOLS = [
  {
    name: "browser_tabs",
    description: "List all open tabs...",
    inputSchema: { type: "object", properties: {} }
  },
  {
    name: "browser_read",
    description: "Read the content of a specific tab...",
    inputSchema: {
      type: "object",
      properties: { tabId: { type: "number" } },
      required: ["tabId"]
    }
  },
  {
    name: "browser_execute",
    description: "Execute JavaScript in a tab...",
    inputSchema: {
      type: "object",
      properties: {
        tabId: { type: "number" },
        script: { type: "string" }
      },
      required: ["tabId", "script"]
    }
  }
];

MCP Flow

Agent                    Proxy                   Extension
  │                        │                         │
  │── tools/call ─────────►│                         │
  │   (browser_tabs)       │                         │
  │                        │── browser_tool_call ───►│
  │                        │                         │
  │                        │◄── browser_tool_result ─│
  │                        │    (tabs array)         │
  │◄── MCP response ───────│                         │

Tech Stack

Component	Technology
Package Manager	Bun (workspaces)
UI Framework	React + TypeScript
Styling	Tailwind CSS
UI Components	shadcn/ui + Vercel AI Elements
Server	Hono (HTTP + WebSocket)
Build	Bun (extension), Vite (web-client)

Architecture#

#Overview

#Why a Proxy Server?

#Components

#Chrome Extension

#Web Client

#Proxy Server

#Shared Package

#WebSocket Protocol

#Client → Server Messages

#Server → Client Messages

#Session Update Types

#Content Types

#Image Support

#Permission System

#Model Selection

#File Explorer

#List Directory

#Read File

#MCP Integration

#MCP Methods

#Tool Definitions

#MCP Flow

#Tech Stack