AI-friendly CLI to control the browser from the terminal.
browser-cli is a command-line tool that lets you control a Chrome browser directly from the terminal. It communicates with the AIPex Chrome extension through a local WebSocket daemon, giving you full programmatic access to browser tabs, page elements, screenshots, downloads, and more.
It is designed for two primary audiences:
- AI agents (Cursor, Claude Code, Codex, etc.) that need to perform browser tasks as part of coding workflows
- Developers and scripters who want to automate browser interactions from shell scripts or CI pipelines
browser-cli ──WebSocket──▶ aipex-daemon ──WebSocket──▶ AIPex Chrome Extension ──▶ Browser APIs
The daemon acts as a relay between CLI clients and the AIPex extension:
browser-cli #1 ──WS /cli────┐
browser-cli #2 ──WS /cli────┤── aipex-daemon (:9223) ──WS /extension──▶ AIPex Extension
MCP bridge ──WS /bridge──┘
- The daemon auto-spawns on first CLI invocation and self-terminates after 30 seconds of inactivity.
- A PID file at
~/.aipex-daemon.pidtracks the running daemon. - Health endpoint available at
http://localhost:9223/health.
npm install -g browser-cli- Node.js >= 18
- AIPex Chrome extension installed (Chrome Web Store or developer build)
# 1. Install
npm install -g browser-cli
# 2. Connect the AIPex extension to the daemon
# Open Chrome → AIPex extension → Options → WebSocket URL: ws://localhost:9223/extension → Connect
# 3. Start using it
browser-cli tab list
browser-cli tab new https://example.com
browser-cli page search "button*" --tab 123After installing browser-cli, you need to connect the AIPex Chrome extension to the daemon:
- Open Chrome and click the AIPex extension icon
- Go to Options (or right-click the icon → "Extension options")
- Find the WebSocket Connection section
- Enter:
ws://localhost:9223 - Click Connect
Verify the connection:
browser-cli statusA successful response looks like:
{
"ok": true,
"data": {
"status": "ok",
"extensionConnected": true,
"bridgeClients": 0,
"version": "3.1.0"
}
}browser-cli organizes commands into groups. Run browser-cli --help for an overview, or browser-cli <group> --help for group-level help.
browser-cli <group> <command> [args] [--options]
| Command | Description | Example |
|---|---|---|
list |
List all open tabs with IDs, titles, and URLs | browser-cli tab list |
current |
Get the currently active tab | browser-cli tab current |
switch <id> |
Switch to a specific tab by ID | browser-cli tab switch 42 |
new <url> |
Open a new tab with the given URL | browser-cli tab new https://google.com |
close <id> |
Close a specific tab by ID | browser-cli tab close 42 |
info <id> |
Get detailed info about a specific tab | browser-cli tab info 42 |
organize |
Auto-group tabs by topic using AI | browser-cli tab organize |
ungroup |
Remove all tab groups in the current window | browser-cli tab ungroup |
| Command | Description | Example |
|---|---|---|
search <query> --tab <id> |
Search elements using glob/grep patterns | browser-cli page search "button*" --tab 123 |
screenshot |
Capture screenshot of the visible tab | browser-cli page screenshot |
screenshot-tab <id> |
Capture screenshot of a specific tab | browser-cli page screenshot-tab 123 |
metadata |
Get page metadata (title, description, etc.) | browser-cli page metadata --tab 123 |
scroll-to <selector> |
Scroll to a DOM element by CSS selector | browser-cli page scroll-to "#main" |
highlight <selector> |
Highlight a DOM element with drop shadow | browser-cli page highlight "button.submit" |
highlight-text <selector> <text> |
Highlight specific words within text | browser-cli page highlight-text "p" "important" |
| Command | Description | Example |
|---|---|---|
click <uid> --tab <id> |
Click an element by UID | browser-cli interact click btn-42 --tab 123 |
fill <uid> <value> --tab <id> |
Fill an input element by UID | browser-cli interact fill input-5 "hello" --tab 123 |
hover <uid> --tab <id> |
Hover over an element by UID | browser-cli interact hover menu-3 --tab 123 |
form --tab <id> --elements <json> |
Fill multiple form elements at once | browser-cli interact form --tab 123 --elements '[...]' |
editor <uid> --tab <id> |
Get content from a code editor or textarea | browser-cli interact editor editor-1 --tab 123 |
upload --tab <id> --file-path <path> |
Upload a file to a file input element | browser-cli interact upload --tab 123 --file-path /path/to/file.pdf |
computer --action <action> |
Coordinate-based mouse/keyboard interaction | browser-cli interact computer --action left_click --coordinate "[500,300]" |
| Command | Description | Example |
|---|---|---|
markdown --text <content> |
Download text content as a markdown file | browser-cli download markdown --text "# Hello" |
image --data <base64> |
Download an image from base64 data | browser-cli download image --data "data:image/png;base64,..." |
chat-images --messages <json> |
Download multiple images from chat messages | browser-cli download chat-images --messages '[...]' |
| Command | Description | Example |
|---|---|---|
list |
List all available intervention types | browser-cli intervention list |
info <type> |
Get detailed info about an intervention type | browser-cli intervention info voice-input |
request <type> |
Request human intervention | browser-cli intervention request voice-input --reason "Need confirmation" |
cancel |
Cancel the currently active intervention | browser-cli intervention cancel |
| Command | Description | Example |
|---|---|---|
list |
List all available skills | browser-cli skill list |
load <name> |
Load the main content (SKILL.md) of a skill | browser-cli skill load my-skill |
info <name> |
Get detailed info about a skill | browser-cli skill info my-skill |
run <skill> <script> |
Execute a script belonging to a skill | browser-cli skill run my-skill scripts/init.js |
ref <skill> <path> |
Read a reference document from a skill | browser-cli skill ref my-skill references/guide.md |
asset <skill> <path> |
Get an asset file from a skill | browser-cli skill asset my-skill assets/icon.png |
| Command | Description |
|---|---|
status |
Check daemon and extension connection status |
update |
Update browser-cli to the latest version |
| Option | Description | Default |
|---|---|---|
--port <n> |
Daemon WebSocket port | 9223 |
--host <h> |
Daemon host address | 127.0.0.1 |
--help, -h |
Show help | |
--version, -v |
Show version |
# Get all tabs
browser-cli tab list
# Search for interactive elements on a tab
browser-cli page search "{button,input,textarea,select,a}*" --tab 123
# Click an element found via search
browser-cli interact click btn-42 --tab 123# Find form inputs
browser-cli page search "{input,textbox}*" --tab 123
# Fill email and password
browser-cli interact fill input-email "user@example.com" --tab 123
browser-cli interact fill input-pass "mypassword" --tab 123
# Click the login button
browser-cli page search "*[Ll]ogin*" --tab 123
browser-cli interact click btn-login --tab 123# Screenshot the active tab
browser-cli page screenshot
# Screenshot a specific tab with LLM analysis
browser-cli page screenshot-tab 123 --send-to-llm true#!/bin/bash
# Open a page and wait for it to load
browser-cli tab new https://example.com
sleep 2
# Get the current tab ID
TAB_INFO=$(browser-cli tab current)
TAB_ID=$(echo "$TAB_INFO" | jq '.data.id')
# Search and interact
browser-cli page search "link*" --tab "$TAB_ID"
browser-cli page screenshotbrowser-cli interact form --tab 123 --elements '[
{"uid": "input-name", "value": "John Doe"},
{"uid": "input-email", "value": "john@example.com"},
{"uid": "select-country", "value": "US"}
]'| Variable | Default | Description |
|---|---|---|
BROWSER_CLI_WS_URL |
ws://127.0.0.1:9223/cli |
Override the daemon WebSocket URL |
BROWSER_CLI_CONNECT_TIMEOUT |
60000 |
Max time (ms) to wait for daemon + extension connection |
-
Daemon auto-spawn: When you run any
browser-clicommand, the CLI checks if a daemon is already running. If not, it automatically forks a detached daemon process in the background. -
Retry with backoff: If the daemon or extension is not yet ready, the CLI retries with exponential backoff (500ms initial, 5s max) until the connection is established or the timeout is reached.
-
WebSocket relay: The daemon listens on port 9223 with three endpoints:
/extension— AIPex Chrome extension connects here/bridge— MCP bridge instances connect here/cli— CLI tool calls connect here
-
Tool call flow: CLI commands are translated to JSON-RPC tool calls, sent to the daemon via WebSocket, relayed to the extension, executed in the browser, and results returned back through the same chain.
-
Idle auto-shutdown: The daemon self-terminates after 30 seconds with no active connections (no extension, no bridge clients), keeping your system clean.
-
Version auto-check: On each invocation, the CLI checks npm for newer versions in the background (cached for 24 hours). If an update is available, a notice is shown after the command completes.
browser-cli can be used as a skill in AI agent platforms. A ready-to-use SKILL.md is included in this repository.
For AI agents that support skill installation (Cursor, Claude Code, Codex, etc.):
# The agent can invoke browser-cli commands directly via shell
browser-cli tab list
browser-cli page search "button*" --tab 123
browser-cli interact click btn-42 --tab 123The recommended agent workflow:
browser-cli tab list— discover open tabsbrowser-cli page search "<pattern>" --tab <id>— find elements on the pagebrowser-cli interact click <uid> --tab <id>— interact with found elementsbrowser-cli page screenshot— verify the result visually
See SKILL.md for the full skill definition with trigger phrases and detailed usage instructions.
| Symptom | Likely Cause | Fix |
|---|---|---|
Daemon not running |
Daemon hasn't started yet | Run any command — the daemon auto-spawns. Or run browser-cli status to check. |
Extension is not connected |
AIPex extension not connected to daemon | Open AIPex Options → set WebSocket URL to ws://localhost:9223/extension → Connect |
| Port 9223 already in use | Port conflict | Use --port 9224 and update extension WebSocket URL accordingly |
Timed out after 60s |
Extension not connected within timeout | Check that the AIPex extension is installed and connected. Increase timeout with BROWSER_CLI_CONNECT_TIMEOUT=120000 |
search_elements returns 0 results |
Page uses canvas or non-semantic HTML | Try different query patterns. Fall back to page screenshot --send-to-llm true + interact computer |
| Connection drops | Extension service worker sleep | AIPex uses keepalive pings. Reconnect from extension Options if needed |
| Commands work but results are empty | Tab ID is wrong | Run browser-cli tab list to get the correct tab IDs |
update command fails |
npm permissions | Try sudo npm i -g browser-cli@latest or use a Node version manager |
面向 AI 的命令行工具,通过 AIPex Chrome 扩展在终端中控制浏览器。
browser-cli 是一个命令行工具,让你能够直接在终端中控制 Chrome 浏览器。它通过本地 WebSocket 守护进程与 AIPex Chrome 扩展 通信,提供对浏览器标签页、页面元素、截图、下载等功能的完整编程访问。
主要面向两类用户:
- AI 智能体(Cursor、Claude Code、Codex 等)—— 在编程工作流中执行浏览器任务
- 开发者和脚本编写者 —— 通过 Shell 脚本或 CI 管道自动化浏览器操作
browser-cli ──WebSocket──▶ aipex-daemon ──WebSocket──▶ AIPex Chrome 扩展 ──▶ 浏览器 API
守护进程充当 CLI 客户端与 AIPex 扩展之间的中继:
browser-cli #1 ──WS /cli────┐
browser-cli #2 ──WS /cli────┤── aipex-daemon (:9223) ──WS /extension──▶ AIPex 扩展
MCP bridge ──WS /bridge──┘
- 守护进程在首次 CLI 调用时自动启动,空闲 30 秒后自动退出。
- PID 文件位于
~/.aipex-daemon.pid,用于追踪运行中的守护进程。 - 健康检查端点:
http://localhost:9223/health。
npm install -g browser-cli- Node.js >= 18
- AIPex Chrome 扩展(Chrome 应用商店 或开发者构建版)
# 1. 安装
npm install -g browser-cli
# 2. 连接 AIPex 扩展到守护进程
# 打开 Chrome → AIPex 扩展 → 选项 → WebSocket URL: ws://localhost:9223/extension → 连接
# 3. 开始使用
browser-cli tab list
browser-cli tab new https://example.com
browser-cli page search "button*" --tab 123安装 browser-cli 后,需要将 AIPex Chrome 扩展连接到守护进程:
- 打开 Chrome,点击 AIPex 扩展图标
- 进入 选项(或右键点击图标 → "扩展选项")
- 找到 WebSocket 连接 区域
- 输入:
ws://localhost:9223/extension - 点击 连接
验证连接状态:
browser-cli status成功响应如下:
{
"ok": true,
"data": {
"status": "ok",
"extensionConnected": true,
"bridgeClients": 0,
"version": "3.1.0"
}
}browser-cli 将命令组织为分组。运行 browser-cli --help 查看概览,或 browser-cli <group> --help 查看分组帮助。
browser-cli <group> <command> [args] [--options]
| 命令 | 描述 | 示例 |
|---|---|---|
list |
列出所有打开的标签页(含 ID、标题和 URL) | browser-cli tab list |
current |
获取当前活动标签页 | browser-cli tab current |
switch <id> |
通过 ID 切换到指定标签页 | browser-cli tab switch 42 |
new <url> |
打开一个新标签页 | browser-cli tab new https://google.com |
close <id> |
关闭指定标签页 | browser-cli tab close 42 |
info <id> |
获取标签页详细信息 | browser-cli tab info 42 |
organize |
使用 AI 自动按主题分组标签页 | browser-cli tab organize |
ungroup |
移除当前窗口中的所有标签页分组 | browser-cli tab ungroup |
| 命令 | 描述 | 示例 |
|---|---|---|
search <query> --tab <id> |
使用 glob/grep 模式搜索元素 | browser-cli page search "button*" --tab 123 |
screenshot |
截取当前可见标签页的屏幕截图 | browser-cli page screenshot |
screenshot-tab <id> |
截取指定标签页的屏幕截图 | browser-cli page screenshot-tab 123 |
metadata |
获取页面元数据(标题、描述等) | browser-cli page metadata --tab 123 |
scroll-to <selector> |
通过 CSS 选择器滚动到 DOM 元素 | browser-cli page scroll-to "#main" |
highlight <selector> |
高亮 DOM 元素(阴影效果) | browser-cli page highlight "button.submit" |
highlight-text <selector> <text> |
高亮文本中的特定词语 | browser-cli page highlight-text "p" "important" |
| 命令 | 描述 | 示例 |
|---|---|---|
click <uid> --tab <id> |
通过 UID 点击元素 | browser-cli interact click btn-42 --tab 123 |
fill <uid> <value> --tab <id> |
通过 UID 填写输入元素 | browser-cli interact fill input-5 "hello" --tab 123 |
hover <uid> --tab <id> |
通过 UID 悬停在元素上 | browser-cli interact hover menu-3 --tab 123 |
form --tab <id> --elements <json> |
批量填写多个表单元素 | browser-cli interact form --tab 123 --elements '[...]' |
editor <uid> --tab <id> |
获取代码编辑器或文本域的内容 | browser-cli interact editor editor-1 --tab 123 |
upload --tab <id> --file-path <path> |
上传文件到文件输入元素 | browser-cli interact upload --tab 123 --file-path /path/to/file.pdf |
computer --action <action> |
基于坐标的鼠标/键盘交互 | browser-cli interact computer --action left_click --coordinate "[500,300]" |
| 命令 | 描述 | 示例 |
|---|---|---|
markdown --text <content> |
将文本内容下载为 Markdown 文件 | browser-cli download markdown --text "# Hello" |
image --data <base64> |
从 base64 数据下载图片 | browser-cli download image --data "data:image/png;base64,..." |
chat-images --messages <json> |
从聊天消息中批量下载图片 | browser-cli download chat-images --messages '[...]' |
| 命令 | 描述 | 示例 |
|---|---|---|
list |
列出所有可用的人工介入类型 | browser-cli intervention list |
info <type> |
获取指定介入类型的详细信息 | browser-cli intervention info voice-input |
request <type> |
请求人工介入 | browser-cli intervention request voice-input --reason "需要确认" |
cancel |
取消当前活动的介入请求 | browser-cli intervention cancel |
| 命令 | 描述 | 示例 |
|---|---|---|
list |
列出所有可用技能 | browser-cli skill list |
load <name> |
加载技能的主要内容(SKILL.md) | browser-cli skill load my-skill |
info <name> |
获取技能的详细信息 | browser-cli skill info my-skill |
run <skill> <script> |
执行技能中的脚本 | browser-cli skill run my-skill scripts/init.js |
ref <skill> <path> |
读取技能的参考文档 | browser-cli skill ref my-skill references/guide.md |
asset <skill> <path> |
获取技能的资源文件 | browser-cli skill asset my-skill assets/icon.png |
| 命令 | 描述 |
|---|---|
status |
检查守护进程和扩展的连接状态 |
update |
更新 browser-cli 到最新版本 |
| 选项 | 描述 | 默认值 |
|---|---|---|
--port <n> |
守护进程 WebSocket 端口 | 9223 |
--host <h> |
守护进程主机地址 | 127.0.0.1 |
--help, -h |
显示帮助 | |
--version, -v |
显示版本 |
# 获取所有标签页
browser-cli tab list
# 搜索标签页上的可交互元素
browser-cli page search "{button,input,textarea,select,a}*" --tab 123
# 点击搜索到的元素
browser-cli interact click btn-42 --tab 123# 查找表单输入框
browser-cli page search "{input,textbox}*" --tab 123
# 填写邮箱和密码
browser-cli interact fill input-email "user@example.com" --tab 123
browser-cli interact fill input-pass "mypassword" --tab 123
# 点击登录按钮
browser-cli page search "*[Ll]ogin*" --tab 123
browser-cli interact click btn-login --tab 123# 截取活动标签页
browser-cli page screenshot
# 截取指定标签页并发送给 LLM 分析
browser-cli page screenshot-tab 123 --send-to-llm true#!/bin/bash
# 打开页面并等待加载
browser-cli tab new https://example.com
sleep 2
# 获取当前标签页 ID
TAB_INFO=$(browser-cli tab current)
TAB_ID=$(echo "$TAB_INFO" | jq '.data.id')
# 搜索并交互
browser-cli page search "link*" --tab "$TAB_ID"
browser-cli page screenshotbrowser-cli interact form --tab 123 --elements '[
{"uid": "input-name", "value": "John Doe"},
{"uid": "input-email", "value": "john@example.com"},
{"uid": "select-country", "value": "US"}
]'| 变量 | 默认值 | 描述 |
|---|---|---|
BROWSER_CLI_WS_URL |
ws://127.0.0.1:9223/cli |
覆盖守护进程 WebSocket URL |
BROWSER_CLI_CONNECT_TIMEOUT |
60000 |
等待守护进程 + 扩展连接的最大时间(毫秒) |
-
守护进程自动启动:运行任何
browser-cli命令时,CLI 会检查守护进程是否已在运行。如果没有,会自动在后台 fork 一个分离的守护进程。 -
指数退避重试:如果守护进程或扩展尚未就绪,CLI 会以指数退避策略重试(初始 500ms,最大 5s),直到连接建立或超时。
-
WebSocket 中继:守护进程在 9223 端口监听三个端点:
/extension— AIPex Chrome 扩展连接到此端点/bridge— MCP bridge 实例连接到此端点/cli— CLI 工具调用连接到此端点
-
工具调用流程:CLI 命令被转换为 JSON-RPC 工具调用,通过 WebSocket 发送到守护进程,中继到扩展,在浏览器中执行,结果通过同一链路返回。
-
空闲自动关闭:守护进程在没有活动连接 30 秒后自动退出(无扩展、无 bridge 客户端),保持系统清洁。
-
版本自动检查:每次调用时,CLI 在后台检查 npm 上是否有更新版本(缓存 24 小时)。如果有可用更新,会在命令完成后显示通知。
browser-cli 可以作为 AI 智能体平台中的技能使用。本仓库包含一个即用的 SKILL.md 文件。
对于支持技能安装的 AI 智能体(Cursor、Claude Code、Codex 等):
# 智能体可以直接通过 Shell 调用 browser-cli 命令
browser-cli tab list
browser-cli page search "button*" --tab 123
browser-cli interact click btn-42 --tab 123推荐的智能体工作流:
browser-cli tab list— 发现打开的标签页browser-cli page search "<pattern>" --tab <id>— 查找页面上的元素browser-cli interact click <uid> --tab <id>— 与找到的元素交互browser-cli page screenshot— 可视化验证结果
完整的技能定义请参阅 SKILL.md。
| 症状 | 可能原因 | 解决方法 |
|---|---|---|
Daemon not running |
守护进程尚未启动 | 运行任意命令即可自动启动,或运行 browser-cli status 检查 |
Extension is not connected |
AIPex 扩展未连接到守护进程 | 打开 AIPex 选项 → 设置 WebSocket URL 为 ws://localhost:9223/extension → 连接 |
| 端口 9223 被占用 | 端口冲突 | 使用 --port 9224 并相应更新扩展的 WebSocket URL |
Timed out after 60s |
扩展未在超时时间内连接 | 确认 AIPex 扩展已安装并连接。可通过 BROWSER_CLI_CONNECT_TIMEOUT=120000 增加超时时间 |
search_elements 返回 0 结果 |
页面使用 canvas 或非语义化 HTML | 尝试不同的查询模式。回退到 page screenshot --send-to-llm true + interact computer |
| 连接断开 | 扩展 Service Worker 休眠 | AIPex 使用 keepalive ping。如需要,从扩展选项重新连接 |
| 命令执行成功但结果为空 | 标签页 ID 错误 | 运行 browser-cli tab list 获取正确的标签页 ID |
update 命令失败 |
npm 权限问题 | 尝试 sudo npm i -g browser-cli@latest 或使用 Node 版本管理器 |