Version: dev

agent-browser

Overview

agent-browser is a built-in browser automation skill for deterministic, agent-friendly web interaction.

Unlike screenshot-first browser flows, it relies on accessibility-tree snapshots and ref-based element selection.

Repo path

skills/agent-browser/
└── SKILL.md

When to use it

multi-step browser workflows
complex single-page applications
deterministic element targeting
isolated sessions for repeated automation

Core workflow

Open the target page.
Capture a snapshot with interactive refs.
Read the returned JSON structure.
Interact with elements using refs such as @e2.
Re-snapshot after navigation or DOM changes.

Typical commands

agent-browser open https://example.com
agent-browser snapshot -i --json
agent-browser click @e2
agent-browser fill @e3 "text"
agent-browser wait --load networkidle

What this skill documents

Because this skill is CLI-driven, its value is primarily in SKILL.md:

navigation patterns
snapshot strategy
ref-based interactions
wait patterns
multi-session usage
state save and restore

Why it matters

This is the built-in skill to use when an agent needs reliable web automation without depending on fragile visual selectors.

agent-browser

Overview​

Repo path​

When to use it​

Core workflow​

Typical commands​

What this skill documents​

Why it matters​