agent-browser
Overviewâ
agent-browser is a built-in browser automation skill for deterministic, agent-friendly web interaction.
Unlike screenshot-first browser flows, it relies on accessibility-tree snapshots and ref-based element selection.
Repo pathâ
skills/agent-browser/
âââ SKILL.md
When to use itâ
- multi-step browser workflows
- complex single-page applications
- deterministic element targeting
- isolated sessions for repeated automation
Core workflowâ
- Open the target page.
- Capture a snapshot with interactive refs.
- Read the returned JSON structure.
- Interact with elements using refs such as
@e2. - Re-snapshot after navigation or DOM changes.
Typical commandsâ
agent-browser open https://example.com
agent-browser snapshot -i --json
agent-browser click @e2
agent-browser fill @e3 "text"
agent-browser wait --load networkidle
What this skill documentsâ
Because this skill is CLI-driven, its value is primarily in SKILL.md:
- navigation patterns
- snapshot strategy
- ref-based interactions
- wait patterns
- multi-session usage
- state save and restore
Why it mattersâ
This is the built-in skill to use when an agent needs reliable web automation without depending on fragile visual selectors.