220 lines
5.3 KiB
Markdown
220 lines
5.3 KiB
Markdown
# Snapshot and Refs
|
|
|
|
Compact element references that reduce context usage dramatically for AI agents.
|
|
|
|
**Related**: [commands.md](commands.md) for full command reference, [SKILL.md](../SKILL.md) for quick start.
|
|
|
|
## Contents
|
|
|
|
- [How Refs Work](#how-refs-work)
|
|
- [Snapshot Command](#the-snapshot-command)
|
|
- [Using Refs](#using-refs)
|
|
- [Ref Lifecycle](#ref-lifecycle)
|
|
- [Best Practices](#best-practices)
|
|
- [Ref Notation Details](#ref-notation-details)
|
|
- [Troubleshooting](#troubleshooting)
|
|
|
|
## How Refs Work
|
|
|
|
Traditional approach:
|
|
```
|
|
Full DOM/HTML → AI parses → CSS selector → Action (~3000-5000 tokens)
|
|
```
|
|
|
|
agent-browser approach:
|
|
```
|
|
Compact snapshot → @refs assigned → Direct interaction (~200-400 tokens)
|
|
```
|
|
|
|
## The Snapshot Command
|
|
|
|
```bash
|
|
# Basic snapshot (shows page structure)
|
|
agent-browser snapshot
|
|
|
|
# Interactive snapshot (-i flag) - RECOMMENDED
|
|
agent-browser snapshot -i
|
|
```
|
|
|
|
### Snapshot Output Format
|
|
|
|
```
|
|
Page: Example Site - Home
|
|
URL: https://example.com
|
|
|
|
@e1 [header]
|
|
@e2 [nav]
|
|
@e3 [a] "Home"
|
|
@e4 [a] "Products"
|
|
@e5 [a] "About"
|
|
@e6 [button] "Sign In"
|
|
|
|
@e7 [main]
|
|
@e8 [h1] "Welcome"
|
|
@e9 [form]
|
|
@e10 [input type="email"] placeholder="Email"
|
|
@e11 [input type="password"] placeholder="Password"
|
|
@e12 [button type="submit"] "Log In"
|
|
|
|
@e13 [footer]
|
|
@e14 [a] "Privacy Policy"
|
|
```
|
|
|
|
## Using Refs
|
|
|
|
Once you have refs, interact directly:
|
|
|
|
```bash
|
|
# Click the "Sign In" button
|
|
agent-browser click @e6
|
|
|
|
# Fill email input
|
|
agent-browser fill @e10 "user@example.com"
|
|
|
|
# Fill password
|
|
agent-browser fill @e11 "password123"
|
|
|
|
# Submit the form
|
|
agent-browser click @e12
|
|
```
|
|
|
|
## Ref Lifecycle
|
|
|
|
**IMPORTANT**: Refs are invalidated when the page changes!
|
|
|
|
```bash
|
|
# Get initial snapshot
|
|
agent-browser snapshot -i
|
|
# @e1 [button] "Next"
|
|
|
|
# Click triggers page change
|
|
agent-browser click @e1
|
|
|
|
# MUST re-snapshot to get new refs!
|
|
agent-browser snapshot -i
|
|
# @e1 [h1] "Page 2" ← Different element now!
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
### 1. Always Snapshot Before Interacting
|
|
|
|
```bash
|
|
# CORRECT
|
|
agent-browser open https://example.com
|
|
agent-browser snapshot -i # Get refs first
|
|
agent-browser click @e1 # Use ref
|
|
|
|
# WRONG
|
|
agent-browser open https://example.com
|
|
agent-browser click @e1 # Ref doesn't exist yet!
|
|
```
|
|
|
|
### 2. Re-Snapshot After Navigation
|
|
|
|
```bash
|
|
agent-browser click @e5 # Navigates to new page
|
|
agent-browser snapshot -i # Get new refs
|
|
agent-browser click @e1 # Use new refs
|
|
```
|
|
|
|
### 3. Re-Snapshot After Dynamic Changes
|
|
|
|
```bash
|
|
agent-browser click @e1 # Opens dropdown
|
|
agent-browser snapshot -i # See dropdown items
|
|
agent-browser click @e7 # Select item
|
|
```
|
|
|
|
### 4. Snapshot Specific Regions
|
|
|
|
For complex pages, snapshot specific areas:
|
|
|
|
```bash
|
|
# Snapshot just the form
|
|
agent-browser snapshot @e9
|
|
```
|
|
|
|
## Ref Notation Details
|
|
|
|
```
|
|
@e1 [tag type="value"] "text content" placeholder="hint"
|
|
│ │ │ │ │
|
|
│ │ │ │ └─ Additional attributes
|
|
│ │ │ └─ Visible text
|
|
│ │ └─ Key attributes shown
|
|
│ └─ HTML tag name
|
|
└─ Unique ref ID
|
|
```
|
|
|
|
### Common Patterns
|
|
|
|
```
|
|
@e1 [button] "Submit" # Button with text
|
|
@e2 [input type="email"] # Email input
|
|
@e3 [input type="password"] # Password input
|
|
@e4 [a href="/page"] "Link Text" # Anchor link
|
|
@e5 [select] # Dropdown
|
|
@e6 [textarea] placeholder="Message" # Text area
|
|
@e7 [div class="modal"] # Container (when relevant)
|
|
@e8 [img alt="Logo"] # Image
|
|
@e9 [checkbox] checked # Checked checkbox
|
|
@e10 [radio] selected # Selected radio
|
|
```
|
|
|
|
## Iframes
|
|
|
|
Snapshots automatically detect and inline iframe content. When the main-frame snapshot runs, each `Iframe` node is resolved and its child accessibility tree is included directly beneath it in the output. Refs assigned to elements inside iframes carry frame context, so interactions like `click`, `fill`, and `type` work without manually switching frames.
|
|
|
|
```bash
|
|
agent-browser snapshot -i
|
|
# @e1 [heading] "Checkout"
|
|
# @e2 [Iframe] "payment-frame"
|
|
# @e3 [input] "Card number"
|
|
# @e4 [input] "Expiry"
|
|
# @e5 [button] "Pay"
|
|
# @e6 [button] "Cancel"
|
|
|
|
# Interact with iframe elements directly using their refs
|
|
agent-browser fill @e3 "4111111111111111"
|
|
agent-browser fill @e4 "12/28"
|
|
agent-browser click @e5
|
|
```
|
|
|
|
**Key details:**
|
|
- Only one level of iframe nesting is expanded (iframes within iframes are not recursed)
|
|
- Cross-origin iframes that block accessibility tree access are silently skipped
|
|
- Empty iframes or iframes with no interactive content are omitted from the output
|
|
- To scope a snapshot to a single iframe, use `frame @ref` then `snapshot -i`
|
|
|
|
## Troubleshooting
|
|
|
|
### "Ref not found" Error
|
|
|
|
```bash
|
|
# Ref may have changed - re-snapshot
|
|
agent-browser snapshot -i
|
|
```
|
|
|
|
### Element Not Visible in Snapshot
|
|
|
|
```bash
|
|
# Scroll down to reveal element
|
|
agent-browser scroll down 1000
|
|
agent-browser snapshot -i
|
|
|
|
# Or wait for dynamic content
|
|
agent-browser wait 1000
|
|
agent-browser snapshot -i
|
|
```
|
|
|
|
### Too Many Elements
|
|
|
|
```bash
|
|
# Snapshot specific container
|
|
agent-browser snapshot @e5
|
|
|
|
# Or use get text for content-only extraction
|
|
agent-browser get text @e5
|
|
```
|