Agent Integration Guide

Overview

This guide explains how an LLM-based agent uses UIAP to understand and interact with a web application. UIAP provides the agent with structured context instead of raw HTML or screenshots.

What the Agent Receives

1. Capabilities

After session initialization, the agent receives a Capability Document describing:

Available UI roles and states
Registered actions with risk levels
Success signal types
Supported execution modes

2. Page Snapshots

The PageGraph is a structured representation of the current page state:

{
  "route": { "routeId": "videos.list", "url": "/videos" },
  "documents": [{
    "scopes": [{
      "id": "video.list",
      "kind": "list",
      "elements": [
        {
          "stableId": "video.new",
          "role": "button",
          "name": "New Video",
          "affordances": ["activatable"],
          "state": { "enabled": true, "visible": true },
          "defaultAction": "video.create"
        }
      ]
    }]
  }]
}

3. Deltas

After each action or state change, the agent receives incremental updates instead of full snapshots.

Agent Loop

A typical agent loop follows this pattern:

1. Receive snapshot/delta
2. Understand current state
3. Plan next action based on goal
4. Send action.request
5. Wait for action.result
6. Verify success via signals
7. Repeat or complete

Execution Modes

UIAP supports multiple execution strategies, in order of preference:

appAction — The app executes its own business logic directly
semanticDom — The SDK interacts with DOM elements by semantic identity
browserInput — Low-level input simulation (click, type)
webdriver — External browser automation (fallback)
vision — Screenshot-based interaction (last resort)

The agent should prefer higher-level modes. UIAP’s design principle: DOM-first, vision-second, computer-use-last-resort.

Policy Awareness

Before executing any action, the agent must respect the policy response:

allow — Proceed
confirm — Request user confirmation first
deny — Action is not permitted
handoff — Human must perform this step manually

Success Verification

After executing an action, the agent verifies success through signals:

success: [
  { kind: "route.changed", pattern: "/videos/:id" },
  { kind: "toast.contains", text: "created" }
]

This makes agent behavior deterministic and verifiable, not just “it looked like it worked.”