Skip to content

Agent Integration Guide

This guide explains how an LLM-based agent uses UIAP to understand and interact with a web application. UIAP provides the agent with structured context instead of raw HTML or screenshots.

After session initialization, the agent receives a Capability Document describing:

  • Available UI roles and states
  • Registered actions with risk levels
  • Success signal types
  • Supported execution modes

The PageGraph is a structured representation of the current page state:

{
"route": { "routeId": "videos.list", "url": "/videos" },
"documents": [{
"scopes": [{
"id": "video.list",
"kind": "list",
"elements": [
{
"stableId": "video.new",
"role": "button",
"name": "New Video",
"affordances": ["activatable"],
"state": { "enabled": true, "visible": true },
"defaultAction": "video.create"
}
]
}]
}]
}

After each action or state change, the agent receives incremental updates instead of full snapshots.

A typical agent loop follows this pattern:

1. Receive snapshot/delta
2. Understand current state
3. Plan next action based on goal
4. Send action.request
5. Wait for action.result
6. Verify success via signals
7. Repeat or complete

UIAP supports multiple execution strategies, in order of preference:

  1. appAction — The app executes its own business logic directly
  2. semanticDom — The SDK interacts with DOM elements by semantic identity
  3. browserInput — Low-level input simulation (click, type)
  4. webdriver — External browser automation (fallback)
  5. vision — Screenshot-based interaction (last resort)

The agent should prefer higher-level modes. UIAP’s design principle: DOM-first, vision-second, computer-use-last-resort.

Before executing any action, the agent must respect the policy response:

  • allow — Proceed
  • confirm — Request user confirmation first
  • deny — Action is not permitted
  • handoff — Human must perform this step manually

After executing an action, the agent verifies success through signals:

success: [
{ kind: "route.changed", pattern: "/videos/:id" },
{ kind: "toast.contains", text: "created" }
]

This makes agent behavior deterministic and verifiable, not just “it looked like it worked.”