Skip to content

UIAP — UI Agent Protocol

The missing protocol between web applications and AI agents. UIAP lets apps tell agents what's visible, what's possible, what's risky, and how to verify success — structured, safe, and transport-agnostic.

Today’s AI agents interact with web UIs in one of three ways — all of them fragile:

  • Screenshots + Vision models — slow, expensive, unreliable. The agent guesses what a button does.
  • Raw DOM / HTML scraping — brittle selectors, no business semantics. Breaks on every deploy.
  • ARIA / Accessibility tree — knows what an element is, but not what it does in business terms, how risky it is, or how to verify success.

There is no standard that answers: “What can I do here? Is it safe? How do I know it worked?”

UIAP closes this gap.


Semantic, not Pixel-based

Agents receive a structured PageGraph with roles, states, affordances, and stable identifiers — no screenshot parsing, no fragile CSS selectors.

Action Safety Built-in

Every action declares a risk level. The SDK enforces policy evaluation, confirmation flows, and human handoff before anything irreversible happens.

Success Verification

Actions define expected outcomes: route changes, toasts, dialog closures. Agents verify success instead of hoping it worked.

Transport Agnostic

Works over WebSocket, HTTP/SSE, postMessage, or any custom binding. The protocol defines the contract, not the wire.


  1. App integrates the UIAP SDK

    A few lines of code or data-uiap-* attributes instrument your UI with semantic meaning.

  2. Session starts, capabilities are exchanged

    The agent learns what the app can do: available actions, risk levels, execution modes, success signals.

  3. Agent receives a PageGraph

    A structured snapshot of the current UI state — routes, scopes, elements, states, affordances — not raw HTML.

  4. Agent plans an action, SDK checks policy

    Before execution, the policy layer decides: allow, confirm, deny, or hand off to the user.

  5. Action executes, app sends feedback

    The SDK runs the action (preferring app-native execution over DOM manipulation) and reports results + state deltas.

  6. Agent verifies success via signals

    Route changed? Toast appeared? Form submitted? The agent confirms the outcome before moving to the next step.


UIAP is a suite of 11 specifications:

#SpecWhat it defines
1CoreMessage envelope, session lifecycle, errors, versioning
2Capability ModelUI roles, states, affordances, actions, risk, signals
3Web ProfileDOM, ARIA, PageGraph, iframes, Shadow DOM, routes
4Action RuntimeExecution, verification, confirmation, result reporting
5Policy ExtensionPermissions, risk classes, sensitivity, audit
6SDK APIClient-side JavaScript integration API
7Workflow ExtensionMulti-step flows, skills, recovery
8Discovery MapperAutomatic UI element discovery and classification
9Authoring/ManifestConfiguration formats, validation, build/release
10Conformance SuiteTest modules, harness model, evaluation rules
11HTTP/REST BindingHTTP+SSE transport binding

UIAP doesn’t replace existing standards — it fills the gap between them:

StandardWhat it doesWhat’s missing for agents
ARIADescribes roles, states, propertiesNo business actions, no risk levels, no success signals
MCPConnects tools and knowledge to modelsDoesn’t model live UI state or browser interaction
Playwright / WebDriverAutomates browser interactionNo semantic understanding, no policy, no verification
AG-UIAgent-to-app event protocolNo canonical UI model, no action catalog, no safety layer

UIAP is adapter-capable: expose capabilities as MCP tools, mirror events to AG-UI, or delegate execution to WebDriver — the protocol is the contract, not the runtime.