Skip to content

UIAP Capability Model

FieldValue
StatusDraft
Version0.1
Date2026-03-27
Dependencies
EditorsPatrick

The Capability Model describes what an application can semantically present, observe, and execute.

It cleanly separates:

  • Roles: what a UI entity is
  • States: what state it is in
  • Affordances: which interactions it fundamentally offers
  • Actions: which standardized requests the agent can send
  • Risk: how strictly an action must be regulated
  • Success Signals: how success can be observed

This is the difference between “there is some button” and “this is video.submit, activatable, requires confirmation, success = route changes and a toast appears”.


  1. Capability values SHOULD be stable and machine-readable.
  2. Core values are reserved without prefix.
  3. Vendor-/app-specific extensions SHOULD begin with x., e.g. x.videoland.asset-card.
  4. A capability document MAY support only a subset of the core vocabulary; the actually supported values MUST be explicitly declared.
  5. Roles and affordances describe semantics, not concrete DOM structures or CSS.

type ProfileId = string; // e.g. "web@0.1"
type StableId = string; // stable, app-defined target ID
type ScopeId = string; // logical scope, e.g. "create-video-dialog"
type ActionId = string; // e.g. "ui.activate" or "video.create"
type TargetRef =
| { by: "stableId"; value: StableId }
| { by: "scope"; value: ScopeId }
| { by: "route"; value: string }
| { by: "semantic"; role: UIRole; name?: string; scope?: ScopeId }
| { by: "custom"; value: string };

UIRole is the canonical typing of visible or interactive UI objects.

type UIRole =
// App / Structure
| "app"
| "route"
| "region"
| "group"
| "form"
| "dialog"
| "drawer"
| "popover"
| "tabpanel"
// Navigation / Structured Selection
| "link"
| "menu"
| "menuitem"
| "tablist"
| "tab"
| "toolbar"
| "breadcrumb"
| "pagination"
// Collections
| "list"
| "listitem"
| "table"
| "row"
| "cell"
| "grid"
| "tree"
| "treeitem"
// Inputs / Controls
| "button"
| "textbox"
| "textarea"
| "searchbox"
| "combobox"
| "listbox"
| "option"
| "checkbox"
| "radio"
| "switch"
| "slider"
| "spinbutton"
| "datepicker"
| "timepicker"
| "fileinput"
| "label"
// Feedback / Status
| "alert"
| "toast"
| "status"
| "progress"
| "spinner"
// Media / Special Cases
| "image"
| "video"
| "audio"
| "canvas"
// Escape hatch
| "custom";
  • button: activatable command
  • textbox / textarea: editable text input
  • checkbox / radio / switch: discrete state selection
  • combobox / listbox / option: selection from candidates
  • dialog / drawer / popover: temporary UI surfaces
  • toast / alert / status: observable feedback signals
  • custom: only when no core role fits

UIState describes the observable state of an element or scope.

interface UIState {
visible?: boolean;
enabled?: boolean;
focusable?: boolean;
focused?: boolean;
editable?: boolean;
readonly?: boolean;
required?: boolean;
busy?: boolean;
loading?: boolean;
blocked?: boolean;
selected?: boolean;
checked?: boolean | "mixed";
pressed?: boolean;
expanded?: boolean;
open?: boolean;
modal?: boolean;
hovered?: boolean;
dragged?: boolean;
current?: false | "page" | "step" | "location" | "date" | "time";
invalid?: false | true | "grammar" | "spelling";
multiline?: boolean;
hasPopup?: false | "menu" | "listbox" | "dialog" | "grid" | "tree";
orientation?: "horizontal" | "vertical";
textValue?: string;
numericValue?: number;
min?: number;
max?: number;
step?: number;
placeholder?: string;
description?: string;
sensitive?: boolean;
obscured?: boolean;
}

A capability document SHOULD additionally declare which state keys are actively emitted.

type UIStateKey = keyof UIState;
  • visible=false does not automatically imply enabled=false.
  • blocked=true means: formally present but currently not usable.
  • sensitive=true marks data or controls that require special policy treatment.
  • obscured=true marks content that is covered or intentionally masked.

Affordances describe which type of interaction an element fundamentally offers.

type UIAffordance =
| "read"
| "focus"
| "activate"
| "edit"
| "choose"
| "toggle"
| "expand"
| "collapse"
| "open"
| "close"
| "scroll"
| "submit"
| "dismiss"
| "drag"
| "drop"
| "resize"
| "upload"
| "navigate"
| "invoke";
  • activate: generic activation, e.g. button, link, card
  • edit: text or value can be modified
  • choose: selection from options
  • toggle: binary or tri-state switch
  • invoke: domain-level app action beyond a simple click
  • Affordance = what an element can principally offer
  • Action = standardized request from the agent

A button typically has the activate affordance. The agent then sends e.g. ui.activate.


Actions are standardized executable operations.

type PrimitiveActionType =
| "ui.read"
| "ui.focus"
| "ui.highlight"
| "ui.hover"
| "ui.activate"
| "ui.enterText"
| "ui.clearText"
| "ui.setValue"
| "ui.choose"
| "ui.toggle"
| "ui.expand"
| "ui.collapse"
| "ui.open"
| "ui.close"
| "ui.scrollIntoView"
| "ui.scroll"
| "ui.submit"
| "ui.upload"
| "nav.navigate"
| "app.invoke";
type ExecutionMode =
| "appAction" // direct app function / registry
| "semanticUi" // semantic UI target addressing
| "inputSynthesis" // synthetic pointer/keyboard input
| "externalDriver" // external automation driver
| "visionAssist"; // vision-assisted fallback
interface ActionDescriptor {
id: ActionId; // "ui.activate" or "video.create"
kind: "primitive" | "domain";
title?: string;
description?: string;
targetKinds: Array<"element" | "scope" | "route" | "entity" | "session" | "none">;
requiredAffordances?: UIAffordance[];
executionModes: ExecutionMode[];
args?: ActionArgDescriptor[];
idempotency?: "idempotent" | "conditional" | "non_idempotent";
risk: RiskDescriptor;
success?: SuccessSignal[];
metadata?: Record<string, unknown>;
}
interface ActionArgDescriptor {
name: string;
type: "string" | "number" | "boolean" | "enum" | "object" | "array";
required?: boolean;
enum?: string[];
description?: string;
}

In addition to primitive actions, an application MAY declare domain-specific actions, e.g.:

  • video.create
  • workspace.setup
  • team.invite
  • billing.openSettings

These SHOULD be stable, dot-segmented, and named with domain semantics.


This is intentionally separated cleanly, because people otherwise tend to squeeze five entirely different risks into a single word.

RiskLevel controls executability.

type RiskLevel =
| "safe" // autonomously executable within granted scopes
| "confirm" // explicit user confirmation required
| "blocked"; // not automatically executable

RiskTag describes the domain-level cause or class of the risk.

type RiskTag =
| "sensitive_data"
| "destructive"
| "external_effect"
| "privileged"
| "billing"
| "security"
| "identity"
| "legal"
| "irreversible";
interface RiskDescriptor {
level: RiskLevel;
tags?: RiskTag[];
reason?: string;
}
  • safe: e.g. opening a route, switching a tab, writing a text suggestion into a draft
  • confirm: e.g. creating a record, sending an invitation, saving settings
  • blocked: e.g. changing a password, triggering a payment, deleting a user, irreversible publish action
  • When multiple risk tags apply, the strictest level wins.
  • blocked MUST NOT be executed autonomously.
  • confirm SHOULD require a human confirmation point before execution.

Success Signals are observable predicates, not wishes and not prompt-based superstition.

type SuccessSignal =
| { kind: "element.appeared"; target: TargetRef }
| { kind: "element.disappeared"; target: TargetRef }
| { kind: "element.state"; target: TargetRef; state: Partial<UIState> }
| { kind: "value.equals"; target: TargetRef; value: string | number | boolean }
| { kind: "route.changed"; pattern?: string; exact?: string }
| { kind: "dialog.opened"; target?: TargetRef }
| { kind: "dialog.closed"; target?: TargetRef }
| { kind: "toast.contains"; text: string }
| { kind: "status.contains"; text: string }
| { kind: "validation.none"; scope?: ScopeId }
| { kind: "collection.count"; target: TargetRef; op: "eq" | "gte" | "lte"; value: number }
| { kind: "entity.created"; entityType: string; idPath?: string }
| { kind: "entity.updated"; entityType: string; idPath?: string }
| { kind: "entity.deleted"; entityType: string; idPath?: string }
| { kind: "network.response"; urlPattern?: string; status?: number }
| { kind: "custom"; name: string; payload?: Record<string, unknown> };
  • Signals SHOULD be declarative and observable.
  • An action descriptor SHOULD contain at least one expected success signal.
  • custom SHOULD only be used when no core signal fits.
  • For critical domain actions, multiple success signals SHOULD be combined.

Example: video.create

  • route.changed to /videos/:id
  • toast.contains = "erstellt"
  • optionally entity.created = "video"

The capability document is the comprehensive description of active capabilities.

interface CapabilityDocument {
modelVersion: "0.1";
profile: ProfileId; // e.g. "web@0.1"
roles: UIRole[];
stateKeys: UIStateKey[];
affordances: UIAffordance[];
actions: ActionDescriptor[];
riskLevels: RiskLevel[];
riskTags?: RiskTag[];
successSignalKinds: string[];
metadata?: Record<string, unknown>;
}
  • roles, stateKeys, affordances, actions, riskLevels MUST be present.
  • An app MUST only declare values that are actually supported.
  • successSignalKinds SHOULD contain all kind values used in the document.

{
"modelVersion": "0.1",
"profile": "web@0.1",
"roles": [
"route",
"dialog",
"form",
"button",
"textbox",
"textarea",
"toast",
"status"
],
"stateKeys": [
"visible",
"enabled",
"focused",
"required",
"open",
"invalid",
"textValue",
"sensitive"
],
"affordances": [
"read",
"focus",
"activate",
"edit",
"submit",
"navigate",
"invoke"
],
"actions": [
{
"id": "ui.activate",
"kind": "primitive",
"targetKinds": ["element"],
"requiredAffordances": ["activate"],
"executionModes": ["appAction", "semanticUi", "inputSynthesis"],
"risk": { "level": "safe" }
},
{
"id": "ui.enterText",
"kind": "primitive",
"targetKinds": ["element"],
"requiredAffordances": ["edit"],
"executionModes": ["appAction", "semanticUi", "inputSynthesis"],
"args": [
{ "name": "text", "type": "string", "required": true }
],
"risk": { "level": "safe" }
},
{
"id": "video.create",
"kind": "domain",
"title": "Video erstellen",
"targetKinds": ["scope"],
"executionModes": ["appAction", "semanticUi"],
"args": [
{ "name": "title", "type": "string", "required": true },
{ "name": "useCase", "type": "string", "required": false }
],
"idempotency": "non_idempotent",
"risk": {
"level": "confirm",
"tags": ["external_effect"]
},
"success": [
{ "kind": "route.changed", "pattern": "/videos/:id" },
{ "kind": "toast.contains", "text": "erstellt" }
]
}
],
"riskLevels": ["safe", "confirm", "blocked"],
"riskTags": [
"sensitive_data",
"destructive",
"external_effect",
"privileged"
],
"successSignalKinds": [
"route.changed",
"toast.contains",
"element.state",
"validation.none"
]
}

  • [UIAP-CORE] UIAP Core v0.1
  • [RFC2119] Key words for use in RFCs to Indicate Requirement Levels, BCP 14
  • Capability documents MAY contain security-relevant information (e.g. available admin actions). Access to capability information SHOULD be restricted to authorized agents.
  • Risk levels MUST be declared correctly; incorrect classification can lead to unintended actions.
  • Sensitive fields SHOULD be marked as such in the capability document.
VersionDateChanges
0.12026-03-27Initial draft