Network Display Protocol Design (axl-display)
axl-display — Network Display Protocol Design
Status: design — not started. Target: server in axl-sdk
(src/display/); client library hosted-build target, location TBD
when implementation starts (likely a sibling repo since axl-sdk is
freestanding UEFI, but no commitment yet).
This doc establishes the design before implementation; supersedes
the Phase R stub in ROADMAP.md.
Goal
A network protocol that lets a remote process drive a UEFI-side display server. The remote client says “fill rectangle, draw text, present buffer”; the UEFI side renders against its local GOP framebuffer. Mouse / keyboard / touch events flow the other way.
X-shaped (the client/server boundary is the network boundary by design), not Wayland-shaped (compositor-and-clients-share-memory is the wrong model when the parties are on different machines).
The protocol is extension-first: a small core covers basic remote rendering and input; everything else — including the screen-mirror (“VNC-shape”) capability — slots in as a named extension. Designed-in extensibility is the X11 lesson worth keeping; the 1980s opcode- allocation mechanics are not.
Non-goals
Not Wayland-shape. Buffer-passing over a network is the wrong bandwidth/latency profile.
Not OpenGL / GLX. UEFI has no GPU driver stack; remote-3D is not a use case worth designing for.
Not an OS-level windowing system. No multi-app coexistence on the same display; pre-boot UEFI typically has one app foregrounded.
Not a font server. Fonts ship with the server (axl-gfx’s built- in
AxlFonttable) or are uploaded by the client; no separate font-service protocol.Not transport-agnostic. TCP only for v0.1 — that’s what UEFI has. Unix sockets / shared memory are not in the picture.
Layering
Remote process (Linux / macOS / Windows / another UEFI host)
│
▼
libaxl-display
│ protocol over TCP
▼
axl-display (axl-sdk)
│ │ │
▼ ▼ ▼
axl-gfx axl-input axl-loop
│
▼
UEFI GOP + Simple Pointer + Simple Text Input
Server side lives in src/display/ as a peer to src/gfx/ and
src/input/. Runs on axl-loop, draws via axl-gfx, reads input
via axl-input. Adds nothing to those substrates — same
discipline rule as AGT
(AGT-Design.md §”Substrate discipline rules”).
Client side is a normal Linux / macOS / Windows shared library — hosted code, can’t share axl-sdk’s freestanding-UEFI build. When implementation starts, the client probably lives outside axl-sdk proper (sibling repo, or its own subtree with a host toolchain); the decision is deferred until R1a actually needs it. Reference language is C; idiomatic bindings (Python, Rust, C++) sit on top of the C ABI in the same location.
Names follow X11’s role split — server / client library / reference tool — without inventing a protocol acronym:
axl-display— the UEFI server (binary:axl-display.efi)libaxl-display— the host-side client libraryaxl-display-viewer— reference CLI tool that connects + opens a host window mirroring the remote framebuffer (Phase R6)The wire format itself is “the axl-display protocol” — no acronym, mirroring how “the Redis protocol” or “the HTTP/2 protocol” is referenced without a TLA.
Toolkits that target axl-display (e.g. AGT’s hypothetical remote-
backend) sit on top of libaxl-display, not on top of the raw wire
format.
Wire protocol
Framing
Length-prefixed TLV records on a single bidirectional TCP stream. Each record:
+---------+---------+---------+---------+
| length (u32) | type-tag (u16) |
+---------+---------+---------+---------+
| flags (u16) | payload … |
+---------+---------+---------+---------+
lengthcovers the whole record including header.type-tagidentifies the request / event / reply kind.flagsreserved for per-record metadata (compression hint, fragment indicator, reply-expected bit).
Records are sent without explicit per-record acknowledgement; the transport is request-stream + reply-stream + event-stream multiplexed onto one TCP connection, with reply records correlated to requests by a client-supplied sequence number in the payload (X11 shape).
Type-tag namespace
Type-tags are interned strings. At connection setup the client and
server exchange a list of (namespace, name) pairs they intend to
use; each side returns a u16 tag the other should use thereafter.
Tag 0 is reserved for the framework itself
(QueryExtension, ListExtensions, RegisterTag, error reply).
core/poly_fill_rect → tag 17 (assigned by server at handshake)
core/draw_text → tag 18
mirror/get_region → tag 64 (after QueryExtension("mirror"))
mirror/damage_event → tag 65
Why string-namespaced, not opcode-allocated: X11’s 1-byte opcode range (128–255 reserved for extensions) is a 1980s memory constraint. With ~16 bytes of one-time setup cost, extensions get a flat 65 535-tag space and no central allocation authority. Cost- recovery on the wire is identical to a fixed-opcode design after handshake.
Request / event / reply correlation
Each request carries a u32 sequence number. Replies and errors
echo it; events carry sequence 0. This is the X11 model; clients
chain requests asynchronously and match replies as they arrive.
Resource model
Drawables, graphics contexts, and uploaded resources (fonts,
glyphs, pixmaps) are referenced by client-allocated 32-bit IDs.
Each session gets an ID range ((client_id << 24) | resource_id)
to prevent collisions across multiple clients. X11’s resource-ID
discipline carries over verbatim — it’s a good answer.
Resource |
Maps to |
Notes |
|---|---|---|
Drawable |
|
The screen is drawable-id 0 |
GC |
server-side state |
Color, font, clip stack |
Font |
|
Built-in name OR uploaded glyphs |
Pixmap |
|
Same as drawable; offscreen targets |
A graphics context (GC) is server-side mutable state: color, font,
clip, draw target. Requests don’t repeat these on every call —
clients update the GC, then send draws that reference it. Saves
bytes on the wire and mirrors axl-gfx’s native call shape.
Core drawing requests
Mechanical serialization of <axl/axl-gfx.h>, split across two
phases by complexity:
Phase R1a — minimal interactive subset:
Request |
axl-gfx call |
Notes |
|---|---|---|
|
server-side |
initial defaults |
|
server-side GC update |
|
|
server-side GC update |
built-in fonts only in R1a |
|
|
batched |
|
|
|
|
|
full-screen flush |
R1a targets the screen drawable only; no off-screen buffers, no clipping, no full GC. Just enough to land an end-to-end “Hello world from Linux to QEMU UEFI” client demo.
Phase R1b — full drawing surface:
Request |
axl-gfx call |
Notes |
|---|---|---|
|
|
off-screen buffer |
|
|
|
|
|
|
|
|
flat in protocol, stack server-side |
|
|
|
|
|
outlines |
|
|
|
|
|
reply-bearing |
|
|
Each request is two-way only if it returns data (get_image); fire-
and-forget otherwise. Errors are asynchronous reply records that
reference the offending request’s sequence number.
Input events (Phase R2)
Server→client event records carrying serialized AxlInputEvent.
Subscription is per-session: client sends select_input(mask) once
with the event-type bitmask it wants, and the server pushes matching
events as axl-input produces them.
Subscription bits map 1:1 to AxlInputType (mouse_move, key_down,
touch_*, etc.). No coordinate translation server-side — the client
sees device-native coordinates and rescales itself, matching axl-
input’s existing contract.
Extension framework
A single core mechanism, used the same way by every extension:
1. Client: query_extension("mirror")
2. Server: extension_info{ name="mirror", version=1, base_tag=64, ok=true }
3. Client: register_tag("mirror/get_region") → server returns tag 64
4. Client: register_tag("mirror/damage_event") → server returns tag 65
5. Client uses tags 64/65 in subsequent records
Extensions are compiled into the server build at link time; runtime
extension loading is out of scope for v0.1 (UEFI has no dlopen).
query_extension returns ok=false when an extension isn’t
present, and clients are responsible for graceful degradation.
Worked example: mirror extension (Phase R4)
The “VNC-shape” / screen-capture use case modeled as an extension:
Request / event |
Maps to |
Notes |
|---|---|---|
|
|
snapshot |
|
server-side dirty tracker |
|
|
server → client |
dirty notification |
|
|
mouse/key replay |
A client that only wants RFB-style behavior registers mirror,
never touches core drawing, and gets the screen-mirror experience.
A client that only wants remote drawing registers core, never
touches mirror, and gets the X-shape experience. Both run against
the same server simultaneously.
This is why the extension framework matters from day 1 — without it, the two use cases would be two different protocols.
Future extensions (sketches)
Listed for design pressure on the core, not committed to:
font/upload(glyphs[])— client-side fonts uploaded asAxlGlypharrays; server constructs anAxlFontand assigns a font-ref.cursor/set(image, hotspot)— software cursor for displays without hardware cursor support.shm/import_buffer(...)— local-only (Unix socket transport) zero-copy pixel transfer. Not applicable to UEFI but reserved in case the client lib someday targets a different server.
Authentication & security
UEFI is pre-boot. The threat model is not desktop X11’s; it’s closer to BMC virtual-console. Anyone who can speak to the server can inject keystrokes into the firmware console — that’s the high-value capability the protocol exposes. Network-attached input injection is the whole point of the protocol AND its primary risk.
Layered options, simplest first
Layer 0 — bind discipline (v0.1 default). Server binds to
127.0.0.1 by default. Network exposure requires explicit
--bind <iface> (e.g. --bind 0.0.0.0 or --bind eth0). When
bound to a non-loopback interface, an optional --allowlist <ip,…>
restricts accepted source IPs. No in-protocol auth at this layer —
intended for management-network / BMC-sideband deployments where the
carrier is already restricted. Defaults shape: same as PostgreSQL or
Redis (localhost-only out of the box). Operators who expose to the
network are making an explicit choice.
Layer 1 — MIT-MAGIC-COOKIE-shape (Phase R5). Server generates a 128-bit random cookie at startup, stores it in NVRAM, prints it prominently on the console boot log for out-of-band retrieval, and exposes a runtime rotation primitive (admin regenerates; old cookie invalidated). Client must present a matching cookie in the connection-setup record. Defeats casual scanning; trusts the channel against active MITM. Same security profile as X11 over an unencrypted socket. Cost: trivial server-side code. Recommended for network-bound deployments once R5 ships.
Layer 2 — PSK challenge-response (Phase R5+ when consumer asks). Server holds a long-lived secret (provisioned via firmware setup or NVRAM). Client proves knowledge via HMAC challenge-response without sending the secret on the wire. SSH-host-key shape. Defeats passive sniffing of the cookie. Cost: HMAC-SHA-256 (axl-digest already has this) + a few hundred lines for the handshake.
Layer 3 — mTLS (deferred indefinitely). axl-tls exists (opt-
in), so this is buildable, but UEFI cert provisioning, expiration
handling, and revocation are operational nightmares. Don’t promise
this until a consumer specifically asks AND brings their own cert
management story.
Per-session resource limits
Independent of authentication: a connected client can exhaust UEFI memory by allocating drawables. v0.1 caps:
Drawables per session: 16
Total pixel memory per session: 64 MiB
GC count per session: 64
Request rate cap: 1000 records/sec (anti-busy-loop)
Limits are server-side configurable and refusal is signalled via the existing error-reply mechanism.
Input injection auditing
Every key/pointer event the server synthesizes via the protocol is
emittable to axl-log (config-gated, off by default since pre-boot
log persistence is unreliable). When on, downstream tooling can
distinguish “user pressed key” from “remote client injected key” in
post-mortem analysis.
Threats considered but not mitigated in v0.1
Eavesdropping — Layer 0/1 don’t encrypt. Move to Layer 3 when there’s a consumer who needs it.
Active MITM — Layer 0/1/2 don’t authenticate the server. TLS solves it; pre-shared server fingerprint is a poorer alternative.
Compromised UEFI firmware speaking the protocol — out of scope; pre-boot firmware integrity is Secure Boot’s job, not ours.
Phased plan
Phases are sized to land independently. Each one results in something testable on its own. End-to-end interactivity (drawing + input) lands by R2 so feedback-rich iteration starts as early as possible.
Phase R0 — wire framing + handshake + framework
<axl/axl-display.h> declarations, src/display/ skeleton. All the
machinery that R1+ shouldn’t have to invent.
TLV framing (encode / decode round-trip tests)
Connection handshake — exchanges byte order, protocol-version range, and a server capability record:
max record size accepted
framebuffer dimensions + supported pixel formats
server software version
extension list (built-in)
Tag-interning mechanism (
register_tag, lookup tables both ways)Async sequence-number-correlated reply machinery
Error-reply framing — common record type, numeric code + symbolic name + sequence-number reference + extension-specific payload area. Code space partitioned by namespace (
core/*,<extension>/*) so extensions don’t collide.Graceful disconnect — explicit
byerecord vs. raw socket close; server distinguishes “client done” from “client crashed” for logging and per-session cleanup.Layer-0 auth: default bind to
127.0.0.1;--bind <iface>opens to the network; optional--allowlist <ip,…>for network binds.Server binds to an axl-loop source; one session = one TCP connection; per-session state owned by the loop source.
Test harness: tiny in-tree client that does handshake + list-extensions + clean disconnect.
No drawing yet. The win is “you can connect to a UEFI box, exchange capabilities, and disconnect cleanly — and the wire format pins all the framework concerns the later phases would otherwise re-invent.”
Phase R1a — minimal drawing (end-to-end demo)
Smallest viable serialization of axl-gfx: fill_rect, draw_text,
present_screen, plus the GC color + font setters. Screen drawable
only, no clip, no buffers.
GC server-side state (color + font, R1a subset)
Resource ID allocation + collision detection (for GCs and the font-ref namespace, even though R1a only uses built-in fonts)
Per-session resource limits (GCs, request rate)
Client-library prototype (C, hosted build): connect + sample app draws “Hello from axl-display” to a remote UEFI box
Visual proof: client running on Linux draws to a QEMU UEFI display via the protocol.
Phase R2 — input event channel
Pull input forward of full drawing — the next interactive piece is more valuable than the rest of the drawing surface.
Server-side
axl-inputintegration; subscription viaselect_input(mask)bitmaskEvent-record framing on the server-event side
Per-session input subscription state (multi-client safe)
Client-library callback API for event delivery
Roundtrip test: client clicks mouse, server reports the event; client presses key, server reports the keycode
Phase R1b — full drawing surface
The rest of axl-gfx not in R1a: off-screen drawables / target
switching, full clip stack, line / polyline / rect-outline,
put_image / get_image, multi-drawable present.
create_drawable/free_drawable/targetset_gc_clipflat-in-protocol → push/pop stack server-sideRemaining draw primitives + the reply-bearing
get_imagePer-session resource limits extended (drawable count, pixel memory cap)
Client-library API extended; sample app demonstrates back-buffer compositing over the wire
Phase R3 — first non-trivial extension
Proves the extension framework works end-to-end with something
useful AND not as fragile as mirror. Candidate: font
extension — client uploads AxlGlyph[] + font metadata; server
constructs an AxlFont and returns a font-ref usable in
set_gc_font. Forces the framework to handle extension-allocated
resources and extension-specific replies.
Phase R4 — mirror extension (VNC-shape capability)
The future-option-1 use case as a proper extension.
mirror/get_region— wrapsaxl_gfx_captureServer-side dirty-rect tracker (instrumented inside drawing primitives —
fill_rect/draw_textetc. accumulate dirty regions when a mirror subscription is active; zero overhead when no subscriber). Implementation strategy (always-on cost vs. hook-pointer installed on first subscribe) is an open question.mirror/damage_event— pushed to subscribers on present / timeoutmirror/inject_input— synthesizesAxlInputEventvia axl-input’s yet-to-be-designed injection API (Phase R4 includes adding that to axl-input; substrate-discipline-compatible because it’s a pure C primitive).Reference client: a minimal RFB-shape viewer that connects, gets a mirror, and reports input back. Not a full RFB bridge in v0.1 — that’s downstream work.
Phase R5 — authentication hardening
Two layers shipped, plus operational hygiene.
Layer 1 — MIT-MAGIC-COOKIE-shape: server generates a 128-bit cookie at startup; client presents matching cookie in connection setup. Cookie stored in NVRAM and printed prominently on console boot so admins can retrieve it out-of-band.
Layer 2 — PSK challenge-response (HMAC-SHA-256): designed + stubbed. Full implementation when a consumer asks.
Session idle timeout: configurable; disconnect after N minutes of no traffic. Cookie remains valid; client reconnects.
Cookie rotation primitive: admin can regenerate at runtime; old cookie invalidated immediately.
Input-injection audit log (axl-log, opt-in): every protocol-injected
AxlInputEventis logged with session id so post-mortem can distinguish remote injection from local input.
Phase R6 — language bindings + tooling
libaxl-display1.0 (C library polish)Python bindings (ctypes or cffi)
Rust bindings (manual; bindgen if it cooperates)
axl-display-viewerreference CLI tool — connects + opens a window on the host display (GTK or SDL2) mirroring the remote UEFI framebuffer via themirrorextension
Scheduling vs. AGT
Phase R is independent of and parallel to AGT phases.
R does not block AGT: AGT v0.1 is local-only and doesn’t care whether anyone is also remote-driving the same display.
AGT does not block R: the protocol serializes
axl-gfx, which is already shipped.Practical sequencing: R is lower priority than Phase 1 (axlmm CPP1 validation) and Phase 2 (AGT bootstrap) because both of those have closer consumers. R picks up opportunistically or when a real consumer of remote display emerges.
Open questions
Backpressure. Client floods server with draws faster than GOP can keep up — TCP backpressure handles the network, but the server’s
axl-loopdispatch can still build up. Need a server- side “high-water mark” that stops reading from the socket until the dispatch queue drains. Cleaner than per-request flow control.Damage tracking overhead. Instrumenting every
axl-gfxprimitive with dirty-rect updates costs cycles even when no mirror subscriber exists. Two paths: (a) accept the cost always, (b) install a “damage hook” pointer that the mirror extension sets when first subscribed and clears on last unsubscribe. (b) is cleaner; (a) is simpler. Decide at R4 with benchmarks.GC stacking vs. flat. X11 has flat GCs (every change is a set, no push/pop). axl-gfx has push/pop for clip. The protocol could expose either; flat is simpler but loses the clip-stack efficiency that AGT will want. Probably: flat GC for color/font, push/pop wrapper for clip via dedicated requests.
Compression. Extension hook reserved (
flagsfield in TLV header) for per-record compression. Not in any planned phase; picked up if real-world bandwidth measurements demand it.Multi-display. UEFI systems with multiple GOP handles (multi-output cards, multi-headed servers). v0.1 binds to the first GOP handle. Extension for multi-display selection is drafted only when a consumer needs it.
Authentication bootstrap. How does the cookie get from the UEFI server to a remote client the first time? Options: print on serial console, store in NVRAM readable by a UEFI shell helper that prints it on demand, push out-of-band via BMC. Probably need all three documented as deployment recipes.