AxlCpu — CPU exceptions + feature detection

Two CPU-level facilities under one axl_cpu_* namespace:

Typed exception handling — a backend-neutral wrapper over EFI_CPU_ARCH_PROTOCOL.RegisterInterruptHandler. Register a callback for a typed AxlCpuExceptionKind and receive a layout-stable AxlCpuException register snapshot, without spelling EFI_*.

Instruction-set feature detection + SIMD dispatchCPUID-based detection (cached) of the SSE family, FMA, and AVX/AVX2 on x86 (NEON on AArch64), plus axl_cpu_simd_tier() returning a single ordered value for kernel dispatch.

Note the firmware/UEFI specifics around SIMD state: the firmware already enables SSE state (its calling convention passes floats in XMM), so SSE/SSE3/SSE4 need no enabling — detection alone gates them. AVX adds YMM register state the firmware does not enable, so AVX/AVX2 trap with #UD until axl_cpu_enable_avx() performs the CPL0 CR4.OSXSAVE + XSETBV sequence (a UEFI app runs at CPL0, so it may). CR4/XCR0 are per-logical-processor — enable on each AP that runs AVX kernels, not just the BSP.

API Reference

Defines

AXL_CPU_ARCH_X64
AXL_CPU_ARCH_AA64
AXL_CPU_EXCEPTION_VERSION

Current AxlCpuException.version value emitted by the SDK. Bumped when the struct gains a new arm or extends an existing arm in a way that consumers care about. Pre-1.0 SDK: layout can move; consumers should test exc->version >= N before reading fields added in version N.

Typedefs

typedef void (*AxlCpuExceptionFn)(const AxlCpuException *exc, void *user)

CPU-exception callback signature.

Runs in firmware exception context — heap allocation, console I/O beyond axl_printf, and most SDK calls that allocate are unsafe. The callback typically captures register state into a pre-allocated buffer and either halts (for (;;) {}) or resumes via the firmware’s exception-return mechanism. The SDK does not return from the callback to user code.

Enums

enum AxlCpuExceptionKind

Kinds of CPU exception a consumer can register a handler for.

axl-cpu.h:

Typed CPU exception handling. Backend-neutral abstraction over UEFI’s EFI_CPU_ARCH_PROTOCOL.RegisterInterruptHandler and the arch-tagged-union EFI_SYSTEM_CONTEXT that comes with it.

Consumers register a callback for a typed AxlCpuExceptionKind; the callback receives a layout-stable AxlCpuException with the full register snapshot translated from the architecture-specific EFI_SYSTEM_CONTEXT_* arm. No consumer code needs to spell EFI_* to monitor CPU exceptions.

Availability is gated by EFI_CPU_ARCH_PROTOCOL — present on conformant DXE firmwares, absent on some embedded / pre-DXE contexts. axl_cpu_register_exception returns AXL_ERR with a warning to log domain "cpu"

if the protocol can’t be located; consumers handle that as “monitoring unavailable on this

firmware” rather than silently going un-monitored.

static void on_crash(const AxlCpuException *exc, void *user) {
    (void)user;
    axl_printf("CRASH at 0x%lx, kind=%d\n",
               (unsigned long)exc->instruction_ptr, exc->kind);
    for (;;) {}  // halt
}

axl_cpu_register_exception(AXL_CPU_EXCEPTION_GP_FAULT, on_crash, NULL);

Per-kind availability differs by architecture; trying to register an unavailable kind returns AXL_ERR.

Values:

enumerator AXL_CPU_EXCEPTION_DIVIDE_ERROR

x64 #DE

enumerator AXL_CPU_EXCEPTION_DEBUG

x64 #DB

enumerator AXL_CPU_EXCEPTION_OVERFLOW

x64 #OF

enumerator AXL_CPU_EXCEPTION_BOUND

x64 #BR

enumerator AXL_CPU_EXCEPTION_INVALID_OPCODE

x64 #UD

enumerator AXL_CPU_EXCEPTION_DEVICE_NA

x64 #NM

enumerator AXL_CPU_EXCEPTION_DOUBLE_FAULT

x64 #DF

enumerator AXL_CPU_EXCEPTION_SEGMENT_NP

x64 #NP

enumerator AXL_CPU_EXCEPTION_STACK_FAULT

x64 #SS

enumerator AXL_CPU_EXCEPTION_GP_FAULT

x64 #GP

enumerator AXL_CPU_EXCEPTION_PAGE_FAULT

x64 #PF

enumerator AXL_CPU_EXCEPTION_FP_ERROR

x64 #MF

enumerator AXL_CPU_EXCEPTION_ALIGNMENT_CHECK

x64 #AC

enumerator AXL_CPU_EXCEPTION_SIMD

x64 #XM

enumerator AXL_CPU_EXCEPTION_SYNCHRONOUS

aa64 synchronous-exception umbrella

enumerator AXL_CPU_EXCEPTION_SERROR

aa64 SError

enumerator AXL_CPU_EXCEPTION_KIND_MAX

exclusive upper bound

enum AxlSimdTier

SIMD dispatch tier — a single ordered value naming the best usable kernel.

Monotonic: a higher value is a strict superset of the work a lower one can do, so a dispatcher picks the highest tier for which it has a kernel. AVX2 is only reported once axl_cpu_enable_avx has succeeded — querying the tier never changes CPU state on its own.

Values:

enumerator AXL_SIMD_SCALAR

no SIMD (not expected on our targets)

enumerator AXL_SIMD_BASELINE

128-bit: SSE2 (x86) or NEON (aarch64); always available

enumerator AXL_SIMD_SSE41

x86 SSE4.1 — 128-bit, richer pixel ops; detection only

enumerator AXL_SIMD_AVX2

x86 AVX2 — 256-bit; requires axl_cpu_enable_avx()

Functions

int axl_cpu_register_exception(AxlCpuExceptionKind kind, AxlCpuExceptionFn cb, void *user)

Register an exception handler for kind.

Internally locates EFI_CPU_ARCH_PROTOCOL (cached after first call), maps kind onto the arch-specific EFI_EXCEPTION_TYPE, and registers a thunk that translates EFI_SYSTEM_CONTEXT into AxlCpuException before invoking cb.

A second axl_cpu_register_exception call for the same kind replaces the previous handler.

Returns:

AXL_OK on success; AXL_ERR if cb is NULL, kind is out of range, kind is not available on the current arch, or EFI_CPU_ARCH_PROTOCOL is not published.

int axl_cpu_unregister_exception(AxlCpuExceptionKind kind)

Unregister a previously-installed handler.

Safe to call on a kind that was never registered (no-op).

Returns:

AXL_OK on success, AXL_ERR if kind is out of range or EFI_CPU_ARCH_PROTOCOL is not published.

const AxlCpuFeatures *axl_cpu_features(void)

Query detected CPU features (cached after first call).

Pure detection — never changes CPU state. The returned pointer is to SDK-owned static storage valid for the program’s lifetime; never NULL.

Returns:

pointer to the cached feature set.

bool axl_cpu_enable_avx(void)

Enable AVX (YMM) register state so AVX/AVX2 instructions run without a #UD fault.

UEFI firmware enables SSE state (the calling convention needs XMM) but does not enable AVX state, so AVX instructions trap until a CPL0 caller sets CR4.OSXSAVE and the AVX bits in XCR0. A UEFI application runs at CPL0, so it may do this itself; this routine performs the sequence (CPUID-gated) once and is idempotent.

Per-logical-processor. CR4/XCR0 are per-CPU; code that runs AVX kernels on application processors (via MP services) must call this on each AP as well as the BSP.

No-op returning false when the CPU lacks AVX (or on non-x86), leaving axl_cpu_simd_tier at AXL_SIMD_SSE41/BASELINE.

Returns:

true if AVX is usable after the call (already-enabled counts), false if the CPU has no AVX to enable.

bool axl_cpu_enable_avx512(void)

Enable AVX-512 (opmask + ZMM) register state.

The AVX-512 counterpart to axl_cpu_enable_avx: sets CR4.OSXSAVE and the XCR0 bits for x87 + SSE + AVX and the AVX-512 state components (opmask, ZMM_Hi256, Hi16_ZMM), so EVEX-encoded AVX-512 instructions run without a #UD. Implies AVX enable. Gated on the CPU advertising AVX-512F and the XSAVE state components being supported; idempotent; per-logical-processor (same caveat as axl_cpu_enable_avx).

Note: axl_cpu_simd_tier tops out at AXL_SIMD_AVX2 — that is the widest tier AXL’s own kernels use. AVX-512 is exposed for consumers who write their own AVX-512 code: check avx512f et al. in axl_cpu_features, call this to enable, then run.

Returns:

true if AVX-512 is usable after the call (already-enabled counts), false if the CPU has no AVX-512 to enable (or non-x86).

AxlSimdTier axl_cpu_simd_tier(void)

The best SIMD tier usable right now for kernel dispatch.

x86: AXL_SIMD_AVX2 if AVX2 is present and enabled (call axl_cpu_enable_avx first), else AXL_SIMD_SSE41 if SSE4.1 is present, else AXL_SIMD_BASELINE (SSE2, always). aarch64: AXL_SIMD_BASELINE (NEON). Does not change CPU state.

Returns:

the highest currently-usable AxlSimdTier.

struct AxlCpuException
#include <axl-cpu.h>

Architecture-neutral CPU exception context delivered to consumer callbacks.

The base fields (fault_address through error_code) are arch-neutral and always populated. The register snapshot lives in the regs union; consumers branch on arch to pick the correct arm.

ABI growth. struct_size is the byte count the SDK wrote for this instance (sizeof(AxlCpuException) at the SDK’s build time); version is AXL_CPU_EXCEPTION_VERSION. Consumers targeting a forward range of SDKs guard reads of late-added fields with

if (exc->struct_size >= offsetof(AxlCpuException,

new_field) + sizeof(exc->new_field))

or if (exc->version >= <since-version>). Existing fields never move; the union arms grow append-only.

Field semantics:

  • fault_address — meaningful for memory-access faults (x64 #PF = CR2; aa64 sync = FAR_EL1). Zero for kinds where no fault address applies (#DE, #GP without memory, #UD, …).

  • error_code — exception-specific: x64 #PF / #GP / #DF / #NP / #SS / #AC carry an error code pushed by the CPU; for other kinds, 0. aa64 carries ESR_EL1 here so consumers can recover finer-grained classification on synchronous exceptions (EC field) without a separate accessor.

Public Members

uint32_t struct_size

sizeof(AxlCpuException) as written by the SDK

uint32_t version

AXL_CPU_EXCEPTION_VERSION at emit time.

AxlCpuExceptionKind kind
int arch

AXL_CPU_ARCH_X64 / _AA64.

uint64_t fault_address

memory-fault address, or 0 if N/A

uint64_t instruction_ptr

RIP / ELR.

uint64_t stack_ptr

RSP / SP.

uint64_t frame_ptr

RBP / X29.

uint64_t error_code

exception-specific; aa64 carries ESR_EL1

uint64_t rax
uint64_t rbx
uint64_t rcx
uint64_t rdx
uint64_t rsi
uint64_t rdi
uint64_t rbp
uint64_t rsp
uint64_t r8
uint64_t r9
uint64_t r10
uint64_t r11
uint64_t r12
uint64_t r13
uint64_t r14
uint64_t r15
uint64_t rip
uint64_t rflags
uint64_t cr2

raw CR2 — meaningful only when kind == PAGE_FAULT

struct AxlCpuException x64
uint64_t x[31]

X0..X28, X29 (=FP), X30 (=LR)

uint64_t sp
uint64_t elr
uint64_t spsr
uint64_t esr

raw ESR_EL1 (also in error_code)

uint64_t far

raw FAR_EL1 (also in fault_address)

struct AxlCpuException aa64
union AxlCpuException regs
struct AxlCpuFeatures
#include <axl-cpu.h>

Detected CPU instruction-set features.

Filled once from CPUID (x86) on first query and cached. Fields for the other architecture are always false — read neon on aarch64, the x86 fields on x86. These report what the CPU can execute; for AVX, “can execute” still requires a one-time state enable (see axl_cpu_enable_avx) before the YMM registers are usable without a #UD fault.

Most consumers want axl_cpu_simd_tier (a single ordered value for kernel dispatch) rather than these individual bits.

Public Members

bool sse2

SSE2 (always present on x86-64; firmware enables XMM state)

bool sse3

SSE3.

bool ssse3

SSSE3 (PSHUFB — byte shuffles)

bool sse41

SSE4.1 (PMOVZX / PBLENDVB / ROUNDPS — no state enable needed)

bool sse42

SSE4.2 (also CRC32 instruction)

bool fma

FMA3 fused multiply-add.

bool xsave

XSAVE/XGETBV/XSETBV present (precondition for enabling AVX)

bool avx

AVX (256-bit) — usable only after axl_cpu_enable_avx()

bool avx2

AVX2 (256-bit integer) — usable only after enable.

bool avx512f

AVX-512 Foundation — usable only after axl_cpu_enable_avx512()

bool avx512dq

AVX-512 Doubleword/Quadword.

bool avx512bw

AVX-512 Byte/Word.

bool avx512vl

AVX-512 Vector Length extensions.

bool avx512cd

AVX-512 Conflict Detection.

bool avx512vnni

AVX-512 Vector Neural Network Instructions.

bool aes

AES-NI (AESENC/AESDEC …)

bool pclmulqdq

carry-less multiply (GHASH / GCM)

bool sha

SHA-NI (SHA1/SHA256 round instructions)

bool vaes

vectorized AES (VEX/EVEX-encoded AES on 256/512-bit)

bool vpclmulqdq

vectorized carry-less multiply

bool popcnt

POPCNT instruction.

bool bmi1

BMI1 (ANDN, BLSR, TZCNT …)

bool bmi2

BMI2 (BZHI, PDEP, PEXT, MULX …)

bool lzcnt

LZCNT (leading-zero count; ABM)

bool movbe

MOVBE (load/store with byte swap)

bool f16c

F16C (half-precision float <-> single convert)

bool adx

ADX (ADCX/ADOX multiprecision add)

bool rdrand

RDRAND (on-chip RNG)

bool rdseed

RDSEED (seed-grade on-chip RNG)

bool neon

AdvSIMD/NEON (always present on ARMv8-A baseline)

bool fp16

half-precision floating point (FEAT_FP16)

bool atomics

Large System Extensions (FEAT_LSE atomic instructions)

bool crc32

CRC32 instructions (FEAT_CRC32)

bool aes_a64

AES instructions (FEAT_AES)

bool pmull

polynomial multiply long (FEAT_PMULL — GHASH)

bool sha1

SHA1 instructions (FEAT_SHA1)

bool sha2

SHA-256 instructions (FEAT_SHA256)

bool sha512

SHA-512 instructions (FEAT_SHA512)

bool sha3

SHA3 instructions (FEAT_SHA3)

bool dotprod

dot-product instructions (FEAT_DotProd)

bool sve

Scalable Vector Extension (FEAT_SVE)