AxlCpu — CPU exceptions + feature detection
Two CPU-level facilities under one axl_cpu_* namespace:
Typed exception handling — a backend-neutral wrapper over
EFI_CPU_ARCH_PROTOCOL.RegisterInterruptHandler. Register a callback
for a typed AxlCpuExceptionKind and receive a layout-stable
AxlCpuException register snapshot, without spelling EFI_*.
Instruction-set feature detection + SIMD dispatch — CPUID-based
detection (cached) of the SSE family, FMA, and AVX/AVX2 on x86 (NEON on
AArch64), plus axl_cpu_simd_tier() returning a single ordered value
for kernel dispatch.
Note the firmware/UEFI specifics around SIMD state: the firmware
already enables SSE state (its calling convention passes floats in XMM),
so SSE/SSE3/SSE4 need no enabling — detection alone gates them. AVX adds
YMM register state the firmware does not enable, so AVX/AVX2 trap
with #UD until axl_cpu_enable_avx() performs the CPL0
CR4.OSXSAVE + XSETBV sequence (a UEFI app runs at CPL0, so it
may). CR4/XCR0 are per-logical-processor — enable on each AP
that runs AVX kernels, not just the BSP.
API Reference
Defines
-
AXL_CPU_ARCH_X64
-
AXL_CPU_ARCH_AA64
-
AXL_CPU_EXCEPTION_VERSION
Current
AxlCpuException.versionvalue emitted by the SDK. Bumped when the struct gains a new arm or extends an existing arm in a way that consumers care about. Pre-1.0 SDK: layout can move; consumers should testexc->version >= Nbefore reading fields added in version N.
Typedefs
-
typedef void (*AxlCpuExceptionFn)(const AxlCpuException *exc, void *user)
CPU-exception callback signature.
Runs in firmware exception context — heap allocation, console I/O beyond
axl_printf, and most SDK calls that allocate are unsafe. The callback typically captures register state into a pre-allocated buffer and either halts (for (;;) {}) or resumes via the firmware’s exception-return mechanism. The SDK does not return from the callback to user code.
Enums
-
enum AxlCpuExceptionKind
Kinds of CPU exception a consumer can register a handler for.
axl-cpu.h:
Typed CPU exception handling. Backend-neutral abstraction over UEFI’s
EFI_CPU_ARCH_PROTOCOL.RegisterInterruptHandlerand the arch-tagged-unionEFI_SYSTEM_CONTEXTthat comes with it.Consumers register a callback for a typed
AxlCpuExceptionKind; the callback receives a layout-stableAxlCpuExceptionwith the full register snapshot translated from the architecture-specificEFI_SYSTEM_CONTEXT_*arm. No consumer code needs to spellEFI_*to monitor CPU exceptions.Availability is gated by
EFI_CPU_ARCH_PROTOCOL— present on conformant DXE firmwares, absent on some embedded / pre-DXE contexts.axl_cpu_register_exceptionreturnsAXL_ERRwith a warning to log domain"cpu"if the protocol can’t be located; consumers handle that as “monitoring unavailable on this
firmware” rather than silently going un-monitored.
static void on_crash(const AxlCpuException *exc, void *user) { (void)user; axl_printf("CRASH at 0x%lx, kind=%d\n", (unsigned long)exc->instruction_ptr, exc->kind); for (;;) {} // halt } axl_cpu_register_exception(AXL_CPU_EXCEPTION_GP_FAULT, on_crash, NULL);
Per-kind availability differs by architecture; trying to register an unavailable kind returns
AXL_ERR.Values:
-
enumerator AXL_CPU_EXCEPTION_DIVIDE_ERROR
x64 #DE
-
enumerator AXL_CPU_EXCEPTION_DEBUG
x64 #DB
-
enumerator AXL_CPU_EXCEPTION_OVERFLOW
x64 #OF
-
enumerator AXL_CPU_EXCEPTION_BOUND
x64 #BR
-
enumerator AXL_CPU_EXCEPTION_INVALID_OPCODE
x64 #UD
-
enumerator AXL_CPU_EXCEPTION_DEVICE_NA
x64 #NM
-
enumerator AXL_CPU_EXCEPTION_DOUBLE_FAULT
x64 #DF
-
enumerator AXL_CPU_EXCEPTION_SEGMENT_NP
x64 #NP
-
enumerator AXL_CPU_EXCEPTION_STACK_FAULT
x64 #SS
-
enumerator AXL_CPU_EXCEPTION_GP_FAULT
x64 #GP
-
enumerator AXL_CPU_EXCEPTION_PAGE_FAULT
x64 #PF
-
enumerator AXL_CPU_EXCEPTION_FP_ERROR
x64 #MF
-
enumerator AXL_CPU_EXCEPTION_ALIGNMENT_CHECK
x64 #AC
-
enumerator AXL_CPU_EXCEPTION_SIMD
x64 #XM
-
enumerator AXL_CPU_EXCEPTION_SYNCHRONOUS
aa64 synchronous-exception umbrella
-
enumerator AXL_CPU_EXCEPTION_SERROR
aa64 SError
-
enumerator AXL_CPU_EXCEPTION_KIND_MAX
exclusive upper bound
-
enumerator AXL_CPU_EXCEPTION_DIVIDE_ERROR
-
enum AxlSimdTier
SIMD dispatch tier — a single ordered value naming the best usable kernel.
Monotonic: a higher value is a strict superset of the work a lower one can do, so a dispatcher picks the highest tier for which it has a kernel. AVX2 is only reported once
axl_cpu_enable_avxhas succeeded — querying the tier never changes CPU state on its own.Values:
-
enumerator AXL_SIMD_SCALAR
no SIMD (not expected on our targets)
-
enumerator AXL_SIMD_BASELINE
128-bit: SSE2 (x86) or NEON (aarch64); always available
-
enumerator AXL_SIMD_SSE41
x86 SSE4.1 — 128-bit, richer pixel ops; detection only
-
enumerator AXL_SIMD_AVX2
x86 AVX2 — 256-bit; requires axl_cpu_enable_avx()
-
enumerator AXL_SIMD_SCALAR
Functions
-
int axl_cpu_register_exception(AxlCpuExceptionKind kind, AxlCpuExceptionFn cb, void *user)
Register an exception handler for
kind.Internally locates
EFI_CPU_ARCH_PROTOCOL(cached after first call), mapskindonto the arch-specificEFI_EXCEPTION_TYPE, and registers a thunk that translatesEFI_SYSTEM_CONTEXTintoAxlCpuExceptionbefore invokingcb.A second
axl_cpu_register_exceptioncall for the samekindreplaces the previous handler.- Returns:
AXL_OK on success; AXL_ERR if
cbis NULL,kindis out of range,kindis not available on the current arch, orEFI_CPU_ARCH_PROTOCOLis not published.
-
int axl_cpu_unregister_exception(AxlCpuExceptionKind kind)
Unregister a previously-installed handler.
Safe to call on a kind that was never registered (no-op).
- Returns:
AXL_OK on success, AXL_ERR if
kindis out of range orEFI_CPU_ARCH_PROTOCOLis not published.
-
const AxlCpuFeatures *axl_cpu_features(void)
Query detected CPU features (cached after first call).
Pure detection — never changes CPU state. The returned pointer is to SDK-owned static storage valid for the program’s lifetime; never NULL.
- Returns:
pointer to the cached feature set.
-
bool axl_cpu_enable_avx(void)
Enable AVX (YMM) register state so AVX/AVX2 instructions run without a #UD fault.
UEFI firmware enables SSE state (the calling convention needs XMM) but does not enable AVX state, so AVX instructions trap until a CPL0 caller sets
CR4.OSXSAVEand the AVX bits inXCR0. A UEFI application runs at CPL0, so it may do this itself; this routine performs the sequence (CPUID-gated) once and is idempotent.Per-logical-processor.
CR4/XCR0are per-CPU; code that runs AVX kernels on application processors (via MP services) must call this on each AP as well as the BSP.No-op returning
falsewhen the CPU lacks AVX (or on non-x86), leavingaxl_cpu_simd_tieratAXL_SIMD_SSE41/BASELINE.- Returns:
trueif AVX is usable after the call (already-enabled counts),falseif the CPU has no AVX to enable.
-
bool axl_cpu_enable_avx512(void)
Enable AVX-512 (opmask + ZMM) register state.
The AVX-512 counterpart to
axl_cpu_enable_avx: setsCR4.OSXSAVEand theXCR0bits for x87 + SSE + AVX and the AVX-512 state components (opmask, ZMM_Hi256, Hi16_ZMM), so EVEX-encoded AVX-512 instructions run without a #UD. Implies AVX enable. Gated on the CPU advertising AVX-512F and the XSAVE state components being supported; idempotent; per-logical-processor (same caveat asaxl_cpu_enable_avx).Note:
axl_cpu_simd_tiertops out atAXL_SIMD_AVX2— that is the widest tier AXL’s own kernels use. AVX-512 is exposed for consumers who write their own AVX-512 code: checkavx512fet al. inaxl_cpu_features, call this to enable, then run.- Returns:
trueif AVX-512 is usable after the call (already-enabled counts),falseif the CPU has no AVX-512 to enable (or non-x86).
-
AxlSimdTier axl_cpu_simd_tier(void)
The best SIMD tier usable right now for kernel dispatch.
x86:
AXL_SIMD_AVX2if AVX2 is present and enabled (callaxl_cpu_enable_avxfirst), elseAXL_SIMD_SSE41if SSE4.1 is present, elseAXL_SIMD_BASELINE(SSE2, always). aarch64:AXL_SIMD_BASELINE(NEON). Does not change CPU state.- Returns:
the highest currently-usable
AxlSimdTier.
-
struct AxlCpuException
- #include <axl-cpu.h>
Architecture-neutral CPU exception context delivered to consumer callbacks.
The base fields (
fault_addressthrougherror_code) are arch-neutral and always populated. The register snapshot lives in theregsunion; consumers branch onarchto pick the correct arm.ABI growth.
struct_sizeis the byte count the SDK wrote for this instance (sizeof(AxlCpuException)at the SDK’s build time);versionisAXL_CPU_EXCEPTION_VERSION. Consumers targeting a forward range of SDKs guard reads of late-added fields withorif (exc->struct_size >= offsetof(AxlCpuException,
new_field) + sizeof(exc->new_field))
if (exc->version >= <since-version>). Existing fields never move; the union arms grow append-only.Field semantics:
fault_address— meaningful for memory-access faults (x64 #PF = CR2; aa64 sync = FAR_EL1). Zero for kinds where no fault address applies (#DE, #GP without memory, #UD, …).error_code— exception-specific: x64 #PF / #GP / #DF / #NP / #SS / #AC carry an error code pushed by the CPU; for other kinds, 0. aa64 carriesESR_EL1here so consumers can recover finer-grained classification on synchronous exceptions (EC field) without a separate accessor.
Public Members
-
uint32_t struct_size
sizeof(AxlCpuException) as written by the SDK
-
uint32_t version
AXL_CPU_EXCEPTION_VERSION at emit time.
-
AxlCpuExceptionKind kind
-
int arch
AXL_CPU_ARCH_X64 / _AA64.
-
uint64_t fault_address
memory-fault address, or 0 if N/A
-
uint64_t instruction_ptr
RIP / ELR.
-
uint64_t stack_ptr
RSP / SP.
-
uint64_t frame_ptr
RBP / X29.
-
uint64_t error_code
exception-specific; aa64 carries ESR_EL1
-
uint64_t rax
-
uint64_t rbx
-
uint64_t rcx
-
uint64_t rdx
-
uint64_t rsi
-
uint64_t rdi
-
uint64_t rbp
-
uint64_t rsp
-
uint64_t r8
-
uint64_t r9
-
uint64_t r10
-
uint64_t r11
-
uint64_t r12
-
uint64_t r13
-
uint64_t r14
-
uint64_t r15
-
uint64_t rip
-
uint64_t rflags
-
uint64_t cr2
raw CR2 — meaningful only when kind == PAGE_FAULT
-
struct AxlCpuException x64
-
uint64_t x[31]
X0..X28, X29 (=FP), X30 (=LR)
-
uint64_t sp
-
uint64_t elr
-
uint64_t spsr
-
uint64_t esr
raw ESR_EL1 (also in
error_code)
-
uint64_t far
raw FAR_EL1 (also in
fault_address)
-
struct AxlCpuException aa64
-
union AxlCpuException regs
-
struct AxlCpuFeatures
- #include <axl-cpu.h>
Detected CPU instruction-set features.
Filled once from
CPUID(x86) on first query and cached. Fields for the other architecture are alwaysfalse— readneonon aarch64, the x86 fields on x86. These report what the CPU can execute; for AVX, “can execute” still requires a one-time state enable (seeaxl_cpu_enable_avx) before the YMM registers are usable without a #UD fault.Most consumers want
axl_cpu_simd_tier(a single ordered value for kernel dispatch) rather than these individual bits.Public Members
-
bool sse2
SSE2 (always present on x86-64; firmware enables XMM state)
-
bool sse3
SSE3.
-
bool ssse3
SSSE3 (PSHUFB — byte shuffles)
-
bool sse41
SSE4.1 (PMOVZX / PBLENDVB / ROUNDPS — no state enable needed)
-
bool sse42
SSE4.2 (also CRC32 instruction)
-
bool fma
FMA3 fused multiply-add.
-
bool xsave
XSAVE/XGETBV/XSETBV present (precondition for enabling AVX)
-
bool avx
AVX (256-bit) — usable only after axl_cpu_enable_avx()
-
bool avx2
AVX2 (256-bit integer) — usable only after enable.
-
bool avx512f
AVX-512 Foundation — usable only after axl_cpu_enable_avx512()
-
bool avx512dq
AVX-512 Doubleword/Quadword.
-
bool avx512bw
AVX-512 Byte/Word.
-
bool avx512vl
AVX-512 Vector Length extensions.
-
bool avx512cd
AVX-512 Conflict Detection.
-
bool avx512vnni
AVX-512 Vector Neural Network Instructions.
-
bool aes
AES-NI (AESENC/AESDEC …)
-
bool pclmulqdq
carry-less multiply (GHASH / GCM)
-
bool sha
SHA-NI (SHA1/SHA256 round instructions)
-
bool vaes
vectorized AES (VEX/EVEX-encoded AES on 256/512-bit)
-
bool vpclmulqdq
vectorized carry-less multiply
-
bool popcnt
POPCNT instruction.
-
bool bmi1
BMI1 (ANDN, BLSR, TZCNT …)
-
bool bmi2
BMI2 (BZHI, PDEP, PEXT, MULX …)
-
bool lzcnt
LZCNT (leading-zero count; ABM)
-
bool movbe
MOVBE (load/store with byte swap)
-
bool f16c
F16C (half-precision float <-> single convert)
-
bool adx
ADX (ADCX/ADOX multiprecision add)
-
bool rdrand
RDRAND (on-chip RNG)
-
bool rdseed
RDSEED (seed-grade on-chip RNG)
-
bool neon
AdvSIMD/NEON (always present on ARMv8-A baseline)
-
bool fp16
half-precision floating point (FEAT_FP16)
-
bool atomics
Large System Extensions (FEAT_LSE atomic instructions)
-
bool crc32
CRC32 instructions (FEAT_CRC32)
-
bool aes_a64
AES instructions (FEAT_AES)
-
bool pmull
polynomial multiply long (FEAT_PMULL — GHASH)
-
bool sha1
SHA1 instructions (FEAT_SHA1)
-
bool sha2
SHA-256 instructions (FEAT_SHA256)
-
bool sha512
SHA-512 instructions (FEAT_SHA512)
-
bool sha3
SHA3 instructions (FEAT_SHA3)
-
bool dotprod
dot-product instructions (FEAT_DotProd)
-
bool sve
Scalable Vector Extension (FEAT_SVE)
-
bool sse2