Releasing
Releasing AXL
Step-by-step for cutting an AXL release. The flow is opinionated and the order matters — running it out of order, or skipping the version helper, has burned us before.
TL;DR — the fast path
scripts/cut-release.sh X.Y.Z # do it
scripts/cut-release.sh X.Y.Z --dry-run # preview, change nothing
scripts/cut-release.sh automates the whole cut below and enforces the
gate: it bumps the version, dates the CHANGELOG, commits + pushes main,
waits for CI to go green on the release commit, and only then creates the
tag (a published tag can’t be cleanly re-cut — see Recovery). Then it watches
CI/Release/Docs and prints the published release. If CI fails on the release
commit it stops before tagging and prints how to recover (fix on main, then
scripts/cut-release.sh X.Y.Z --resume).
The script does not run the heavy local prerequisite gate — CI on the
release commit is the authoritative gate. That only works well if you push
small batches to main continuously so CI has already validated the code
before release day. Releasing a big pile of unpushed commits is what made v1.0.0
ship with a red CI (≈100 commits had never been through CI). The rest of this
doc is what the script does, and the manual fallback if you need it.
To validate locally before pushing (optional fail-fast), run scripts/lint.sh
(clang-tidy exactly as CI runs it) and the smoke suites below.
Prerequisites
You’re on
main, working tree clean,git log origin/main..HEADshows the commits to ship.The integration suite passes locally. The quick smoke set:
./test/integration/test-axl.sh ./test/integration/test-tools.sh ./test/integration/test-tcp-echo.sh ./test/integration/test-http.sh ./test/integration/test-cpu-idle.sh
But this list is a SUBSET of what CI runs —
.github/workflows/ci.ymlis the source of truth, and it runs more (e.g.test-input-keys-qemu.sh,test-yield-ctrlc.sh,test-axl-cc-service.sh, theclang-tidyjob, both arches). Running only the five above can leave a CI-only failure undetected until the tag. Either run every*.shstep listed inci.yml, or — better — rely on the §4b “watch CI onmainbefore tagging” gate, which validates the exact CI matrix. Some tests are deliberately local-only (they need a QMP-pointer-injection-capable host the GitHub runners don’t provide, e.g.test-input-modifiers-qemu.sh); run those by hand before a release.Both archs build clean against
BUILD=RELEASE. Use a separatePREFIXso the RELEASE-flagged.ofiles don’t shadow the in-tree DEBUG cache the integration tests above reuse — the.ocache key is the.ctimestamp only, not theBUILDmode, so a same-prefix RELEASE compile leaves.ofiles newer than the.csource and a subsequent defaultmakereuses them with the wrong flags. Symptom: thedebug: alloc fill 0xDAtest fails (axl-mem.o built without-DAXL_MEM_DEBUG).scripts/install.shuses the-releaseprefix internally for the same reason.make ARCH=x64 BUILD=RELEASE PREFIX=out/native-x64-release make ARCH=aa64 BUILD=RELEASE PREFIX=out/native-aa64-release
TLS-enabled build is green if you touched anything in
src/net/(release.yml hardcodesAXL_TLS=1for the published packages):AXL_TLS=1 make ARCH=x64 BUILD=RELEASE PREFIX=out/native-x64-release
clang-tidy is clean locally. The CI workflow’s
lintjob runsclang-tidywithWarningsAsErrors: '*'; the unit / integration suites above don’t exercise it. This step has been the post-tag failure mode TWICE in a row (v0.18.0 → v0.18.1, v0.19.0 → v0.19.1): Release published artifacts cleanly, then CI flagged a finding and required a follow-up patch release. Run it the same way CI does, before tagging, so any findings surface BEFORE the artifacts are public — just use the wrapper:scripts/lint.sh # bear + clang-tidy, exactly as ci.yml's lint job
Fix any errors and re-run. If the only findings are warnings (not errors), CI is fine —
WarningsAsErrors: '*'in.clang-tidyonly escalates the checks the config enables; the2 warnings generatedlines per file are noise.Version skew — local-clean does NOT imply CI-clean (unless versions match). CI pins
clang-tidy-21by running thelintjob inside anubuntu:26.04container (26.04 is the first Ubuntu whose apt ships clang-tidy-21 natively; 24.04 tops out at 20).scripts/lint.shpins the sameCT_VERSION=21— so a cleanscripts/lint.shdoes mean a clean CI lint when your local clang-tidy is 21. Three ways to get a matching 21:Ubuntu 26.04+:
sudo apt-get install clang-tidy-21.EL/Fedora/macOS dev boxes: the distro’s current
clang-tidyis often already 21 —scripts/lint.shdetects that and uses it without warning (it only warns on a real mismatch).Any host: run CI’s exact lint in the same image —
podman run --rm -v "$PWD":/src:ro ubuntu:26.04 …(the containerci.yml’s lint job uses), so local == CI byte-for-byte regardless of distro.
This pinning exists because a newer local clang-tidy silently passed code an older CI clang-tidy flagged — v1.0.0 hit exactly that (
bugprone-sizeof-expressionon a correct array-of-pointerssizeof). When you intentionally move clang-tidy versions, bump it in bothci.yml(theclang-tidy-NNinstall + invocation, and theubuntu:NN.NNcontainer if the new version needs a newer base) andscripts/lint.sh’sCT_VERSIONtogether. Either way, the §4b CI-on-mainrun remains the authoritative gate.Run clang-tidy one file per process (
-n1). Passing many TUs to a singleclang-tidyinvocation makes the path-sensitiveclang-analyzer-*checks (notablysecurity.ArrayBound) non-deterministic: the same clean tree intermittently reports spurious out-of-bounds findings that vanish when each file is analyzed alone with full budget.-n1 -P"$(nproc)"is deterministic, parallel, and still catches real bugs (they reproduce per-file). A barexargs -0 clang-tidy(batched) is the flaky form — don’t use it.
Cut the release
1. Bump the version
scripts/bump-version.sh X.Y.Z
This updates both VERSION and include/axl/axl-version.h in
one shot. Don’t edit either file by hand — the Makefile’s
check-version target compares them and refuses to build on
mismatch. If you’ve ever seen
ERROR: VERSION (X.Y.Z) != axl-version.h (P.Q.R) in CI, this is
why.
2. Date the CHANGELOG
CHANGELOG.md accumulates under a ## Unreleased heading during
development. At release time, rename it:
- ## Unreleased
+ ## X.Y.Z — YYYY-MM-DD
Sweep git log <prev-tag>..HEAD --oneline and confirm every
user-visible change has an entry. Headlines worth catching:
New public API (Added)
Behavior changes that callers might trip on (Changed, Migration)
Bug fixes — especially anything that was a UAF, leak, or flake (Fixed)
Build/CI/Docs/Examples changes that affect contributors
3. One commit for the release metadata
git add VERSION include/axl/axl-version.h CHANGELOG.md
git commit -m "release: vX.Y.Z"
Keep this commit small. The release-cut commit is the canonical “what changed in this release” reference; if it’s noisy with code changes, the diff-against-previous-tag becomes harder to read.
4. Push main first
git push origin main
main must contain the release-metadata commit before the tag
points at it; if you tag first and then push the branch, the
release.yml workflow can race and check out the wrong commit.
4b. Wait for CI to pass on main BEFORE tagging — the load-bearing gate
Do not tag until the CI workflow is green on the branch push. The tag
re-triggers the same CI + Release + Docs workflows; if CI is red on the
branch it will be red on the tag too — except now release.yml has already
published the artifacts and gh release create has run, so the tag can no
longer be cleanly re-cut (see Recovery) and you’re forced into a patch
release. Validating on the branch first makes a red CI a 5-minute branch fix
instead of a burned version number.
scripts/watch-release-runs.sh # or: gh run watch <ci-run-id>
This matters most after a long unpushed run: if main is dozens of
commits ahead of origin/main, CI has validated none of them, and the
local prereq suite above is not a substitute — it is a strict subset of what
CI runs. v1.0.0 shipped with a red CI for exactly this reason: ~100 commits
were unpushed, and two CI-only failures (a test-input-modifiers-qemu.sh
that can’t run on headless runners, and a clang-tidy finding from a newer
local clang-tidy than CI’s) only surfaced on the tag. The fix that would
have caught both: push main and watch CI before tagging.
If CI is red on the branch, fix it on main as normal commits, let CI go
green, and only then proceed to the tag.
5. Create and push the tag
git tag -a vX.Y.Z -m "vX.Y.Z
<headline paragraph from CHANGELOG>
- bullet 1
- bullet 2
- bullet 3
See CHANGELOG.md for the full list."
git push origin vX.Y.Z
The annotated tag’s body shows up in the GitHub Release page and
in gh release view vX.Y.Z. Worth a few minutes of polish.
6. Watch the workflows
The tag push triggers three workflows on the same commit:
CI (
.github/workflows/ci.yml) — full unit + integration suite across both architectures.Release (
.github/workflows/release.yml) — builds .deb + .rpm viafpm(both x64 and aa64), pulls iPXE from upstream for the host-tools tarball, attaches everything to a GitHub Release onaximcode/axl-sdk-releases.Docs (
.github/workflows/docs.yml) — Doxygen + Sphinx build + Cloudflare Pages deploy.
Realistic timing. On healthy GitHub-runner infrastructure the whole flow is ~4–5 minutes wall-clock (parallel across the three workflows). Verified on v0.9.0’s successful retry: Release 2m57s, Docs 1m44s, CI 4m10s. The longest individual jobs are CI’s QEMU integration tests (~4 min) and Release’s Build-tools-aa64 (~3 min, slower than x64 due to QEMU user-mode emulation of cross-tool execution). All other jobs finish in under 2 minutes.
Pathological case — bad-DNS days. When GitHub Actions runner
DNS is flaky (azure.archive.ubuntu.com mirrors), apt-get install retries can stall jobs at 30–40 minutes before
giving up. The v0.9.0 first attempt hit this — five jobs across
CI/Release/Docs all failed at exit-code 100 (apt’s “couldn’t
fetch packages”) after 18–40 minute stalls. The workflows now
write Acquire::Retries=3 and Acquire::http(s)::Timeout=15
into /etc/apt/apt.conf.d/99-axl-retry as the first action of
every install-deps step — that bounds the worst case to ~5
minutes on bad days instead of 30+. Even with the bound, plan
for up to ~15 minutes total if a re-run is needed during a
runner network blip.
Workflow notes:
AArch64 builds run on x86 hosted runners. The library + tests
tools cross-compile fine in 1-3 min each; QEMU emulation only kicks in when something needs to execute an aa64 binary (e.g. CI’s QEMU integration tests).
The
Docsworkflow occasionally hits transient apt-mirror failures even on otherwise-healthy days. Re-running just that workflow is the fix; it’s not a release blocker since artifacts come fromRelease, notDocs.
The default — and what to use unless you have a specific reason to stream live output:
scripts/watch-release-runs.sh
That’s it. The script polls all three workflows via GraphQL
(separate quota pool from REST) at 60-second intervals, prints
status per workflow per tick, exits rc=0 only when all three
reach SUCCESS, rc=1 if any finish non-SUCCESS. Total cost:
~60 GraphQL calls/hour. Use this. Don’t reach for gh run watch unless you have a specific reason to stream
per-step output.
CI and Release must finish green for the release to be considered shipped. Docs is best-effort — re-run later if it flakes.
Why not gh run watch?
GitHub’s REST API has a 5,000 req/hour cap per authenticated
user. gh run watch --exit-status is a polling loop disguised
as a stream — it makes ~1 request every 3 seconds. Three
parallel watchers (CI + Release + Docs) for a 30-minute
multi-arch workflow generates ~1,800 calls; add ad-hoc
gh run view / gh run list during diagnosis and you’ll
exhaust the quota mid-flight. When that happens the watcher
exits rc=1 on the rate-limit error, which is
indistinguishable from “the run failed” by exit code alone.
GitHub does NOT sell a per-account rate-limit boost; the only escape is Enterprise Cloud (15,000/hr) or a GitHub App on a multi-installation org (up to 12,500/hr).
scripts/watch-release-runs.sh sidesteps the entire issue —
GraphQL has its own 5,000-points/hr quota that REST polling
doesn’t touch, and a status query costs ~1 point. ~60 calls
per hour is well below the limit even if you forget to stop
the loop.
The gh run watch fallback
When you need streaming per-step output (e.g., debugging a
specific job’s failure), use gh run watch for one
workflow at a time — never three in parallel. Pick the
slowest (Release):
gh run list --commit "$(git rev-parse HEAD)" \
--json databaseId,workflowName,status --limit 5
gh run watch <release-id>
Critical caveat: gh run watch follows ALL jobs in the
workflow, not just the first one. Don’t be misled by an
early ✓ Complete job line in the output — multi-arch
workflows have several jobs, and the watcher keeps running
until the slowest one finishes.
Two corollaries:
The streamed
✓ Complete joblines are per-individual-job, NOT workflow completion. A multi-arch workflow has several jobs (Build tools x64 → Build tools aa64 → Build axl-sdk → Build host-tools → publish); seeing✓ Complete jobonly means one of those finished. Don’t conflate with workflow conclusion.Cross-check via GraphQL after rate-limit hits. GraphQL has a separate 5000 req/h quota that REST polling doesn’t touch:
gh api graphql -f query=' { repository(owner: "aximcode", name: "axl-sdk") { object(expression: "<sha>") { ... on Commit { checkSuites(first: 10) { nodes { workflowRun { workflow { name } } status conclusion } } } } } }'
Returns
status=COMPLETED, conclusion=SUCCESS|FAILUREfor each workflow without consuming REST quota. Use this whenevergh run watchexits non-zero orgh run view403s — it’s the authoritative source while the REST limit recovers (typically 1 hour from the first rate-limit error).
7. Confirm
The Release is published on the public sibling repo
aximcode/axl-sdk-releases, not on the private upstream — pass
--repo to gh release view:
gh release view vX.Y.Z --repo aximcode/axl-sdk-releases
Should show the release page with axl-sdk.deb, axl-sdk.rpm,
axl-sdk-tools-{x64,aa64}.tar.gz,
axl-sdk-host-tools.{tar.gz,deb}, and SHA256SUMS attached.
The .deb / .rpm packages include the full C and C++ surface
when CI builds with the ARM bare-metal toolchain cached
(scripts/install-arm-toolchain.sh). Specifically, each
package contains:
C bits (always):
axl-ccdriver,libaxl.aper arch,axl.h+axl/*.hheaders, CRT0 objects, linker scripts, CMake config, pkg-config, JSON5 sidecars.C++ bits (when toolchain present at build time):
axl-c++driver,libaxl-cxx.aper arch.libaxl-cxx.adoesn’t link libstdc++ so there’s no runtime-dependency escalation on the package — pure-C consumers can ignore the extra files.
CI cache invalidation: the ARM toolchain tarball is keyed on
its pinned version (14.3.rel1 at writing). Bump the pin in
scripts/install-arm-toolchain.sh to force a re-fetch. Verify
both packages contain libaxl-cxx.a after a release build:
dpkg-deb -c axl-sdk_*_amd64.deb | grep libaxl-cxx
rpm -qpl axl-sdk-*.x86_64.rpm | grep libaxl-cxx
If libaxl-cxx.a is missing, the build host either didn’t have
the ARM bare-metal toolchain available or install.sh’s
auto-detect failed. See
AXL-SDK-Design.md §”C++ support” for the
toolchain story.
8. Get back on main
git checkout main # if `git status` shows detached HEAD
git status # confirm: "On branch main, working tree clean"
git tag -a and git push origin <tag> themselves don’t detach
HEAD, but a typical release session involves enough sideband
operations (rebases, fix-ups during step 5’s tag-message edit,
git checkout vX.Y.Z to spot-check the tagged tree before
pushing) that it’s worth verifying. Subsequent commits made
while detached will be invisible to git push origin main,
which is the silent-data-loss case to avoid.
Recovery: a failed release tag
If a workflow fails on a freshly-pushed tag and no GitHub Release has been created yet, it’s safe to re-cut:
Verify no release exists:
gh release view vX.Y.Zshould printrelease not found.Land the fix on
mainas a normal commit.Move the tag:
git tag -d vX.Y.Z git push origin :refs/tags/vX.Y.Z git tag -a vX.Y.Z -m "<original tag message>" git push origin vX.Y.Z
Watch the new workflow runs (step 6) and confirm (step 7).
If a GitHub Release does exist for the tag (i.e. release.yml
ran past the gh release create step before failing), re-cutting
the same version is no longer clean: consumers may have already
downloaded the partial artifacts. Bump to the next patch version
instead.
Why these steps
The order is a recovery from each pitfall we’ve actually hit:
bump-version.shexists because the Makefile’scheck-versiontarget catches staleaxl-version.hon every build. Cut a release without it and the firstmakeinvocation in CI fails; v0.2.6 was re-cut for exactly this reason.“Push main before the tag” exists because release.yml clones the tag, but the tag points to a commit that has to be reachable via the branch ref the runners pull from.
“One commit for release metadata” exists because the tag-to-tag diff is the canonical changelog input on the next release.
See also
scripts/bump-version.sh— the atomic version helper..github/workflows/release.yml— the deb/rpm build and publish workflow..github/workflows/ci.yml— the per-push integration suite that gates a release.CHANGELOG.md— what’s shipped, by tag.