AxlPieceTree — Out-of-Core Editable Buffer
See AxlData — Data Structures for an overview of all data modules.
An out-of-core, editable text buffer for large files. The original file is never loaded whole — its bytes are read on demand through AxlFileView — while edits accumulate in an append-only add buffer. A balanced tree of pieces (spans into either source, held in an AxlRBTree — Intrusive Augmented Red-Black Tree augmented with subtree byte and newline sums) presents one logical document with O(log n) offset↔line mapping and O(log n) edits, so editing a multi-gigabyte file costs memory proportional to the edits, not the file.
This is the structure VS Code calls a “piece tree.” For a
memory-resident editable store use AxlTextBuffer (a
gap buffer) instead. Line semantics match AxlTextBuffer exactly, so
the two are interchangeable for a renderer. See
docs/AXL-PieceTree-Design.md.
Header: <axl/axl-piece-tree.h>
API Reference
Typedefs
-
typedef struct AxlPieceTree AxlPieceTree
axl-piece-tree.h:
An out-of-core, editable text buffer for large files.
The original file is never loaded whole — its bytes are read on demand through AxlFileView — while edits accumulate in an append-only add buffer. A balanced tree of pieces (spans into the original or the add buffer, held in an AxlRBTree augmented with subtree byte and newline sums) stitches the two into one logical document with O(log n) offset<->line mapping and O(log n) edits. Editing a multi-gigabyte file therefore costs memory proportional to the edits, not the file.
This is the structure VS Code calls a “piece tree”. For a memory-resident editable store (small/medium buffers, form fields) use AxlTextBuffer (a gap buffer) instead — AxlPieceTree is the large-file tool.
Byte-oriented and encoding-agnostic: ‘
’ (0x0A) is the only special byte (line delimiter). Line semantics match AxlTextBuffer exactly (a ‘
’ belongs to the line it terminates; line bounds exclude the trailing ‘
’; an empty document is one line), so the two are interchangeable for a renderer. Content is read out via axl_piece_tree_get (the document is virtual — pieces, not a contiguous buffer). Read-only on the original; never writes back until axl_piece_tree_save.
Single-threaded (UEFI).
-
typedef struct AxlPageCache AxlPageCache
Enums
-
enum AxlEol
Line-ending (EOL) style. LF / CRLF / CR are concrete terminators; AXL_EOL_MIXED is only ever returned by axl_piece_tree_detect_eol (a document mixing more than one style) and is not a valid argument to axl_piece_tree_set_eol.
Values:
-
enumerator AXL_EOL_LF
“\n” (Unix)
-
enumerator AXL_EOL_CRLF
“\r\n” (DOS / Windows)
-
enumerator AXL_EOL_CR
“\r” (classic Mac)
-
enumerator AXL_EOL_MIXED
(detect only) more than one style present
-
enumerator AXL_EOL_LF
Functions
-
AxlPieceTree *axl_piece_tree_open(const char *path, size_t page_size, size_t max_frames)
Open a file for out-of-core editing.
Opens an AxlFileView over
pathand performs a single streaming scan to index newline positions (the one O(file) step — sequential, constant-memory via the page cache — that every editor pays to show line numbers on a large file). No file content is held resident.page_size/max_framesare the AxlFileView cache parameters (0 / 0 select defaults).- Parameters:
path – file path (UTF-8)
page_size – AxlFileView page size (0 = default)
max_frames – AxlFileView resident frames (0 = default)
- Returns:
new piece tree, or NULL on open / OOM failure. Free with axl_piece_tree_free().
-
AxlPieceTree *axl_piece_tree_open_cached(const char *path, AxlPageCache *cache)
Open a file out-of-core sharing a page cache across documents.
Like axl_piece_tree_open, but the underlying AxlFileView borrows
cache(created with axl_page_cache_new_shared) instead of allocating its own frames, so one bounded frame budget is shared across many open documents (an editor with many files). The page size comes from the cache (which must be a power of two). The cache is caller-owned and NOT freed when the document is freed — free it after all documents sharing it are freed.- Parameters:
path – file path (UTF-8)
cache – shared cache to borrow (caller-owned)
- Returns:
new piece tree, or NULL on open / OOM failure. Free with axl_piece_tree_free().
-
AxlPieceTree *axl_piece_tree_new(void)
Create an empty document (no backing file; add buffer only).
- Returns:
new empty piece tree (length 0, line count 1), or NULL on OOM.
-
void axl_piece_tree_free(AxlPieceTree *pt)
Free a piece tree and all its resources. NULL-safe.
- Parameters:
pt – piece tree (NULL-safe)
-
size_t axl_piece_tree_length(const AxlPieceTree *pt)
Total byte length of the logical document.
- Parameters:
pt – piece tree
-
size_t axl_piece_tree_line_count(const AxlPieceTree *pt)
Number of lines (newline count + 1; empty document = 1).
- Parameters:
pt – piece tree
-
int axl_piece_tree_insert(AxlPieceTree *pt, size_t offset, const char *data, size_t len)
Insert
lenbytes at byteoffset.offsetis clamped to the current length. O(log pieces).- Parameters:
pt – piece tree
offset – byte offset (clamped to length)
data – bytes to insert
len – number of bytes
- Returns:
AXL_OK on success, AXL_ERR on OOM or invalid args (NULL
datawithlen> 0).
-
int axl_piece_tree_delete(AxlPieceTree *pt, size_t offset, size_t len)
Delete
lenbytes starting at byteoffset.offsetpast the end deletes nothing;lenis clamped to the document end. O(log pieces) per affected piece boundary.- Parameters:
pt – piece tree
offset – byte offset
len – number of bytes
- Returns:
AXL_OK on success (including the no-op cases), AXL_ERR on OOM (a mid-piece split may allocate).
-
size_t axl_piece_tree_get(AxlPieceTree *pt, size_t offset, size_t len, char *out, size_t cap)
Copy a logical byte range out.
Copies up to min(
len,cap) bytes of [offset, offset+len), clamped to the document length, spanning pieces and reading original bytes through AxlFileView as needed.- Parameters:
pt – piece tree
offset – byte offset
len – bytes requested
out – destination buffer
cap – capacity of
out
- Returns:
number of bytes copied.
-
char *axl_piece_tree_get_alloc(AxlPieceTree *pt, size_t offset, size_t len)
Copy a logical byte range out into a fresh NUL-terminated buffer.
Convenience over axl_piece_tree_get for the common copy-out-then-use cases (selection to clipboard, measuring a line): allocates (clamped length + 1) bytes, copies [offset, offset+len) clamped to the document, and NUL-terminates. The copy holds the raw bytes — embedded NULs are preserved but a C-string view of the result stops at the first.
- Parameters:
pt – piece tree
offset – byte offset
len – bytes requested (clamped to the document)
- Returns:
newly allocated buffer (free with axl_free; an empty range yields a 1-byte “”), or NULL on OOM / NULL
pt.
-
size_t axl_piece_tree_cp_align(AxlPieceTree *pt, size_t offset)
Round
offsetdown to the start of the UTF-8 codepoint that contains it.Returns
offsetunchanged if it already sits on a codepoint boundary (or at 0 / the document end); otherwise steps back over UTF-8 continuation bytes. Use it to snap an arbitrary byte offset (e.g. from a click-to-offset hit test) to a valid caret position. O(log n).- Parameters:
pt – piece tree
offset – byte offset
-
size_t axl_piece_tree_cp_next(AxlPieceTree *pt, size_t offset)
Offset of the next UTF-8 codepoint boundary after
offset.Advances past the codepoint containing
offset— caret “move right”. Clamped at the document end. O(log n).- Parameters:
pt – piece tree
offset – byte offset
-
size_t axl_piece_tree_cp_prev(AxlPieceTree *pt, size_t offset)
Offset of the previous UTF-8 codepoint boundary before
offset.The start of the codepoint immediately before
offset— caret “move
left”. Clamped at 0. O(log n).
- Parameters:
pt – piece tree
offset – byte offset
-
size_t axl_piece_tree_line_of_offset(const AxlPieceTree *pt, size_t offset)
The 0-based line number containing byte
offset.offsetis clamped to the document length. O(log n).- Parameters:
pt – piece tree
offset – byte offset
-
int axl_piece_tree_line_bounds(const AxlPieceTree *pt, size_t line, size_t *start, size_t *end)
Byte range [start, end) of line
line(end excludes ‘
’).
A trailing CR (
\\r) immediately before the line’s terminating ‘
’ (a CRLF pair) is also excluded, so the returned range is the line’s content without either terminator byte. O(log n).
- Parameters:
pt – piece tree
line – 0-based line number
start – [out] first byte of the line
end –
[out] one past last byte, excluding ‘
’
- Returns:
AXL_OK if
lineis valid, AXL_ERR ifline>= line count.
-
bool axl_piece_tree_find(AxlPieceTree *pt, const char *needle, size_t needle_len, size_t from_offset, uint32_t flags, AxlMatch *out)
Crash-safely write the current document to a file.
Streams every piece (reading original bytes through AxlFileView) to a temporary sibling, flushes, then replaces
pathby rename — the document is never fully materialized in memory. Same crash-safety as axl_file_write_atomic (the target is never left half-written).Search for a byte substring across the (virtual) document.
Finds
needlestarting the scan atfrom_offset. Forward (default) returns the lowest match with start >=from_offset;AXL_FIND_BACKWARDreturns the highest match with start <=from_offset. Matches spanning piece boundaries are handled.AXL_FIND_CASE_INSENSITIVEfolds ASCII case;AXL_FIND_WHOLE_WORDrequires non-word bytes (anything but[A-Za-z0-9_]) on both sides. Wrap-around is the caller’s job.Thin wrapper over axl_find_in_source (see AxlByteReader).
- Parameters:
pt – piece tree
needle – bytes to find
needle_len – length of
needlefrom_offset – where to start scanning
flags – AxlFindFlags
out – [out] match on success
- Returns:
AXL_OK on success, AXL_ERR on failure.
- Returns:
true and fills
outon a match; false if not found (orneedle_lenis 0).
-
bool axl_piece_tree_is_modified(const AxlPieceTree *pt)
Whether the document differs from the last saved state.
Save-point aware: cleared by save / open / new, set by edits, and cleared again if undo/redo returns the document to the saved state.
- Parameters:
pt – piece tree
-
int axl_piece_tree_apply_edits(AxlPieceTree *pt, const AxlEdit *edits, size_t n)
Apply a batch of edits as a single undo step.
Edits use original (pre-batch) offsets and may be given in any order; they are applied with correct offset adjustment so later edits land where the caller intended (replace-all, multi-cursor, column edit). The whole batch is one undo group.
- Parameters:
pt – piece tree
edits – edits (original coordinates)
n – number of edits
- Returns:
AXL_OK on success, AXL_ERR on OOM or invalid args (a partial batch may have applied — undo it to recover).
-
void axl_piece_tree_line_iter_init(AxlPieceTree *pt, AxlPieceLineIter *it)
Start a forward line iterator (one O(n) pass over all lines).
Cheaper than repeated axl_piece_tree_line_bounds when visiting every line (find-all, re-layout, export). Do not edit the document while iterating.
- Parameters:
pt – piece tree
it – [out] iterator
-
void axl_piece_tree_line_iter_init_at(AxlPieceTree *pt, AxlPieceLineIter *it, size_t start_line)
Start a line iterator at a given line (skip earlier lines in O(log n) instead of O(
start_line)).Like axl_piece_tree_line_iter_init but the first axl_piece_tree_line_iter_next returns line
start_line— so a renderer can iterate a viewport deep in a huge file without walking every preceding line.start_line== 0 is identical to init;start_line>= the line count leaves the iterator exhausted (next returns false).- Parameters:
pt – piece tree
it – [out] iterator
start_line – 0-based line to start at
-
bool axl_piece_tree_line_iter_next(AxlPieceLineIter *it, size_t *start, size_t *end)
Advance the line iterator.
Sets
*start/*endto the next line’s byte range (end excludes the terminating ‘
’ and a preceding CR (
\\r), matching axl_piece_tree_line_bounds).- Parameters:
it – iterator
start – [out] line start
end –
[out] line end (excludes ‘
’)
- Returns:
true if a line was returned, false at end of document.
-
int axl_piece_tree_save(AxlPieceTree *pt, const char *path)
- Parameters:
pt – piece tree
path – destination path (UTF-8)
-
AxlPieceTree *axl_piece_tree_load_encoded(const char *path, size_t page_size, size_t max_frames, AxlEncoding *out_enc, bool *out_has_bom)
Open a text file, detecting and decoding its encoding.
Sniffs the leading bytes with axl_detect_encoding. A plain UTF-8 file with no BOM opens out-of-core exactly like axl_piece_tree_open (no materialization,
page_size/max_framesare the AxlFileView cache parameters). Every other case — a UTF-8 BOM to strip, or UTF-16 LE/BE (reported as theAXL_ENC_UCS2_LE/AXL_ENC_UCS2_BEwire codes) — is read in full, transcoded to UTF-8 (surrogate-aware; BOM stripped), and held in a memory-resident document (axl_piece_tree_new).Either way the document is UTF-8 internally and starts clean (no undo history; axl_piece_tree_is_modified is false). The detected encoding and BOM presence are reported so a later axl_piece_tree_save_encoded can round-trip them.
- Parameters:
path – file path (UTF-8)
page_size – AxlFileView page size, UTF-8 path (0 = default)
max_frames – AxlFileView resident frames, UTF-8 path (0 = default)
out_enc – [out, optional] detected encoding
out_has_bom – [out, optional] whether a BOM was present
- Returns:
new piece tree, or NULL on open / read / transcode / OOM failure. Free with axl_piece_tree_free().
-
int axl_piece_tree_save_encoded(AxlPieceTree *pt, const char *path, AxlEncoding enc, bool write_bom)
Write the document to a file in a chosen encoding.
Transcodes the (UTF-8) document to
enc— one ofAXL_ENC_UTF8,AXL_ENC_UCS2_LE, orAXL_ENC_UCS2_BE(the UCS-2 codes are written as surrogate-aware UTF-16) — optionally prepending the matching BOM, then writes it crash-safely (axl_file_write_atomic). The save point is advanced (axl_piece_tree_is_modified becomes false) on success.Plain UTF-8 with
write_bomfalse is equivalent to (and delegates to) the streaming axl_piece_tree_save — it never materializes the document. Any other encoding materializes the whole document for the transcode, which is acceptable for the rare “save as a different encoding” case.- Parameters:
pt – piece tree
path – destination path (UTF-8)
enc – target wire encoding
write_bom – prepend the encoding’s BOM
- Returns:
AXL_OK on success, AXL_ERR on an unsupported
enc, transcode, write, or OOM failure.
-
AxlEol axl_piece_tree_detect_eol(AxlPieceTree *pt)
Detect the document’s line-ending style.
Scans the whole document (O(n)) and classifies its line terminators: each “\r\n” is CRLF, each lone “\n” is LF, each lone “\r” is CR. The result is that single style if the document is uniform, AXL_EOL_MIXED if more than one style is present, and AXL_EOL_LF if there are no line terminators at all (the conventional default).
- Parameters:
pt – piece tree
- Returns:
the detected AxlEol.
-
int axl_piece_tree_set_eol(AxlPieceTree *pt, AxlEol eol)
Set the line ending that axl_piece_tree_save writes.
By default (no call) save preserves the document’s bytes verbatim. Once set, every save (and axl_piece_tree_save_encoded) normalizes each line terminator — “\r\n”, a lone “\r”, or a lone “\n” — to
eolwhile streaming, so line endings convert without materializing the document.Note this does not rewrite the in-memory document; it only governs the bytes written on save.
eolmust be a concrete style — AXL_EOL_MIXED (or any out-of-range value) is rejected.- Parameters:
pt – piece tree
eol – target line ending (LF / CRLF / CR)
- Returns:
AXL_OK, or AXL_ERR for AXL_EOL_MIXED / an out-of-range
eol.
-
void axl_piece_tree_set_read_only(AxlPieceTree *pt, bool read_only)
Make the document reject content mutations.
While read-only, axl_piece_tree_insert / _delete / _apply_edits return AXL_ERR without changing the document. Reads (axl_piece_tree_get, find, line queries) and save are unaffected. Clear it to allow edits again.
This governs the editor’s own mutators only; it is independent of the file’s on-disk permissions.
- Parameters:
pt – piece tree
read_only – true to reject mutations
-
bool axl_piece_tree_is_read_only(const AxlPieceTree *pt)
Whether the document is currently read-only.
- Parameters:
pt – piece tree
-
bool axl_piece_tree_backing_changed(AxlPieceTree *pt)
Whether the backing file changed on disk since it was opened.
Compares the current size / modification time of the file this document was opened from (axl_piece_tree_open, or load_encoded’s out-of-core path) against the values captured at open. An out-of-core document reads original bytes lazily through that file, so an external change can corrupt reads — call this (e.g. when the editor regains focus) to detect it and offer a reload.
Returns false for a document with no backing file (axl_piece_tree_new, or a load that was transcoded fully into memory). Note mtime has filesystem granularity (FAT is 2 s), so a same-size change within that window can be missed.
- Parameters:
pt – piece tree
- Returns:
true if the file’s size or mtime differs from open (or the file is now missing / inaccessible); false otherwise.
-
int axl_piece_tree_undo(AxlPieceTree *pt, size_t *affected_offset, size_t *affected_len)
Undo the most recent edit (or edit group).
Reverses the last recorded insert/delete — or, if it was inside a group (see axl_piece_tree_undo_group_begin), the whole group — and moves it onto the redo stack. Reversal reuses the original / add-buffer bytes (no copy) and is atomic: on OOM the document is left unchanged.
Reports where the change landed so an editor can put the caret / selection at the edit site:
affected_offsetreceives the document offset of the reverted edit andaffected_lenthe byte length of the content now present there — non-zero when bytes were re-inserted (select that range), zero for a net deletion (collapse the caret there). For a multi-edit group it is the last sub-edit applied. Both out-params are optional (NULL to ignore) and are set to 0 on AXL_ERR.- Parameters:
pt – piece tree
affected_offset – [out, optional] offset of the change
affected_len – [out, optional] bytes now at that offset
- Returns:
AXL_OK if something was undone, AXL_ERR if there is nothing to undo (or an allocation failed).
-
int axl_piece_tree_redo(AxlPieceTree *pt, size_t *affected_offset, size_t *affected_len)
Redo the most recently undone edit (or edit group).
Reports the affected range exactly like axl_piece_tree_undo (see there):
affected_offset/affected_lenlocate the redone change for caret / selection placement; both are optional.- Parameters:
pt – piece tree
affected_offset – [out, optional] offset of the change
affected_len – [out, optional] bytes now at that offset
- Returns:
AXL_OK if something was redone, AXL_ERR if there is nothing to redo (or an allocation failed).
-
bool axl_piece_tree_can_undo(const AxlPieceTree *pt)
Whether an undo is available.
- Parameters:
pt – piece tree
-
bool axl_piece_tree_can_redo(const AxlPieceTree *pt)
Whether a redo is available.
- Parameters:
pt – piece tree
-
void axl_piece_tree_set_undo_limit(AxlPieceTree *pt, size_t max_edits)
Set how many edit records to retain.
SIZE_MAX(the default) keeps unlimited history;0disables undo (records are not kept);Nkeeps the most recent N records, dropping the oldest. Note that unlimited history means unbounded memory growth (the add buffer never shrinks and the log grows).- Parameters:
pt – piece tree
max_edits – retained record count (SIZE_MAX = unlimited)
-
void axl_piece_tree_undo_group_begin(AxlPieceTree *pt)
Begin an undo group; edits until the matching end undo together.
The explicit atomic-transaction bracket: every edit between this call and the matching axl_piece_tree_undo_group_end undoes/redoes as one step. Nestable — only the outermost begin/end pair forms the group (inner pairs are depth-counted, not separate groups). Reach for this when you apply several edits imperatively and want them atomic: a paste that deletes a selection then inserts, find-replace-all, multi-cursor edits, auto-indent-on-newline.
Three tools, three jobs — don’t confuse them:
**_undo_group_begin / _end** (this) — bracket an explicit, possibly-nested batch of imperative edits as one atomic step.
axl_piece_tree_apply_edits — when you already hold the whole batch as an AxlEdit[] (replace-all, multi-cursor): applies it as ONE group with correct offset adjustment. Simpler than bracketing by hand; prefer it when the edits are known up front.
axl_piece_tree_undo_checkpoint — the opposite model: accumulate-until-break keystroke coalescing (VS Code-style), where consecutive edits merge until the editor declares a boundary. Use for live typing, NOT for bracketing a transaction.
- Parameters:
pt – piece tree
-
void axl_piece_tree_undo_group_end(AxlPieceTree *pt)
End the current undo group (see axl_piece_tree_undo_group_begin).
- Parameters:
pt – piece tree
-
void axl_piece_tree_undo_checkpoint(AxlPieceTree *pt)
Break the undo run; subsequent edits start a new group.
Sugar for the “accumulate-until-break” editing model (no explicit begin/end bracketing): once any checkpoint has been set, consecutive edits coalesce into one undo step until the next checkpoint. The editor calls this at its chosen boundaries — a typing pause, a cursor jump, a type↔delete switch, a word/line boundary, an N-keystroke cap — to get VS Code-style “smart” grouping; the buffer supplies the mechanism, the editor the policy (the buffer has no clock or cursor of its own).
Equivalent to slicing one long group at this point. Edits made before the first checkpoint (and with no active group) still undo individually.
- Parameters:
pt – piece tree
-
struct AxlEdit
- #include <axl-piece-tree.h>
One edit for axl_piece_tree_apply_edits: delete
del_lenbytes atoffset, then insertins_lenbytes. Offsets are in the document’s ORIGINAL coordinates (before any edit in the batch).
-
struct AxlPieceLineIter
- #include <axl-piece-tree.h>
Forward line iterator (see axl_piece_tree_line_iter_init). Fields are private.
Public Members
-
AxlPieceTree *pt
-
void *node
current piece (AxlRBNode*)
-
size_t node_base
doc offset of
node'sfirst byte
-
size_t intra
bytes consumed within
node
-
size_t nl_idx
next newline index in node’s source array
-
size_t line_start
doc offset of the pending line’s start
-
size_t doc_len
-
bool done
-
AxlPieceTree *pt