Skip to content

Packed Format Design Specification

Version 1.0.0

This file documents the format used to describe a PackedTree data structure.

A packed tree is composed of two parts: an initial header containing tree metadata (e.g. tree checksum, bounding box, etc), and the actual tree structure in memory.

The basic unit of the packed format is the field, currently defined as a uint32. Thus, the entire packed tree can be considered an array of uint32, and internally is stored as a numpy array of such.

Tree Metadata

Field Bytes Description
0 0-3 Size of header in bytes (=144).
Note this also sets endianness
1 4 Field size in bytes as an int8. Currently set to 4, representing a uint32.
A negative value would represent a signed integer (i.e. -4 <=> int32)
1 5-7 Packed specification version as major - minor - patch
following semantic versioning (see semver.org), stored as 3 uint8s
2-3 8-15 Creation time as a UTC timestamp
4-20 16-83 Checksum method as a string: 'xxh3_64_intdigest'
21-22 84-91 Tree checksum computed via xxhash's xxh3_64_intdigest
23-24 92-99 Tree bounding box x position
25-26 100-107 Tree bounding box y position
27-28 108-115 Tree bounding box z position
29-30 116-123 Tree bounding box dx
31-32 124-131 Tree bounding box dy
33-34 132-139 Tree bounding box dz
35 140-143 Particle threshold

Timestamp is currently computed via

# conversion to timestamp
t = np.float64(datetime.datetime.now(tz=datetime.UTC).timestamp())
# conversion from timestamp
datetime.datetime.fromtimestamp(t, tz=datetime.UTC)

For the checksum, use xxhash's xxh3 (not part of the standard library):

from xxhash import xxh3_64_intdigest 

checksum = xxh3_64_intdigest(packed_tree)
This creates a uint64 checksum.

Tree Data Structure

Essentially we can think of the tree structure as the equivalent of numpy's structured array, except in an octree structure as opposed to the linear array structure. Thus, the bytes are organized such that all information for a node (including it's children!) are together in sequential order.

The intent is to mimic the in-memory structure of a simple tree from a language like C, where every child node pointer has been replaced by the actual bytes of the child node.

Thus, the first few bytes correspond to data attached to the root node, the next few bytes are the first child's data, followed by the first grandchild, etc., in a pre-order depth-first traversal of the tree.

Internal Node:

skip_length node_start node_end child_flag my_index level unused C0 C1 C5 C6 C7 parent_offset
uint32 uint32 uint32 uint8 uint8 uint8 uint8 uint32
5+sum([c.skip_length for c in children]) CN is present if child_flag & (2**N)=1 Which child is this (0 if root) What level node is this. root is 0 skip_length + sum([c.skip_length for c in siblings if c < self])
field 0 field 1 field 2 field 3 * * * * * field 4

For the present case, child_flag would be 227 (\(=2^0 + 2^1 + 2^5 + 2^6 + 2^7\)). If all 5 nodes were leaves, then skip_length=30.

Leaf Node:

Size = 5 fields = 20 bytes

skip_length=20 node_start node_end child_flag=0 my_index level unused parent_offset
uint32 uint32 uint32 uint8 uint8 uint8 uint8 uint32

Example:

graph TD
    a --> b
    a --> g[g*]
    b --> c
    b --> f[f*]
    c --> d
    c --> e[e*]
    g --> h
    g --> i[i*]

The packed implementation would then look like (indenting subsequent tree levels)

45, 
a_s, 
a_e, 
a_m,
    25,
    b_s,
    b_e,
    b_m,
        15,
        c_s,
        c_e,
        c_m,
            5,
            d_s,
            d_e,
            d_m,
            4,
            5,
            e_s,
            e_e,
            e_m,
            9,
        4,
        5, 
        f_s, 
        f_e, 
        f_m, 
        15,
    4,
    15, 
    g_s,
    g_e,
    g_m,
        5,
        h_s,
        h_e,
        h_m,
        4,
        5,
        i_s,
        i_e,
        i_m,
        9,
    25,
0

Here, each element is a field and \(x_{s}\), \(x_{e}\), and \(x_{m}\) are the node_start, node_end, and metadata (child_flag, my_index, level, and empty) fields for node \(x\). The starred nodes in the tree are the last sibling for each group of children.