Skip to content

PackedTrees and OpTrees

packed_tree Package

Implementation of the Packed Octree data structure

OpTree

OpTree(data, leafsize=None, compact_nodes=None, copy_data=False, balanced_tree=None, boxsize=None, *, cubes_per_side=-1, save_dataset=False)

Class to mimic the SciPy KDTree API using ParticleCubes and PackedTrees

Will provide identical API to SciPy's KDTree to the extent possible given that ParticleCubes and PackedTrees are fundamentally different. Where 1-1 matches for a requested method, argument, or functionality are not possible, raise an OpTreeError if there is nothing similar and emit an OpTreeWarning explaining the replacement otherwise.

Warning! PackedTrees are not robust against large amounts of degenerate input data! Please sanitize data prior to usage if expecting data degeneracy levels above ~100 (i.e. 100 data points with the same values). Note that multiple degenerate regions are acceptable, assuming they are sufficiently separated.

Parameters:

  • data ((array_like, shape(n, m) | MultiParticleDataset)) –

    The n m-dimensional data points to be indexed. This array is preferentially not copied and will be sorted in place, so modifying this data will result in bogus results. The data are also copied if the OpTree is built with copy_data=True. Note: OpTrees are intended to support 3-dimensional data, so m>3 is not supported. For m<3, the data is padded with zeros (e.g. [[1, 2], [3, 4]] will become [[1, 2, 0], [3, 4, 0]]). This will lead to the data being copied. Can also pass in a Dataset directly for improved creation time.

  • leafsize (positive int, default: None ) –

    The number of points at which the algorithm switches over to brute-force. Default: 400.

  • compact_nodes (bool, default: None ) –

    This parameter is irrelevant for OpTrees and is only provided to match the KDTree API.

  • copy_data (bool, default: False ) –

    If True the data is copied to protect the kd-tree against data corruption and to prevent the original data from being sorted. Default: False.

  • balanced_tree (bool, default: None ) –

    OpTrees are always split at the bounding box midpoint, so this option is only provided to match the KDTree API

  • boxsize (array_like or scalar, default: None ) –

    Provide an explicit bounding box for the data in the form [x_min, y_min, z_min, dx, dy, dz]. If len(boxsize)==3, x_min = y_min = z_min = 0. If boxsize`` is a scalar,dx = dy = dz = boxsize. Otherboxsizelengths are unsupported. SciPy'sKDTree` will impose a toroidal topology in addition; this functionality is currently unsupported.

  • cubes_per_side (int, default: -1 ) –

    Size of the top-level grid. Must be between 3 and 32 or -1 (default). The default uses the number of available threads to ensure there are more grid cells than threads.

  • save_dataset (bool, default: False ) –

    If data is a dataset, save sorted positions/indices to file. Default False

data instance-attribute

data = positions

The n data points of dimension m to be indexed. The data is only copied if the "kd-tree" is built with copy_data=True.

leafsize instance-attribute

leafsize = _DEFAULT_PARTICLE_THRESHOLD if leafsize is None else leafsize

The number of points at which the algorithm switches over to brute-force

maxs instance-attribute

maxs = box[:3] + box[3:]

The maximum value in each dimension of the n data points

mins instance-attribute

mins = box[:3]

The minimum value in each dimension of the n data points

n instance-attribute

n = len(data)

The number of data points.

size instance-attribute

size = sum((int(len(tree) / 5)) for ct in (cube_trees))

The number of nodes in the tree.

sort_index property

sort_index

Shuffle list for the original data, ie self.data = data[self.sort_index]

count_neighbors

count_neighbors(*, other, r, p=None, weights=None, cumulative=None)

Count how many nearby pairs can be formed.

Count the number of pairs (x1, x2) can be formed, with x1 drawn from self and x2 drawn from other, and where distance(x1, x2, p) <= r.

Data points on self and other are optionally weighted by the weights argument. (See below)

WARNING

Not currently implemented.

Parameters:

  • other (OpTree) –

    The other tree to draw points from, can be the same tree as self

  • r (float | NDArray) –

    The radius to produce a count for. Mulltiple radii are searched with a single tree traversal. If the count is non-cumulative (cumulative=False), r defines the edges of the bins, and must be non-decreasing

  • p (float | None, default: None ) –

    Which Minkowski p-norm to use. Default 2.0. A finite large p may cause a ValueError if overflow can occur

  • weights (tuple[float | None, float | None] | NDArray | None, default: None ) –

    If None, the pair-counting is unweighted. If given as a tuple, weights[0] is the weights of points in self, and weights[1] is the weights of points in other; either can be None to indicate the points are unweighted. If given as an array_like, weights is the weights of points in self and other. For this to make sense, self and other must be the same tree. If self and other are two different trees, a ValueError is raised. Default: None

  • cumulative (bool | None, default: None ) –

    Whether the returned counts are cumulative. When cumulative is set to False the algorithm is optimized to work with a large number of bins (>10) specified by r. When cumulative is set to True, the algorithm is optimized to work with a small number of r. Default: True

Returns:

  • result ( scalar or 1-D array ) –

    The number of pairs. For unweighted counts, the result is integer. For weighted counts, the result is float. If cumulative is False, result[i] contains the counts with (-inf if i == 0 else r[i-1]) < R <= r[i]

Raises:

query

query(x, k=1, eps=None, p=None, distance_upper_bound=None, workers=None, *, return_data_indices=None, return_sorted=None)

Query the OpTree for nearest neighbors

Parameters:

  • x (ArrayLike) –

    An array of points to query

  • k (int | Sequence[int], default: 1 ) –

    Either the number of nearest neighbors to return or a list of the kth nearest neighbors to return, starting from 1. E.g., [2,3] will return the 2nd and 3rd nearest neighbors

  • eps (float | None, default: None ) –

    Return approximate nearest neighbors; Note that this parameter is unused

  • p (int | None, default: None ) –

    The Minkowski p-norm to use. 1 is the sum of absolute-values distance ("Manhattan" distance). 2 is the Euclidean distance. infinity is the maximum-coordinate-difference distance. Currently only p=2 is supported

  • distance_upper_bound (float | None, default: None ) –

    Return only neighbors from other nodes within this distance. This is used for tree pruning, so if you are doing a series of nearest-neighbor queries, it may help to supply the distance to the nearest neighbor of the most recent point.

  • workers (int | None, default: None ) –

    Number of workers to use for parallel processing. Only 1 is supported, for more, see Cubes

  • return_data_indices (bool | None, default: None ) –

    Return indices into the sorted data if True instead of into the original. Specify None to have this set by the copy_data argument used during tree construction.

  • return_sorted (bool | None, default: None ) –

    Flag to return the distances and indices in distance-sorted order. Set to False for a performance boost. Default True

Returns:

  • d ( float or array of floats ) –

    The distances to the nearest neighbors. If x has shape tuple+(self.m,), then d has shape tuple+(k,). When k==1, the last dimension of the output is squeezed. Missing neighbors are indicated with infinite distances. Hits are sorted by distance (nearest first)

  • i ( integer or array of integers ) –

    The index of each neighbor in self.data. i is the same shape as d. Missing neighbors are indicated with self.n.

Raises:

query_ball_point

query_ball_point(x, r, p=2.0, eps=None, workers=-1, *, return_sorted=False, return_length=False, return_lists=False, return_data_indices=None, strict=None)

Find all points within distance r of point(s) x.

Parameters:

  • x (array_like, shape tuple + (self.m,)) –

    The point or points to search for neighbors of.

  • r ((array_like, float)) –

    The radius of points to return, must broadcast to the length of x.

  • p (float, default: 2.0 ) –

    Which Minkowski p-norm to use. Should be in the range [1, inf]. A finite large p may cause a ValueError if overflow can occur.

  • eps (nonnegative float, default: None ) –

    Approximate search. Branches of the tree are not explored if their nearest points are further than r / (1 + eps), and branches are added in bulk if their furthest points are nearer than r * (1 + eps).

  • workers (int, default: -1 ) –

    Number of jobs to schedule for parallel processing. If -1 is given all processors are used. Default: -1. Note: SciPy's kdtree parallelizes across the number of points queried. Thus, querying on a single point gets no speed-up from parallelization. We parallelize on single point queries, thus the different default

  • return_sorted (bool, default: False ) –

    Sorts returned indices if True and does not sort them if False. If None, does not sort single point queries, but does sort multi-point queries which was the behavior before this option was added. Default False.

  • return_length (bool, default: False ) –

    Return the number of points inside the radius instead of a list of the indices. Note that this is much faster for large trees.

  • return_lists (bool, default: False ) –

    Force returning lists instead of arrays. OpTrees return arrays of indices by default, but this doesn't match the expected query_ball_point signature. To exactly match SciPy, set this to True.

  • return_data_indices (bool | None, default: None ) –

    Return indices into the sorted data if True instead of into the original. Specify None to have this set by the copy_data argument used during tree construction.

  • strict (bool | None, default: None ) –

    If False, compare only the approximate node distance. Should be significantly faster, but may include some amount of false positives. Default True

Returns:

  • results ( list or array of lists ) –

    If x is a single point, returns a list of the indices of the neighbors of x. If x is an array of points, returns an object array of shape tuple containing lists of neighbors.

Notes

If you have many points whose neighbors you want to find, you may save substantial amounts of time by putting them in a OpTree and using query_ball_tree.

Examples:

>>> import numpy as np
>>> from packingcubes import OpTree
>>> x, y = np.mgrid[0:5, 0:5]
>>> points = np.c_[x.ravel(), y.ravel()]
>>> tree = OpTree(points)
>>> sorted(tree.query_ball_point([2, 0], 1))
[5, 10, 11, 15]

Query multiple points and plot the results:

>>> import matplotlib.pyplot as plt
>>> points = np.asarray(points)
>>> plt.plot(points[:,0], points[:,1], '.')
>>> for results in tree.query_ball_point(([2, 0], [3, 3]), 1):
...     nearby_points = points[results]
...     plt.plot(nearby_points[:,0], nearby_points[:,1], 'o')
>>> plt.margins(0.1, 0.1)
>>> plt.show()

query_ball_tree

query_ball_tree(other, r, p=2, eps=None, *, strict=None, return_lists=None, return_sorted=None)

Find all pairs of points between self and other whose distance is at most r.

Parameters:

  • other (OpTree) –

    The tree containing points to search against

  • r

    The maximum distance, has to be positive

  • p (float, default: 2 ) –

    Which Minkowski norm to use. p has to meet the condition 1 <= p <= infinity

  • eps (float | None, default: None ) –

    Approximate search. Branches of the tree are not explored if their nearest points are further than r/(1+eps), and branches are added in bulk if their furthest points are nearer than r * (1+eps). eps has to be non-negative.

  • strict (bool | None, default: None ) –

    If False, compare only the approximate node distance. Should be significantly faster, but may include substantial amounts of false positives. Default True

  • return_lists (bool | None, default: None ) –

    Force returning lists instead of arrays. OpTrees return arrays of indices by default, but this doesn't match the expected query_ball_tree signature. For a slight performance increase, set this to False

  • return_sorted (bool | None, default: None ) –

    Force returning sorted lists. If the copy_data flag was passed during tree construction, the data used to generate the results may be in a different order than originally imposed. For example, results[0] = [5, 1, 2]. If order of output is important, set this flag to True, at a performance penalty.

Returns:

  • results ( list of lists ) –

    For each element self.data[i] of this tree, results[i] is a list of the indices of its neighbors in other.data

Raises:

query_pairs

query_pairs(r, p=2.0, *, eps=None, output_type=None, strict=None)

Find all pairs of points in self whose distance is at most r.

Parameters:

  • r (float) –

    The maximum distance, has to be positive

  • p (float, default: 2.0 ) –

    Which Minkowski norm to use. p has to meet the condition 1 <= p <= infinity

  • eps (float | None, default: None ) –

    Approximate search. Branches of the tree are not explored if their nearest points are further than r/(1+eps), and branches are added in bulk if their furthest points are nearer than r * (1+eps). eps has to be non-negative.

  • output_type (str | None, default: None ) –

    Choose the output container, 'set' or 'ndarray'. Default: 'set'

  • strict (bool | None, default: None ) –

    If False, compare only the approximate node distance. Should be significantly faster, but may include substantial amounts of false positives. Default True

Returns:

  • results ( set or NDArray ) –

    Set of pairs (i, j) with i<j, for which the corresponding positions are close. If output_type is 'ndarray', an ndarray is returned instead of a set.

Raises:

sparse_distance_matrix

sparse_distance_matrix(*, other, max_distance, p=None, output_type=None)

Compute a sparse distance matrix

Computes a distance matrix between two OpTrees, leaving as zero any distance greater than max_distance.

WARNING

Not currently implemented.

Parameters:

  • other (OpTree) –
  • max_distance (float) –
  • p (float | None, default: None ) –

    Which Minkowski p-norm to use. A finite large p may cause a ValueError if overflow can occur

  • output_type (str | None, default: None ) –

    Which container to use for output data. Default: "dok_matrix"

Returns:

  • result ( dok_matrix, coo_matrix, dict, or ndarray ) –

    Sparse matrix representing the results in a "dictionary of keys" format. If a dict is returned the keys are (i,j) tuples of indices. If output_type is "ndarray" a record array with fields "i", "j", and "v" is returned.

Raises:

PackedTree

PackedTree(*, dataset=None, source=None, particle_threshold=None, bounding_box=None, copy_data=False)

Bases: Octree

Public packed octree interface

This interface defines the methods for creating, manipulating, and traversing a packingcubes packed octree.

Attributes:

Note

Must provide either dataset or source. If provided source does not include metadata, must additionally provide either dataset or bounding_box.

Parameters:

  • dataset (NDArray | Dataset | None, default: None ) –

    An (N,3) array or Dataset containing particle data

  • source (Buffer | None, default: None ) –

    Pre-computed packed buffer containing this tree. Leave out to compute the tree from scratch.

  • particle_threshold (int | None, default: None ) –

    Number of particles allowed in a leaf before splitting. Defaults to octree._DEFAULT_PARTICLE_THRESHOLD

  • bounding_box (BoxLike | None, default: None ) –

    Bounding box of the tree. Required if metadata needs to be created and dataset is not provided.

    Will override the dataset bounding box.

  • copy_data (bool, default: False ) –

    If dataset is just an array, flag to copy data prior to construction. Defaults to False

metadata instance-attribute

metadata = metadata

The metadata for this packed tree

packed_form property

packed_form

Return this tree in full packed form

packed_meta property

packed_meta

Return a memoryview of the tree's packed metadata

packed_tree property

packed_tree

Return a memoryview of the tree's backing byte array

particle_threshold instance-attribute

particle_threshold = particle_threshold

The maximum leaf size before splitting, used in tree construction

__iter__

__iter__()

Iterate through all nodes of the octree.

Note that no guarantee is made of what order the nodes are traversed in

__len__

__len__()

Return the number of particles in the tree

count_neighbors

count_neighbors(*, other, r)

Count how many nearby pairs can be formed.

Parameters:

  • other (PackedTree) –

    The other tree to compare against

  • r (float) –

    The radius to produce a count for.

Returns:

  • result ( scalar or 1-d array ) –

    The number of pairs

get_closest_particle

get_closest_particle(xyz, *, check_neighbors=True)

Get nearest particle index (and distance) to point.

Parameters:

  • xyz (ArrayLike) –

    Coordinates of point to check

  • check_neighbors (bool, default: True ) –

    Flag to check whether we should look at neighbors of the smallest containing node. Default True

Returns:

  • closest_ind ( int ) –

    Absolute index of closest particle

  • closest_dist ( float ) –

    Distance to closest particle

Raises:

get_closest_particles

get_closest_particles(*, data, xyz, distance_upper_bound=None, p=2, k=1, use_data_indices=True, return_sorted=True)

Get kth nearest particle distances and indices to point.

Parameters:

  • data (DataContainer | Dataset) –

    Source of particle position data

  • xyz (NDArray) –

    Coordinates of point to check

  • distance_upper_bound (float | None, default: None ) –

    Return only neighbors from other nodes within this distance. This is used for tree pruning, so if you are doing a series of nearest-neighbor queries, it may help to supply the distance to the nearest neighbor of the most recent point.

  • p (float, default: 2 ) –

    Which Minkowski p-norm to use. 1 is the sum of absolute-values distance ("Manhattan" distance). 2 is the usual Euclidean distance. Infinity is the maximum-coordinate-difference distance. Currently, only p=2 is supported.

  • k (int, default: 1 ) –

    Number of closest particles to return. Default 1

  • use_data_indices (bool, default: True ) –

    Flag to return indices into the sorted dataset (True, default) or into the shuffle list (False)

  • return_sorted (bool, default: True ) –

    Flag to return the distances and indices in distance-sorted order. Set to False for a performance boost. Default True

Returns:

  • distances ( NDArray[float] ) –

    Distances to the kth nearest neighbors. Has shape (min(N,k),), where N is the number of particles in the sphere bounded by distance_upper_bound

  • indices ( NDArray[int] ) –

    Indices in data of the kth nearest neighbors. Has same shape as distances

Raises:

get_leaves

get_leaves()

Return a list of all leaf octree nodes in depth-first order

get_node

get_node(tag)

Return the node corresponding to the provided tag or None if not found.

Parameters:

  • tag (str) –

    The tag to search for

Returns:

  • node

    Node in octree with specified tag or None if it does not exist

get_particle_index_list_in_box

get_particle_index_list_in_box(*, data, box, strict=False)

Return all particles contained within the box.

Parameters:

  • data (DataContainer | Dataset) –

    Dataset containing the particle positions. Pass a DataContainer object for a slight performance increase

  • box (BoxLike) –

    Box to check

  • strict (bool, default: False ) –

    Flag to specify whether only particles inside box will be returned. If False (default), additional nearby particles may be included for signficantly increased performance

Returns:

  • indices ( NDArray[int]] ) –

    List of original particle indices contained within sphere

get_particle_index_list_in_sphere

get_particle_index_list_in_sphere(*, data, center, radius, strict=False)

Return all particles contained within sphere defined by center and radius.

Parameters:

  • data (DataContainer | Dataset) –

    Dataset containing the particle positions. Pass a DataContainer object for a slight performance increase

  • center (NDArray) –

    Center point of the sphere

  • radius (float) –

    Radius of the sphere

  • strict (bool, default: False ) –

    Flag to specify whether only particles inside the sphere will be returned. If False (default), additional nearby particles may be included for signficantly increased performance

Returns:

  • indices ( NDArray[int] ) –

    List of original particle indices contained within sphere

get_particle_indices_in_box

get_particle_indices_in_box(*, box)

Return all particles contained within the box.

Parameters:

  • box (BoxLike) –

    Box to check

Returns:

  • indices ( list[tuple[int, int]] ) –

    List of particle start-stop indices contained within sphere Third element of each tuple is a flag for whether only some particles (1) among the start-stop indices are contained or all (0)

get_particle_indices_in_sphere

get_particle_indices_in_sphere(*, center, radius)

Return all particles contained within sphere defined by center and radius.

Parameters:

  • center (NDArray) –

    Center point of the sphere

  • radius (float) –

    Radius of the sphere

Returns:

  • indices ( list[tuple[int, int, int]] ) –

    List of particle start-stop indices contained within sphere Third element of each tuple is a flag for whether only some particles (1) among the start-stop indices are contained or all (0)

PackedTreeMeta

Bases: NamedTuple

The metadata for a PackedTree

bounding_box instance-attribute

bounding_box

Bounding box used when creating the tree

checksum instance-attribute

checksum

Actual checksum

checksum_method class-attribute instance-attribute

checksum_method = 'xxh3_64_intdigest'

Method for computing the checksum

creation_timestamp instance-attribute

creation_timestamp

Creation time as a UTC timestamp

field_type class-attribute instance-attribute

field_type = dtype(uint32)

Field type (e.g. uint32)

meta_size class-attribute instance-attribute

meta_size = 144

Header size in bytes

packed_spec_version class-attribute instance-attribute

packed_spec_version = '1.0.0'

Current version of the packed metadata standard in semver format

particle_threshold class-attribute instance-attribute

particle_threshold = _DEFAULT_PARTICLE_THRESHOLD

Particle threshold used when creating the tree

create_metadata

create_metadata(box, packed, particle_threshold=_DEFAULT_PARTICLE_THRESHOLD)

Create a tree metadata object from the provided info

Parameters:

  • box (BoundingBox) –

    The bounding box used to create the tree

  • packed (NDArray[uint32]) –

    The packed tree data

  • particle_threshold (int, default: _DEFAULT_PARTICLE_THRESHOLD ) –

    The particle threshold to split leaves. Defaults to optree._DEFAULT_PARTICLE_THRESHOLD

Returns:

  • TreeMeta

    A TreeMeta object containing the tree's metadata

extract_metadata

extract_metadata(source)

Extract the metadata and packed tree information from a buffer

Parameters:

  • source (Buffer) –

    A Buffer containing the packed data

Returns:

  • metadata ( TreeMeta ) –

    The tree's metadata

  • packed_tree ( NDArray[uint32] ) –

    The actual packed tree data as a numpy array

Raises:

  • ValueError

    If the metadata does not match an expected format

  • NotImplementedError

    If an unimplemented data format is specified (currently only uint32)

  • OctreeError

    If the metadata checksum does not match the computed checksum

pack_metadata

pack_metadata(metadata, packed_tree)

Pack tree metadata

Parameters:

  • metadata (TreeMeta) –

    The metadata of the tree

  • packed_tree (NDArray[uint32]) –

    The packed tree data

Returns:

  • NDArray[uint32]

    Packed metadata