PackedTrees and OpTrees¶
packed_tree Package
Implementation of the Packed Octree data structure
OpTree
¶
OpTree(data, leafsize=None, compact_nodes=None, copy_data=False, balanced_tree=None, boxsize=None, *, cubes_per_side=-1, save_dataset=False)
Class to mimic the SciPy KDTree API using ParticleCubes and PackedTrees
Will provide identical API to SciPy's KDTree to the extent possible given that ParticleCubes and PackedTrees are fundamentally different. Where 1-1 matches for a requested method, argument, or functionality are not possible, raise an OpTreeError if there is nothing similar and emit an OpTreeWarning explaining the replacement otherwise.
Warning! PackedTrees are not robust against large amounts of degenerate input data! Please sanitize data prior to usage if expecting data degeneracy levels above ~100 (i.e. 100 data points with the same values). Note that multiple degenerate regions are acceptable, assuming they are sufficiently separated.
Parameters:
-
data((array_like, shape(n, m) | MultiParticleDataset)) –The
nm-dimensional data points to be indexed. This array is preferentially not copied and will be sorted in place, so modifying this data will result in bogus results. The data are also copied if theOpTreeis built with copy_data=True. Note: OpTrees are intended to support 3-dimensional data, so m>3 is not supported. For m<3, the data is padded with zeros (e.g.[[1, 2], [3, 4]]will become[[1, 2, 0], [3, 4, 0]]). This will lead to the data being copied. Can also pass in a Dataset directly for improved creation time. -
leafsize(positive int, default:None) –The number of points at which the algorithm switches over to brute-force. Default:
400. -
compact_nodes(bool, default:None) –This parameter is irrelevant for
OpTrees and is only provided to match theKDTreeAPI. -
copy_data(bool, default:False) –If
Truethe data is copied to protect the kd-tree against data corruption and to prevent the original data from being sorted. Default:False. -
balanced_tree(bool, default:None) –OpTrees are always split at the bounding box midpoint, so this option is only provided to match theKDTreeAPI -
boxsize(array_like or scalar, default:None) –Provide an explicit bounding box for the data in the form
[x_min, y_min, z_min, dx, dy, dz]. Iflen(boxsize)==3,x_min = y_min = z_min = 0. Ifboxsize`` is a scalar,dx = dy = dz = boxsize. Otherboxsizelengths are unsupported. SciPy'sKDTree` will impose a toroidal topology in addition; this functionality is currently unsupported. -
cubes_per_side(int, default:-1) –Size of the top-level grid. Must be between 3 and 32 or -1 (default). The default uses the number of available threads to ensure there are more grid cells than threads.
-
save_dataset(bool, default:False) –If data is a dataset, save sorted positions/indices to file. Default
False
data
instance-attribute
¶
The n data points of dimension m to be indexed. The data is only copied if
the "kd-tree" is built with copy_data=True.
leafsize
instance-attribute
¶
The number of points at which the algorithm switches over to brute-force
maxs
instance-attribute
¶
The maximum value in each dimension of the n data points
size
instance-attribute
¶
The number of nodes in the tree.
sort_index
property
¶
Shuffle list for the original data, ie self.data = data[self.sort_index]
count_neighbors
¶
Count how many nearby pairs can be formed.
Count the number of pairs (x1, x2) can be formed, with x1 drawn from
self and x2 drawn from other, and where distance(x1, x2, p)
<= r.
Data points on self and other are optionally weighted by the weights
argument. (See below)
WARNING
Not currently implemented.
Parameters:
-
other(OpTree) –The other tree to draw points from, can be the same tree as self
-
r(float | NDArray) –The radius to produce a count for. Mulltiple radii are searched with a single tree traversal. If the count is non-cumulative (
cumulative=False),rdefines the edges of the bins, and must be non-decreasing -
p(float | None, default:None) –Which Minkowski p-norm to use. Default 2.0. A finite large p may cause a
ValueErrorif overflow can occur -
weights(tuple[float | None, float | None] | NDArray | None, default:None) –If None, the pair-counting is unweighted. If given as a tuple,
weights[0]is the weights of points inself, andweights[1]is the weights of points inother; either can be None to indicate the points are unweighted. If given as an array_like,weightsis the weights of points inselfandother. For this to make sense,selfandothermust be the same tree. Ifselfandotherare two different trees, aValueErroris raised. Default: None -
cumulative(bool | None, default:None) –Whether the returned counts are cumulative. When
cumulativeis set toFalsethe algorithm is optimized to work with a large number of bins (>10) specified byr. Whencumulativeis set toTrue, the algorithm is optimized to work with a small number ofr. Default: True
Returns:
-
result(scalar or 1-D array) –The number of pairs. For unweighted counts, the result is integer. For weighted counts, the result is float. If
cumulativeisFalse,result[i]contains the counts with(-inf if i == 0 else r[i-1]) < R <= r[i]
Raises:
query
¶
query(x, k=1, eps=None, p=None, distance_upper_bound=None, workers=None, *, return_data_indices=None, return_sorted=None)
Query the OpTree for nearest neighbors
Parameters:
-
x(ArrayLike) –An array of points to query
-
k(int | Sequence[int], default:1) –Either the number of nearest neighbors to return or a list of the kth nearest neighbors to return, starting from 1. E.g.,
[2,3]will return the 2nd and 3rd nearest neighbors -
eps(float | None, default:None) –Return approximate nearest neighbors; Note that this parameter is unused
-
p(int | None, default:None) –The Minkowski p-norm to use. 1 is the sum of absolute-values distance ("Manhattan" distance). 2 is the Euclidean distance. infinity is the maximum-coordinate-difference distance. Currently only p=2 is supported
-
distance_upper_bound(float | None, default:None) –Return only neighbors from other nodes within this distance. This is used for tree pruning, so if you are doing a series of nearest-neighbor queries, it may help to supply the distance to the nearest neighbor of the most recent point.
-
workers(int | None, default:None) –Number of workers to use for parallel processing. Only 1 is supported, for more, see Cubes
-
return_data_indices(bool | None, default:None) –Return indices into the sorted data if True instead of into the original. Specify None to have this set by the copy_data argument used during tree construction.
-
return_sorted(bool | None, default:None) –Flag to return the distances and indices in distance-sorted order. Set to
Falsefor a performance boost. DefaultTrue
Returns:
-
d(float or array of floats) –The distances to the nearest neighbors. If
xhas shapetuple+(self.m,), thendhas shapetuple+(k,). Whenk==1, the last dimension of the output is squeezed. Missing neighbors are indicated with infinite distances. Hits are sorted by distance (nearest first) -
i(integer or array of integers) –The index of each neighbor in
self.data.iis the same shape asd. Missing neighbors are indicated withself.n.
Raises:
-
NotImplementedError–if
p!=2
query_ball_point
¶
query_ball_point(x, r, p=2.0, eps=None, workers=-1, *, return_sorted=False, return_length=False, return_lists=False, return_data_indices=None, strict=None)
Find all points within distance r of point(s) x.
Parameters:
-
x(array_like, shape tuple + (self.m,)) –The point or points to search for neighbors of.
-
r((array_like, float)) –The radius of points to return, must broadcast to the length of
x. -
p(float, default:2.0) –Which Minkowski
p-norm to use. Should be in the range[1, inf]. A finite largepmay cause aValueErrorif overflow can occur. -
eps(nonnegative float, default:None) –Approximate search. Branches of the tree are not explored if their nearest points are further than
r / (1 + eps), and branches are added in bulk if their furthest points are nearer thanr * (1 + eps). -
workers(int, default:-1) –Number of jobs to schedule for parallel processing. If -1 is given all processors are used. Default: -1. Note: SciPy's kdtree parallelizes across the number of points queried. Thus, querying on a single point gets no speed-up from parallelization. We parallelize on single point queries, thus the different default
-
return_sorted(bool, default:False) –Sorts returned indices if
Trueand does not sort them ifFalse. IfNone, does not sort single point queries, but does sort multi-point queries which was the behavior before this option was added. DefaultFalse. -
return_length(bool, default:False) –Return the number of points inside the radius instead of a list of the indices. Note that this is much faster for large trees.
-
return_lists(bool, default:False) –Force returning lists instead of arrays.
OpTrees return arrays of indices by default, but this doesn't match the expected query_ball_point signature. To exactly match SciPy, set this toTrue. -
return_data_indices(bool | None, default:None) –Return indices into the sorted data if
Trueinstead of into the original. SpecifyNoneto have this set by the copy_data argument used during tree construction. -
strict(bool | None, default:None) –If
False, compare only the approximate node distance. Should be significantly faster, but may include some amount of false positives. DefaultTrue
Returns:
-
results(list or array of lists) –If
xis a single point, returns a list of the indices of the neighbors ofx. Ifxis an array of points, returns an object array of shape tuple containing lists of neighbors.
Notes
If you have many points whose neighbors you want to find, you may save substantial amounts of time by putting them in a OpTree and using query_ball_tree.
Examples:
>>> import numpy as np
>>> from packingcubes import OpTree
>>> x, y = np.mgrid[0:5, 0:5]
>>> points = np.c_[x.ravel(), y.ravel()]
>>> tree = OpTree(points)
>>> sorted(tree.query_ball_point([2, 0], 1))
[5, 10, 11, 15]
Query multiple points and plot the results:
>>> import matplotlib.pyplot as plt
>>> points = np.asarray(points)
>>> plt.plot(points[:,0], points[:,1], '.')
>>> for results in tree.query_ball_point(([2, 0], [3, 3]), 1):
... nearby_points = points[results]
... plt.plot(nearby_points[:,0], nearby_points[:,1], 'o')
>>> plt.margins(0.1, 0.1)
>>> plt.show()
query_ball_tree
¶
Find all pairs of points between self and other whose distance is at most r.
Parameters:
-
other(OpTree) –The tree containing points to search against
-
r–The maximum distance, has to be positive
-
p(float, default:2) –Which Minkowski norm to use.
phas to meet the condition1 <= p <= infinity -
eps(float | None, default:None) –Approximate search. Branches of the tree are not explored if their nearest points are further than
r/(1+eps), and branches are added in bulk if their furthest points are nearer thanr * (1+eps). eps has to be non-negative. -
strict(bool | None, default:None) –If
False, compare only the approximate node distance. Should be significantly faster, but may include substantial amounts of false positives. DefaultTrue -
return_lists(bool | None, default:None) –Force returning lists instead of arrays.
OpTrees return arrays of indices by default, but this doesn't match the expectedquery_ball_treesignature. For a slight performance increase, set this toFalse -
return_sorted(bool | None, default:None) –Force returning sorted lists. If the
copy_dataflag was passed during tree construction, the data used to generate the results may be in a different order than originally imposed. For example,results[0] = [5, 1, 2]. If order of output is important, set this flag toTrue, at a performance penalty.
Returns:
-
results(list of lists) –For each element
self.data[i]of this tree,results[i]is a list of the indices of its neighbors in other.data
Raises:
-
NotImplementedError–if p!=2
query_pairs
¶
Find all pairs of points in self whose distance is at most r.
Parameters:
-
r(float) –The maximum distance, has to be positive
-
p(float, default:2.0) –Which Minkowski norm to use.
phas to meet the condition1 <= p <= infinity -
eps(float | None, default:None) –Approximate search. Branches of the tree are not explored if their nearest points are further than
r/(1+eps), and branches are added in bulk if their furthest points are nearer thanr * (1+eps). eps has to be non-negative. -
output_type(str | None, default:None) –Choose the output container, 'set' or 'ndarray'. Default: 'set'
-
strict(bool | None, default:None) –If
False, compare only the approximate node distance. Should be significantly faster, but may include substantial amounts of false positives. DefaultTrue
Returns:
-
results(set or NDArray) –Set of pairs
(i, j)withi<j, for which the corresponding positions are close. If output_type is 'ndarray', an ndarray is returned instead of a set.
Raises:
-
NotImplementedError–if p!=2
sparse_distance_matrix
¶
Compute a sparse distance matrix
Computes a distance matrix between two OpTrees, leaving as zero any distance greater than max_distance.
WARNING
Not currently implemented.
Parameters:
-
other(OpTree) – -
max_distance(float) – -
p(float | None, default:None) –Which Minkowski p-norm to use. A finite large p may cause a
ValueErrorif overflow can occur -
output_type(str | None, default:None) –Which container to use for output data. Default: "dok_matrix"
Returns:
-
result(dok_matrix, coo_matrix, dict, or ndarray) –Sparse matrix representing the results in a "dictionary of keys" format. If a dict is returned the keys are (i,j) tuples of indices. If output_type is "ndarray" a record array with fields "i", "j", and "v" is returned.
Raises:
PackedTree
¶
PackedTree(*, dataset=None, source=None, particle_threshold=None, bounding_box=None, copy_data=False)
Bases: Octree
Public packed octree interface
This interface defines the methods for creating, manipulating, and
traversing a packingcubes packed octree.
Attributes:
-
data(Dataset) –The backing dataset
-
particle_threshold(int) –The maximum leaf size before splitting, used in tree construction
Note
Must provide either dataset or source. If provided source does not include metadata, must additionally provide either dataset or bounding_box.
Parameters:
-
dataset(NDArray | Dataset | None, default:None) –An (N,3) array or Dataset containing particle data
-
source(Buffer | None, default:None) –Pre-computed packed buffer containing this tree. Leave out to compute the tree from scratch.
-
particle_threshold(int | None, default:None) –Number of particles allowed in a leaf before splitting. Defaults to
octree._DEFAULT_PARTICLE_THRESHOLD -
bounding_box(BoxLike | None, default:None) –Bounding box of the tree. Required if metadata needs to be created and dataset is not provided.
Will override the dataset bounding box.
-
copy_data(bool, default:False) –If dataset is just an array, flag to copy data prior to construction. Defaults to
False
particle_threshold
instance-attribute
¶
The maximum leaf size before splitting, used in tree construction
__iter__
¶
Iterate through all nodes of the octree.
Note that no guarantee is made of what order the nodes are traversed in
count_neighbors
¶
Count how many nearby pairs can be formed.
Parameters:
-
other(PackedTree) –The other tree to compare against
-
r(float) –The radius to produce a count for.
Returns:
-
result(scalar or 1-d array) –The number of pairs
get_closest_particle
¶
Get nearest particle index (and distance) to point.
Parameters:
-
xyz(ArrayLike) –Coordinates of point to check
-
check_neighbors(bool, default:True) –Flag to check whether we should look at neighbors of the smallest containing node. Default True
Returns:
-
closest_ind(int) –Absolute index of closest particle
-
closest_dist(float) –Distance to closest particle
Raises:
-
NotImplementedError–This function is not implemented, see get_closest_particles instead
get_closest_particles
¶
get_closest_particles(*, data, xyz, distance_upper_bound=None, p=2, k=1, use_data_indices=True, return_sorted=True)
Get kth nearest particle distances and indices to point.
Parameters:
-
data(DataContainer | Dataset) –Source of particle position data
-
xyz(NDArray) –Coordinates of point to check
-
distance_upper_bound(float | None, default:None) –Return only neighbors from other nodes within this distance. This is used for tree pruning, so if you are doing a series of nearest-neighbor queries, it may help to supply the distance to the nearest neighbor of the most recent point.
-
p(float, default:2) –Which Minkowski p-norm to use. 1 is the sum of absolute-values distance ("Manhattan" distance). 2 is the usual Euclidean distance. Infinity is the maximum-coordinate-difference distance. Currently, only
p=2is supported. -
k(int, default:1) –Number of closest particles to return. Default
1 -
use_data_indices(bool, default:True) –Flag to return indices into the sorted dataset (
True, default) or into the shuffle list (False) -
return_sorted(bool, default:True) –Flag to return the distances and indices in distance-sorted order. Set to
Falsefor a performance boost. DefaultTrue
Returns:
-
distances(NDArray[float]) –Distances to the kth nearest neighbors. Has shape
(min(N,k),), whereNis the number of particles in the sphere bounded by distance_upper_bound -
indices(NDArray[int]) –Indices in data of the kth nearest neighbors. Has same shape as distances
Raises:
-
NotImplementedError–If a
pvalue of then 2 is provided
get_node
¶
Return the node corresponding to the provided tag or None if not found.
Parameters:
-
tag(str) –The tag to search for
Returns:
-
node–Node in octree with specified
tagorNoneif it does not exist
get_particle_index_list_in_box
¶
Return all particles contained within the box.
Parameters:
-
data(DataContainer | Dataset) –Dataset containing the particle positions. Pass a
DataContainerobject for a slight performance increase -
box(BoxLike) –Box to check
-
strict(bool, default:False) –Flag to specify whether only particles inside
boxwill be returned. IfFalse(default), additional nearby particles may be included for signficantly increased performance
Returns:
-
indices(NDArray[int]]) –List of original particle indices contained within sphere
get_particle_index_list_in_sphere
¶
Return all particles contained within sphere defined by center and radius.
Parameters:
-
data(DataContainer | Dataset) –Dataset containing the particle positions. Pass a
DataContainerobject for a slight performance increase -
center(NDArray) –Center point of the sphere
-
radius(float) –Radius of the sphere
-
strict(bool, default:False) –Flag to specify whether only particles inside the sphere will be returned. If
False(default), additional nearby particles may be included for signficantly increased performance
Returns:
-
indices(NDArray[int]) –List of original particle indices contained within sphere
get_particle_indices_in_box
¶
Return all particles contained within the box.
Parameters:
-
box(BoxLike) –Box to check
Returns:
-
indices(list[tuple[int, int]]) –List of particle start-stop indices contained within sphere Third element of each tuple is a flag for whether only some particles (1) among the start-stop indices are contained or all (0)
get_particle_indices_in_sphere
¶
Return all particles contained within sphere defined by center and radius.
Parameters:
-
center(NDArray) –Center point of the sphere
-
radius(float) –Radius of the sphere
Returns:
-
indices(list[tuple[int, int, int]]) –List of particle start-stop indices contained within sphere Third element of each tuple is a flag for whether only some particles (1) among the start-stop indices are contained or all (0)
PackedTreeMeta
¶
Bases: NamedTuple
The metadata for a PackedTree
checksum_method
class-attribute
instance-attribute
¶
Method for computing the checksum
packed_spec_version
class-attribute
instance-attribute
¶
Current version of the packed metadata standard in semver format
particle_threshold
class-attribute
instance-attribute
¶
Particle threshold used when creating the tree
create_metadata
¶
Create a tree metadata object from the provided info
Parameters:
-
box(BoundingBox) –The bounding box used to create the tree
-
packed(NDArray[uint32]) –The packed tree data
-
particle_threshold(int, default:_DEFAULT_PARTICLE_THRESHOLD) –The particle threshold to split leaves. Defaults to
optree._DEFAULT_PARTICLE_THRESHOLD
Returns:
-
TreeMeta–A TreeMeta object containing the tree's metadata
extract_metadata
¶
Extract the metadata and packed tree information from a buffer
Parameters:
-
source(Buffer) –A Buffer containing the packed data
Returns:
-
metadata(TreeMeta) –The tree's metadata
-
packed_tree(NDArray[uint32]) –The actual packed tree data as a numpy array
Raises:
-
ValueError–If the metadata does not match an expected format
-
NotImplementedError–If an unimplemented data format is specified (currently only uint32)
-
OctreeError–If the metadata checksum does not match the computed checksum