Reference¶

The packingcubes package

packingcubes aims to provide a fast, minimal-memory-usage octree implementation, specialized for use in astronomical/astrophysical contexts. It's written in pure python, with Numba-based acceleration of the critical code paths.

packingcubes can be used from the command line, via the packcubes command, but will generally be used programmatically.

The following classes and methods represent the primary public interface of packingcubes, though we expect most users will rely on only a small portion.

Classes:

ParticleCubes –

Workhorse class. Performs parallel sorting of datasets and PackedTree creation. Expected to be the primary interface for users.
GadgetishHDF5Dataset –

Use to load HDF5 Datasets that look like Gadget-2 snapshots
InMemory –

Use to convert an in-memory array into a dataset
PackedTree –

Actual Octree implementation.
OpTree –

Use as a drop-in replacement for SciPy's KDTree. Not all functionality is implemented, but query and query_ball_point are. Tree creation and the query_ball_point implementation should be significantly faster than SciPy's for larger (> 10_000) particle balls, and tree size should be substantially smaller.

Functions:

Cubes –

Intended ParticleCubes creation method. Can accept snaphsot file paths, Dataset objects, and position arrays and returns a ParticleCubes object

GadgetishHDF5Dataset ¶

GadgetishHDF5Dataset(*, name=None, filepath, sorted_filepath=None, particle_type=None, data_slices=None, **kwargs)

Bases: HDF5Dataset

HDF5 dataset with Gadget-2 like header

Represents an HDF5 dataset that at least has the fields from the Gadget-2 header specification here

Parameters:

filepath (str | Path) –

The path to the file
name (str | None, default: None ) –

A name for this dataset. Defaults to filepath
sorted_filepath (str | Path | None, default: None ) –

Optional file to store sorted position and shuffle-list data. Will also search for positions data from this file before searching filepath. Defaults to filepath.parent/filepath.stem + "_sorted.hdf5"
particle_type (str | None, default: None ) –

Initial particle type to (eagerly) load. Defaults to the first HDF5 group that starts with "Part".
data_slices –

A numpy slice object or dictionary of slice objects per particle type. This can be used to load only a portion of the dataset. Effectively, the dataset will be loaded as data = data[data_slice[0]:data_slice[1]:data_slice[3]]
**kwargs –

Additional arguments are discarded

bounding_box `property` ¶

bounding_box

Return a copy of the bounding box for this dataset

data_container `property` ¶

data_container

Return the DataContainer wrapping this dataset

data_slices `property` `writable` ¶

data_slices

Slices of data to load. A value of None means load all data

extras `property` ¶

extras

Additional sorted fields

filepath `instance-attribute` ¶

filepath = filepath

The path to this dataset (can be empty)

index `property` ¶

index

Return the shuffle list, creating if necessary

name `instance-attribute` ¶

name = name

A name for this dataset (can be empty)

particle_numbers `property` ¶

particle_numbers

Map of particle types to numbers in this dataset

particle_type `property` `writable` ¶

particle_type

Current particle type

particle_types `property` ¶

particle_types

List of particle types in this dataset

positions `property` ¶

positions

Return the particle position data

sorted_filepath `property` `writable` ¶

sorted_filepath

Path to the sorted data

len ¶

__len__()

Return the number of particles in the dataset

repr ¶

__repr__()

Return a string representation of this dataset

make_into_array ¶

make_into_array(field)

Try to convert field into an array

Parameters:

field (str | NDArray | Any) –
Object to be converted into an array

Adds supported types:
- strings representing fields in either the HDF5 file at self.filepath (unsorted) or self.sorted_filepath (sorted)
See MultiParticleDataset for additional supported types.

Returns:

field_arr ( NDArray ) –

An array of the values with the 1^st dimension having the same length as positions
is_sorted ( bool ) –

If field_arr is already sorted

Raises:

NotImplementedError –

If we do not know how to transform type(field) into an array

process_extra_fields ¶

process_extra_fields(extra)

Process extra fields

How different types of extra fields are handled will depend on make_into_array, but the net effect will be a sorted array accessible as an attribute of this dataset instance with the name provided.

Parameters:

extra (Mapping[str, Any]) –

A mapping of names to extra fields to attach.

Examples:

>>> dataset.process_extra_fields({"mass":"Mass"})
>>> dataset.mass

Note

Any attributes added via this method will only be sorted now. Any subsequent sorting will not affect the ordering of these attributes

reorder ¶

reorder(new_order)

Impose a new order on the position data and shuffle list

save ¶

save(*, output_file=None, force_overwrite=None, particle_type=None, fields=None, skip_positions=False, skip_index=False)

Save sorted particle positions and shuffle list to provided file

Parameters:

output_file (str | Path | None, default: None ) –

File to save information to. Default is self.sorted_filepath
force_overwrite (bool | None, default: None ) –

Force overwriting position and index data if the output file already contains it under the specified particle type
particle_type (str | None, default: None ) –

Save positions under a different particle type than self.particle_type
fields (Collection[str] | None, default: None ) –

Collection of fields in self.extras to save in addition to self.positions and self.index
skip_positions (bool, default: False ) –

Do not save self.positions if True. Default False.
skip_index (bool, default: False ) –

Do not save self.index if True. Default False.

InMemory ¶

InMemory(*, positions, name='', filepath='', particle_type=None, bounding_box=None, **kwargs)

Bases: MultiParticleDataset

In-memory Dataset

Class for datasets where the positions data is entirely in-memory. These datasets generally are not expected to have a name or filepath and may consist solely of positions data.

Parameters:

positions (NDArray) –

Array containing particle position data.
particle_type (str | None, default: None ) –

Particle type these positions belong to. Default is "PartTypeIM"
filepath (str, default: '' ) –

Specify a default save location if non-empty. Default is "".
**kwargs –

Additional arguments are discarded

bounding_box `property` ¶

bounding_box

Return a copy of the bounding box for this dataset

data_container `property` ¶

data_container

Return the DataContainer wrapping this dataset

extras `property` ¶

extras

Additional sorted fields

filepath `instance-attribute` ¶

filepath = filepath

The path to this dataset (can be empty)

index `property` ¶

index

Return the shuffle list, creating if necessary

name `instance-attribute` ¶

name = name

A name for this dataset (can be empty)

particle_numbers `property` ¶

particle_numbers

Number of particles of each type

particle_type `property` `writable` ¶

particle_type

Currently selected particle type

particle_types `property` ¶

particle_types

List of particle types in this dataset

positions `property` ¶

positions

Return the particle position data

len ¶

__len__()

Return the number of particles in the dataset

repr ¶

__repr__()

Return a string representation of this dataset

make_into_array ¶

make_into_array(field)

Try to convert field into an array

Parameters:

field (NDArray | Any) –
Object to be converted into an array.

Supported types:
- NDArrays with the same length (1^st dimension) as positions. Always assumed unsorted.
- (NDArray, is_sorted) tuples, where the NDArray must be like the above.

Returns:

field_arr ( NDArray ) –

An array of the values with the 1^st dimension having the same length as positions
is_sorted ( bool ) –

If field_arr is already sorted

Raises:

NotImplementedError –

If we do not know how to transform type(field) into an array

process_extra_fields ¶

process_extra_fields(extra)

Process extra fields

How different types of extra fields are handled will depend on make_into_array, but the net effect will be a sorted array accessible as an attribute of this dataset instance with the name provided.

Parameters:

extra (Mapping[str, Any]) –

A mapping of names to extra fields to attach.

Examples:

>>> dataset.process_extra_fields({"mass":"Mass"})
>>> dataset.mass

Note

Any attributes added via this method will only be sorted now. Any subsequent sorting will not affect the ordering of these attributes

reorder ¶

reorder(new_order)

Impose a new order on the position data and shuffle list

save ¶

save(*, output_file=None, force_overwrite=None, particle_type=None, fields=None, skip_positions=False, skip_index=False)

Save sorted particle data and shuffle-list to disk in an HDF5 file

Parameters:

output_file (str | Path | None, default: None ) –

The name of the output file. Defaults to self.filepath. Since this is "" unless specified, will raise a ValueError.
force_overwrite (bool | None, default: None ) –

Force overwriting position and index data if the output file already contains it under the specified particle type
particle_type (str | None, default: None ) –

Save positions under a different particle type than self.particle_type
fields (Collection[str] | None, default: None ) –

Collection of fields in self.extras to save in addition to self.positions and self.index
skip_positions (bool, default: False ) –

Do not save self.positions if True. Default False.
skip_index (bool, default: False ) –

Do not save self.index if True. Default False.

Raises:

ValueError –

If no output_file or the empty string ("") is specified

OpTree ¶

OpTree(data, leafsize=None, compact_nodes=None, copy_data=False, balanced_tree=None, boxsize=None, *, cubes_per_side=-1, save_dataset=False)

Class to mimic the SciPy KDTree API using ParticleCubes and PackedTrees

Will provide identical API to SciPy's KDTree to the extent possible given that ParticleCubes and PackedTrees are fundamentally different. Where 1-1 matches for a requested method, argument, or functionality are not possible, raise an OpTreeError if there is nothing similar and emit an OpTreeWarning explaining the replacement otherwise.

Warning! PackedTrees are not robust against large amounts of degenerate input data! Please sanitize data prior to usage if expecting data degeneracy levels above ~100 (i.e. 100 data points with the same values). Note that multiple degenerate regions are acceptable, assuming they are sufficiently separated.

Parameters:

data ((array_like, shape(n, m) | MultiParticleDataset)) –

The n m-dimensional data points to be indexed. This array is preferentially not copied and will be sorted in place, so modifying this data will result in bogus results. The data are also copied if the OpTree is built with copy_data=True. Note: OpTrees are intended to support 3-dimensional data, so m>3 is not supported. For m<3, the data is padded with zeros (e.g. [[1, 2], [3, 4]] will become [[1, 2, 0], [3, 4, 0]]). This will lead to the data being copied. Can also pass in a Dataset directly for improved creation time.
leafsize (positive int, default: None ) –

The number of points at which the algorithm switches over to brute-force. Default: 400.
compact_nodes (bool, default: None ) –

This parameter is irrelevant for OpTrees and is only provided to match the KDTree API.
copy_data (bool, default: False ) –

If True the data is copied to protect the kd-tree against data corruption and to prevent the original data from being sorted. Default: False.
balanced_tree (bool, default: None ) –

OpTrees are always split at the bounding box midpoint, so this option is only provided to match the KDTree API
boxsize (array_like or scalar, default: None ) –

Provide an explicit bounding box for the data in the form [x_min, y_min, z_min, dx, dy, dz]. If len(boxsize)==3, x_min = y_min = z_min = 0. If boxsize`` is a scalar,dx = dy = dz = boxsize. Otherboxsizelengths are unsupported. SciPy'sKDTree` will impose a toroidal topology in addition; this functionality is currently unsupported.
cubes_per_side (int, default: -1 ) –

Size of the top-level grid. Must be between 3 and 32 or -1 (default). The default uses the number of available threads to ensure there are more grid cells than threads.
save_dataset (bool, default: False ) –

If data is a dataset, save sorted positions/indices to file. Default False

data `instance-attribute` ¶

data = positions

The n data points of dimension m to be indexed. The data is only copied if the "kd-tree" is built with copy_data=True.

leafsize `instance-attribute` ¶

leafsize = _DEFAULT_PARTICLE_THRESHOLD if leafsize is None else leafsize

The number of points at which the algorithm switches over to brute-force

maxs `instance-attribute` ¶

maxs = box[:3] + box[3:]

The maximum value in each dimension of the n data points

mins `instance-attribute` ¶

mins = box[:3]

The minimum value in each dimension of the n data points

n `instance-attribute` ¶

n = len(data)

The number of data points.

size `instance-attribute` ¶

size = sum((int(len(tree) / 5)) for ct in (cube_trees))

The number of nodes in the tree.

sort_index `property` ¶

sort_index

Shuffle list for the original data, ie self.data = data[self.sort_index]

count_neighbors ¶

count_neighbors(*, other, r, p=None, weights=None, cumulative=None)

Count how many nearby pairs can be formed.

Count the number of pairs (x1, x2) can be formed, with x1 drawn from self and x2 drawn from other, and where distance(x1, x2, p) <= r.

Data points on self and other are optionally weighted by the weights argument. (See below)

WARNING

Not currently implemented.

Parameters:

other (OpTree) –

The other tree to draw points from, can be the same tree as self
r (float | NDArray) –

The radius to produce a count for. Mulltiple radii are searched with a single tree traversal. If the count is non-cumulative (cumulative=False), r defines the edges of the bins, and must be non-decreasing
p (float | None, default: None ) –

Which Minkowski p-norm to use. Default 2.0. A finite large p may cause a ValueError if overflow can occur
weights (tuple[float | None, float | None] | NDArray | None, default: None ) –

If None, the pair-counting is unweighted. If given as a tuple, weights[0] is the weights of points in self, and weights[1] is the weights of points in other; either can be None to indicate the points are unweighted. If given as an array_like, weights is the weights of points in self and other. For this to make sense, self and other must be the same tree. If self and other are two different trees, a ValueError is raised. Default: None
cumulative (bool | None, default: None ) –

Whether the returned counts are cumulative. When cumulative is set to False the algorithm is optimized to work with a large number of bins (>10) specified by r. When cumulative is set to True, the algorithm is optimized to work with a small number of r. Default: True

Returns:

result ( scalar or 1-D array ) –

The number of pairs. For unweighted counts, the result is integer. For weighted counts, the result is float. If cumulative is False, result[i] contains the counts with (-inf if i == 0 else r[i-1]) < R <= r[i]

Raises:

NotImplementedError –

query ¶

query(x, k=1, eps=None, p=None, distance_upper_bound=None, workers=None, *, return_data_indices=None, return_sorted=None)

Query the OpTree for nearest neighbors

Parameters:

x (ArrayLike) –

An array of points to query
k (int | Sequence[int], default: 1 ) –

Either the number of nearest neighbors to return or a list of the kth nearest neighbors to return, starting from 1. E.g., [2,3] will return the 2^nd and 3^rd nearest neighbors
eps (float | None, default: None ) –

Return approximate nearest neighbors; Note that this parameter is unused
p (int | None, default: None ) –

The Minkowski p-norm to use. 1 is the sum of absolute-values distance ("Manhattan" distance). 2 is the Euclidean distance. infinity is the maximum-coordinate-difference distance. Currently only p=2 is supported
distance_upper_bound (float | None, default: None ) –

Return only neighbors from other nodes within this distance. This is used for tree pruning, so if you are doing a series of nearest-neighbor queries, it may help to supply the distance to the nearest neighbor of the most recent point.
workers (int | None, default: None ) –

Number of workers to use for parallel processing. Only 1 is supported, for more, see Cubes
return_data_indices (bool | None, default: None ) –

Return indices into the sorted data if True instead of into the original. Specify None to have this set by the copy_data argument used during tree construction.
return_sorted (bool | None, default: None ) –

Flag to return the distances and indices in distance-sorted order. Set to False for a performance boost. Default True

Returns:

d ( float or array of floats ) –

The distances to the nearest neighbors. If x has shape tuple+(self.m,), then d has shape tuple+(k,). When k==1, the last dimension of the output is squeezed. Missing neighbors are indicated with infinite distances. Hits are sorted by distance (nearest first)
i ( integer or array of integers ) –

The index of each neighbor in self.data. i is the same shape as d. Missing neighbors are indicated with self.n.

Raises:

NotImplementedError –

if p!=2

query_ball_point ¶

query_ball_point(x, r, p=2.0, eps=None, workers=-1, *, return_sorted=False, return_length=False, return_lists=False, return_data_indices=None, strict=None)

Find all points within distance r of point(s) x.

Parameters:

x (array_like, shape tuple + (self.m,)) –

The point or points to search for neighbors of.
r ((array_like, float)) –

The radius of points to return, must broadcast to the length of x.
p (float, default: 2.0 ) –

Which Minkowski p-norm to use. Should be in the range [1, inf]. A finite large p may cause a ValueError if overflow can occur.
eps (nonnegative float, default: None ) –

Approximate search. Branches of the tree are not explored if their nearest points are further than r / (1 + eps), and branches are added in bulk if their furthest points are nearer than r * (1 + eps).
workers (int, default: -1 ) –

Number of jobs to schedule for parallel processing. If -1 is given all processors are used. Default: -1. Note: SciPy's kdtree parallelizes across the number of points queried. Thus, querying on a single point gets no speed-up from parallelization. We parallelize on single point queries, thus the different default
return_sorted (bool, default: False ) –

Sorts returned indices if True and does not sort them if False. If None, does not sort single point queries, but does sort multi-point queries which was the behavior before this option was added. Default False.
return_length (bool, default: False ) –

Return the number of points inside the radius instead of a list of the indices. Note that this is much faster for large trees.
return_lists (bool, default: False ) –

Force returning lists instead of arrays. OpTrees return arrays of indices by default, but this doesn't match the expected query_ball_point signature. To exactly match SciPy, set this to True.
return_data_indices (bool | None, default: None ) –

Return indices into the sorted data if True instead of into the original. Specify None to have this set by the copy_data argument used during tree construction.
strict (bool | None, default: None ) –

If False, compare only the approximate node distance. Should be significantly faster, but may include some amount of false positives. Default True

Returns:

results ( list or array of lists ) –

If x is a single point, returns a list of the indices of the neighbors of x. If x is an array of points, returns an object array of shape tuple containing lists of neighbors.

Notes

If you have many points whose neighbors you want to find, you may save substantial amounts of time by putting them in a OpTree and using query_ball_tree.

Examples:

>>> import numpy as np
>>> from packingcubes import OpTree
>>> x, y = np.mgrid[0:5, 0:5]
>>> points = np.c_[x.ravel(), y.ravel()]
>>> tree = OpTree(points)
>>> sorted(tree.query_ball_point([2, 0], 1))
[5, 10, 11, 15]

Query multiple points and plot the results:

>>> import matplotlib.pyplot as plt
>>> points = np.asarray(points)
>>> plt.plot(points[:,0], points[:,1], '.')
>>> for results in tree.query_ball_point(([2, 0], [3, 3]), 1):
...     nearby_points = points[results]
...     plt.plot(nearby_points[:,0], nearby_points[:,1], 'o')
>>> plt.margins(0.1, 0.1)
>>> plt.show()

query_ball_tree ¶

query_ball_tree(other, r, p=2, eps=None, *, strict=None, return_lists=None, return_sorted=None)

Find all pairs of points between self and other whose distance is at most r.

Parameters:

other (OpTree) –

The tree containing points to search against
r –

The maximum distance, has to be positive
p (float, default: 2 ) –

Which Minkowski norm to use. p has to meet the condition 1 <= p <= infinity
eps (float | None, default: None ) –

Approximate search. Branches of the tree are not explored if their nearest points are further than r/(1+eps), and branches are added in bulk if their furthest points are nearer than r * (1+eps). eps has to be non-negative.
strict (bool | None, default: None ) –

If False, compare only the approximate node distance. Should be significantly faster, but may include substantial amounts of false positives. Default True
return_lists (bool | None, default: None ) –

Force returning lists instead of arrays. OpTrees return arrays of indices by default, but this doesn't match the expected query_ball_tree signature. For a slight performance increase, set this to False
return_sorted (bool | None, default: None ) –

Force returning sorted lists. If the copy_data flag was passed during tree construction, the data used to generate the results may be in a different order than originally imposed. For example, results[0] = [5, 1, 2]. If order of output is important, set this flag to True, at a performance penalty.

Returns:

results ( list of lists ) –

For each element self.data[i] of this tree, results[i] is a list of the indices of its neighbors in other.data

Raises:

NotImplementedError –

if p!=2

query_pairs ¶

query_pairs(r, p=2.0, *, eps=None, output_type=None, strict=None)

Find all pairs of points in self whose distance is at most r.

Parameters:

r (float) –

The maximum distance, has to be positive
p (float, default: 2.0 ) –

Which Minkowski norm to use. p has to meet the condition 1 <= p <= infinity
eps (float | None, default: None ) –

Approximate search. Branches of the tree are not explored if their nearest points are further than r/(1+eps), and branches are added in bulk if their furthest points are nearer than r * (1+eps). eps has to be non-negative.
output_type (str | None, default: None ) –

Choose the output container, 'set' or 'ndarray'. Default: 'set'
strict (bool | None, default: None ) –

If False, compare only the approximate node distance. Should be significantly faster, but may include substantial amounts of false positives. Default True

Returns:

results ( set or NDArray ) –

Set of pairs (i, j) with i<j, for which the corresponding positions are close. If output_type is 'ndarray', an ndarray is returned instead of a set.

Raises:

NotImplementedError –

if p!=2

sparse_distance_matrix ¶

sparse_distance_matrix(*, other, max_distance, p=None, output_type=None)

Compute a sparse distance matrix

Computes a distance matrix between two OpTrees, leaving as zero any distance greater than max_distance.

WARNING

Not currently implemented.

Parameters:

other (OpTree) –
max_distance (float) –
p (float | None, default: None ) –

Which Minkowski p-norm to use. A finite large p may cause a ValueError if overflow can occur
output_type (str | None, default: None ) –

Which container to use for output data. Default: "dok_matrix"

Returns:

result ( dok_matrix, coo_matrix, dict, or ndarray ) –

Sparse matrix representing the results in a "dictionary of keys" format. If a dict is returned the keys are (i,j) tuples of indices. If output_type is "ndarray" a record array with fields "i", "j", and "v" is returned.

Raises:

NotImplementedError –

PackedTree ¶

PackedTree(*, dataset=None, source=None, particle_threshold=None, bounding_box=None, copy_data=False)

Bases: Octree

Public packed octree interface

This interface defines the methods for creating, manipulating, and traversing a packingcubes packed octree.

Attributes:

data (Dataset) –

The backing dataset
particle_threshold (int) –

The maximum leaf size before splitting, used in tree construction

Note

Must provide either dataset or source. If provided source does not include metadata, must additionally provide either dataset or bounding_box.

Parameters:

dataset (NDArray | Dataset | None, default: None ) –

An (N,3) array or Dataset containing particle data
source (Buffer | None, default: None ) –

Pre-computed packed buffer containing this tree. Leave out to compute the tree from scratch.
particle_threshold (int | None, default: None ) –

Number of particles allowed in a leaf before splitting. Defaults to octree._DEFAULT_PARTICLE_THRESHOLD
bounding_box (BoxLike | None, default: None ) –

Bounding box of the tree. Required if metadata needs to be created and dataset is not provided.

Will override the dataset bounding box.
copy_data (bool, default: False ) –

If dataset is just an array, flag to copy data prior to construction. Defaults to False

metadata `instance-attribute` ¶

metadata = metadata

The metadata for this packed tree

packed_form `property` ¶

packed_form

Return this tree in full packed form

packed_meta `property` ¶

packed_meta

Return a memoryview of the tree's packed metadata

packed_tree `property` ¶

packed_tree

Return a memoryview of the tree's backing byte array

particle_threshold `instance-attribute` ¶

particle_threshold = particle_threshold

The maximum leaf size before splitting, used in tree construction

iter ¶

__iter__()

Iterate through all nodes of the octree.

Note that no guarantee is made of what order the nodes are traversed in

len ¶

__len__()

Return the number of particles in the tree

count_neighbors ¶

count_neighbors(*, other, r)

Count how many nearby pairs can be formed.

Parameters:

other (PackedTree) –

The other tree to compare against
r (float) –

The radius to produce a count for.

Returns:

result ( scalar or 1-d array ) –

The number of pairs

get_closest_particle ¶

get_closest_particle(xyz, *, check_neighbors=True)

Get nearest particle index (and distance) to point.

Parameters:

xyz (ArrayLike) –

Coordinates of point to check
check_neighbors (bool, default: True ) –

Flag to check whether we should look at neighbors of the smallest containing node. Default True

Returns:

closest_ind ( int ) –

Absolute index of closest particle
closest_dist ( float ) –

Distance to closest particle

Raises:

NotImplementedError –

This function is not implemented, see get_closest_particles instead

get_closest_particles ¶

get_closest_particles(*, data, xyz, distance_upper_bound=None, p=2, k=1, use_data_indices=True, return_sorted=True)

Get kth nearest particle distances and indices to point.

Parameters:

data (DataContainer | Dataset) –

Source of particle position data
xyz (NDArray) –

Coordinates of point to check
distance_upper_bound (float | None, default: None ) –

Return only neighbors from other nodes within this distance. This is used for tree pruning, so if you are doing a series of nearest-neighbor queries, it may help to supply the distance to the nearest neighbor of the most recent point.
p (float, default: 2 ) –

Which Minkowski p-norm to use. 1 is the sum of absolute-values distance ("Manhattan" distance). 2 is the usual Euclidean distance. Infinity is the maximum-coordinate-difference distance. Currently, only p=2 is supported.
k (int, default: 1 ) –

Number of closest particles to return. Default 1
use_data_indices (bool, default: True ) –

Flag to return indices into the sorted dataset (True, default) or into the shuffle list (False)
return_sorted (bool, default: True ) –

Flag to return the distances and indices in distance-sorted order. Set to False for a performance boost. Default True

Returns:

distances ( NDArray[float] ) –

Distances to the kth nearest neighbors. Has shape (min(N,k),), where N is the number of particles in the sphere bounded by distance_upper_bound
indices ( NDArray[int] ) –

Indices in data of the kth nearest neighbors. Has same shape as distances

Raises:

NotImplementedError –

If a p value of then 2 is provided

get_leaves ¶

get_leaves()

Return a list of all leaf octree nodes in depth-first order

get_node ¶

get_node(tag)

Return the node corresponding to the provided tag or None if not found.

Parameters:

tag (str) –

The tag to search for

Returns:

node –

Node in octree with specified tag or None if it does not exist

get_particle_index_list_in_box ¶

get_particle_index_list_in_box(*, data, box, strict=False)

Return all particles contained within the box.

Parameters:

data (DataContainer | Dataset) –

Dataset containing the particle positions. Pass a DataContainer object for a slight performance increase
box (BoxLike) –

Box to check
strict (bool, default: False ) –

Flag to specify whether only particles inside box will be returned. If False (default), additional nearby particles may be included for signficantly increased performance

Returns:

indices ( NDArray[int]] ) –

List of original particle indices contained within sphere

get_particle_index_list_in_sphere ¶

get_particle_index_list_in_sphere(*, data, center, radius, strict=False)

Return all particles contained within sphere defined by center and radius.

Parameters:

data (DataContainer | Dataset) –

Dataset containing the particle positions. Pass a DataContainer object for a slight performance increase
center (NDArray) –

Center point of the sphere
radius (float) –

Radius of the sphere
strict (bool, default: False ) –

Flag to specify whether only particles inside the sphere will be returned. If False (default), additional nearby particles may be included for signficantly increased performance

Returns:

indices ( NDArray[int] ) –

List of original particle indices contained within sphere

get_particle_indices_in_box ¶

get_particle_indices_in_box(*, box)

Return all particles contained within the box.

Parameters:

box (BoxLike) –

Box to check

Returns:

indices ( list[tuple[int, int]] ) –

List of particle start-stop indices contained within sphere Third element of each tuple is a flag for whether only some particles (1) among the start-stop indices are contained or all (0)

get_particle_indices_in_sphere ¶

get_particle_indices_in_sphere(*, center, radius)

Return all particles contained within sphere defined by center and radius.

Parameters:

center (NDArray) –

Center point of the sphere
radius (float) –

Radius of the sphere

Returns:

indices ( list[tuple[int, int, int]] ) –

List of particle start-stop indices contained within sphere Third element of each tuple is a flag for whether only some particles (1) among the start-stop indices are contained or all (0)

ParticleCubes ¶

ParticleCubes(*, cube_indices, cube_boxes, cube_trees, dataset=None, **kwargs)

The cubes for a single particle type

cube_boxes `instance-attribute` ¶

cube_boxes = cube_boxes

The bounding boxes for each cube

cube_indices `instance-attribute` ¶

cube_indices = cube_indices

Array of cube indices into the dataset

cube_trees `instance-attribute` ¶

cube_trees = []

The packed trees for each cube

dataset `property` ¶

dataset

Return the attached Dataset to this object or None

Box ¶

Box(box, *, dataset=None, strict=False, fields=None, extras=None, save_filepath=None, save_particle_type=None)

Construct a box-shaped subdataset

Parameters:

box (BoxLike) –

The box to search in
dataset (Dataset | None, default: None ) –

Dataset containing the particle positions. Defaults to self.dataset.
strict (bool, default: False ) –

Flag to specify whether only particles inside the shape will be returned. If False (default), additional nearby particles may be included for signficantly increased performance
fields (Collection[str] | None, default: None ) –

Subset of fields in dataset.extras to include. Specify "all" to include everything in dataset.extras. Defaults to the empty set.
extras (Mapping[str, Any] | None, default: None ) –

Additional fields to sort, add to dataset.extras, and include in the returned subdataset. See [process_extra_fields][] for more details. Defaults to None
save_filepath (str | None, default: None ) –

If provided, save this subdataset to the specified file with the specified particle type. save_particle_type can be omitted to use the default particle type.
save_particle_type (str | None, default: None ) –

If provided, save this subdataset to the specified file with the specified particle type. save_particle_type can be omitted to use the default particle type.

Returns:

InMemory –

Subdataset with the specified bounding volume and fields

Raises:

ValueError –

If fields are specified that are in neither extras nor dataset.extras.

Sphere ¶

Sphere(center, radius, *, dataset=None, strict=False, fields=None, extras=None, save_filepath=None, save_particle_type=None)

Construct a spherical subdataset

Parameters:

center (ArrayLike) –

Center point of the sphere
radius (float) –

Radius of the sphere
dataset (Dataset | None, default: None ) –

Dataset containing the particle positions. Defaults to self.dataset.
strict (bool, default: False ) –

Flag to specify whether only particles inside the shape will be returned. If False (default), additional nearby particles may be included for signficantly increased performance
fields (Collection[str] | None, default: None ) –

Subset of fields in dataset.extras to include. Specify "all" to include everything in dataset.extras. Defaults to the empty set.
extras (Mapping[str, Any] | None, default: None ) –

Additional fields to sort, add to dataset.extras, and include in the returned subdataset. See [process_extra_fields][] for more details. Defaults to None
save_filepath (str | None, default: None ) –

If provided, save this subdataset to the specified file with the specified particle type. save_particle_type can be omitted to use the default particle type.
save_particle_type (str | None, default: None ) –

If provided, save this subdataset to the specified file with the specified particle type. save_particle_type can be omitted to use the default particle type.

Returns:

InMemory –

Subdataset with the specified bounding volume and fields

Raises:

ValueError –

If fields are specified that are in neither extras nor dataset.extras.

get_closest_particles ¶

get_closest_particles(*, xyz, data=None, distance_upper_bound=None, p=None, k=None, return_shuffle_indices=None, return_sorted=None)

Get kth nearest particle distances and indices to point.

Parameters:

xyz (NDArray) –

Coordinates of point to check
data (DataContainer | Dataset | None, default: None ) –

Source of particle position data. Defaults to self.dataset.
distance_upper_bound (float | None, default: None ) –

Return only neighbors from other nodes within this distance. This is used for tree pruning, so if you are doing a series of nearest-neighbor queries, it may help to supply the distance to the nearest neighbor of the most recent point.
p (float | None, default: None ) –

Which Minkowski p-norm to use. 1 is the sum of absolute-values distance ("Manhattan" distance). 2 is the usual Euclidean distance. Infinity is the maximum-coordinate-difference distance. Currently, only p=2 is supported.
k (int | None, default: None ) –

Number of closest particles to return. Default 1
return_shuffle_indices (bool | None, default: None ) –

Flag to return the shuffle indices instead of the data indices. Default False.
return_sorted (bool | None, default: None ) –

Flag to return the distances and indices in distance-sorted order. Set to False for a performance boost. Default True

Returns:

distances ( NDArray[float] ) –

Distances to the kth nearest neighbors. Has shape (min(N,k),), where N is the number of particles in the sphere bounded by distance_upper_bound
indices ( NDArray[int] ) –

Indices in data of the kth nearest neighbors. Has same shape as distances

Raises:

NotImplementedError –

If a p value of greater than 2 is provided
ValueError –

If data is None and self.dataset is None

get_particle_index_list_in_box ¶

get_particle_index_list_in_box(box, *, data=None, use_data_indices=True, strict=False)

Return all particle indices contained within the box

Parameters:

box (BoxLike) –

The box to search in
data (DataContainer | Dataset | None, default: None ) –

Dataset containing the particle positions. Pass a DataContainer object for a slight performance increase. Defaults to self.dataset.
use_data_indices (bool, default: True ) –

Flag to return indices into the sorted dataset (True, default) or into the shuffle list (False)
strict (bool, default: False ) –

Flag to specify whether only particles inside the shape will be returned. If False (default), additional nearby particles may be included for signficantly increased performance

Returns:

indices ( Array[int] ) –

Array of particle indices contained within shape

Raises:

ValueError –

If data is None and self.dataset is None

get_particle_index_list_in_sphere ¶

get_particle_index_list_in_sphere(center, radius, *, data=None, use_data_indices=True, strict=False)

Return all particle indices contained within the sphere

Parameters:

center (NDArray) –

Center point of the sphere
radius (float) –

Radius of the sphere
data (DataContainer | Dataset | None, default: None ) –

Dataset containing the particle positions. Pass a DataContainer object for a slight performance increase. Defaults to self.dataset.
use_data_indices (bool, default: True ) –

Flag to return indices into the sorted dataset (True, default) or into the shuffle list (False)
strict (bool, default: False ) –

Flag to specify whether only particles inside the shape will be returned. If False (default), additional nearby particles may be included for signficantly increased performance

Returns:

indices ( NDArray[int] ) –

Array of particle indices contained within the sphere

Raises:

ValueError –

If data is None and self.dataset is None

get_particle_indices_in_box ¶

get_particle_indices_in_box(box)

Return all particles contained within the box

Parameters:

box (BoxLike) –

Box to check

Returns:

indices ( Xx3 NDArray[np.int_] ) –

Array of index information. Each row describes a chunk/slice of data in the form [start, stop, partial], where partial is a flag - (1) if the data chunk is entirely contained within box, (0) otherwise.

get_particle_indices_in_sphere ¶

get_particle_indices_in_sphere(center, radius)

Return all particles contained within the sphere defined by center and radius

Parameters:

center (NDArray) –

Center point of the sphere
radius (float) –

Radius of the sphere

Returns:

indices ( Xx3 NDArray[np.int_] ) –

Array of index information. Each row describes a chunk/slice of data in the form [start, stop, partial], where partial is a flag - (1) if the data chunk is entirely contained within the sphere, (0) otherwise.

save ¶

save(dataset, *, force_overwrite=False)

Save cubes information to specified file

Parameters:

dataset (str | Path | HDF5Dataset) –

Location to store cubes data.
force_overwrite (bool, default: False ) –

If dataset already contains cubes data, overwrite if True. Default False

Returns:

Path –

Path to the saved cubes information

Cubes ¶

Cubes(dataset=None, *, cubes_dict=None, particle_type=None, extras=None, **kwargs)

Create or load ParticleCubes objects from the provided data

As an alternative to a dataset, you can provide a dictionary containing cube data offsets, bounding boxes, and optionally PackedTrees as cube_indices, cube_boxes, and cube_trees. This could be useful in the case where a dataset has a natural top-level structure already, but may not yet have PackedTree subcomponents. As an example, a collection of disjoint blobs in a 3D parameter space, or if the dataset already contains an octree-like structure.

Parameters:

dataset (str | NDArray | MultiParticleDataset | None, default: None ) –

Dataset containing positional data. Will be used to create a new ParticleCubes, including sorting. Must provide either this or cubes_dict, below. Assumes strings are filepaths to GadgetishHDF5Datasets.
cubes_dict (dict[str, NDArray | list[BoundingBox] | list[NDArray | PackedTree]] | None, default: None ) –
Dictionary with 2-3 components:
1. cube_indices - contains the data offsets for each cube's particles (i.e. cube 0 is from cubes_indices[0]:cubes_indices[1])
2. cube_boxes - containes the BoundingBox for each cube
3. cube_trees (optional) - contains the PackedTree for each cube
particle_type (str | None, default: None ) –

The particle type to use. Unused if cubes_dict is provided. Defaults to dataset.particle_type
extras (Mapping[str, Any] | None, default: None ) –

Attach additional fields to the dataset to be sorted. Unused if cubes_dict is provided. See process_extra_fields for MultiParticleDataset or GadgetishHDF5Dataset
**kwargs –

Extra arguments to InMemory/ GadgetishHDF5Dataset, make_cubes, and ParticleCubes for a description.

Returns:

ParticleCubes –

ParticleCubes object constructed from the dataset/dictionary

Raises:

CubesError –

If neither dataset nor cubes_dict is provided

make_cubes ¶

make_cubes(*, dataset, cubes_per_side=-1, cube_box=None, particle_threshold=None, particle_type=None, save_dataset=False, **kwargs)

Create a ParticleCubes from the provided dataset

Parameters:

dataset (MultiParticleDataset) –

The dataset containing particle data. Will be sorted in-place, but will not save updated positional information unless save_dataset is True
cubes_per_side (int, default: -1 ) –

Number of cubes on a side. Dataset will be divided into cubes_per_side**3 cubes, plus an additional cube to catch any remaining particles (if the cube_box is smaller than the actual data extants). Note: due to the PackedTree's packed format, cubes must contain fewer than ~4 billion particles. If cubes_per_side is too small to support this, a ValueError will be raised. The limit is per-particle-type.
cube_box (BoxLike | None, default: None ) –

A box-like object (i.e. something that can convert to a (6,) ndarray) that delineates the region of data to be cubed. Any particles outside this region will fall into an overflow cube. Useful for zoom-in simulations or other datasets with sparse outer regions. Default is the data bounding box.
particle_threshold (int | None, default: None ) –

Maximum number of particles in a tree leaf node. Default is 400
particle_type (str | None, default: None ) –

Particle type to process. Default is dataset.particle_type
save_dataset (bool, default: False ) –

Whether to save the sorted dataset positions out to a file using default values for the parameters. The data will be sorted in memory either way. Default False.

Returns:

ParticleCubes –

The created ParticleCubes object

Raises:

ValueError –

If requested particle type isn't in the dataset or if too few cubes were requested for the number of particles

Additional API¶

Additional module level information can be found at the following links

Cubes PackedTrees & OpTrees Data Objects Tree Visualization

Performance¶

For the Numba-based modules, see

DataContainers Numba Cubes Bounding Volumes Numba Packed Trees

Binary Layout¶

The PackedTree format can be found here.

Command Line Interface¶

Instructions on the CLI can be found here.

Reference¶

GadgetishHDF5Dataset ¶

bounding_box property ¶

data_container property ¶

data_slices property writable ¶

extras property ¶

filepath instance-attribute ¶

index property ¶

name instance-attribute ¶

particle_numbers property ¶

particle_type property writable ¶

particle_types property ¶

positions property ¶

sorted_filepath property writable ¶

__len__ ¶

__repr__ ¶

make_into_array ¶

process_extra_fields ¶

reorder ¶

save ¶

InMemory ¶

bounding_box property ¶

data_container property ¶

extras property ¶

filepath instance-attribute ¶

index property ¶

name instance-attribute ¶

particle_numbers property ¶

particle_type property writable ¶

particle_types property ¶

positions property ¶

__len__ ¶

__repr__ ¶

make_into_array ¶

process_extra_fields ¶

reorder ¶

save ¶

OpTree ¶

data instance-attribute ¶

leafsize instance-attribute ¶

maxs instance-attribute ¶

mins instance-attribute ¶

n instance-attribute ¶

size instance-attribute ¶

sort_index property ¶

count_neighbors ¶

query ¶

query_ball_point ¶

query_ball_tree ¶

query_pairs ¶

sparse_distance_matrix ¶

PackedTree ¶

metadata instance-attribute ¶

packed_form property ¶

packed_meta property ¶

packed_tree property ¶

particle_threshold instance-attribute ¶

__iter__ ¶

__len__ ¶

count_neighbors ¶

get_closest_particle ¶

get_closest_particles ¶

get_leaves ¶

get_node ¶

get_particle_index_list_in_box ¶

get_particle_index_list_in_sphere ¶

get_particle_indices_in_box ¶

get_particle_indices_in_sphere ¶

ParticleCubes ¶

cube_boxes instance-attribute ¶

cube_indices instance-attribute ¶

cube_trees instance-attribute ¶

dataset property ¶

Box ¶

Sphere ¶

get_closest_particles ¶

get_particle_index_list_in_box ¶

get_particle_index_list_in_sphere ¶

get_particle_indices_in_box ¶

get_particle_indices_in_sphere ¶

bounding_box `property` ¶

data_container `property` ¶

data_slices `property` `writable` ¶

extras `property` ¶

filepath `instance-attribute` ¶

index `property` ¶

name `instance-attribute` ¶

particle_numbers `property` ¶

particle_type `property` `writable` ¶

particle_types `property` ¶

positions `property` ¶

sorted_filepath `property` `writable` ¶

len ¶

repr ¶

bounding_box `property` ¶

data_container `property` ¶

extras `property` ¶

filepath `instance-attribute` ¶

index `property` ¶

name `instance-attribute` ¶

particle_numbers `property` ¶

particle_type `property` `writable` ¶

particle_types `property` ¶

positions `property` ¶

len ¶

repr ¶

data `instance-attribute` ¶

leafsize `instance-attribute` ¶

maxs `instance-attribute` ¶

mins `instance-attribute` ¶

n `instance-attribute` ¶

size `instance-attribute` ¶

sort_index `property` ¶

metadata `instance-attribute` ¶

packed_form `property` ¶

packed_meta `property` ¶

packed_tree `property` ¶

particle_threshold `instance-attribute` ¶

iter ¶

len ¶

cube_boxes `instance-attribute` ¶

cube_indices `instance-attribute` ¶

cube_trees `instance-attribute` ¶

dataset `property` ¶