Skip to content

Reference

The packingcubes package

packingcubes aims to provide a fast, minimal-memory-usage octree implementation, specialized for use in astronomical/astrophysical contexts. It's written in pure python, with Numba-based acceleration of the critical code paths.

packingcubes can be used from the command line, via the packcubes command, but will generally be used programmatically.

The following classes and methods represent the primary public interface of packingcubes, though we expect most users will rely on only a small portion.

Classes:

  • ParticleCubes

    Workhorse class. Performs parallel sorting of datasets and PackedTree creation. Expected to be the primary interface for users.

  • GadgetishHDF5Dataset

    Use to load HDF5 Datasets that look like Gadget-2 snapshots

  • InMemory

    Use to convert an in-memory array into a dataset

  • PackedTree

    Actual Octree implementation.

  • OpTree

    Use as a drop-in replacement for SciPy's KDTree. Not all functionality is implemented, but query and query_ball_point are. Tree creation and the query_ball_point implementation should be significantly faster than SciPy's for larger (> 10_000) particle balls, and tree size should be substantially smaller.

Functions:

  • Cubes

    Intended ParticleCubes creation method. Can accept snaphsot file paths, Dataset objects, and position arrays and returns a ParticleCubes object

GadgetishHDF5Dataset

GadgetishHDF5Dataset(*, name=None, filepath, sorted_filepath=None, particle_type=None, data_slices=None, **kwargs)

Bases: HDF5Dataset

HDF5 dataset with Gadget-2 like header

Represents an HDF5 dataset that at least has the fields from the Gadget-2 header specification here

Parameters:

  • filepath (str | Path) –

    The path to the file

  • name (str | None, default: None ) –

    A name for this dataset. Defaults to filepath

  • sorted_filepath (str | Path | None, default: None ) –

    Optional file to store sorted position and shuffle-list data. Will also search for positions data from this file before searching filepath. Defaults to filepath.parent/filepath.stem + "_sorted.hdf5"

  • particle_type (str | None, default: None ) –

    Initial particle type to (eagerly) load. Defaults to the first HDF5 group that starts with "Part".

  • data_slices

    A numpy slice object or dictionary of slice objects per particle type. This can be used to load only a portion of the dataset. Effectively, the dataset will be loaded as data = data[data_slice[0]:data_slice[1]:data_slice[3]]

  • **kwargs

    Additional arguments are discarded

bounding_box property

bounding_box

Return a copy of the bounding box for this dataset

data_container property

data_container

Return the DataContainer wrapping this dataset

data_slices property writable

data_slices

Slices of data to load. A value of None means load all data

extras property

extras

Additional sorted fields

filepath instance-attribute

filepath = filepath

The path to this dataset (can be empty)

index property

index

Return the shuffle list, creating if necessary

name instance-attribute

name = name

A name for this dataset (can be empty)

particle_numbers property

particle_numbers

Map of particle types to numbers in this dataset

particle_type property writable

particle_type

Current particle type

particle_types property

particle_types

List of particle types in this dataset

positions property

positions

Return the particle position data

sorted_filepath property writable

sorted_filepath

Path to the sorted data

__len__

__len__()

Return the number of particles in the dataset

__repr__

__repr__()

Return a string representation of this dataset

make_into_array

make_into_array(field)

Try to convert field into an array

Parameters:

  • field (str | NDArray | Any) –

    Object to be converted into an array

    Adds supported types:

    • strings representing fields in either the HDF5 file at self.filepath (unsorted) or self.sorted_filepath (sorted)

    See MultiParticleDataset for additional supported types.

Returns:

  • field_arr ( NDArray ) –

    An array of the values with the 1st dimension having the same length as positions

  • is_sorted ( bool ) –

    If field_arr is already sorted

Raises:

process_extra_fields

process_extra_fields(extra)

Process extra fields

How different types of extra fields are handled will depend on make_into_array, but the net effect will be a sorted array accessible as an attribute of this dataset instance with the name provided.

Parameters:

  • extra (Mapping[str, Any]) –

    A mapping of names to extra fields to attach.

Examples:

>>> dataset.process_extra_fields({"mass":"Mass"})
>>> dataset.mass
Note

Any attributes added via this method will only be sorted now. Any subsequent sorting will not affect the ordering of these attributes

reorder

reorder(new_order)

Impose a new order on the position data and shuffle list

save

save(*, output_file=None, force_overwrite=None, particle_type=None, fields=None, skip_positions=False, skip_index=False)

Save sorted particle positions and shuffle list to provided file

Parameters:

  • output_file (str | Path | None, default: None ) –

    File to save information to. Default is self.sorted_filepath

  • force_overwrite (bool | None, default: None ) –

    Force overwriting position and index data if the output file already contains it under the specified particle type

  • particle_type (str | None, default: None ) –

    Save positions under a different particle type than self.particle_type

  • fields (Collection[str] | None, default: None ) –

    Collection of fields in self.extras to save in addition to self.positions and self.index

  • skip_positions (bool, default: False ) –

    Do not save self.positions if True. Default False.

  • skip_index (bool, default: False ) –

    Do not save self.index if True. Default False.

InMemory

InMemory(*, positions, name='', filepath='', particle_type=None, bounding_box=None, **kwargs)

Bases: MultiParticleDataset

In-memory Dataset

Class for datasets where the positions data is entirely in-memory. These datasets generally are not expected to have a name or filepath and may consist solely of positions data.

Parameters:

  • positions (NDArray) –

    Array containing particle position data.

  • particle_type (str | None, default: None ) –

    Particle type these positions belong to. Default is "PartTypeIM"

  • filepath (str, default: '' ) –

    Specify a default save location if non-empty. Default is "".

  • **kwargs

    Additional arguments are discarded

bounding_box property

bounding_box

Return a copy of the bounding box for this dataset

data_container property

data_container

Return the DataContainer wrapping this dataset

extras property

extras

Additional sorted fields

filepath instance-attribute

filepath = filepath

The path to this dataset (can be empty)

index property

index

Return the shuffle list, creating if necessary

name instance-attribute

name = name

A name for this dataset (can be empty)

particle_numbers property

particle_numbers

Number of particles of each type

particle_type property writable

particle_type

Currently selected particle type

particle_types property

particle_types

List of particle types in this dataset

positions property

positions

Return the particle position data

__len__

__len__()

Return the number of particles in the dataset

__repr__

__repr__()

Return a string representation of this dataset

make_into_array

make_into_array(field)

Try to convert field into an array

Parameters:

  • field (NDArray | Any) –

    Object to be converted into an array.

    Supported types:

    • NDArrays with the same length (1st dimension) as positions. Always assumed unsorted.
    • (NDArray, is_sorted) tuples, where the NDArray must be like the above.

Returns:

  • field_arr ( NDArray ) –

    An array of the values with the 1st dimension having the same length as positions

  • is_sorted ( bool ) –

    If field_arr is already sorted

Raises:

process_extra_fields

process_extra_fields(extra)

Process extra fields

How different types of extra fields are handled will depend on make_into_array, but the net effect will be a sorted array accessible as an attribute of this dataset instance with the name provided.

Parameters:

  • extra (Mapping[str, Any]) –

    A mapping of names to extra fields to attach.

Examples:

>>> dataset.process_extra_fields({"mass":"Mass"})
>>> dataset.mass
Note

Any attributes added via this method will only be sorted now. Any subsequent sorting will not affect the ordering of these attributes

reorder

reorder(new_order)

Impose a new order on the position data and shuffle list

save

save(*, output_file=None, force_overwrite=None, particle_type=None, fields=None, skip_positions=False, skip_index=False)

Save sorted particle data and shuffle-list to disk in an HDF5 file

Parameters:

  • output_file (str | Path | None, default: None ) –

    The name of the output file. Defaults to self.filepath. Since this is "" unless specified, will raise a ValueError.

  • force_overwrite (bool | None, default: None ) –

    Force overwriting position and index data if the output file already contains it under the specified particle type

  • particle_type (str | None, default: None ) –

    Save positions under a different particle type than self.particle_type

  • fields (Collection[str] | None, default: None ) –

    Collection of fields in self.extras to save in addition to self.positions and self.index

  • skip_positions (bool, default: False ) –

    Do not save self.positions if True. Default False.

  • skip_index (bool, default: False ) –

    Do not save self.index if True. Default False.

Raises:

  • ValueError

    If no output_file or the empty string ("") is specified

OpTree

OpTree(data, leafsize=None, compact_nodes=None, copy_data=False, balanced_tree=None, boxsize=None, *, cubes_per_side=-1, save_dataset=False)

Class to mimic the SciPy KDTree API using ParticleCubes and PackedTrees

Will provide identical API to SciPy's KDTree to the extent possible given that ParticleCubes and PackedTrees are fundamentally different. Where 1-1 matches for a requested method, argument, or functionality are not possible, raise an OpTreeError if there is nothing similar and emit an OpTreeWarning explaining the replacement otherwise.

Warning! PackedTrees are not robust against large amounts of degenerate input data! Please sanitize data prior to usage if expecting data degeneracy levels above ~100 (i.e. 100 data points with the same values). Note that multiple degenerate regions are acceptable, assuming they are sufficiently separated.

Parameters:

  • data ((array_like, shape(n, m) | MultiParticleDataset)) –

    The n m-dimensional data points to be indexed. This array is preferentially not copied and will be sorted in place, so modifying this data will result in bogus results. The data are also copied if the OpTree is built with copy_data=True. Note: OpTrees are intended to support 3-dimensional data, so m>3 is not supported. For m<3, the data is padded with zeros (e.g. [[1, 2], [3, 4]] will become [[1, 2, 0], [3, 4, 0]]). This will lead to the data being copied. Can also pass in a Dataset directly for improved creation time.

  • leafsize (positive int, default: None ) –

    The number of points at which the algorithm switches over to brute-force. Default: 400.

  • compact_nodes (bool, default: None ) –

    This parameter is irrelevant for OpTrees and is only provided to match the KDTree API.

  • copy_data (bool, default: False ) –

    If True the data is copied to protect the kd-tree against data corruption and to prevent the original data from being sorted. Default: False.

  • balanced_tree (bool, default: None ) –

    OpTrees are always split at the bounding box midpoint, so this option is only provided to match the KDTree API

  • boxsize (array_like or scalar, default: None ) –

    Provide an explicit bounding box for the data in the form [x_min, y_min, z_min, dx, dy, dz]. If len(boxsize)==3, x_min = y_min = z_min = 0. If boxsize`` is a scalar,dx = dy = dz = boxsize. Otherboxsizelengths are unsupported. SciPy'sKDTree` will impose a toroidal topology in addition; this functionality is currently unsupported.

  • cubes_per_side (int, default: -1 ) –

    Size of the top-level grid. Must be between 3 and 32 or -1 (default). The default uses the number of available threads to ensure there are more grid cells than threads.

  • save_dataset (bool, default: False ) –

    If data is a dataset, save sorted positions/indices to file. Default False

data instance-attribute

data = positions

The n data points of dimension m to be indexed. The data is only copied if the "kd-tree" is built with copy_data=True.

leafsize instance-attribute

leafsize = _DEFAULT_PARTICLE_THRESHOLD if leafsize is None else leafsize

The number of points at which the algorithm switches over to brute-force

maxs instance-attribute

maxs = box[:3] + box[3:]

The maximum value in each dimension of the n data points

mins instance-attribute

mins = box[:3]

The minimum value in each dimension of the n data points

n instance-attribute

n = len(data)

The number of data points.

size instance-attribute

size = sum((int(len(tree) / 5)) for ct in (cube_trees))

The number of nodes in the tree.

sort_index property

sort_index

Shuffle list for the original data, ie self.data = data[self.sort_index]

count_neighbors

count_neighbors(*, other, r, p=None, weights=None, cumulative=None)

Count how many nearby pairs can be formed.

Count the number of pairs (x1, x2) can be formed, with x1 drawn from self and x2 drawn from other, and where distance(x1, x2, p) <= r.

Data points on self and other are optionally weighted by the weights argument. (See below)

WARNING

Not currently implemented.

Parameters:

  • other (OpTree) –

    The other tree to draw points from, can be the same tree as self

  • r (float | NDArray) –

    The radius to produce a count for. Mulltiple radii are searched with a single tree traversal. If the count is non-cumulative (cumulative=False), r defines the edges of the bins, and must be non-decreasing

  • p (float | None, default: None ) –

    Which Minkowski p-norm to use. Default 2.0. A finite large p may cause a ValueError if overflow can occur

  • weights (tuple[float | None, float | None] | NDArray | None, default: None ) –

    If None, the pair-counting is unweighted. If given as a tuple, weights[0] is the weights of points in self, and weights[1] is the weights of points in other; either can be None to indicate the points are unweighted. If given as an array_like, weights is the weights of points in self and other. For this to make sense, self and other must be the same tree. If self and other are two different trees, a ValueError is raised. Default: None

  • cumulative (bool | None, default: None ) –

    Whether the returned counts are cumulative. When cumulative is set to False the algorithm is optimized to work with a large number of bins (>10) specified by r. When cumulative is set to True, the algorithm is optimized to work with a small number of r. Default: True

Returns:

  • result ( scalar or 1-D array ) –

    The number of pairs. For unweighted counts, the result is integer. For weighted counts, the result is float. If cumulative is False, result[i] contains the counts with (-inf if i == 0 else r[i-1]) < R <= r[i]

Raises:

query

query(x, k=1, eps=None, p=None, distance_upper_bound=None, workers=None, *, return_data_indices=None, return_sorted=None)

Query the OpTree for nearest neighbors

Parameters:

  • x (ArrayLike) –

    An array of points to query

  • k (int | Sequence[int], default: 1 ) –

    Either the number of nearest neighbors to return or a list of the kth nearest neighbors to return, starting from 1. E.g., [2,3] will return the 2nd and 3rd nearest neighbors

  • eps (float | None, default: None ) –

    Return approximate nearest neighbors; Note that this parameter is unused

  • p (int | None, default: None ) –

    The Minkowski p-norm to use. 1 is the sum of absolute-values distance ("Manhattan" distance). 2 is the Euclidean distance. infinity is the maximum-coordinate-difference distance. Currently only p=2 is supported

  • distance_upper_bound (float | None, default: None ) –

    Return only neighbors from other nodes within this distance. This is used for tree pruning, so if you are doing a series of nearest-neighbor queries, it may help to supply the distance to the nearest neighbor of the most recent point.

  • workers (int | None, default: None ) –

    Number of workers to use for parallel processing. Only 1 is supported, for more, see Cubes

  • return_data_indices (bool | None, default: None ) –

    Return indices into the sorted data if True instead of into the original. Specify None to have this set by the copy_data argument used during tree construction.

  • return_sorted (bool | None, default: None ) –

    Flag to return the distances and indices in distance-sorted order. Set to False for a performance boost. Default True

Returns:

  • d ( float or array of floats ) –

    The distances to the nearest neighbors. If x has shape tuple+(self.m,), then d has shape tuple+(k,). When k==1, the last dimension of the output is squeezed. Missing neighbors are indicated with infinite distances. Hits are sorted by distance (nearest first)

  • i ( integer or array of integers ) –

    The index of each neighbor in self.data. i is the same shape as d. Missing neighbors are indicated with self.n.

Raises:

query_ball_point

query_ball_point(x, r, p=2.0, eps=None, workers=-1, *, return_sorted=False, return_length=False, return_lists=False, return_data_indices=None, strict=None)

Find all points within distance r of point(s) x.

Parameters:

  • x (array_like, shape tuple + (self.m,)) –

    The point or points to search for neighbors of.

  • r ((array_like, float)) –

    The radius of points to return, must broadcast to the length of x.

  • p (float, default: 2.0 ) –

    Which Minkowski p-norm to use. Should be in the range [1, inf]. A finite large p may cause a ValueError if overflow can occur.

  • eps (nonnegative float, default: None ) –

    Approximate search. Branches of the tree are not explored if their nearest points are further than r / (1 + eps), and branches are added in bulk if their furthest points are nearer than r * (1 + eps).

  • workers (int, default: -1 ) –

    Number of jobs to schedule for parallel processing. If -1 is given all processors are used. Default: -1. Note: SciPy's kdtree parallelizes across the number of points queried. Thus, querying on a single point gets no speed-up from parallelization. We parallelize on single point queries, thus the different default

  • return_sorted (bool, default: False ) –

    Sorts returned indices if True and does not sort them if False. If None, does not sort single point queries, but does sort multi-point queries which was the behavior before this option was added. Default False.

  • return_length (bool, default: False ) –

    Return the number of points inside the radius instead of a list of the indices. Note that this is much faster for large trees.

  • return_lists (bool, default: False ) –

    Force returning lists instead of arrays. OpTrees return arrays of indices by default, but this doesn't match the expected query_ball_point signature. To exactly match SciPy, set this to True.

  • return_data_indices (bool | None, default: None ) –

    Return indices into the sorted data if True instead of into the original. Specify None to have this set by the copy_data argument used during tree construction.

  • strict (bool | None, default: None ) –

    If False, compare only the approximate node distance. Should be significantly faster, but may include some amount of false positives. Default True

Returns:

  • results ( list or array of lists ) –

    If x is a single point, returns a list of the indices of the neighbors of x. If x is an array of points, returns an object array of shape tuple containing lists of neighbors.

Notes

If you have many points whose neighbors you want to find, you may save substantial amounts of time by putting them in a OpTree and using query_ball_tree.

Examples:

>>> import numpy as np
>>> from packingcubes import OpTree
>>> x, y = np.mgrid[0:5, 0:5]
>>> points = np.c_[x.ravel(), y.ravel()]
>>> tree = OpTree(points)
>>> sorted(tree.query_ball_point([2, 0], 1))
[5, 10, 11, 15]

Query multiple points and plot the results:

>>> import matplotlib.pyplot as plt
>>> points = np.asarray(points)
>>> plt.plot(points[:,0], points[:,1], '.')
>>> for results in tree.query_ball_point(([2, 0], [3, 3]), 1):
...     nearby_points = points[results]
...     plt.plot(nearby_points[:,0], nearby_points[:,1], 'o')
>>> plt.margins(0.1, 0.1)
>>> plt.show()

query_ball_tree

query_ball_tree(other, r, p=2, eps=None, *, strict=None, return_lists=None, return_sorted=None)

Find all pairs of points between self and other whose distance is at most r.

Parameters:

  • other (OpTree) –

    The tree containing points to search against

  • r

    The maximum distance, has to be positive

  • p (float, default: 2 ) –

    Which Minkowski norm to use. p has to meet the condition 1 <= p <= infinity

  • eps (float | None, default: None ) –

    Approximate search. Branches of the tree are not explored if their nearest points are further than r/(1+eps), and branches are added in bulk if their furthest points are nearer than r * (1+eps). eps has to be non-negative.

  • strict (bool | None, default: None ) –

    If False, compare only the approximate node distance. Should be significantly faster, but may include substantial amounts of false positives. Default True

  • return_lists (bool | None, default: None ) –

    Force returning lists instead of arrays. OpTrees return arrays of indices by default, but this doesn't match the expected query_ball_tree signature. For a slight performance increase, set this to False

  • return_sorted (bool | None, default: None ) –

    Force returning sorted lists. If the copy_data flag was passed during tree construction, the data used to generate the results may be in a different order than originally imposed. For example, results[0] = [5, 1, 2]. If order of output is important, set this flag to True, at a performance penalty.

Returns:

  • results ( list of lists ) –

    For each element self.data[i] of this tree, results[i] is a list of the indices of its neighbors in other.data

Raises:

query_pairs

query_pairs(r, p=2.0, *, eps=None, output_type=None, strict=None)

Find all pairs of points in self whose distance is at most r.

Parameters:

  • r (float) –

    The maximum distance, has to be positive

  • p (float, default: 2.0 ) –

    Which Minkowski norm to use. p has to meet the condition 1 <= p <= infinity

  • eps (float | None, default: None ) –

    Approximate search. Branches of the tree are not explored if their nearest points are further than r/(1+eps), and branches are added in bulk if their furthest points are nearer than r * (1+eps). eps has to be non-negative.

  • output_type (str | None, default: None ) –

    Choose the output container, 'set' or 'ndarray'. Default: 'set'

  • strict (bool | None, default: None ) –

    If False, compare only the approximate node distance. Should be significantly faster, but may include substantial amounts of false positives. Default True

Returns:

  • results ( set or NDArray ) –

    Set of pairs (i, j) with i<j, for which the corresponding positions are close. If output_type is 'ndarray', an ndarray is returned instead of a set.

Raises:

sparse_distance_matrix

sparse_distance_matrix(*, other, max_distance, p=None, output_type=None)

Compute a sparse distance matrix

Computes a distance matrix between two OpTrees, leaving as zero any distance greater than max_distance.

WARNING

Not currently implemented.

Parameters:

  • other (OpTree) –
  • max_distance (float) –
  • p (float | None, default: None ) –

    Which Minkowski p-norm to use. A finite large p may cause a ValueError if overflow can occur

  • output_type (str | None, default: None ) –

    Which container to use for output data. Default: "dok_matrix"

Returns:

  • result ( dok_matrix, coo_matrix, dict, or ndarray ) –

    Sparse matrix representing the results in a "dictionary of keys" format. If a dict is returned the keys are (i,j) tuples of indices. If output_type is "ndarray" a record array with fields "i", "j", and "v" is returned.

Raises:

PackedTree

PackedTree(*, dataset=None, source=None, particle_threshold=None, bounding_box=None, copy_data=False)

Bases: Octree

Public packed octree interface

This interface defines the methods for creating, manipulating, and traversing a packingcubes packed octree.

Attributes:

Note

Must provide either dataset or source. If provided source does not include metadata, must additionally provide either dataset or bounding_box.

Parameters:

  • dataset (NDArray | Dataset | None, default: None ) –

    An (N,3) array or Dataset containing particle data

  • source (Buffer | None, default: None ) –

    Pre-computed packed buffer containing this tree. Leave out to compute the tree from scratch.

  • particle_threshold (int | None, default: None ) –

    Number of particles allowed in a leaf before splitting. Defaults to octree._DEFAULT_PARTICLE_THRESHOLD

  • bounding_box (BoxLike | None, default: None ) –

    Bounding box of the tree. Required if metadata needs to be created and dataset is not provided.

    Will override the dataset bounding box.

  • copy_data (bool, default: False ) –

    If dataset is just an array, flag to copy data prior to construction. Defaults to False

metadata instance-attribute

metadata = metadata

The metadata for this packed tree

packed_form property

packed_form

Return this tree in full packed form

packed_meta property

packed_meta

Return a memoryview of the tree's packed metadata

packed_tree property

packed_tree

Return a memoryview of the tree's backing byte array

particle_threshold instance-attribute

particle_threshold = particle_threshold

The maximum leaf size before splitting, used in tree construction

__iter__

__iter__()

Iterate through all nodes of the octree.

Note that no guarantee is made of what order the nodes are traversed in

__len__

__len__()

Return the number of particles in the tree

count_neighbors

count_neighbors(*, other, r)

Count how many nearby pairs can be formed.

Parameters:

  • other (PackedTree) –

    The other tree to compare against

  • r (float) –

    The radius to produce a count for.

Returns:

  • result ( scalar or 1-d array ) –

    The number of pairs

get_closest_particle

get_closest_particle(xyz, *, check_neighbors=True)

Get nearest particle index (and distance) to point.

Parameters:

  • xyz (ArrayLike) –

    Coordinates of point to check

  • check_neighbors (bool, default: True ) –

    Flag to check whether we should look at neighbors of the smallest containing node. Default True

Returns:

  • closest_ind ( int ) –

    Absolute index of closest particle

  • closest_dist ( float ) –

    Distance to closest particle

Raises:

get_closest_particles

get_closest_particles(*, data, xyz, distance_upper_bound=None, p=2, k=1, use_data_indices=True, return_sorted=True)

Get kth nearest particle distances and indices to point.

Parameters:

  • data (DataContainer | Dataset) –

    Source of particle position data

  • xyz (NDArray) –

    Coordinates of point to check

  • distance_upper_bound (float | None, default: None ) –

    Return only neighbors from other nodes within this distance. This is used for tree pruning, so if you are doing a series of nearest-neighbor queries, it may help to supply the distance to the nearest neighbor of the most recent point.

  • p (float, default: 2 ) –

    Which Minkowski p-norm to use. 1 is the sum of absolute-values distance ("Manhattan" distance). 2 is the usual Euclidean distance. Infinity is the maximum-coordinate-difference distance. Currently, only p=2 is supported.

  • k (int, default: 1 ) –

    Number of closest particles to return. Default 1

  • use_data_indices (bool, default: True ) –

    Flag to return indices into the sorted dataset (True, default) or into the shuffle list (False)

  • return_sorted (bool, default: True ) –

    Flag to return the distances and indices in distance-sorted order. Set to False for a performance boost. Default True

Returns:

  • distances ( NDArray[float] ) –

    Distances to the kth nearest neighbors. Has shape (min(N,k),), where N is the number of particles in the sphere bounded by distance_upper_bound

  • indices ( NDArray[int] ) –

    Indices in data of the kth nearest neighbors. Has same shape as distances

Raises:

get_leaves

get_leaves()

Return a list of all leaf octree nodes in depth-first order

get_node

get_node(tag)

Return the node corresponding to the provided tag or None if not found.

Parameters:

  • tag (str) –

    The tag to search for

Returns:

  • node

    Node in octree with specified tag or None if it does not exist

get_particle_index_list_in_box

get_particle_index_list_in_box(*, data, box, strict=False)

Return all particles contained within the box.

Parameters:

  • data (DataContainer | Dataset) –

    Dataset containing the particle positions. Pass a DataContainer object for a slight performance increase

  • box (BoxLike) –

    Box to check

  • strict (bool, default: False ) –

    Flag to specify whether only particles inside box will be returned. If False (default), additional nearby particles may be included for signficantly increased performance

Returns:

  • indices ( NDArray[int]] ) –

    List of original particle indices contained within sphere

get_particle_index_list_in_sphere

get_particle_index_list_in_sphere(*, data, center, radius, strict=False)

Return all particles contained within sphere defined by center and radius.

Parameters:

  • data (DataContainer | Dataset) –

    Dataset containing the particle positions. Pass a DataContainer object for a slight performance increase

  • center (NDArray) –

    Center point of the sphere

  • radius (float) –

    Radius of the sphere

  • strict (bool, default: False ) –

    Flag to specify whether only particles inside the sphere will be returned. If False (default), additional nearby particles may be included for signficantly increased performance

Returns:

  • indices ( NDArray[int] ) –

    List of original particle indices contained within sphere

get_particle_indices_in_box

get_particle_indices_in_box(*, box)

Return all particles contained within the box.

Parameters:

  • box (BoxLike) –

    Box to check

Returns:

  • indices ( list[tuple[int, int]] ) –

    List of particle start-stop indices contained within sphere Third element of each tuple is a flag for whether only some particles (1) among the start-stop indices are contained or all (0)

get_particle_indices_in_sphere

get_particle_indices_in_sphere(*, center, radius)

Return all particles contained within sphere defined by center and radius.

Parameters:

  • center (NDArray) –

    Center point of the sphere

  • radius (float) –

    Radius of the sphere

Returns:

  • indices ( list[tuple[int, int, int]] ) –

    List of particle start-stop indices contained within sphere Third element of each tuple is a flag for whether only some particles (1) among the start-stop indices are contained or all (0)

ParticleCubes

ParticleCubes(*, cube_indices, cube_boxes, cube_trees, dataset=None, **kwargs)

The cubes for a single particle type

cube_boxes instance-attribute

cube_boxes = cube_boxes

The bounding boxes for each cube

cube_indices instance-attribute

cube_indices = cube_indices

Array of cube indices into the dataset

cube_trees instance-attribute

cube_trees = []

The packed trees for each cube

dataset property

dataset

Return the attached Dataset to this object or None

Box

Box(box, *, dataset=None, strict=False, fields=None, extras=None, save_filepath=None, save_particle_type=None)

Construct a box-shaped subdataset

Parameters:

  • box (BoxLike) –

    The box to search in

  • dataset (Dataset | None, default: None ) –

    Dataset containing the particle positions. Defaults to self.dataset.

  • strict (bool, default: False ) –

    Flag to specify whether only particles inside the shape will be returned. If False (default), additional nearby particles may be included for signficantly increased performance

  • fields (Collection[str] | None, default: None ) –

    Subset of fields in dataset.extras to include. Specify "all" to include everything in dataset.extras. Defaults to the empty set.

  • extras (Mapping[str, Any] | None, default: None ) –

    Additional fields to sort, add to dataset.extras, and include in the returned subdataset. See [process_extra_fields][] for more details. Defaults to None

  • save_filepath (str | None, default: None ) –

    If provided, save this subdataset to the specified file with the specified particle type. save_particle_type can be omitted to use the default particle type.

  • save_particle_type (str | None, default: None ) –

    If provided, save this subdataset to the specified file with the specified particle type. save_particle_type can be omitted to use the default particle type.

Returns:

  • InMemory

    Subdataset with the specified bounding volume and fields

Raises:

  • ValueError

    If fields are specified that are in neither extras nor dataset.extras.

Sphere

Sphere(center, radius, *, dataset=None, strict=False, fields=None, extras=None, save_filepath=None, save_particle_type=None)

Construct a spherical subdataset

Parameters:

  • center (ArrayLike) –

    Center point of the sphere

  • radius (float) –

    Radius of the sphere

  • dataset (Dataset | None, default: None ) –

    Dataset containing the particle positions. Defaults to self.dataset.

  • strict (bool, default: False ) –

    Flag to specify whether only particles inside the shape will be returned. If False (default), additional nearby particles may be included for signficantly increased performance

  • fields (Collection[str] | None, default: None ) –

    Subset of fields in dataset.extras to include. Specify "all" to include everything in dataset.extras. Defaults to the empty set.

  • extras (Mapping[str, Any] | None, default: None ) –

    Additional fields to sort, add to dataset.extras, and include in the returned subdataset. See [process_extra_fields][] for more details. Defaults to None

  • save_filepath (str | None, default: None ) –

    If provided, save this subdataset to the specified file with the specified particle type. save_particle_type can be omitted to use the default particle type.

  • save_particle_type (str | None, default: None ) –

    If provided, save this subdataset to the specified file with the specified particle type. save_particle_type can be omitted to use the default particle type.

Returns:

  • InMemory

    Subdataset with the specified bounding volume and fields

Raises:

  • ValueError

    If fields are specified that are in neither extras nor dataset.extras.

get_closest_particles

get_closest_particles(*, xyz, data=None, distance_upper_bound=None, p=None, k=None, return_shuffle_indices=None, return_sorted=None)

Get kth nearest particle distances and indices to point.

Parameters:

  • xyz (NDArray) –

    Coordinates of point to check

  • data (DataContainer | Dataset | None, default: None ) –

    Source of particle position data. Defaults to self.dataset.

  • distance_upper_bound (float | None, default: None ) –

    Return only neighbors from other nodes within this distance. This is used for tree pruning, so if you are doing a series of nearest-neighbor queries, it may help to supply the distance to the nearest neighbor of the most recent point.

  • p (float | None, default: None ) –

    Which Minkowski p-norm to use. 1 is the sum of absolute-values distance ("Manhattan" distance). 2 is the usual Euclidean distance. Infinity is the maximum-coordinate-difference distance. Currently, only p=2 is supported.

  • k (int | None, default: None ) –

    Number of closest particles to return. Default 1

  • return_shuffle_indices (bool | None, default: None ) –

    Flag to return the shuffle indices instead of the data indices. Default False.

  • return_sorted (bool | None, default: None ) –

    Flag to return the distances and indices in distance-sorted order. Set to False for a performance boost. Default True

Returns:

  • distances ( NDArray[float] ) –

    Distances to the kth nearest neighbors. Has shape (min(N,k),), where N is the number of particles in the sphere bounded by distance_upper_bound

  • indices ( NDArray[int] ) –

    Indices in data of the kth nearest neighbors. Has same shape as distances

Raises:

get_particle_index_list_in_box

get_particle_index_list_in_box(box, *, data=None, use_data_indices=True, strict=False)

Return all particle indices contained within the box

Parameters:

  • box (BoxLike) –

    The box to search in

  • data (DataContainer | Dataset | None, default: None ) –

    Dataset containing the particle positions. Pass a DataContainer object for a slight performance increase. Defaults to self.dataset.

  • use_data_indices (bool, default: True ) –

    Flag to return indices into the sorted dataset (True, default) or into the shuffle list (False)

  • strict (bool, default: False ) –

    Flag to specify whether only particles inside the shape will be returned. If False (default), additional nearby particles may be included for signficantly increased performance

Returns:

  • indices ( Array[int] ) –

    Array of particle indices contained within shape

Raises:

  • ValueError

    If data is None and self.dataset is None

get_particle_index_list_in_sphere

get_particle_index_list_in_sphere(center, radius, *, data=None, use_data_indices=True, strict=False)

Return all particle indices contained within the sphere

Parameters:

  • center (NDArray) –

    Center point of the sphere

  • radius (float) –

    Radius of the sphere

  • data (DataContainer | Dataset | None, default: None ) –

    Dataset containing the particle positions. Pass a DataContainer object for a slight performance increase. Defaults to self.dataset.

  • use_data_indices (bool, default: True ) –

    Flag to return indices into the sorted dataset (True, default) or into the shuffle list (False)

  • strict (bool, default: False ) –

    Flag to specify whether only particles inside the shape will be returned. If False (default), additional nearby particles may be included for signficantly increased performance

Returns:

  • indices ( NDArray[int] ) –

    Array of particle indices contained within the sphere

Raises:

  • ValueError

    If data is None and self.dataset is None

get_particle_indices_in_box

get_particle_indices_in_box(box)

Return all particles contained within the box

Parameters:

  • box (BoxLike) –

    Box to check

Returns:

  • indices ( Xx3 NDArray[np.int_] ) –

    Array of index information. Each row describes a chunk/slice of data in the form [start, stop, partial], where partial is a flag - (1) if the data chunk is entirely contained within box, (0) otherwise.

get_particle_indices_in_sphere

get_particle_indices_in_sphere(center, radius)

Return all particles contained within the sphere defined by center and radius

Parameters:

  • center (NDArray) –

    Center point of the sphere

  • radius (float) –

    Radius of the sphere

Returns:

  • indices ( Xx3 NDArray[np.int_] ) –

    Array of index information. Each row describes a chunk/slice of data in the form [start, stop, partial], where partial is a flag - (1) if the data chunk is entirely contained within the sphere, (0) otherwise.

save

save(dataset, *, force_overwrite=False)

Save cubes information to specified file

Parameters:

  • dataset (str | Path | HDF5Dataset) –

    Location to store cubes data.

  • force_overwrite (bool, default: False ) –

    If dataset already contains cubes data, overwrite if True. Default False

Returns:

  • Path

    Path to the saved cubes information

Cubes

Cubes(dataset=None, *, cubes_dict=None, particle_type=None, extras=None, **kwargs)

Create or load ParticleCubes objects from the provided data

As an alternative to a dataset, you can provide a dictionary containing cube data offsets, bounding boxes, and optionally PackedTrees as cube_indices, cube_boxes, and cube_trees. This could be useful in the case where a dataset has a natural top-level structure already, but may not yet have PackedTree subcomponents. As an example, a collection of disjoint blobs in a 3D parameter space, or if the dataset already contains an octree-like structure.

Parameters:

  • dataset (str | NDArray | MultiParticleDataset | None, default: None ) –

    Dataset containing positional data. Will be used to create a new ParticleCubes, including sorting. Must provide either this or cubes_dict, below. Assumes strings are filepaths to GadgetishHDF5Datasets.

  • cubes_dict (dict[str, NDArray | list[BoundingBox] | list[NDArray | PackedTree]] | None, default: None ) –

    Dictionary with 2-3 components:

    1. cube_indices - contains the data offsets for each cube's particles (i.e. cube 0 is from cubes_indices[0]:cubes_indices[1])
    2. cube_boxes - containes the BoundingBox for each cube
    3. cube_trees (optional) - contains the PackedTree for each cube
  • particle_type (str | None, default: None ) –

    The particle type to use. Unused if cubes_dict is provided. Defaults to dataset.particle_type

  • extras (Mapping[str, Any] | None, default: None ) –

    Attach additional fields to the dataset to be sorted. Unused if cubes_dict is provided. See process_extra_fields for MultiParticleDataset or GadgetishHDF5Dataset

  • **kwargs

    Extra arguments to InMemory/ GadgetishHDF5Dataset, make_cubes, and ParticleCubes for a description.

Returns:

  • ParticleCubes

    ParticleCubes object constructed from the dataset/dictionary

Raises:

  • CubesError

    If neither dataset nor cubes_dict is provided

See Also

ParticleCubes, MultiCubes

make_cubes

make_cubes(*, dataset, cubes_per_side=-1, cube_box=None, particle_threshold=None, particle_type=None, save_dataset=False, **kwargs)

Create a ParticleCubes from the provided dataset

Parameters:

  • dataset (MultiParticleDataset) –

    The dataset containing particle data. Will be sorted in-place, but will not save updated positional information unless save_dataset is True

  • cubes_per_side (int, default: -1 ) –

    Number of cubes on a side. Dataset will be divided into cubes_per_side**3 cubes, plus an additional cube to catch any remaining particles (if the cube_box is smaller than the actual data extants). Note: due to the PackedTree's packed format, cubes must contain fewer than ~4 billion particles. If cubes_per_side is too small to support this, a ValueError will be raised. The limit is per-particle-type.

  • cube_box (BoxLike | None, default: None ) –

    A box-like object (i.e. something that can convert to a (6,) ndarray) that delineates the region of data to be cubed. Any particles outside this region will fall into an overflow cube. Useful for zoom-in simulations or other datasets with sparse outer regions. Default is the data bounding box.

  • particle_threshold (int | None, default: None ) –

    Maximum number of particles in a tree leaf node. Default is 400

  • particle_type (str | None, default: None ) –

    Particle type to process. Default is dataset.particle_type

  • save_dataset (bool, default: False ) –

    Whether to save the sorted dataset positions out to a file using default values for the parameters. The data will be sorted in memory either way. Default False.

Returns:

Raises:

  • ValueError

    If requested particle type isn't in the dataset or if too few cubes were requested for the number of particles

Additional API

Additional module level information can be found at the following links

Cubes PackedTrees & OpTrees Data Objects Tree Visualization

Performance

For the Numba-based modules, see

DataContainers Numba Cubes Bounding Volumes Numba Packed Trees

Binary Layout

The PackedTree format can be found here.

Command Line Interface

Instructions on the CLI can be found here.