Reference¶
The packingcubes package
packingcubes aims to provide a fast, minimal-memory-usage octree
implementation, specialized for use in astronomical/astrophysical contexts.
It's written in pure python, with Numba-based
acceleration of the critical code paths.
packingcubes can be used from the command line, via the packcubes
command, but will generally be used programmatically.
The following classes and methods represent the primary public interface of
packingcubes, though we expect most users will rely on only a small
portion.
Classes:
-
ParticleCubes–Workhorse class. Performs parallel sorting of datasets and
PackedTreecreation. Expected to be the primary interface for users. -
GadgetishHDF5Dataset–Use to load HDF5 Datasets that look like Gadget-2 snapshots
-
InMemory–Use to convert an in-memory array into a dataset
-
PackedTree–Actual Octree implementation.
-
OpTree–Use as a drop-in replacement for SciPy's KDTree. Not all functionality is implemented, but
queryandquery_ball_pointare. Tree creation and thequery_ball_pointimplementation should be significantly faster than SciPy's for larger (> 10_000) particle balls, and tree size should be substantially smaller.
Functions:
-
Cubes–Intended ParticleCubes creation method. Can accept snaphsot file paths, Dataset objects, and position arrays and returns a ParticleCubes object
GadgetishHDF5Dataset
¶
GadgetishHDF5Dataset(*, name=None, filepath, sorted_filepath=None, particle_type=None, data_slices=None, **kwargs)
Bases: HDF5Dataset
HDF5 dataset with Gadget-2 like header
Represents an HDF5 dataset that at least has the fields from the Gadget-2 header specification here
Parameters:
-
filepath(str | Path) –The path to the file
-
name(str | None, default:None) –A name for this dataset. Defaults to filepath
-
sorted_filepath(str | Path | None, default:None) –Optional file to store sorted position and shuffle-list data. Will also search for positions data from this file before searching filepath. Defaults to
filepath.parent/filepath.stem + "_sorted.hdf5" -
particle_type(str | None, default:None) –Initial particle type to (eagerly) load. Defaults to the first HDF5 group that starts with "Part".
-
data_slices–A numpy slice object or dictionary of slice objects per particle type. This can be used to load only a portion of the dataset. Effectively, the dataset will be loaded as
data = data[data_slice[0]:data_slice[1]:data_slice[3]] -
**kwargs–Additional arguments are discarded
data_slices
property
writable
¶
Slices of data to load. A value of None means load all data
make_into_array
¶
Try to convert field into an array
Parameters:
-
field(str | NDArray | Any) –Object to be converted into an array
Adds supported types:
- strings representing fields in either the HDF5 file at
self.filepath(unsorted) orself.sorted_filepath(sorted)
See MultiParticleDataset for additional supported types.
- strings representing fields in either the HDF5 file at
Returns:
-
field_arr(NDArray) –An array of the values with the 1st dimension having the same length as positions
-
is_sorted(bool) –If
field_arris already sorted
Raises:
-
NotImplementedError–If we do not know how to transform
type(field)into an array
process_extra_fields
¶
Process extra fields
How different types of extra fields are handled will depend on make_into_array, but the net effect will be a sorted array accessible as an attribute of this dataset instance with the name provided.
Parameters:
Examples:
Note
Any attributes added via this method will only be sorted now. Any subsequent sorting will not affect the ordering of these attributes
save
¶
save(*, output_file=None, force_overwrite=None, particle_type=None, fields=None, skip_positions=False, skip_index=False)
Save sorted particle positions and shuffle list to provided file
Parameters:
-
output_file(str | Path | None, default:None) –File to save information to. Default is
self.sorted_filepath -
force_overwrite(bool | None, default:None) –Force overwriting position and index data if the output file already contains it under the specified particle type
-
particle_type(str | None, default:None) –Save positions under a different particle type than
self.particle_type -
fields(Collection[str] | None, default:None) –Collection of fields in
self.extrasto save in addition toself.positionsandself.index -
skip_positions(bool, default:False) –Do not save
self.positionsifTrue. DefaultFalse. -
skip_index(bool, default:False) –Do not save
self.indexifTrue. DefaultFalse.
InMemory
¶
Bases: MultiParticleDataset
In-memory Dataset
Class for datasets where the positions data is entirely in-memory. These datasets generally are not expected to have a name or filepath and may consist solely of positions data.
Parameters:
-
positions(NDArray) –Array containing particle position data.
-
particle_type(str | None, default:None) –Particle type these positions belong to. Default is
"PartTypeIM" -
filepath(str, default:'') –Specify a default save location if non-empty. Default is "".
-
**kwargs–Additional arguments are discarded
make_into_array
¶
Try to convert field into an array
Parameters:
-
field(NDArray | Any) –Object to be converted into an array.
Supported types:
NDArrays with the same length (1st dimension) as positions. Always assumed unsorted.- (
NDArray,is_sorted) tuples, where the NDArray must be like the above.
Returns:
-
field_arr(NDArray) –An array of the values with the 1st dimension having the same length as positions
-
is_sorted(bool) –If
field_arris already sorted
Raises:
-
NotImplementedError–If we do not know how to transform
type(field)into an array
process_extra_fields
¶
Process extra fields
How different types of extra fields are handled will depend on make_into_array, but the net effect will be a sorted array accessible as an attribute of this dataset instance with the name provided.
Parameters:
Examples:
Note
Any attributes added via this method will only be sorted now. Any subsequent sorting will not affect the ordering of these attributes
save
¶
save(*, output_file=None, force_overwrite=None, particle_type=None, fields=None, skip_positions=False, skip_index=False)
Save sorted particle data and shuffle-list to disk in an HDF5 file
Parameters:
-
output_file(str | Path | None, default:None) –The name of the output file. Defaults to
self.filepath. Since this is""unless specified, will raise aValueError. -
force_overwrite(bool | None, default:None) –Force overwriting position and index data if the output file already contains it under the specified particle type
-
particle_type(str | None, default:None) –Save positions under a different particle type than
self.particle_type -
fields(Collection[str] | None, default:None) –Collection of fields in
self.extrasto save in addition toself.positionsandself.index -
skip_positions(bool, default:False) –Do not save
self.positionsifTrue. DefaultFalse. -
skip_index(bool, default:False) –Do not save
self.indexifTrue. DefaultFalse.
Raises:
-
ValueError–If no output_file or the empty string (
"") is specified
OpTree
¶
OpTree(data, leafsize=None, compact_nodes=None, copy_data=False, balanced_tree=None, boxsize=None, *, cubes_per_side=-1, save_dataset=False)
Class to mimic the SciPy KDTree API using ParticleCubes and PackedTrees
Will provide identical API to SciPy's KDTree to the extent possible given that ParticleCubes and PackedTrees are fundamentally different. Where 1-1 matches for a requested method, argument, or functionality are not possible, raise an OpTreeError if there is nothing similar and emit an OpTreeWarning explaining the replacement otherwise.
Warning! PackedTrees are not robust against large amounts of degenerate input data! Please sanitize data prior to usage if expecting data degeneracy levels above ~100 (i.e. 100 data points with the same values). Note that multiple degenerate regions are acceptable, assuming they are sufficiently separated.
Parameters:
-
data((array_like, shape(n, m) | MultiParticleDataset)) –The
nm-dimensional data points to be indexed. This array is preferentially not copied and will be sorted in place, so modifying this data will result in bogus results. The data are also copied if theOpTreeis built with copy_data=True. Note: OpTrees are intended to support 3-dimensional data, so m>3 is not supported. For m<3, the data is padded with zeros (e.g.[[1, 2], [3, 4]]will become[[1, 2, 0], [3, 4, 0]]). This will lead to the data being copied. Can also pass in a Dataset directly for improved creation time. -
leafsize(positive int, default:None) –The number of points at which the algorithm switches over to brute-force. Default:
400. -
compact_nodes(bool, default:None) –This parameter is irrelevant for
OpTrees and is only provided to match theKDTreeAPI. -
copy_data(bool, default:False) –If
Truethe data is copied to protect the kd-tree against data corruption and to prevent the original data from being sorted. Default:False. -
balanced_tree(bool, default:None) –OpTrees are always split at the bounding box midpoint, so this option is only provided to match theKDTreeAPI -
boxsize(array_like or scalar, default:None) –Provide an explicit bounding box for the data in the form
[x_min, y_min, z_min, dx, dy, dz]. Iflen(boxsize)==3,x_min = y_min = z_min = 0. Ifboxsize`` is a scalar,dx = dy = dz = boxsize. Otherboxsizelengths are unsupported. SciPy'sKDTree` will impose a toroidal topology in addition; this functionality is currently unsupported. -
cubes_per_side(int, default:-1) –Size of the top-level grid. Must be between 3 and 32 or -1 (default). The default uses the number of available threads to ensure there are more grid cells than threads.
-
save_dataset(bool, default:False) –If data is a dataset, save sorted positions/indices to file. Default
False
data
instance-attribute
¶
The n data points of dimension m to be indexed. The data is only copied if
the "kd-tree" is built with copy_data=True.
leafsize
instance-attribute
¶
The number of points at which the algorithm switches over to brute-force
maxs
instance-attribute
¶
The maximum value in each dimension of the n data points
size
instance-attribute
¶
The number of nodes in the tree.
sort_index
property
¶
Shuffle list for the original data, ie self.data = data[self.sort_index]
count_neighbors
¶
Count how many nearby pairs can be formed.
Count the number of pairs (x1, x2) can be formed, with x1 drawn from
self and x2 drawn from other, and where distance(x1, x2, p)
<= r.
Data points on self and other are optionally weighted by the weights
argument. (See below)
WARNING
Not currently implemented.
Parameters:
-
other(OpTree) –The other tree to draw points from, can be the same tree as self
-
r(float | NDArray) –The radius to produce a count for. Mulltiple radii are searched with a single tree traversal. If the count is non-cumulative (
cumulative=False),rdefines the edges of the bins, and must be non-decreasing -
p(float | None, default:None) –Which Minkowski p-norm to use. Default 2.0. A finite large p may cause a
ValueErrorif overflow can occur -
weights(tuple[float | None, float | None] | NDArray | None, default:None) –If None, the pair-counting is unweighted. If given as a tuple,
weights[0]is the weights of points inself, andweights[1]is the weights of points inother; either can be None to indicate the points are unweighted. If given as an array_like,weightsis the weights of points inselfandother. For this to make sense,selfandothermust be the same tree. Ifselfandotherare two different trees, aValueErroris raised. Default: None -
cumulative(bool | None, default:None) –Whether the returned counts are cumulative. When
cumulativeis set toFalsethe algorithm is optimized to work with a large number of bins (>10) specified byr. Whencumulativeis set toTrue, the algorithm is optimized to work with a small number ofr. Default: True
Returns:
-
result(scalar or 1-D array) –The number of pairs. For unweighted counts, the result is integer. For weighted counts, the result is float. If
cumulativeisFalse,result[i]contains the counts with(-inf if i == 0 else r[i-1]) < R <= r[i]
Raises:
query
¶
query(x, k=1, eps=None, p=None, distance_upper_bound=None, workers=None, *, return_data_indices=None, return_sorted=None)
Query the OpTree for nearest neighbors
Parameters:
-
x(ArrayLike) –An array of points to query
-
k(int | Sequence[int], default:1) –Either the number of nearest neighbors to return or a list of the kth nearest neighbors to return, starting from 1. E.g.,
[2,3]will return the 2nd and 3rd nearest neighbors -
eps(float | None, default:None) –Return approximate nearest neighbors; Note that this parameter is unused
-
p(int | None, default:None) –The Minkowski p-norm to use. 1 is the sum of absolute-values distance ("Manhattan" distance). 2 is the Euclidean distance. infinity is the maximum-coordinate-difference distance. Currently only p=2 is supported
-
distance_upper_bound(float | None, default:None) –Return only neighbors from other nodes within this distance. This is used for tree pruning, so if you are doing a series of nearest-neighbor queries, it may help to supply the distance to the nearest neighbor of the most recent point.
-
workers(int | None, default:None) –Number of workers to use for parallel processing. Only 1 is supported, for more, see Cubes
-
return_data_indices(bool | None, default:None) –Return indices into the sorted data if True instead of into the original. Specify None to have this set by the copy_data argument used during tree construction.
-
return_sorted(bool | None, default:None) –Flag to return the distances and indices in distance-sorted order. Set to
Falsefor a performance boost. DefaultTrue
Returns:
-
d(float or array of floats) –The distances to the nearest neighbors. If
xhas shapetuple+(self.m,), thendhas shapetuple+(k,). Whenk==1, the last dimension of the output is squeezed. Missing neighbors are indicated with infinite distances. Hits are sorted by distance (nearest first) -
i(integer or array of integers) –The index of each neighbor in
self.data.iis the same shape asd. Missing neighbors are indicated withself.n.
Raises:
-
NotImplementedError–if
p!=2
query_ball_point
¶
query_ball_point(x, r, p=2.0, eps=None, workers=-1, *, return_sorted=False, return_length=False, return_lists=False, return_data_indices=None, strict=None)
Find all points within distance r of point(s) x.
Parameters:
-
x(array_like, shape tuple + (self.m,)) –The point or points to search for neighbors of.
-
r((array_like, float)) –The radius of points to return, must broadcast to the length of
x. -
p(float, default:2.0) –Which Minkowski
p-norm to use. Should be in the range[1, inf]. A finite largepmay cause aValueErrorif overflow can occur. -
eps(nonnegative float, default:None) –Approximate search. Branches of the tree are not explored if their nearest points are further than
r / (1 + eps), and branches are added in bulk if their furthest points are nearer thanr * (1 + eps). -
workers(int, default:-1) –Number of jobs to schedule for parallel processing. If -1 is given all processors are used. Default: -1. Note: SciPy's kdtree parallelizes across the number of points queried. Thus, querying on a single point gets no speed-up from parallelization. We parallelize on single point queries, thus the different default
-
return_sorted(bool, default:False) –Sorts returned indices if
Trueand does not sort them ifFalse. IfNone, does not sort single point queries, but does sort multi-point queries which was the behavior before this option was added. DefaultFalse. -
return_length(bool, default:False) –Return the number of points inside the radius instead of a list of the indices. Note that this is much faster for large trees.
-
return_lists(bool, default:False) –Force returning lists instead of arrays.
OpTrees return arrays of indices by default, but this doesn't match the expected query_ball_point signature. To exactly match SciPy, set this toTrue. -
return_data_indices(bool | None, default:None) –Return indices into the sorted data if
Trueinstead of into the original. SpecifyNoneto have this set by the copy_data argument used during tree construction. -
strict(bool | None, default:None) –If
False, compare only the approximate node distance. Should be significantly faster, but may include some amount of false positives. DefaultTrue
Returns:
-
results(list or array of lists) –If
xis a single point, returns a list of the indices of the neighbors ofx. Ifxis an array of points, returns an object array of shape tuple containing lists of neighbors.
Notes
If you have many points whose neighbors you want to find, you may save substantial amounts of time by putting them in a OpTree and using query_ball_tree.
Examples:
>>> import numpy as np
>>> from packingcubes import OpTree
>>> x, y = np.mgrid[0:5, 0:5]
>>> points = np.c_[x.ravel(), y.ravel()]
>>> tree = OpTree(points)
>>> sorted(tree.query_ball_point([2, 0], 1))
[5, 10, 11, 15]
Query multiple points and plot the results:
>>> import matplotlib.pyplot as plt
>>> points = np.asarray(points)
>>> plt.plot(points[:,0], points[:,1], '.')
>>> for results in tree.query_ball_point(([2, 0], [3, 3]), 1):
... nearby_points = points[results]
... plt.plot(nearby_points[:,0], nearby_points[:,1], 'o')
>>> plt.margins(0.1, 0.1)
>>> plt.show()
query_ball_tree
¶
Find all pairs of points between self and other whose distance is at most r.
Parameters:
-
other(OpTree) –The tree containing points to search against
-
r–The maximum distance, has to be positive
-
p(float, default:2) –Which Minkowski norm to use.
phas to meet the condition1 <= p <= infinity -
eps(float | None, default:None) –Approximate search. Branches of the tree are not explored if their nearest points are further than
r/(1+eps), and branches are added in bulk if their furthest points are nearer thanr * (1+eps). eps has to be non-negative. -
strict(bool | None, default:None) –If
False, compare only the approximate node distance. Should be significantly faster, but may include substantial amounts of false positives. DefaultTrue -
return_lists(bool | None, default:None) –Force returning lists instead of arrays.
OpTrees return arrays of indices by default, but this doesn't match the expectedquery_ball_treesignature. For a slight performance increase, set this toFalse -
return_sorted(bool | None, default:None) –Force returning sorted lists. If the
copy_dataflag was passed during tree construction, the data used to generate the results may be in a different order than originally imposed. For example,results[0] = [5, 1, 2]. If order of output is important, set this flag toTrue, at a performance penalty.
Returns:
-
results(list of lists) –For each element
self.data[i]of this tree,results[i]is a list of the indices of its neighbors in other.data
Raises:
-
NotImplementedError–if p!=2
query_pairs
¶
Find all pairs of points in self whose distance is at most r.
Parameters:
-
r(float) –The maximum distance, has to be positive
-
p(float, default:2.0) –Which Minkowski norm to use.
phas to meet the condition1 <= p <= infinity -
eps(float | None, default:None) –Approximate search. Branches of the tree are not explored if their nearest points are further than
r/(1+eps), and branches are added in bulk if their furthest points are nearer thanr * (1+eps). eps has to be non-negative. -
output_type(str | None, default:None) –Choose the output container, 'set' or 'ndarray'. Default: 'set'
-
strict(bool | None, default:None) –If
False, compare only the approximate node distance. Should be significantly faster, but may include substantial amounts of false positives. DefaultTrue
Returns:
-
results(set or NDArray) –Set of pairs
(i, j)withi<j, for which the corresponding positions are close. If output_type is 'ndarray', an ndarray is returned instead of a set.
Raises:
-
NotImplementedError–if p!=2
sparse_distance_matrix
¶
Compute a sparse distance matrix
Computes a distance matrix between two OpTrees, leaving as zero any distance greater than max_distance.
WARNING
Not currently implemented.
Parameters:
-
other(OpTree) – -
max_distance(float) – -
p(float | None, default:None) –Which Minkowski p-norm to use. A finite large p may cause a
ValueErrorif overflow can occur -
output_type(str | None, default:None) –Which container to use for output data. Default: "dok_matrix"
Returns:
-
result(dok_matrix, coo_matrix, dict, or ndarray) –Sparse matrix representing the results in a "dictionary of keys" format. If a dict is returned the keys are (i,j) tuples of indices. If output_type is "ndarray" a record array with fields "i", "j", and "v" is returned.
Raises:
PackedTree
¶
PackedTree(*, dataset=None, source=None, particle_threshold=None, bounding_box=None, copy_data=False)
Bases: Octree
Public packed octree interface
This interface defines the methods for creating, manipulating, and
traversing a packingcubes packed octree.
Attributes:
-
data(Dataset) –The backing dataset
-
particle_threshold(int) –The maximum leaf size before splitting, used in tree construction
Note
Must provide either dataset or source. If provided source does not include metadata, must additionally provide either dataset or bounding_box.
Parameters:
-
dataset(NDArray | Dataset | None, default:None) –An (N,3) array or Dataset containing particle data
-
source(Buffer | None, default:None) –Pre-computed packed buffer containing this tree. Leave out to compute the tree from scratch.
-
particle_threshold(int | None, default:None) –Number of particles allowed in a leaf before splitting. Defaults to
octree._DEFAULT_PARTICLE_THRESHOLD -
bounding_box(BoxLike | None, default:None) –Bounding box of the tree. Required if metadata needs to be created and dataset is not provided.
Will override the dataset bounding box.
-
copy_data(bool, default:False) –If dataset is just an array, flag to copy data prior to construction. Defaults to
False
particle_threshold
instance-attribute
¶
The maximum leaf size before splitting, used in tree construction
__iter__
¶
Iterate through all nodes of the octree.
Note that no guarantee is made of what order the nodes are traversed in
count_neighbors
¶
Count how many nearby pairs can be formed.
Parameters:
-
other(PackedTree) –The other tree to compare against
-
r(float) –The radius to produce a count for.
Returns:
-
result(scalar or 1-d array) –The number of pairs
get_closest_particle
¶
Get nearest particle index (and distance) to point.
Parameters:
-
xyz(ArrayLike) –Coordinates of point to check
-
check_neighbors(bool, default:True) –Flag to check whether we should look at neighbors of the smallest containing node. Default True
Returns:
-
closest_ind(int) –Absolute index of closest particle
-
closest_dist(float) –Distance to closest particle
Raises:
-
NotImplementedError–This function is not implemented, see get_closest_particles instead
get_closest_particles
¶
get_closest_particles(*, data, xyz, distance_upper_bound=None, p=2, k=1, use_data_indices=True, return_sorted=True)
Get kth nearest particle distances and indices to point.
Parameters:
-
data(DataContainer | Dataset) –Source of particle position data
-
xyz(NDArray) –Coordinates of point to check
-
distance_upper_bound(float | None, default:None) –Return only neighbors from other nodes within this distance. This is used for tree pruning, so if you are doing a series of nearest-neighbor queries, it may help to supply the distance to the nearest neighbor of the most recent point.
-
p(float, default:2) –Which Minkowski p-norm to use. 1 is the sum of absolute-values distance ("Manhattan" distance). 2 is the usual Euclidean distance. Infinity is the maximum-coordinate-difference distance. Currently, only
p=2is supported. -
k(int, default:1) –Number of closest particles to return. Default
1 -
use_data_indices(bool, default:True) –Flag to return indices into the sorted dataset (
True, default) or into the shuffle list (False) -
return_sorted(bool, default:True) –Flag to return the distances and indices in distance-sorted order. Set to
Falsefor a performance boost. DefaultTrue
Returns:
-
distances(NDArray[float]) –Distances to the kth nearest neighbors. Has shape
(min(N,k),), whereNis the number of particles in the sphere bounded by distance_upper_bound -
indices(NDArray[int]) –Indices in data of the kth nearest neighbors. Has same shape as distances
Raises:
-
NotImplementedError–If a
pvalue of then 2 is provided
get_node
¶
Return the node corresponding to the provided tag or None if not found.
Parameters:
-
tag(str) –The tag to search for
Returns:
-
node–Node in octree with specified
tagorNoneif it does not exist
get_particle_index_list_in_box
¶
Return all particles contained within the box.
Parameters:
-
data(DataContainer | Dataset) –Dataset containing the particle positions. Pass a
DataContainerobject for a slight performance increase -
box(BoxLike) –Box to check
-
strict(bool, default:False) –Flag to specify whether only particles inside
boxwill be returned. IfFalse(default), additional nearby particles may be included for signficantly increased performance
Returns:
-
indices(NDArray[int]]) –List of original particle indices contained within sphere
get_particle_index_list_in_sphere
¶
Return all particles contained within sphere defined by center and radius.
Parameters:
-
data(DataContainer | Dataset) –Dataset containing the particle positions. Pass a
DataContainerobject for a slight performance increase -
center(NDArray) –Center point of the sphere
-
radius(float) –Radius of the sphere
-
strict(bool, default:False) –Flag to specify whether only particles inside the sphere will be returned. If
False(default), additional nearby particles may be included for signficantly increased performance
Returns:
-
indices(NDArray[int]) –List of original particle indices contained within sphere
get_particle_indices_in_box
¶
Return all particles contained within the box.
Parameters:
-
box(BoxLike) –Box to check
Returns:
-
indices(list[tuple[int, int]]) –List of particle start-stop indices contained within sphere Third element of each tuple is a flag for whether only some particles (1) among the start-stop indices are contained or all (0)
get_particle_indices_in_sphere
¶
Return all particles contained within sphere defined by center and radius.
Parameters:
-
center(NDArray) –Center point of the sphere
-
radius(float) –Radius of the sphere
Returns:
-
indices(list[tuple[int, int, int]]) –List of particle start-stop indices contained within sphere Third element of each tuple is a flag for whether only some particles (1) among the start-stop indices are contained or all (0)
ParticleCubes
¶
The cubes for a single particle type
cube_indices
instance-attribute
¶
Array of cube indices into the dataset
Box
¶
Box(box, *, dataset=None, strict=False, fields=None, extras=None, save_filepath=None, save_particle_type=None)
Construct a box-shaped subdataset
Parameters:
-
box(BoxLike) –The box to search in
-
dataset(Dataset | None, default:None) –Dataset containing the particle positions. Defaults to self.dataset.
-
strict(bool, default:False) –Flag to specify whether only particles inside the shape will be returned. If False (default), additional nearby particles may be included for signficantly increased performance
-
fields(Collection[str] | None, default:None) –Subset of fields in
dataset.extrasto include. Specify"all"to include everything indataset.extras. Defaults to the empty set. -
extras(Mapping[str, Any] | None, default:None) –Additional fields to sort, add to
dataset.extras, and include in the returned subdataset. See [process_extra_fields][] for more details. Defaults toNone -
save_filepath(str | None, default:None) –If provided, save this subdataset to the specified file with the specified particle type.
save_particle_typecan be omitted to use the default particle type. -
save_particle_type(str | None, default:None) –If provided, save this subdataset to the specified file with the specified particle type.
save_particle_typecan be omitted to use the default particle type.
Returns:
-
InMemory–Subdataset with the specified bounding volume and fields
Raises:
-
ValueError–If fields are specified that are in neither
extrasnordataset.extras.
Sphere
¶
Sphere(center, radius, *, dataset=None, strict=False, fields=None, extras=None, save_filepath=None, save_particle_type=None)
Construct a spherical subdataset
Parameters:
-
center(ArrayLike) –Center point of the sphere
-
radius(float) –Radius of the sphere
-
dataset(Dataset | None, default:None) –Dataset containing the particle positions. Defaults to self.dataset.
-
strict(bool, default:False) –Flag to specify whether only particles inside the shape will be returned. If False (default), additional nearby particles may be included for signficantly increased performance
-
fields(Collection[str] | None, default:None) –Subset of fields in
dataset.extrasto include. Specify"all"to include everything indataset.extras. Defaults to the empty set. -
extras(Mapping[str, Any] | None, default:None) –Additional fields to sort, add to
dataset.extras, and include in the returned subdataset. See [process_extra_fields][] for more details. Defaults toNone -
save_filepath(str | None, default:None) –If provided, save this subdataset to the specified file with the specified particle type.
save_particle_typecan be omitted to use the default particle type. -
save_particle_type(str | None, default:None) –If provided, save this subdataset to the specified file with the specified particle type.
save_particle_typecan be omitted to use the default particle type.
Returns:
-
InMemory–Subdataset with the specified bounding volume and fields
Raises:
-
ValueError–If fields are specified that are in neither
extrasnordataset.extras.
get_closest_particles
¶
get_closest_particles(*, xyz, data=None, distance_upper_bound=None, p=None, k=None, return_shuffle_indices=None, return_sorted=None)
Get kth nearest particle distances and indices to point.
Parameters:
-
xyz(NDArray) –Coordinates of point to check
-
data(DataContainer | Dataset | None, default:None) –Source of particle position data. Defaults to self.dataset.
-
distance_upper_bound(float | None, default:None) –Return only neighbors from other nodes within this distance. This is used for tree pruning, so if you are doing a series of nearest-neighbor queries, it may help to supply the distance to the nearest neighbor of the most recent point.
-
p(float | None, default:None) –Which Minkowski p-norm to use. 1 is the sum of absolute-values distance ("Manhattan" distance). 2 is the usual Euclidean distance. Infinity is the maximum-coordinate-difference distance. Currently, only p=2 is supported.
-
k(int | None, default:None) –Number of closest particles to return. Default 1
-
return_shuffle_indices(bool | None, default:None) –Flag to return the shuffle indices instead of the data indices. Default False.
-
return_sorted(bool | None, default:None) –Flag to return the distances and indices in distance-sorted order. Set to False for a performance boost. Default True
Returns:
-
distances(NDArray[float]) –Distances to the kth nearest neighbors. Has shape (min(N,k),), where N is the number of particles in the sphere bounded by distance_upper_bound
-
indices(NDArray[int]) –Indices in data of the kth nearest neighbors. Has same shape as distances
Raises:
-
NotImplementedError–If a p value of greater than 2 is provided
-
ValueError–If
dataisNoneandself.datasetisNone
get_particle_index_list_in_box
¶
Return all particle indices contained within the box
Parameters:
-
box(BoxLike) –The box to search in
-
data(DataContainer | Dataset | None, default:None) –Dataset containing the particle positions. Pass a DataContainer object for a slight performance increase. Defaults to self.dataset.
-
use_data_indices(bool, default:True) –Flag to return indices into the sorted dataset (True, default) or into the shuffle list (False)
-
strict(bool, default:False) –Flag to specify whether only particles inside the shape will be returned. If False (default), additional nearby particles may be included for signficantly increased performance
Returns:
-
indices(Array[int]) –Array of particle indices contained within shape
Raises:
-
ValueError–If
dataisNoneandself.datasetisNone
get_particle_index_list_in_sphere
¶
get_particle_index_list_in_sphere(center, radius, *, data=None, use_data_indices=True, strict=False)
Return all particle indices contained within the sphere
Parameters:
-
center(NDArray) –Center point of the sphere
-
radius(float) –Radius of the sphere
-
data(DataContainer | Dataset | None, default:None) –Dataset containing the particle positions. Pass a DataContainer object for a slight performance increase. Defaults to self.dataset.
-
use_data_indices(bool, default:True) –Flag to return indices into the sorted dataset (True, default) or into the shuffle list (False)
-
strict(bool, default:False) –Flag to specify whether only particles inside the shape will be returned. If False (default), additional nearby particles may be included for signficantly increased performance
Returns:
-
indices(NDArray[int]) –Array of particle indices contained within the sphere
Raises:
-
ValueError–If
dataisNoneandself.datasetisNone
get_particle_indices_in_box
¶
Return all particles contained within the box
Parameters:
-
box(BoxLike) –Box to check
Returns:
-
indices(Xx3 NDArray[np.int_]) –Array of index information. Each row describes a chunk/slice of data in the form
[start, stop, partial], where partial is a flag - (1) if the data chunk is entirely contained withinbox, (0) otherwise.
get_particle_indices_in_sphere
¶
Return all particles contained within the sphere defined by center and radius
Parameters:
-
center(NDArray) –Center point of the sphere
-
radius(float) –Radius of the sphere
Returns:
-
indices(Xx3 NDArray[np.int_]) –Array of index information. Each row describes a chunk/slice of data in the form
[start, stop, partial], where partial is a flag - (1) if the data chunk is entirely contained within the sphere, (0) otherwise.
Cubes
¶
Create or load ParticleCubes objects from the provided data
As an alternative to a dataset, you can provide a dictionary containing
cube data offsets, bounding boxes, and optionally PackedTrees as
cube_indices, cube_boxes, and cube_trees. This could be useful in the
case where a dataset has a natural top-level structure already, but may not
yet have PackedTree subcomponents. As an example, a collection of disjoint
blobs in a 3D parameter space, or if the dataset already contains an
octree-like structure.
Parameters:
-
dataset(str | NDArray | MultiParticleDataset | None, default:None) –Dataset containing positional data. Will be used to create a new ParticleCubes, including sorting. Must provide either this or
cubes_dict, below. Assumes strings are filepaths to GadgetishHDF5Datasets. -
cubes_dict(dict[str, NDArray | list[BoundingBox] | list[NDArray | PackedTree]] | None, default:None) –Dictionary with 2-3 components:
- cube_indices - contains the data offsets for each cube's particles
(i.e. cube 0 is from
cubes_indices[0]:cubes_indices[1]) - cube_boxes - containes the
BoundingBoxfor each cube - cube_trees (optional) - contains the
PackedTreefor each cube
- cube_indices - contains the data offsets for each cube's particles
(i.e. cube 0 is from
-
particle_type(str | None, default:None) –The particle type to use. Unused if
cubes_dictis provided. Defaults todataset.particle_type -
extras(Mapping[str, Any] | None, default:None) –Attach additional fields to the dataset to be sorted. Unused if
cubes_dictis provided. Seeprocess_extra_fieldsforMultiParticleDatasetorGadgetishHDF5Dataset -
**kwargs–Extra arguments to
InMemory/GadgetishHDF5Dataset,make_cubes, andParticleCubesfor a description.
Returns:
-
ParticleCubes–ParticleCubes object constructed from the dataset/dictionary
Raises:
-
CubesError–If neither
datasetnorcubes_dictis provided
See Also
make_cubes
¶
make_cubes(*, dataset, cubes_per_side=-1, cube_box=None, particle_threshold=None, particle_type=None, save_dataset=False, **kwargs)
Create a ParticleCubes from the provided dataset
Parameters:
-
dataset(MultiParticleDataset) –The dataset containing particle data. Will be sorted in-place, but will not save updated positional information unless
save_datasetisTrue -
cubes_per_side(int, default:-1) –Number of cubes on a side. Dataset will be divided into
cubes_per_side**3cubes, plus an additional cube to catch any remaining particles (if thecube_boxis smaller than the actual data extants). Note: due to thePackedTree's packed format, cubes must contain fewer than ~4 billion particles. Ifcubes_per_sideis too small to support this, aValueErrorwill be raised. The limit is per-particle-type. -
cube_box(BoxLike | None, default:None) –A box-like object (i.e. something that can convert to a
(6,)ndarray) that delineates the region of data to be cubed. Any particles outside this region will fall into an overflow cube. Useful for zoom-in simulations or other datasets with sparse outer regions. Default is the data bounding box. -
particle_threshold(int | None, default:None) –Maximum number of particles in a tree leaf node. Default is
400 -
particle_type(str | None, default:None) –Particle type to process. Default is
dataset.particle_type -
save_dataset(bool, default:False) –Whether to save the sorted dataset positions out to a file using default values for the parameters. The data will be sorted in memory either way. Default
False.
Returns:
-
ParticleCubes–The created ParticleCubes object
Raises:
-
ValueError–If requested particle type isn't in the dataset or if too few cubes were requested for the number of particles
Additional API¶
Additional module level information can be found at the following links
Cubes PackedTrees & OpTrees Data Objects Tree Visualization
Performance¶
For the Numba-based modules, see
DataContainers Numba Cubes Bounding Volumes Numba Packed Trees
Binary Layout¶
The PackedTree format can be found here.
Command Line Interface¶
Instructions on the CLI can be found here.