Getting Started¶
We're on PyPI:
Basic Usage¶
Searching Data¶
If you just want to find the values of a particular field in a spherical region of your dataset, the simplest way to do it is the following "one-liner":
import packingcubes
field_in_sph = packingcubes.Cubes(
positions_array, # (1)!
extras={"field_name":field_array} # (2)!
).Sphere(center, radius, fields="all").field_name
positions_arrayis a 3D array of particle position datafield_arrayis an array withNorNxMelements, whereNis the number of particles."field_name"can be whatever you want to later call it.
That's it.
Positions Only?
Position data is always included, so if that's all you need, the "one-liner" is even shorter:
1D/2D and ND matrices
For technical reasons, direct ParticleCubes creation from 1D and 2D
matrices are not supported. See
Finding Particles Within a Shape
for an example of converting 2D data to 3D. 4D and higher dimensional
matrices are not supported.
In the next couple sections, we'll briefly explain what each step does, but for more information, see the Tutorials, Recipes, and Reference sections.
Creating a Dataset and ParticleCubes object¶
The packingcubes.Cubes(...) function as used here combines three steps into one:
- Create Dataset object (an InMemory object in this case)
- Sort the positions and create a ParticleCubes object
- Sort the extra field
field_arrayand attach it to the Dataset.
Datasets are multifunctional objects whose main
purpose is to unify various forms of positions data in a common format (as
dataset.positions). It's also used to sort any extra fields via the "shuffle
list" (dataset.index), an array containing the orginal index for each sorted
position. In this example, it acts as thin wrapper on the array, but it can
also be used for loading entire snapshots.
ParticleCubes are the
main workhorses of packingcubes; they contain the cubed-octree
structure/metadata that produces all the cool features. Constructing a
ParticleCubes object performs the initial sorting of the positions data
(in parallel!) and creates the "cubes" and
"octrees".
Searching the ParticleCubes¶
The next part, .Sphere(...), searches the ParticleCubes object and finds
all particles contained within the specified spherical region. It then creates
a subdataset containing the particle positions and any additional fields
(specified with the fields parameter, "all" includes everything available
in dataset.extras) corresponding to solely those particles that are contained
(but see note about Extra Particles).
Extra particles
By default, searches on ParticleCubes are assumed to be not strict for
maximal performance. So particle containment is tested on an octree node
level, i.e. if the node overlaps the containment region, all particles in
the node are considered to be contained. This may increase the number
of returned particles by a small fraction. If this is unacceptable, several
of the search methods, including Sphere and Box, allow passing a
strict parameter, which forces packingcubes to test each position in
nodes that only partially overlap for containment. This imposes an
additional, usually minor, performance penalty.
Data wrapping
packingcubes does not (yet!) support toroidal topology for snapshots. Meaning
if the snapshot has x bounds [0, 100], a sphere with radius 10 centered
at [0, 0, 0] would only search the region from 0 to 10 and will not include
the region from 90 to 100. See
Issue #28.
Units
packingcubes does not use units internally. That means any quantities defining
a search volume must be in the same units as the positions data (including
hubble parameters, if present). You can use quantities with units, e.g.
center = [10, 10, 10] * kpc, but no conversion occurs if the positions data
doesn't match (e.g. if it is in Mpc/h). See
Issue #21.
Lastly, we access the specified field attribute of our new subdataset.
Multiline approach¶
The "one-liner" above can be broken up into multiple lines to better highlight the object pipeline:
dataset = packingcubes.InMemory(positions=positions_array)
cubes = packingcubes.Cubes(dataset)
dataset.process_extra_fields({"field_name":field_array})
sphere = cubes.Sphere(center, radius, fields="field_name")
field_in_sph = sphere.field_name
Saving a Dataset/ParticleCubes¶
Saving is simple, though it does break up the "one-liner":
cubes = packingcubes.Cubes(
positions_array,
save_dataset=True, # (1)!
sorted_filepath="path/to/output.hdf5",
extras={"field_name":field_array}
)
cubes.save("cubes_output.hdf5") # (2)!
sphere = cubes.Sphere(
center, radius,
save_filepath="sphere_output.hdf5", # (3)!
fields="all",
)
- Saves the sorted dataset (including anything in
extras) to thesorted_filepath. - Recommend using the same file as
sorted_filepath. - Should not be the same as
sorted_filepath, unless you want to overwrite everything...
Loading it is identical to loading a snapshot:
Specifying Number of Threads¶
Both the ParticleCubes creation and search operate in parallel using numba's
threading layers.
By default we automatically use all threads available, but you can set a lower
number (if e.g. your code also does work in parallel) by either setting the
NUMBA_NUM_THREADS environment variable or numba.set_num_threads(num_threads),
where num_threads <= numba.config.NUMBA_NUM_THREADS.
Substituting SciPy's KDTree¶
We have also reimplemented some of the API from the KDTree in scipy.spatial,
notably the query_ball_point method (This is also what the benchmarks compare
against). Example usage modified from the scipy.spatial.KDTree documentation:
>>> import numpy as np
>>> from packingcubes import OpTree as KDTree # this is the only change you need to make
>>> x, y = np.mgrid[0:5, 0:5]
>>> points = np.c_[x.ravel(), y.ravel()]
>>> tree = KDTree(points)
>>> sorted(tree.query_ball_point([2, 0], 1))
[5, 10, 11, 15]
More information can be found in the Optrees Documentation.
Command Line Interface¶
The command
will generate the ParticleCubes data structure for each particle type found in the
snapshot file and store it in the OUTPUT file, creating if necessary. In this
context, a particle type is any top-level group in the snapshot whose name starts
with the string Part. See The Command Line Interface for more details.