Data Objects
packingcubes.data_objects
¶
Particle Datasets
Classes:
-
GadgetishHDF5Dataset–Use to load HDF5 Datasets that look like Gadget-2 snapshots
-
InMemory–Use to convert an in-memory array into a dataset
-
HDF5Dataset–Abstract dataset, use as a base for custom subclassing for loading HDF5 data (see e.g. GadgetishHDF5Dataset)
-
Dataset–Generic dataset class, use for typing
-
MultiParticleDataset–Generic dataset class with multiple particle types, use for typing
-
DataContainer–Effectively a view of a Dataset, only use within
jitted code
GadgetishHDF5Dataset
¶
GadgetishHDF5Dataset(*, name=None, filepath, sorted_filepath=None, particle_type=None, data_slices=None, **kwargs)
Bases: HDF5Dataset
HDF5 dataset with Gadget-2 like header
Represents an HDF5 dataset that at least has the fields from the Gadget-2 header specification here
Parameters:
-
filepath(str | Path) –The path to the file
-
name(str | None, default:None) –A name for this dataset. Defaults to filepath
-
sorted_filepath(str | Path | None, default:None) –Optional file to store sorted position and shuffle-list data. Will also search for positions data from this file before searching filepath. Defaults to
filepath.parent/filepath.stem + "_sorted.hdf5" -
particle_type(str | None, default:None) –Initial particle type to (eagerly) load. Defaults to the first HDF5 group that starts with "Part".
-
data_slices–A numpy slice object or dictionary of slice objects per particle type. This can be used to load only a portion of the dataset. Effectively, the dataset will be loaded as
data = data[data_slice[0]:data_slice[1]:data_slice[3]] -
**kwargs–Additional arguments are discarded
Methods:
-
__len__–Return the number of particles in the dataset
-
__repr__–Return a string representation of this dataset
-
make_into_array–Try to convert field into an array
-
process_extra_fields–Process extra fields
-
reorder–Impose a new order on the position data and shuffle list
-
save–Save sorted particle positions and shuffle list to provided file
Attributes:
-
bounding_box(BoundingBox) –Return a copy of the bounding box for this dataset
-
data_container(DataContainer) –Return the DataContainer wrapping this dataset
-
data_slices–Slices of data to load. A value of None means load all data
-
extras(Set) –Additional sorted fields
-
filepath(Path) –The path to this dataset (can be empty)
-
index(NDArray) –Return the shuffle list, creating if necessary
-
name(str) –A name for this dataset (can be empty)
-
particle_numbers–Map of particle types to numbers in this dataset
-
particle_type–Current particle type
-
particle_types–List of particle types in this dataset
-
positions(NDArray) –Return the particle position data
-
sorted_filepath–Path to the sorted data
data_slices
property
writable
¶
Slices of data to load. A value of None means load all data
make_into_array
¶
Try to convert field into an array
Parameters:
-
field(str | NDArray | Any) –Object to be converted into an array
Adds supported types:
- strings representing fields in either the HDF5 file at
self.filepath(unsorted) orself.sorted_filepath(sorted)
See MultiParticleDataset for additional supported types.
- strings representing fields in either the HDF5 file at
Returns:
-
field_arr(NDArray) –An array of the values with the 1st dimension having the same length as positions
-
is_sorted(bool) –If
field_arris already sorted
Raises:
-
NotImplementedError–If we do not know how to transform
type(field)into an array
process_extra_fields
¶
Process extra fields
How different types of extra fields are handled will depend on make_into_array, but the net effect will be a sorted array accessible as an attribute of this dataset instance with the name provided.
Parameters:
Examples:
Note
Any attributes added via this method will only be sorted now. Any subsequent sorting will not affect the ordering of these attributes
save
¶
save(*, output_file=None, force_overwrite=None, particle_type=None, fields=None, skip_positions=False, skip_index=False)
Save sorted particle positions and shuffle list to provided file
Parameters:
-
output_file(str | Path | None, default:None) –File to save information to. Default is
self.sorted_filepath -
force_overwrite(bool | None, default:None) –Force overwriting position and index data if the output file already contains it under the specified particle type
-
particle_type(str | None, default:None) –Save positions under a different particle type than
self.particle_type -
fields(Collection[str] | None, default:None) –Collection of fields in
self.extrasto save in addition toself.positionsandself.index -
skip_positions(bool, default:False) –Do not save
self.positionsifTrue. DefaultFalse. -
skip_index(bool, default:False) –Do not save
self.indexifTrue. DefaultFalse.
InMemory
¶
Bases: MultiParticleDataset
In-memory Dataset
Class for datasets where the positions data is entirely in-memory. These datasets generally are not expected to have a name or filepath and may consist solely of positions data.
Parameters:
-
positions(NDArray) –Array containing particle position data.
-
particle_type(str | None, default:None) –Particle type these positions belong to. Default is
"PartTypeIM" -
filepath(str, default:'') –Specify a default save location if non-empty. Default is "".
-
**kwargs–Additional arguments are discarded
Methods:
-
__len__–Return the number of particles in the dataset
-
__repr__–Return a string representation of this dataset
-
make_into_array–Try to convert field into an array
-
process_extra_fields–Process extra fields
-
reorder–Impose a new order on the position data and shuffle list
-
save–Save sorted particle data and shuffle-list to disk in an HDF5 file
Attributes:
-
bounding_box(BoundingBox) –Return a copy of the bounding box for this dataset
-
data_container(DataContainer) –Return the DataContainer wrapping this dataset
-
extras(Set) –Additional sorted fields
-
filepath(Path) –The path to this dataset (can be empty)
-
index(NDArray) –Return the shuffle list, creating if necessary
-
name(str) –A name for this dataset (can be empty)
-
particle_numbers(dict[str, int]) –Number of particles of each type
-
particle_type(str) –Currently selected particle type
-
particle_types(list[str]) –List of particle types in this dataset
-
positions(NDArray) –Return the particle position data
make_into_array
¶
Try to convert field into an array
Parameters:
-
field(NDArray | Any) –Object to be converted into an array.
Supported types:
NDArrays with the same length (1st dimension) as positions. Always assumed unsorted.- (
NDArray,is_sorted) tuples, where the NDArray must be like the above.
Returns:
-
field_arr(NDArray) –An array of the values with the 1st dimension having the same length as positions
-
is_sorted(bool) –If
field_arris already sorted
Raises:
-
NotImplementedError–If we do not know how to transform
type(field)into an array
process_extra_fields
¶
Process extra fields
How different types of extra fields are handled will depend on make_into_array, but the net effect will be a sorted array accessible as an attribute of this dataset instance with the name provided.
Parameters:
Examples:
Note
Any attributes added via this method will only be sorted now. Any subsequent sorting will not affect the ordering of these attributes
save
¶
save(*, output_file=None, force_overwrite=None, particle_type=None, fields=None, skip_positions=False, skip_index=False)
Save sorted particle data and shuffle-list to disk in an HDF5 file
Parameters:
-
output_file(str | Path | None, default:None) –The name of the output file. Defaults to
self.filepath. Since this is""unless specified, will raise aValueError. -
force_overwrite(bool | None, default:None) –Force overwriting position and index data if the output file already contains it under the specified particle type
-
particle_type(str | None, default:None) –Save positions under a different particle type than
self.particle_type -
fields(Collection[str] | None, default:None) –Collection of fields in
self.extrasto save in addition toself.positionsandself.index -
skip_positions(bool, default:False) –Do not save
self.positionsifTrue. DefaultFalse. -
skip_index(bool, default:False) –Do not save
self.indexifTrue. DefaultFalse.
Raises:
-
ValueError–If no output_file or the empty string (
"") is specified
HDF5Dataset
¶
HDF5Dataset(*, name=None, filepath, sorted_filepath=None, particle_type=None, data_slices=None, **kwargs)
Bases: MultiParticleDataset
HDF5 Dataset
Base class for using HDF5 datasets. We will assume the entire positions array can be loaded into memory. We do not need to be able to load the entire dataset since this is for purely spatial sorting.
Note that for simplicity, only one particle type is available at a time.
You can use the particle_type and particle_types attributes to change
particle type and get a list of valid particle types.
Parameters:
-
filepath(str | Path) –The path to the file
-
name(str | None, default:None) –A name for this dataset. Defaults to filepath
-
sorted_filepath(str | Path | None, default:None) –Optional file to store sorted position and shuffle-list data. Will also search for positions data from this file before searching filepath.
-
particle_type(str | None, default:None) –Initial particle type to (eagerly) load.
-
data_slices–A numpy slice object or dictionary of slice objects per particle type. This can be used to load only a portion of the dataset. Effectively, the dataset will be loaded as
data = data[data_slice[0]:data_slice[1]:data_slice[3]]. Note: this is true even if loading from the sorted data! -
**kwargs–Additional arguments are discarded
Methods:
-
__len__–Return the number of particles in the dataset
-
__repr__–Return a string representation of this dataset
-
make_into_array–Try to convert field into an array
-
process_extra_fields–Process extra fields
-
reorder–Impose a new order on the position data and shuffle list
-
save–Save sorted particle positions and shuffle list to provided file
Attributes:
-
bounding_box(BoundingBox) –Return a copy of the bounding box for this dataset
-
data_container(DataContainer) –Return the DataContainer wrapping this dataset
-
data_slices–Slices of data to load. A value of None means load all data
-
extras(Set) –Additional sorted fields
-
filepath(Path) –The path to this dataset (can be empty)
-
index(NDArray) –Return the shuffle list, creating if necessary
-
name(str) –A name for this dataset (can be empty)
-
particle_numbers–Map of particle types to numbers in this dataset
-
particle_type–Current particle type
-
particle_types–List of particle types in this dataset
-
positions(NDArray) –Return the particle position data
-
sorted_filepath–Path to the sorted data
data_slices
property
writable
¶
Slices of data to load. A value of None means load all data
make_into_array
¶
Try to convert field into an array
Parameters:
-
field(str | NDArray | Any) –Object to be converted into an array
Adds supported types:
- strings representing fields in either the HDF5 file at
self.filepath(unsorted) orself.sorted_filepath(sorted)
See MultiParticleDataset for additional supported types.
- strings representing fields in either the HDF5 file at
Returns:
-
field_arr(NDArray) –An array of the values with the 1st dimension having the same length as positions
-
is_sorted(bool) –If
field_arris already sorted
Raises:
-
NotImplementedError–If we do not know how to transform
type(field)into an array
process_extra_fields
¶
Process extra fields
How different types of extra fields are handled will depend on make_into_array, but the net effect will be a sorted array accessible as an attribute of this dataset instance with the name provided.
Parameters:
Examples:
Note
Any attributes added via this method will only be sorted now. Any subsequent sorting will not affect the ordering of these attributes
save
¶
save(*, output_file=None, force_overwrite=None, particle_type=None, fields=None, skip_positions=False, skip_index=False)
Save sorted particle positions and shuffle list to provided file
Parameters:
-
output_file(str | Path | None, default:None) –File to save information to. Default is
self.sorted_filepath -
force_overwrite(bool | None, default:None) –Force overwriting position and index data if the output file already contains it under the specified particle type
-
particle_type(str | None, default:None) –Save positions under a different particle type than
self.particle_type -
fields(Collection[str] | None, default:None) –Collection of fields in
self.extrasto save in addition toself.positionsandself.index -
skip_positions(bool, default:False) –Do not save
self.positionsifTrue. DefaultFalse. -
skip_index(bool, default:False) –Do not save
self.indexifTrue. DefaultFalse.
MultiParticleDataset
¶
Dataset containing multiple particle types
Multiple particle types are handled by exposing only one particle type at a time.
Methods:
-
__len__–Return the number of particles in the dataset
-
__repr__–Return a string representation of this dataset
-
make_into_array–Try to convert field into an array
-
process_extra_fields–Process extra fields
-
reorder–Impose a new order on the position data and shuffle list
-
save–Save this dataset to disk
Attributes:
-
bounding_box(BoundingBox) –Return a copy of the bounding box for this dataset
-
data_container(DataContainer) –Return the DataContainer wrapping this dataset
-
extras(Set) –Additional sorted fields
-
filepath(Path) –The path to this dataset (can be empty)
-
index(NDArray) –Return the shuffle list, creating if necessary
-
name(str) –A name for this dataset (can be empty)
-
particle_numbers(dict[str, int]) –Number of particles of each type
-
particle_type(str) –Currently selected particle type
-
particle_types(list[str]) –List of particle types in this dataset
-
positions(NDArray) –Return the particle position data
make_into_array
¶
Try to convert field into an array
Parameters:
-
field(NDArray | Any) –Object to be converted into an array.
Supported types:
NDArrays with the same length (1st dimension) as positions. Always assumed unsorted.- (
NDArray,is_sorted) tuples, where the NDArray must be like the above.
Returns:
-
field_arr(NDArray) –An array of the values with the 1st dimension having the same length as positions
-
is_sorted(bool) –If
field_arris already sorted
Raises:
-
NotImplementedError–If we do not know how to transform
type(field)into an array
process_extra_fields
¶
Process extra fields
How different types of extra fields are handled will depend on make_into_array, but the net effect will be a sorted array accessible as an attribute of this dataset instance with the name provided.
Parameters:
Examples:
Note
Any attributes added via this method will only be sorted now. Any subsequent sorting will not affect the ordering of these attributes
save
abstractmethod
¶
save(*, output_file=None, force_overwrite=None, particle_type=None, fields=None, skip_positions=False, skip_index=False)
Save this dataset to disk
It will be up to the subclass to decide what that means
Parameters:
-
output_file(str | Path | None, default:None) –The name of the output file. Note this field is optional because there might be an obvious default.
-
force_overwrite(bool | None, default:None) –Force overwriting position and index data if the output file already contains it under the specified particle type
-
particle_type(str | None, default:None) –Save positions under a different particle type than
self.particle_type -
fields(Collection[str] | None, default:None) –Collection of fields in
self.extrasto save in addition toself.positionsandself.index -
skip_positions(bool, default:False) –Do not save
self.positionsifTrue. DefaultFalse. -
skip_index(bool, default:False) –Do not save
self.indexifTrue. DefaultFalse.
Dataset
¶
Base class for holding particle position data and associated shuffle list
This class is intended to be the primary interface for octree access to the position data. Essentially, it abstracts where the data is and what it looks like so the octrees only care about the position data (and more specifically, its order)
Methods:
-
__len__–Return the number of particles in the dataset
-
__repr__–Return a string representation of this dataset
-
make_into_array–Try to convert field into an array
-
process_extra_fields–Process extra fields
-
reorder–Impose a new order on the position data and shuffle list
Attributes:
-
bounding_box(BoundingBox) –Return a copy of the bounding box for this dataset
-
data_container(DataContainer) –Return the DataContainer wrapping this dataset
-
extras(Set) –Additional sorted fields
-
filepath(Path) –The path to this dataset (can be empty)
-
index(NDArray) –Return the shuffle list, creating if necessary
-
name(str) –A name for this dataset (can be empty)
-
positions(NDArray) –Return the particle position data
make_into_array
¶
Try to convert field into an array
Parameters:
-
field(NDArray | Any) –Object to be converted into an array.
Supported types:
NDArrays with the same length (1st dimension) as positions. Always assumed unsorted.- (
NDArray,is_sorted) tuples, where the NDArray must be like the above.
Returns:
-
field_arr(NDArray) –An array of the values with the 1st dimension having the same length as positions
-
is_sorted(bool) –If
field_arris already sorted
Raises:
-
NotImplementedError–If we do not know how to transform
type(field)into an array
process_extra_fields
¶
Process extra fields
How different types of extra fields are handled will depend on make_into_array, but the net effect will be a sorted array accessible as an attribute of this dataset instance with the name provided.
Parameters:
Examples:
Note
Any attributes added via this method will only be sorted now. Any subsequent sorting will not affect the ordering of these attributes