Data Objects

packingcubes.data_objects ¶

Particle Datasets

Classes:

GadgetishHDF5Dataset –

Use to load HDF5 Datasets that look like Gadget-2 snapshots
InMemory –

Use to convert an in-memory array into a dataset
HDF5Dataset –

Abstract dataset, use as a base for custom subclassing for loading HDF5 data (see e.g. GadgetishHDF5Dataset)
Dataset –

Generic dataset class, use for typing
MultiParticleDataset –

Generic dataset class with multiple particle types, use for typing
DataContainer –

Effectively a view of a Dataset, only use within jitted code

GadgetishHDF5Dataset ¶

GadgetishHDF5Dataset(*, name=None, filepath, sorted_filepath=None, particle_type=None, data_slices=None, **kwargs)

Bases: HDF5Dataset

HDF5 dataset with Gadget-2 like header

Represents an HDF5 dataset that at least has the fields from the Gadget-2 header specification here

Parameters:

filepath (str | Path) –

The path to the file
name (str | None, default: None ) –

A name for this dataset. Defaults to filepath
sorted_filepath (str | Path | None, default: None ) –

Optional file to store sorted position and shuffle-list data. Will also search for positions data from this file before searching filepath. Defaults to filepath.parent/filepath.stem + "_sorted.hdf5"
particle_type (str | None, default: None ) –

Initial particle type to (eagerly) load. Defaults to the first HDF5 group that starts with "Part".
data_slices –

A numpy slice object or dictionary of slice objects per particle type. This can be used to load only a portion of the dataset. Effectively, the dataset will be loaded as data = data[data_slice[0]:data_slice[1]:data_slice[3]]
**kwargs –

Additional arguments are discarded

Methods:

__len__ –

Return the number of particles in the dataset
__repr__ –

Return a string representation of this dataset
make_into_array –

Try to convert field into an array
process_extra_fields –

Process extra fields
reorder –

Impose a new order on the position data and shuffle list
save –

Save sorted particle positions and shuffle list to provided file

Attributes:

bounding_box (BoundingBox) –

Return a copy of the bounding box for this dataset
data_container (DataContainer) –

Return the DataContainer wrapping this dataset
data_slices –

Slices of data to load. A value of None means load all data
extras (Set) –

Additional sorted fields
filepath (Path) –

The path to this dataset (can be empty)
index (NDArray) –

Return the shuffle list, creating if necessary
name (str) –

A name for this dataset (can be empty)
particle_numbers –

Map of particle types to numbers in this dataset
particle_type –

Current particle type
particle_types –

List of particle types in this dataset
positions (NDArray) –

Return the particle position data
sorted_filepath –

Path to the sorted data

bounding_box `property` ¶

bounding_box

Return a copy of the bounding box for this dataset

data_container `property` ¶

data_container

Return the DataContainer wrapping this dataset

data_slices `property` `writable` ¶

data_slices

Slices of data to load. A value of None means load all data

extras `property` ¶

extras

Additional sorted fields

filepath `instance-attribute` ¶

filepath = filepath

The path to this dataset (can be empty)

index `property` ¶

index

Return the shuffle list, creating if necessary

name `instance-attribute` ¶

name = name

A name for this dataset (can be empty)

particle_numbers `property` ¶

particle_numbers

Map of particle types to numbers in this dataset

particle_type `property` `writable` ¶

particle_type

Current particle type

particle_types `property` ¶

particle_types

List of particle types in this dataset

positions `property` ¶

positions

Return the particle position data

sorted_filepath `property` `writable` ¶

sorted_filepath

Path to the sorted data

len ¶

__len__()

Return the number of particles in the dataset

repr ¶

__repr__()

Return a string representation of this dataset

make_into_array ¶

make_into_array(field)

Try to convert field into an array

Parameters:

field (str | NDArray | Any) –
Object to be converted into an array

Adds supported types:
- strings representing fields in either the HDF5 file at self.filepath (unsorted) or self.sorted_filepath (sorted)
See MultiParticleDataset for additional supported types.

Returns:

field_arr ( NDArray ) –

An array of the values with the 1^st dimension having the same length as positions
is_sorted ( bool ) –

If field_arr is already sorted

Raises:

NotImplementedError –

If we do not know how to transform type(field) into an array

process_extra_fields ¶

process_extra_fields(extra)

Process extra fields

How different types of extra fields are handled will depend on make_into_array, but the net effect will be a sorted array accessible as an attribute of this dataset instance with the name provided.

Parameters:

extra (Mapping[str, Any]) –

A mapping of names to extra fields to attach.

Examples:

>>> dataset.process_extra_fields({"mass":"Mass"})
>>> dataset.mass

Note

Any attributes added via this method will only be sorted now. Any subsequent sorting will not affect the ordering of these attributes

reorder ¶

reorder(new_order)

Impose a new order on the position data and shuffle list

save ¶

save(*, output_file=None, force_overwrite=None, particle_type=None, fields=None, skip_positions=False, skip_index=False)

Save sorted particle positions and shuffle list to provided file

Parameters:

output_file (str | Path | None, default: None ) –

File to save information to. Default is self.sorted_filepath
force_overwrite (bool | None, default: None ) –

Force overwriting position and index data if the output file already contains it under the specified particle type
particle_type (str | None, default: None ) –

Save positions under a different particle type than self.particle_type
fields (Collection[str] | None, default: None ) –

Collection of fields in self.extras to save in addition to self.positions and self.index
skip_positions (bool, default: False ) –

Do not save self.positions if True. Default False.
skip_index (bool, default: False ) –

Do not save self.index if True. Default False.

InMemory ¶

InMemory(*, positions, name='', filepath='', particle_type=None, bounding_box=None, **kwargs)

Bases: MultiParticleDataset

In-memory Dataset

Class for datasets where the positions data is entirely in-memory. These datasets generally are not expected to have a name or filepath and may consist solely of positions data.

Parameters:

positions (NDArray) –

Array containing particle position data.
particle_type (str | None, default: None ) –

Particle type these positions belong to. Default is "PartTypeIM"
filepath (str, default: '' ) –

Specify a default save location if non-empty. Default is "".
**kwargs –

Additional arguments are discarded

Methods:

__len__ –

Return the number of particles in the dataset
__repr__ –

Return a string representation of this dataset
make_into_array –

Try to convert field into an array
process_extra_fields –

Process extra fields
reorder –

Impose a new order on the position data and shuffle list
save –

Save sorted particle data and shuffle-list to disk in an HDF5 file

Attributes:

bounding_box (BoundingBox) –

Return a copy of the bounding box for this dataset
data_container (DataContainer) –

Return the DataContainer wrapping this dataset
extras (Set) –

Additional sorted fields
filepath (Path) –

The path to this dataset (can be empty)
index (NDArray) –

Return the shuffle list, creating if necessary
name (str) –

A name for this dataset (can be empty)
particle_numbers (dict[str, int]) –

Number of particles of each type
particle_type (str) –

Currently selected particle type
particle_types (list[str]) –

List of particle types in this dataset
positions (NDArray) –

Return the particle position data

bounding_box `property` ¶

bounding_box

Return a copy of the bounding box for this dataset

data_container `property` ¶

data_container

Return the DataContainer wrapping this dataset

extras `property` ¶

extras

Additional sorted fields

filepath `instance-attribute` ¶

filepath = filepath

The path to this dataset (can be empty)

index `property` ¶

index

Return the shuffle list, creating if necessary

name `instance-attribute` ¶

name = name

A name for this dataset (can be empty)

particle_numbers `property` ¶

particle_numbers

Number of particles of each type

particle_type `property` `writable` ¶

particle_type

Currently selected particle type

particle_types `property` ¶

particle_types

List of particle types in this dataset

positions `property` ¶

positions

Return the particle position data

len ¶

__len__()

Return the number of particles in the dataset

repr ¶

__repr__()

Return a string representation of this dataset

make_into_array ¶

make_into_array(field)

Try to convert field into an array

Parameters:

field (NDArray | Any) –
Object to be converted into an array.

Supported types:
- NDArrays with the same length (1^st dimension) as positions. Always assumed unsorted.
- (NDArray, is_sorted) tuples, where the NDArray must be like the above.

Returns:

field_arr ( NDArray ) –

An array of the values with the 1^st dimension having the same length as positions
is_sorted ( bool ) –

If field_arr is already sorted

Raises:

NotImplementedError –

If we do not know how to transform type(field) into an array

process_extra_fields ¶

process_extra_fields(extra)

Process extra fields

How different types of extra fields are handled will depend on make_into_array, but the net effect will be a sorted array accessible as an attribute of this dataset instance with the name provided.

Parameters:

extra (Mapping[str, Any]) –

A mapping of names to extra fields to attach.

Examples:

>>> dataset.process_extra_fields({"mass":"Mass"})
>>> dataset.mass

Note

Any attributes added via this method will only be sorted now. Any subsequent sorting will not affect the ordering of these attributes

reorder ¶

reorder(new_order)

Impose a new order on the position data and shuffle list

save ¶

save(*, output_file=None, force_overwrite=None, particle_type=None, fields=None, skip_positions=False, skip_index=False)

Save sorted particle data and shuffle-list to disk in an HDF5 file

Parameters:

output_file (str | Path | None, default: None ) –

The name of the output file. Defaults to self.filepath. Since this is "" unless specified, will raise a ValueError.
force_overwrite (bool | None, default: None ) –

Force overwriting position and index data if the output file already contains it under the specified particle type
particle_type (str | None, default: None ) –

Save positions under a different particle type than self.particle_type
fields (Collection[str] | None, default: None ) –

Collection of fields in self.extras to save in addition to self.positions and self.index
skip_positions (bool, default: False ) –

Do not save self.positions if True. Default False.
skip_index (bool, default: False ) –

Do not save self.index if True. Default False.

Raises:

ValueError –

If no output_file or the empty string ("") is specified

HDF5Dataset ¶

HDF5Dataset(*, name=None, filepath, sorted_filepath=None, particle_type=None, data_slices=None, **kwargs)

Bases: MultiParticleDataset

HDF5 Dataset

Base class for using HDF5 datasets. We will assume the entire positions array can be loaded into memory. We do not need to be able to load the entire dataset since this is for purely spatial sorting.

Note that for simplicity, only one particle type is available at a time. You can use the particle_type and particle_types attributes to change particle type and get a list of valid particle types.

Parameters:

filepath (str | Path) –

The path to the file
name (str | None, default: None ) –

A name for this dataset. Defaults to filepath
sorted_filepath (str | Path | None, default: None ) –

Optional file to store sorted position and shuffle-list data. Will also search for positions data from this file before searching filepath.
particle_type (str | None, default: None ) –

Initial particle type to (eagerly) load.
data_slices –

A numpy slice object or dictionary of slice objects per particle type. This can be used to load only a portion of the dataset. Effectively, the dataset will be loaded as data = data[data_slice[0]:data_slice[1]:data_slice[3]]. Note: this is true even if loading from the sorted data!
**kwargs –

Additional arguments are discarded

Methods:

__len__ –

Return the number of particles in the dataset
__repr__ –

Return a string representation of this dataset
make_into_array –

Try to convert field into an array
process_extra_fields –

Process extra fields
reorder –

Impose a new order on the position data and shuffle list
save –

Save sorted particle positions and shuffle list to provided file

Attributes:

bounding_box (BoundingBox) –

Return a copy of the bounding box for this dataset
data_container (DataContainer) –

Return the DataContainer wrapping this dataset
data_slices –

Slices of data to load. A value of None means load all data
extras (Set) –

Additional sorted fields
filepath (Path) –

The path to this dataset (can be empty)
index (NDArray) –

Return the shuffle list, creating if necessary
name (str) –

A name for this dataset (can be empty)
particle_numbers –

Map of particle types to numbers in this dataset
particle_type –

Current particle type
particle_types –

List of particle types in this dataset
positions (NDArray) –

Return the particle position data
sorted_filepath –

Path to the sorted data

bounding_box `property` ¶

bounding_box

Return a copy of the bounding box for this dataset

data_container `property` ¶

data_container

Return the DataContainer wrapping this dataset

data_slices `property` `writable` ¶

data_slices

Slices of data to load. A value of None means load all data

extras `property` ¶

extras

Additional sorted fields

filepath `instance-attribute` ¶

filepath = filepath

The path to this dataset (can be empty)

index `property` ¶

index

Return the shuffle list, creating if necessary

name `instance-attribute` ¶

name = name

A name for this dataset (can be empty)

particle_numbers `property` ¶

particle_numbers

Map of particle types to numbers in this dataset

particle_type `property` `writable` ¶

particle_type

Current particle type

particle_types `property` ¶

particle_types

List of particle types in this dataset

positions `property` ¶

positions

Return the particle position data

sorted_filepath `property` `writable` ¶

sorted_filepath

Path to the sorted data

len ¶

__len__()

Return the number of particles in the dataset

repr ¶

__repr__()

Return a string representation of this dataset

make_into_array ¶

make_into_array(field)

Try to convert field into an array

Parameters:

field (str | NDArray | Any) –
Object to be converted into an array

Adds supported types:
- strings representing fields in either the HDF5 file at self.filepath (unsorted) or self.sorted_filepath (sorted)
See MultiParticleDataset for additional supported types.

Returns:

field_arr ( NDArray ) –

An array of the values with the 1^st dimension having the same length as positions
is_sorted ( bool ) –

If field_arr is already sorted

Raises:

NotImplementedError –

If we do not know how to transform type(field) into an array

process_extra_fields ¶

process_extra_fields(extra)

Process extra fields

How different types of extra fields are handled will depend on make_into_array, but the net effect will be a sorted array accessible as an attribute of this dataset instance with the name provided.

Parameters:

extra (Mapping[str, Any]) –

A mapping of names to extra fields to attach.

Examples:

>>> dataset.process_extra_fields({"mass":"Mass"})
>>> dataset.mass

Note

Any attributes added via this method will only be sorted now. Any subsequent sorting will not affect the ordering of these attributes

reorder ¶

reorder(new_order)

Impose a new order on the position data and shuffle list

save ¶

save(*, output_file=None, force_overwrite=None, particle_type=None, fields=None, skip_positions=False, skip_index=False)

Save sorted particle positions and shuffle list to provided file

Parameters:

output_file (str | Path | None, default: None ) –

File to save information to. Default is self.sorted_filepath
force_overwrite (bool | None, default: None ) –

Force overwriting position and index data if the output file already contains it under the specified particle type
particle_type (str | None, default: None ) –

Save positions under a different particle type than self.particle_type
fields (Collection[str] | None, default: None ) –

Collection of fields in self.extras to save in addition to self.positions and self.index
skip_positions (bool, default: False ) –

Do not save self.positions if True. Default False.
skip_index (bool, default: False ) –

Do not save self.index if True. Default False.

MultiParticleDataset ¶

MultiParticleDataset(*, name=None, filepath)

Bases: Dataset, ABC

Dataset containing multiple particle types

Multiple particle types are handled by exposing only one particle type at a time.

Methods:

__len__ –

Return the number of particles in the dataset
__repr__ –

Return a string representation of this dataset
make_into_array –

Try to convert field into an array
process_extra_fields –

Process extra fields
reorder –

Impose a new order on the position data and shuffle list
save –

Save this dataset to disk

Attributes:

bounding_box (BoundingBox) –

Return a copy of the bounding box for this dataset
data_container (DataContainer) –

Return the DataContainer wrapping this dataset
extras (Set) –

Additional sorted fields
filepath (Path) –

The path to this dataset (can be empty)
index (NDArray) –

Return the shuffle list, creating if necessary
name (str) –

A name for this dataset (can be empty)
particle_numbers (dict[str, int]) –

Number of particles of each type
particle_type (str) –

Currently selected particle type
particle_types (list[str]) –

List of particle types in this dataset
positions (NDArray) –

Return the particle position data

bounding_box `property` ¶

bounding_box

Return a copy of the bounding box for this dataset

data_container `property` ¶

data_container

Return the DataContainer wrapping this dataset

extras `property` ¶

extras

Additional sorted fields

filepath `instance-attribute` ¶

filepath = filepath

The path to this dataset (can be empty)

index `property` ¶

index

Return the shuffle list, creating if necessary

name `instance-attribute` ¶

name = name

A name for this dataset (can be empty)

particle_numbers `abstractmethod` `property` ¶

particle_numbers

Number of particles of each type

particle_type `abstractmethod` `property` `writable` ¶

particle_type

Currently selected particle type

particle_types `abstractmethod` `property` ¶

particle_types

List of particle types in this dataset

positions `property` ¶

positions

Return the particle position data

len ¶

__len__()

Return the number of particles in the dataset

repr ¶

__repr__()

Return a string representation of this dataset

make_into_array ¶

make_into_array(field)

Try to convert field into an array

Parameters:

field (NDArray | Any) –
Object to be converted into an array.

Supported types:
- NDArrays with the same length (1^st dimension) as positions. Always assumed unsorted.
- (NDArray, is_sorted) tuples, where the NDArray must be like the above.

Returns:

field_arr ( NDArray ) –

An array of the values with the 1^st dimension having the same length as positions
is_sorted ( bool ) –

If field_arr is already sorted

Raises:

NotImplementedError –

If we do not know how to transform type(field) into an array

process_extra_fields ¶

process_extra_fields(extra)

Process extra fields

How different types of extra fields are handled will depend on make_into_array, but the net effect will be a sorted array accessible as an attribute of this dataset instance with the name provided.

Parameters:

extra (Mapping[str, Any]) –

A mapping of names to extra fields to attach.

Examples:

>>> dataset.process_extra_fields({"mass":"Mass"})
>>> dataset.mass

Note

Any attributes added via this method will only be sorted now. Any subsequent sorting will not affect the ordering of these attributes

reorder ¶

reorder(new_order)

Impose a new order on the position data and shuffle list

save `abstractmethod` ¶

save(*, output_file=None, force_overwrite=None, particle_type=None, fields=None, skip_positions=False, skip_index=False)

Save this dataset to disk

It will be up to the subclass to decide what that means

Parameters:

output_file (str | Path | None, default: None ) –

The name of the output file. Note this field is optional because there might be an obvious default.
force_overwrite (bool | None, default: None ) –

Force overwriting position and index data if the output file already contains it under the specified particle type
particle_type (str | None, default: None ) –

Save positions under a different particle type than self.particle_type
fields (Collection[str] | None, default: None ) –

Collection of fields in self.extras to save in addition to self.positions and self.index
skip_positions (bool, default: False ) –

Do not save self.positions if True. Default False.
skip_index (bool, default: False ) –

Do not save self.index if True. Default False.

Dataset ¶

Dataset(*, name=None, filepath)

Base class for holding particle position data and associated shuffle list

This class is intended to be the primary interface for octree access to the position data. Essentially, it abstracts where the data is and what it looks like so the octrees only care about the position data (and more specifically, its order)

Methods:

__len__ –

Return the number of particles in the dataset
__repr__ –

Return a string representation of this dataset
make_into_array –

Try to convert field into an array
process_extra_fields –

Process extra fields
reorder –

Impose a new order on the position data and shuffle list

Attributes:

bounding_box (BoundingBox) –

Return a copy of the bounding box for this dataset
data_container (DataContainer) –

Return the DataContainer wrapping this dataset
extras (Set) –

Additional sorted fields
filepath (Path) –

The path to this dataset (can be empty)
index (NDArray) –

Return the shuffle list, creating if necessary
name (str) –

A name for this dataset (can be empty)
positions (NDArray) –

Return the particle position data

bounding_box `property` ¶

bounding_box

Return a copy of the bounding box for this dataset

data_container `property` ¶

data_container

Return the DataContainer wrapping this dataset

extras `property` ¶

extras

Additional sorted fields

filepath `instance-attribute` ¶

filepath = filepath

The path to this dataset (can be empty)

index `property` ¶

index

Return the shuffle list, creating if necessary

name `instance-attribute` ¶

name = name

A name for this dataset (can be empty)

positions `property` ¶

positions

Return the particle position data

len ¶

__len__()

Return the number of particles in the dataset

repr ¶

__repr__()

Return a string representation of this dataset

make_into_array ¶

make_into_array(field)

Try to convert field into an array

Parameters:

field (NDArray | Any) –
Object to be converted into an array.

Supported types:
- NDArrays with the same length (1^st dimension) as positions. Always assumed unsorted.
- (NDArray, is_sorted) tuples, where the NDArray must be like the above.

Returns:

field_arr ( NDArray ) –

An array of the values with the 1^st dimension having the same length as positions
is_sorted ( bool ) –

If field_arr is already sorted

Raises:

NotImplementedError –

If we do not know how to transform type(field) into an array

process_extra_fields ¶

process_extra_fields(extra)

Process extra fields

How different types of extra fields are handled will depend on make_into_array, but the net effect will be a sorted array accessible as an attribute of this dataset instance with the name provided.

Parameters:

extra (Mapping[str, Any]) –

A mapping of names to extra fields to attach.

Examples:

>>> dataset.process_extra_fields({"mass":"Mass"})
>>> dataset.mass

Note

Any attributes added via this method will only be sorted now. Any subsequent sorting will not affect the ordering of these attributes

reorder ¶

reorder(new_order)

Impose a new order on the position data and shuffle list

Data Objects

packingcubes.data_objects ¶

GadgetishHDF5Dataset ¶

bounding_box property ¶

data_container property ¶

data_slices property writable ¶

extras property ¶

filepath instance-attribute ¶

index property ¶

name instance-attribute ¶

particle_numbers property ¶

particle_type property writable ¶

particle_types property ¶

positions property ¶

sorted_filepath property writable ¶

__len__ ¶

__repr__ ¶

make_into_array ¶

process_extra_fields ¶

reorder ¶

save ¶

InMemory ¶

bounding_box property ¶

data_container property ¶

extras property ¶

filepath instance-attribute ¶

index property ¶

name instance-attribute ¶

particle_numbers property ¶

particle_type property writable ¶

particle_types property ¶

positions property ¶

__len__ ¶

__repr__ ¶

make_into_array ¶

process_extra_fields ¶

reorder ¶

save ¶

HDF5Dataset ¶

bounding_box property ¶

data_container property ¶

data_slices property writable ¶

extras property ¶

filepath instance-attribute ¶

index property ¶

name instance-attribute ¶

particle_numbers property ¶

particle_type property writable ¶

particle_types property ¶

positions property ¶

sorted_filepath property writable ¶

__len__ ¶

__repr__ ¶

make_into_array ¶

process_extra_fields ¶

reorder ¶

save ¶

MultiParticleDataset ¶

bounding_box property ¶

data_container property ¶

extras property ¶

filepath instance-attribute ¶

index property ¶

name instance-attribute ¶

particle_numbers abstractmethod property ¶

particle_type abstractmethod property writable ¶

particle_types abstractmethod property ¶

positions property ¶

__len__ ¶

__repr__ ¶

make_into_array ¶

process_extra_fields ¶

reorder ¶

save abstractmethod ¶

Dataset ¶

bounding_box property ¶

data_container property ¶

extras property ¶

filepath instance-attribute ¶

index property ¶

bounding_box `property` ¶

data_container `property` ¶

data_slices `property` `writable` ¶

extras `property` ¶

filepath `instance-attribute` ¶

index `property` ¶

name `instance-attribute` ¶

particle_numbers `property` ¶

particle_type `property` `writable` ¶

particle_types `property` ¶

positions `property` ¶

sorted_filepath `property` `writable` ¶

len ¶

repr ¶

bounding_box `property` ¶

data_container `property` ¶

extras `property` ¶

filepath `instance-attribute` ¶

index `property` ¶

name `instance-attribute` ¶

particle_numbers `property` ¶

particle_type `property` `writable` ¶

particle_types `property` ¶

positions `property` ¶

len ¶

repr ¶

bounding_box `property` ¶

data_container `property` ¶

data_slices `property` `writable` ¶

extras `property` ¶

filepath `instance-attribute` ¶

index `property` ¶

name `instance-attribute` ¶

particle_numbers `property` ¶

particle_type `property` `writable` ¶

particle_types `property` ¶

positions `property` ¶

sorted_filepath `property` `writable` ¶

len ¶

repr ¶

bounding_box `property` ¶

data_container `property` ¶

extras `property` ¶

filepath `instance-attribute` ¶

index `property` ¶

name `instance-attribute` ¶

particle_numbers `abstractmethod` `property` ¶

particle_type `abstractmethod` `property` `writable` ¶

particle_types `abstractmethod` `property` ¶

positions `property` ¶

len ¶

repr ¶

save `abstractmethod` ¶

bounding_box `property` ¶

data_container `property` ¶

extras `property` ¶

filepath `instance-attribute` ¶

index `property` ¶

name `instance-attribute` ¶

positions `property` ¶

len ¶

repr ¶