Skip to content

Data Objects

packingcubes.data_objects

Particle Datasets

Classes:

GadgetishHDF5Dataset

GadgetishHDF5Dataset(*, name=None, filepath, sorted_filepath=None, particle_type=None, data_slices=None, **kwargs)

Bases: HDF5Dataset

HDF5 dataset with Gadget-2 like header

Represents an HDF5 dataset that at least has the fields from the Gadget-2 header specification here

Parameters:

  • filepath (str | Path) –

    The path to the file

  • name (str | None, default: None ) –

    A name for this dataset. Defaults to filepath

  • sorted_filepath (str | Path | None, default: None ) –

    Optional file to store sorted position and shuffle-list data. Will also search for positions data from this file before searching filepath. Defaults to filepath.parent/filepath.stem + "_sorted.hdf5"

  • particle_type (str | None, default: None ) –

    Initial particle type to (eagerly) load. Defaults to the first HDF5 group that starts with "Part".

  • data_slices

    A numpy slice object or dictionary of slice objects per particle type. This can be used to load only a portion of the dataset. Effectively, the dataset will be loaded as data = data[data_slice[0]:data_slice[1]:data_slice[3]]

  • **kwargs

    Additional arguments are discarded

Methods:

  • __len__

    Return the number of particles in the dataset

  • __repr__

    Return a string representation of this dataset

  • make_into_array

    Try to convert field into an array

  • process_extra_fields

    Process extra fields

  • reorder

    Impose a new order on the position data and shuffle list

  • save

    Save sorted particle positions and shuffle list to provided file

Attributes:

bounding_box property

bounding_box

Return a copy of the bounding box for this dataset

data_container property

data_container

Return the DataContainer wrapping this dataset

data_slices property writable

data_slices

Slices of data to load. A value of None means load all data

extras property

extras

Additional sorted fields

filepath instance-attribute

filepath = filepath

The path to this dataset (can be empty)

index property

index

Return the shuffle list, creating if necessary

name instance-attribute

name = name

A name for this dataset (can be empty)

particle_numbers property

particle_numbers

Map of particle types to numbers in this dataset

particle_type property writable

particle_type

Current particle type

particle_types property

particle_types

List of particle types in this dataset

positions property

positions

Return the particle position data

sorted_filepath property writable

sorted_filepath

Path to the sorted data

__len__

__len__()

Return the number of particles in the dataset

__repr__

__repr__()

Return a string representation of this dataset

make_into_array

make_into_array(field)

Try to convert field into an array

Parameters:

  • field (str | NDArray | Any) –

    Object to be converted into an array

    Adds supported types:

    • strings representing fields in either the HDF5 file at self.filepath (unsorted) or self.sorted_filepath (sorted)

    See MultiParticleDataset for additional supported types.

Returns:

  • field_arr ( NDArray ) –

    An array of the values with the 1st dimension having the same length as positions

  • is_sorted ( bool ) –

    If field_arr is already sorted

Raises:

process_extra_fields

process_extra_fields(extra)

Process extra fields

How different types of extra fields are handled will depend on make_into_array, but the net effect will be a sorted array accessible as an attribute of this dataset instance with the name provided.

Parameters:

  • extra (Mapping[str, Any]) –

    A mapping of names to extra fields to attach.

Examples:

>>> dataset.process_extra_fields({"mass":"Mass"})
>>> dataset.mass
Note

Any attributes added via this method will only be sorted now. Any subsequent sorting will not affect the ordering of these attributes

reorder

reorder(new_order)

Impose a new order on the position data and shuffle list

save

save(*, output_file=None, force_overwrite=None, particle_type=None, fields=None, skip_positions=False, skip_index=False)

Save sorted particle positions and shuffle list to provided file

Parameters:

  • output_file (str | Path | None, default: None ) –

    File to save information to. Default is self.sorted_filepath

  • force_overwrite (bool | None, default: None ) –

    Force overwriting position and index data if the output file already contains it under the specified particle type

  • particle_type (str | None, default: None ) –

    Save positions under a different particle type than self.particle_type

  • fields (Collection[str] | None, default: None ) –

    Collection of fields in self.extras to save in addition to self.positions and self.index

  • skip_positions (bool, default: False ) –

    Do not save self.positions if True. Default False.

  • skip_index (bool, default: False ) –

    Do not save self.index if True. Default False.

InMemory

InMemory(*, positions, name='', filepath='', particle_type=None, bounding_box=None, **kwargs)

Bases: MultiParticleDataset

In-memory Dataset

Class for datasets where the positions data is entirely in-memory. These datasets generally are not expected to have a name or filepath and may consist solely of positions data.

Parameters:

  • positions (NDArray) –

    Array containing particle position data.

  • particle_type (str | None, default: None ) –

    Particle type these positions belong to. Default is "PartTypeIM"

  • filepath (str, default: '' ) –

    Specify a default save location if non-empty. Default is "".

  • **kwargs

    Additional arguments are discarded

Methods:

  • __len__

    Return the number of particles in the dataset

  • __repr__

    Return a string representation of this dataset

  • make_into_array

    Try to convert field into an array

  • process_extra_fields

    Process extra fields

  • reorder

    Impose a new order on the position data and shuffle list

  • save

    Save sorted particle data and shuffle-list to disk in an HDF5 file

Attributes:

bounding_box property

bounding_box

Return a copy of the bounding box for this dataset

data_container property

data_container

Return the DataContainer wrapping this dataset

extras property

extras

Additional sorted fields

filepath instance-attribute

filepath = filepath

The path to this dataset (can be empty)

index property

index

Return the shuffle list, creating if necessary

name instance-attribute

name = name

A name for this dataset (can be empty)

particle_numbers property

particle_numbers

Number of particles of each type

particle_type property writable

particle_type

Currently selected particle type

particle_types property

particle_types

List of particle types in this dataset

positions property

positions

Return the particle position data

__len__

__len__()

Return the number of particles in the dataset

__repr__

__repr__()

Return a string representation of this dataset

make_into_array

make_into_array(field)

Try to convert field into an array

Parameters:

  • field (NDArray | Any) –

    Object to be converted into an array.

    Supported types:

    • NDArrays with the same length (1st dimension) as positions. Always assumed unsorted.
    • (NDArray, is_sorted) tuples, where the NDArray must be like the above.

Returns:

  • field_arr ( NDArray ) –

    An array of the values with the 1st dimension having the same length as positions

  • is_sorted ( bool ) –

    If field_arr is already sorted

Raises:

process_extra_fields

process_extra_fields(extra)

Process extra fields

How different types of extra fields are handled will depend on make_into_array, but the net effect will be a sorted array accessible as an attribute of this dataset instance with the name provided.

Parameters:

  • extra (Mapping[str, Any]) –

    A mapping of names to extra fields to attach.

Examples:

>>> dataset.process_extra_fields({"mass":"Mass"})
>>> dataset.mass
Note

Any attributes added via this method will only be sorted now. Any subsequent sorting will not affect the ordering of these attributes

reorder

reorder(new_order)

Impose a new order on the position data and shuffle list

save

save(*, output_file=None, force_overwrite=None, particle_type=None, fields=None, skip_positions=False, skip_index=False)

Save sorted particle data and shuffle-list to disk in an HDF5 file

Parameters:

  • output_file (str | Path | None, default: None ) –

    The name of the output file. Defaults to self.filepath. Since this is "" unless specified, will raise a ValueError.

  • force_overwrite (bool | None, default: None ) –

    Force overwriting position and index data if the output file already contains it under the specified particle type

  • particle_type (str | None, default: None ) –

    Save positions under a different particle type than self.particle_type

  • fields (Collection[str] | None, default: None ) –

    Collection of fields in self.extras to save in addition to self.positions and self.index

  • skip_positions (bool, default: False ) –

    Do not save self.positions if True. Default False.

  • skip_index (bool, default: False ) –

    Do not save self.index if True. Default False.

Raises:

  • ValueError

    If no output_file or the empty string ("") is specified

HDF5Dataset

HDF5Dataset(*, name=None, filepath, sorted_filepath=None, particle_type=None, data_slices=None, **kwargs)

Bases: MultiParticleDataset

HDF5 Dataset

Base class for using HDF5 datasets. We will assume the entire positions array can be loaded into memory. We do not need to be able to load the entire dataset since this is for purely spatial sorting.

Note that for simplicity, only one particle type is available at a time. You can use the particle_type and particle_types attributes to change particle type and get a list of valid particle types.

Parameters:

  • filepath (str | Path) –

    The path to the file

  • name (str | None, default: None ) –

    A name for this dataset. Defaults to filepath

  • sorted_filepath (str | Path | None, default: None ) –

    Optional file to store sorted position and shuffle-list data. Will also search for positions data from this file before searching filepath.

  • particle_type (str | None, default: None ) –

    Initial particle type to (eagerly) load.

  • data_slices

    A numpy slice object or dictionary of slice objects per particle type. This can be used to load only a portion of the dataset. Effectively, the dataset will be loaded as data = data[data_slice[0]:data_slice[1]:data_slice[3]]. Note: this is true even if loading from the sorted data!

  • **kwargs

    Additional arguments are discarded

Methods:

  • __len__

    Return the number of particles in the dataset

  • __repr__

    Return a string representation of this dataset

  • make_into_array

    Try to convert field into an array

  • process_extra_fields

    Process extra fields

  • reorder

    Impose a new order on the position data and shuffle list

  • save

    Save sorted particle positions and shuffle list to provided file

Attributes:

bounding_box property

bounding_box

Return a copy of the bounding box for this dataset

data_container property

data_container

Return the DataContainer wrapping this dataset

data_slices property writable

data_slices

Slices of data to load. A value of None means load all data

extras property

extras

Additional sorted fields

filepath instance-attribute

filepath = filepath

The path to this dataset (can be empty)

index property

index

Return the shuffle list, creating if necessary

name instance-attribute

name = name

A name for this dataset (can be empty)

particle_numbers property

particle_numbers

Map of particle types to numbers in this dataset

particle_type property writable

particle_type

Current particle type

particle_types property

particle_types

List of particle types in this dataset

positions property

positions

Return the particle position data

sorted_filepath property writable

sorted_filepath

Path to the sorted data

__len__

__len__()

Return the number of particles in the dataset

__repr__

__repr__()

Return a string representation of this dataset

make_into_array

make_into_array(field)

Try to convert field into an array

Parameters:

  • field (str | NDArray | Any) –

    Object to be converted into an array

    Adds supported types:

    • strings representing fields in either the HDF5 file at self.filepath (unsorted) or self.sorted_filepath (sorted)

    See MultiParticleDataset for additional supported types.

Returns:

  • field_arr ( NDArray ) –

    An array of the values with the 1st dimension having the same length as positions

  • is_sorted ( bool ) –

    If field_arr is already sorted

Raises:

process_extra_fields

process_extra_fields(extra)

Process extra fields

How different types of extra fields are handled will depend on make_into_array, but the net effect will be a sorted array accessible as an attribute of this dataset instance with the name provided.

Parameters:

  • extra (Mapping[str, Any]) –

    A mapping of names to extra fields to attach.

Examples:

>>> dataset.process_extra_fields({"mass":"Mass"})
>>> dataset.mass
Note

Any attributes added via this method will only be sorted now. Any subsequent sorting will not affect the ordering of these attributes

reorder

reorder(new_order)

Impose a new order on the position data and shuffle list

save

save(*, output_file=None, force_overwrite=None, particle_type=None, fields=None, skip_positions=False, skip_index=False)

Save sorted particle positions and shuffle list to provided file

Parameters:

  • output_file (str | Path | None, default: None ) –

    File to save information to. Default is self.sorted_filepath

  • force_overwrite (bool | None, default: None ) –

    Force overwriting position and index data if the output file already contains it under the specified particle type

  • particle_type (str | None, default: None ) –

    Save positions under a different particle type than self.particle_type

  • fields (Collection[str] | None, default: None ) –

    Collection of fields in self.extras to save in addition to self.positions and self.index

  • skip_positions (bool, default: False ) –

    Do not save self.positions if True. Default False.

  • skip_index (bool, default: False ) –

    Do not save self.index if True. Default False.

MultiParticleDataset

MultiParticleDataset(*, name=None, filepath)

Bases: Dataset, ABC

Dataset containing multiple particle types

Multiple particle types are handled by exposing only one particle type at a time.

Methods:

  • __len__

    Return the number of particles in the dataset

  • __repr__

    Return a string representation of this dataset

  • make_into_array

    Try to convert field into an array

  • process_extra_fields

    Process extra fields

  • reorder

    Impose a new order on the position data and shuffle list

  • save

    Save this dataset to disk

Attributes:

bounding_box property

bounding_box

Return a copy of the bounding box for this dataset

data_container property

data_container

Return the DataContainer wrapping this dataset

extras property

extras

Additional sorted fields

filepath instance-attribute

filepath = filepath

The path to this dataset (can be empty)

index property

index

Return the shuffle list, creating if necessary

name instance-attribute

name = name

A name for this dataset (can be empty)

particle_numbers abstractmethod property

particle_numbers

Number of particles of each type

particle_type abstractmethod property writable

particle_type

Currently selected particle type

particle_types abstractmethod property

particle_types

List of particle types in this dataset

positions property

positions

Return the particle position data

__len__

__len__()

Return the number of particles in the dataset

__repr__

__repr__()

Return a string representation of this dataset

make_into_array

make_into_array(field)

Try to convert field into an array

Parameters:

  • field (NDArray | Any) –

    Object to be converted into an array.

    Supported types:

    • NDArrays with the same length (1st dimension) as positions. Always assumed unsorted.
    • (NDArray, is_sorted) tuples, where the NDArray must be like the above.

Returns:

  • field_arr ( NDArray ) –

    An array of the values with the 1st dimension having the same length as positions

  • is_sorted ( bool ) –

    If field_arr is already sorted

Raises:

process_extra_fields

process_extra_fields(extra)

Process extra fields

How different types of extra fields are handled will depend on make_into_array, but the net effect will be a sorted array accessible as an attribute of this dataset instance with the name provided.

Parameters:

  • extra (Mapping[str, Any]) –

    A mapping of names to extra fields to attach.

Examples:

>>> dataset.process_extra_fields({"mass":"Mass"})
>>> dataset.mass
Note

Any attributes added via this method will only be sorted now. Any subsequent sorting will not affect the ordering of these attributes

reorder

reorder(new_order)

Impose a new order on the position data and shuffle list

save abstractmethod

save(*, output_file=None, force_overwrite=None, particle_type=None, fields=None, skip_positions=False, skip_index=False)

Save this dataset to disk

It will be up to the subclass to decide what that means

Parameters:

  • output_file (str | Path | None, default: None ) –

    The name of the output file. Note this field is optional because there might be an obvious default.

  • force_overwrite (bool | None, default: None ) –

    Force overwriting position and index data if the output file already contains it under the specified particle type

  • particle_type (str | None, default: None ) –

    Save positions under a different particle type than self.particle_type

  • fields (Collection[str] | None, default: None ) –

    Collection of fields in self.extras to save in addition to self.positions and self.index

  • skip_positions (bool, default: False ) –

    Do not save self.positions if True. Default False.

  • skip_index (bool, default: False ) –

    Do not save self.index if True. Default False.

Dataset

Dataset(*, name=None, filepath)

Base class for holding particle position data and associated shuffle list

This class is intended to be the primary interface for octree access to the position data. Essentially, it abstracts where the data is and what it looks like so the octrees only care about the position data (and more specifically, its order)

Methods:

  • __len__

    Return the number of particles in the dataset

  • __repr__

    Return a string representation of this dataset

  • make_into_array

    Try to convert field into an array

  • process_extra_fields

    Process extra fields

  • reorder

    Impose a new order on the position data and shuffle list

Attributes:

bounding_box property

bounding_box

Return a copy of the bounding box for this dataset

data_container property

data_container

Return the DataContainer wrapping this dataset

extras property

extras

Additional sorted fields

filepath instance-attribute

filepath = filepath

The path to this dataset (can be empty)

index property

index

Return the shuffle list, creating if necessary

name instance-attribute

name = name

A name for this dataset (can be empty)

positions property

positions

Return the particle position data

__len__

__len__()

Return the number of particles in the dataset

__repr__

__repr__()

Return a string representation of this dataset

make_into_array

make_into_array(field)

Try to convert field into an array

Parameters:

  • field (NDArray | Any) –

    Object to be converted into an array.

    Supported types:

    • NDArrays with the same length (1st dimension) as positions. Always assumed unsorted.
    • (NDArray, is_sorted) tuples, where the NDArray must be like the above.

Returns:

  • field_arr ( NDArray ) –

    An array of the values with the 1st dimension having the same length as positions

  • is_sorted ( bool ) –

    If field_arr is already sorted

Raises:

process_extra_fields

process_extra_fields(extra)

Process extra fields

How different types of extra fields are handled will depend on make_into_array, but the net effect will be a sorted array accessible as an attribute of this dataset instance with the name provided.

Parameters:

  • extra (Mapping[str, Any]) –

    A mapping of names to extra fields to attach.

Examples:

>>> dataset.process_extra_fields({"mass":"Mass"})
>>> dataset.mass
Note

Any attributes added via this method will only be sorted now. Any subsequent sorting will not affect the ordering of these attributes

reorder

reorder(new_order)

Impose a new order on the position data and shuffle list