Dataset#

class pydidas.core.Dataset(array: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], **kwargs: dict)#

Bases: ndarray

Dataset class, a subclass of a numpy.ndarray with metadata.

The Dataset creates a new ndarray object from the array-like input and provides a view of the underlying ndarray as Dataset instance. This subclass extends ndarray with additional metadata, accessible and modifiable through the respective properties:

  • axis_units : The units of the axis ranges (in str format).

  • axis_labels : The descriptive labels for all array axes (in str format).

  • axis_ranges : The data values corresponding to the respective axes indices, given in form of 1-d np.ndarrays, lists or tuples. All axis_ranges values will be internally converted to np.ndarrays. The axis_ranges keys are integers corresponding to the axis indices.

  • data_unit : The unit for the data values (in str format).

  • data_label : The label for the data values (in str format).

PLEASE NOTE:

  1. While axis metadata is preserved during operations like reshaping or transposing, units are not automatically converted. The operator is responsible for ensuring that the units are consistent. For example, if the data_unit is meters, Dataset**2 will still have the unit meters which must be updated in the calling function

  2. Metadata is not preserved when operating on two datasets. The second dataset will be interpreted as a numpy.ndarray and the metadata will be lost.

  3. The ndarray.base property of Datasets is never None because the Dataset class is a subclass of ndarray. This means that the base property will always point to an ndarray. However, Dataset views never share memory and each Dataset view will create a new memory object.

The following numpy ufuncs are reimplemented to preserver the metadata: flatten, max, mean, min, repeat, reshape, shape, sort, squeeze, sum, take, transpose. For other numpy ufuncs, metadata preservation is not guaranteed.

Parameters:
  • array (ndarray) – The data array.

  • **kwargs (dict) –

    Optional keyword arguments. Supported keywords are:

    axis_labelsUnion[dict, list, tuple], optional

    The labels for the axes. The length must correspond to the array dimensions. The default is None.

    axis_rangesUnion[list, tuple, dict[int, Union[np.ndarray, list, tuple]]], optional

    The ranges for the axes. If a dictionary is provided, the keys must correspond to the axis indices, and the values must be sequences (e.g., np.ndarray, list , or tuple ) with lengths matching the dimension of the array. The length of each sequence must correspond to the array dimension for that axis. Empty axis_ranges (e.g. None) will be converted to indices. The default is None.

    axis_rangesUnion[dict[int, Sequence], list, tuple], optional

    The ranges for the axes. The length for each range must correspond to the array dimensions. The default is None.

    axis_unitsUnion[dict, list, tuple], optional

    The units for the axes. The length must correspond to the array dimensions. The default is None.

    metadataUnion[dict, None], optional

    A dictionary with metadata. The default is None.

    data_unitstr, optional

    The description of the data unit. The default is an empty string.

    data_labelstr, optional

    The description of the data. The default is an empty string.

argsort(axis: int | None = -1, kind: str | None = None, order: str | list[str] | None = None, stable: bool | None = None) ndarray#

Get the indices which would sort the Dataset.

Parameters:
  • axis (int, optional) – The axis to sort the array. The default is -1.

  • kind (str, optional) – Please see the numpy.argsort documentation for more information on the kind parameter.

  • order (Union[str, list[str]], optional) – Please see the numpy.argsort documentation for more information on the order parameter.

  • stable (bool, optional) – Please see the numpy.argsort documentation for more information on the stable parameter.

property array: ndarray#

Get the raw array data of the dataset.

Returns:

The array data.

Return type:

ndarray

property axis_labels: dict#

Get the axis_labels.

Returns:

The axis labels: A dictionary with keys corresponding to the dimension in the array and respective values.

Return type:

dict

property axis_ranges: dict#

Get the axis ranges.

These arrays for every dimension give the range of the data (in conjunction with the units).

Returns:

The axis ranges: A dictionary with keys corresponding to the dimension in the array and respective values.

Return type:

dict

property axis_units: dict#

Get the axis units.

Returns:

The axis units: A dictionary with keys corresponding to the dimension in the array and respective values.

Return type:

dict

copy(order: Literal['C', 'F', 'A', 'K'] = 'C') Self#

Overload the generic nd.ndarray copy method to copy metadata as well.

Parameters:

order (Literal["C", "F", "A", "K"], optional) – The memory layout. The default is “C”.

Returns:

The copied dataset.

Return type:

Dataset

property data_description: str#

Get a descriptive string for the data.

Returns:

The descriptive string for the data.

Return type:

str

property data_label: str#

Get the data label.

Returns:

The data label.

Return type:

str

property data_unit: str#

Get the data unit.

Returns:

The data unit.

Return type:

str

flatten(order: Literal['C', 'F', 'A', 'K'] = 'C') Self#

Clear the metadata when flattening the array.

Parameters:

order ({'C', 'F', 'A', 'K'}, optional) – ‘C’ means to flatten in row-major (C-style) order. ‘F’ means to flatten in column-major (Fortran-style) order. ‘A’ means to flatten in column-major order if a is Fortran contiguous in memory, row-major order otherwise. ‘K’ means to flatten a in the order the elements occur in memory. The default is ‘C’.

flatten_dims(*args: tuple[int], **kwargs: dict)#

Flatten the specified dimensions in place in the Dataset.

This method will reduce the dimensionality of the Dataset by len(args).

Warning: Flattening distributed dimensions throughout the dataset will destroy the data organisation and only adjacent dimensions can be processed.

Parameters:
  • *args (tuple[int]) – The tuple of the dimensions to be flattened. Each dimension must be an integer entry.

  • **kwargs (dict) –

    Additional keyword arguments. Supported keywords are:

    new_dim_labelstr, optional

    The label for the new, flattened dimension. The default is ‘Flattened’.

    new_dim_unitstr, optional

    The unit for the new, flattened dimension. The default is ‘’.

    new_dim_rangeUnion[None, ndarray, Iterable], optional

    The new range for the flattened dimension. If None, a simple The default is None.

get_axis_description(index: int) str#

Get the description for the given axis, based on the axis label and unit.

Parameters:

index (int) – The axis index.

Returns:

The description for the given axis.

Return type:

str

get_axis_range(index: int) ndarray#

Get a copy of the range for the specified axis.

Parameters:

index (int) – The axis index.

Returns:

The range of the axis.

Return type:

ndarray

get_description_of_point(indices: Iterable) str#

Get the metadata description of a single point in the array.

Index values of “None” will be interpreted as request to skip this axis.

Parameters:

indices (Iterable) – The indices for each dimension.

Returns:

A string description of the selected point.

Return type:

str

get_rebinned_copy(binning: int) Self#

Get a binned copy of the Dataset.

This method will create a binned copy and copy all axis metadata. It will also modify the ranges, if required.

Parameters:

binning (int) – The binning factor.

Returns:

The re-binned Dataset.

Return type:

Dataset

is_axis_nonlinear(index: int, threshold: float = 0.0001) bool#

Check if the axis range is nonlinear.

Parameters:
  • index (int) – The axis index.

  • threshold (float, optional) – The threshold for the standard deviation of the range differences. The default is 1e-4.

Returns:

True if the axis range is nonlinear, False otherwise.

Return type:

bool

max(axis=None, out=None, keepdims=False, initial=<no value>, where=True)#

Return the maximum along a given axis.

Refer to numpy.amax for full documentation.

See also

numpy.amax

equivalent function

mean(axis=None, dtype=None, out=None, keepdims=False, *, where=True)#

Returns the average of the array elements along given axis.

Refer to numpy.mean for full documentation.

See also

numpy.mean

equivalent function

property metadata: dict#

Get the dataset metadata.

Returns:

The metadata dictionary. There is no enforced structure of the dictionary.

Return type:

dict

min(axis=None, out=None, keepdims=False, initial=<no value>, where=True)#

Return the minimum along a given axis.

Refer to numpy.amin for full documentation.

See also

numpy.amin

equivalent function

property property_dict: dict#

Get a copy of the properties dictionary.

Returns:

A dictionary with copies of all properties.

Return type:

dict

repeat(repeats, axis: int | None = None) Self#

Overload the generic repeat method to update the metadata.

Parameters:
  • repeats (int) – The number of repetitions.

  • axis (int, optional) – The axis along which to repeat. If None, the flattened array is returned. The default is None.

Returns:

The repeated array.

Return type:

Dataset

reshape(*new_shape: int | tuple[int], order='C')#

Overload the generic reshape method to update the metadata.

Parameters:
  • new_shape (Union[int, tuple[int]]) – The new shape of the array.

  • order ({'C', 'F', 'A', 'K'}, optional) – The order of the reshaping. The default is ‘C’.

Returns:

The reshaped Dataset.

Return type:

pydidas.core.Dataset

property shape: tuple#

Get the shape of the array.

Returns:

The shape of the array.

Return type:

tuple

sort(axis: int | None = -1, kind: str | None = None, order: str | list[str] | None = None, stable: bool | None = None) None#

Sort the Dataset in place.

Parameters:
  • axis (int, optional) – The axis to sort the array. The default is -1.

  • kind (str, optional) – Please see the numpy.sort documentation for more information on the kind parameter.

  • order (Union[str, list[str]], optional) – Please see the numpy.sort documentation for more information on the order parameter.

  • stable (bool, optional) – Please see the numpy.sort documentation for more information on the stable parameter.

squeeze(axis: None | int = None) Self#

Squeeze the array and remove dimensions of length one.

Parameters:

axis (Union[None, int], optional) – The axis to be squeezed. If None, all axes of length one will be squeezed. The default is None.

Returns:

The squeezed Dataset.

Return type:

Dataset

sum(axis=None, dtype=None, out=None, keepdims=False, initial=0, where=True)#

Return the sum of the array elements over the given axis.

Refer to numpy.sum for full documentation.

See also

numpy.sum

equivalent function

take(indices: int | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], axis: int | None = None, out: ndarray | None = None, mode: Literal['raise', 'wrap', 'clip'] = 'raise') Self#

Take elements from an array along an axis.

This method overloads the ndarray.take method to process the axis properties as well.

Parameters:
  • indices (Union[int, ArrayLike]) – The indices of the values to extract.

  • axis (int, optional) – The axis to take the data from. If None, data will be taken from the flattened array. The default is None.

  • out (ndarray, optional) – An optional output array. If None, a new array is created. The default is None.

  • mode (str, optional) – Specifies how out-of-bounds indices will behave. The default is “raise”.

Returns:

The new dataset.

Return type:

Dataset

transpose(*axes: tuple[int]) Self#

Overload the generic transpose method to transpose the metadata as well.

Note that contrary to the generic method, transpose creates a deepcopy of the data and not only a view to prevent inconsistent metadata.

Parameters:

*axes (tuple) – The axes to be transposed. If not given, the generic order is used.

Returns:

The transposed Dataset.

Return type:

pydidas.core.Dataset

update_axis_label(index: int, item: str)#

Update a single axis label value.

Parameters:
  • index (int) – The dimension to be updated.

  • item (str) – The new item for the range of the selected dimension.

Raises:

ValueError – If the index is not in range of the Dataset dimensions or if the item is not a string.

update_axis_range(index: int, item: ndarray | Iterable)#

Update a single axis range value.

Parameters:
  • index (int) – The dimension to be updated.

  • item (Union[ndarray, Iterable]) – The new item for the range of the selected dimension.

Raises:

ValueError – If the index is not in range of the Dataset dimensions.

update_axis_unit(index: int, item: str)#

Update a single axis unit value.

Parameters:
  • index (int) – The dimension to be updated.

  • item (str) – The new item for the range of the selected dimension.

Raises:

ValueError – If the index is not in range of the Dataset dimensions or if the item is not a string.