sup3r.preprocessing.samplers.dual.DualSampler

sup3r.preprocessing.samplers.dual.DualSampler#

class DualSampler(data: Sup3rDataset, sample_shape: tuple | None = None, batch_size: int = 16, s_enhance: int = 1, t_enhance: int = 1, feature_sets: dict | None = None, proxy_obs_kwargs: dict | None = None, mode: str = 'lazy')[source]#

Bases: Sampler

Sampler for sampling from paired (or dual) datasets. Pairs consist of low and high resolution data, which are contained by a Sup3rDataset. This can also include extra observation data on the same grid as the high-resolution data which has NaNs at points where observation data doesn’t exist. This will be used in an additional content loss term.

Parameters:

data (Sup3rDataset) – A Sup3rDataset instance with low-res and high-res data members.
sample_shape (tuple) – Size of arrays to sample from the high-res data. The sample shape for the low-res sampler will be determined from the enhancement factors.
s_enhance (int) – Spatial enhancement factor
t_enhance (int) – Temporal enhancement factor
feature_sets (Optional[dict]) – Optional dictionary describing how the full set of features is split between lr_features, hr_exo_features, and hr_out_features.

lr_featureslist | tuple
List of feature names or patt*erns to use as low-resolution model inputs. If no entry is provided then all available features from the data will be used.

hr_out_featureslist | tuple
List of feature names or patt*erns that should be output by the generative model and available as ground truth targets. If no entry is provided then all features in the high res data will be used.

hr_exo_featureslist | tuple
List of feature names or patt*erns that should be available as high-resolution model inputs (like topography or observations) or bespoke loss functions. Features used for input are injected into the model mid-network to condition output on high-resolution information. The model configuration should have the appropriate layers to use these features. e.g. Sup3rConcat for topography injection, Sup3rObsModel or Sup3rCrossAttention for obs injection. If no entry is provided then hr_exo_features will be empty.

*To include sparse features as inputs or targets the features must have an “_obs” suffix.
proxy_obs_kwargs (dict | None) – Optional dictionary of keyword arguments to pass to the proxy observation generator. This is only used when training with proxy observations. Keys can include onshore_obs_frac, offshore_obs_frac, and perturbation_scale.

perturbation_scalefloat
Scale of the perturbation to add to the proxy observations when using proxy observations. This specifies the multiplier of the noise sampled from (-standard deviation, standard deviation). The standdard deviation is calculated per feature over each batch.

onshore_obs_fracfloat | dict
Fraction of onshore observations to include in each batch when using proxy observations. This can be a single float or a dictionary with keys ‘spatial’ and ‘temporal’ to specify the fraction for each domain. If a dictionary is provided, the actual fraction for each batch will be sampled uniformly between the specified spatial and temporal fractions.

offshore_obs_fracfloat | dict
Fraction of offshore observations to include in each batch when using proxy observations. This can be a single float or a dictionary with keys ‘spatial’ and ‘temporal’ to specify the fraction for each domain. If a dictionary is provided, the actual fraction for each batch will be sampled uniformly between the specified spatial and temporal fractions.
mode (str) – Mode for sampling data. Options are ‘lazy’ or ‘eager’. ‘eager’ mode pre-loads all data into memory as numpy arrays for faster access. ‘lazy’ mode samples directly from the underlying data object, which could be backed by dask arrays or on-disk netCDF files.

Methods

`check_feature_consistency`()	Make sure features are consistent with the data and with each other.
`check_proxy_obs_consistency`()	Check that the obs features are configured correctly for proxy observations.
`check_shape_consistency`()	Make sure container shapes are compatible with enhancement factors.
`get_sample_index`([n_obs])	Get paired sample index, consisting of index for the low res sample and the index for the high res sample with the same spatiotemporal extent.
`post_init_log`([args_dict])	Log additional arguments after initialization.
`preflight`()	Perform shape and feature checks.
`wrap`(data)	Return a `Sup3rDataset` object or tuple of such.

Attributes

`timer`
`data`	Return underlying data.
`hr_exo_features`	Get a list of exogenous high-resolution features that are only used for training e.g., mid-network high-res topo injection.
`hr_features`	List of feature names or patt*erns that the model is shown at high-resolution.
`hr_features_ind`	Get the high-resolution feature channel indices that should be included for loss calculations.
`hr_out_features`	List of feature names or patt*erns that should be output by the generative model.
`hr_sample_shape`	Shape of the data sample to select when __next__() is called.
`hr_source_features`	Features available natively at high-resolution.
`lr_features`	List of feature names or patt*erns to use as low-resolution model inputs.
`lr_features_ind`	Get the low-resolution feature channel indices that should be included for training.
`obs_features`	List of feature names or patt*erns that should be treated as observations.
`obs_features_ind`	Get the source feature indices in `features` for each obs feature.
`offshore_obs_frac`	Fraction of offshore observations to include in each batch when using proxy observations.
`onshore_obs_frac`	Fraction of onshore observations to include in each batch when using proxy observations.
`perturbation_scale`	Scale of the perturbation to add to the proxy observations when using proxy observations.
`sample_shape`	Shape of the data sample to select when `__next__()` is called.
`shape`	Get shape of underlying data.
`use_proxy_obs`	Whether to use proxy observations.

property data#

Return underlying data.

Returns:: Sup3rDataset

sup3r.preprocessing.samplers.dual.DualSampler

Contents

sup3r.preprocessing.samplers.dual.DualSampler#