sup3r.preprocessing.samplers.cc.DualSamplerCC

sup3r.preprocessing.samplers.cc.DualSamplerCC#

class DualSamplerCC(data: Sup3rDataset, sample_shape: tuple | None = None, batch_size: int = 16, s_enhance: int = 1, t_enhance: int = 24, feature_sets: dict | None = None, mode: str = 'lazy')[source]#

Bases: DualSampler

Special sampling of WTK or NSRDB data for climate change applications

Note

This will always give daily / hourly data if t_enhance != 1. The number of days / hours in the samples is determined by t_enhance. For example, if t_enhance = 8 and sample_shape = (..., 24) there will be 3 days in the low res sample: lr_sample_shape = (…, 3). If 1 < t_enhance != 24 reduce_high_res_sub_daily() will be used to reduce a high res sample shape from (..., sample_shape[2] * 24 // t_enhance) to (..., sample_shape[2])

Parameters:

data (Sup3rDataset) – A Sup3rDataset instance with low-res and high-res data members
sample_shape (tuple) – Size of arrays to sample from the high-res data. The sample shape for the low-res sampler will be determined from the enhancement factors.
s_enhance (int) – Spatial enhancement factor
t_enhance (int) – Temporal enhancement factor
feature_sets (Optional[dict]) – Optional dictionary describing how the full set of features is split between lr_features, hr_exo_features, and hr_out_features.

lr_featureslist | tuple
List of feature names or patt*erns to use as low-resolution model inputs. If no entry is provided then all available features from the data will be used.

hr_out_featureslist | tuple
List of feature names or patt*erns that should be output by the generative model and available as ground truth targets. If no entry is provided then all features in lr_features will be used.

hr_exo_featureslist | tuple
List of feature names or patt*erns that should be available as high-resolution model inputs (like topography or observations) or for bespoke loss functions. Features used as inputs are injected into the model mid-network to condition output on high-resolution information. The model configuration should have the appropriate layers to use these features. e.g. Sup3rConcat for topography injection, Sup3rObsModel or Sup3rCrossAttention for obs injection. If no entry is provided then hr_exo_features will be empty.

*To include sparse features as inputs or targets the features must have an “_obs” suffix.
mode (str) – Mode for sampling data. Options are ‘lazy’ or ‘eager’. ‘eager’ mode pre-loads all data into memory as numpy arrays for faster access. ‘lazy’ mode samples directly from the underlying data object, which could be backed by dask arrays or on-disk netCDF files.

See also

DualSampler

Methods

`check_feature_consistency`()	Make sure features are consistent with the data and with each other.
`check_proxy_obs_consistency`()	Check that the obs features are configured correctly for proxy observations.
`check_shape_consistency`()	Make sure container shapes and sample shapes are compatible with enhancement factors.
`derive`(feature[, strict])	Resolve feature name to a feature in the underlying data.
`get_middle_days`(high_res, sample_shape)	Get middle chunk of high_res data that will then be reduced to day time steps.
`get_sample_index`([n_obs])	Get sample index for expanded hourly chunk which will be reduced to the given sample shape.
`post_init_log`([args_dict])	Log additional arguments after initialization.
`preflight`()	Perform shape and feature checks.
`reduce_high_res_sub_daily`(high_res[, csr_ind])	Take an hourly high-res observation and reduce the temporal axis down to lr_sample_shape[2] * t_enhance time steps, using only daylight hours on the middle part of the high res data.
`wrap`(data)	Return a `Sup3rDataset` object or tuple of such.

Attributes

`timer`
`data`	Return underlying data.
`hr_exo_features`	Get a list of exogenous high-resolution features that are only used for training e.g., mid-network high-res topo injection.
`hr_features`	List of feature names or patt*erns that the model is shown at high-resolution.
`hr_features_ind`	Get the high-resolution feature channel indices that should be included for loss calculations.
`hr_out_features`	List of feature names or patt*erns that should be output by the generative model.
`hr_sample_features`	List of feature names used in the sample index for the high-resolution training data.
`hr_sample_shape`	Shape of the data sample to select when __next__() is called.
`hr_source_features`	Features available natively at high-resolution.
`lr_features`	List of feature names or patt*erns to use as low-resolution model inputs.
`lr_features_ind`	Get the low-resolution feature channel indices that should be included for training.
`obs_features`	List of feature names or patt*erns that should be treated as observations.
`obs_features_ind`	Get the source feature indices in `features` for each obs feature.
`offshore_obs_frac`	Fraction of offshore observations to include in each batch when using proxy observations.
`onshore_obs_frac`	Fraction of onshore observations to include in each batch when using proxy observations.
`perturbation_scale`	Scale of the perturbation to add to the proxy observations when using proxy observations.
`sample_shape`	Shape of the data sample to select when `__next__()` is called.
`shape`	Get shape of underlying data.
`use_proxy_obs`	Whether to use proxy observations.

check_shape_consistency()[source]#: Make sure container shapes and sample shapes are compatible with enhancement factors.

reduce_high_res_sub_daily(high_res, csr_ind=0)[source]#

Take an hourly high-res observation and reduce the temporal axis down to lr_sample_shape[2] * t_enhance time steps, using only daylight hours on the middle part of the high res data.

Parameters:

high_res (Union[np.ndarray, da.core.Array]) – 5D array with dimensions (n_obs, spatial_1, spatial_2, temporal, n_features) where temporal >= 24 (set by the data handler).
csr_ind (int) – Feature index of clearsky_ratio. e.g. self.data[…, csr_ind] -> cs_ratio

Returns:

high_res (Union[np.ndarray, da.core.Array]) – 5D array with dimensions (n_obs, spatial_1, spatial_2, temporal, n_features) where temporal has been reduced down to the integer lr_sample_shape[2] * t_enhance. For example if hr_sample_shape[2] is 9 and t_enhance = 8, 72 hourly time steps will be reduced to 9 using the center daylight 9 hours from the second day.

Note

This only does something when 1 < t_enhance < 24. If t_enhance = 24 there is no need for reduction since every daily time step will have 24 hourly time steps in the high_res batch data. Of course, if t_enhance = 1, we are running for a spatial only model so this routine is unnecessary.

*Needs review from @grantbuster

static get_middle_days(high_res, sample_shape)[source]#: Get middle chunk of high_res data that will then be reduced to day time steps. This has n_time_steps = 24 if sample_shape[-1] <= 24 otherwise n_time_steps = sample_shape[-1].

get_sample_index(n_obs=None)[source]#: Get sample index for expanded hourly chunk which will be reduced to the given sample shape.

check_feature_consistency()#: Make sure features are consistent with the data and with each other.

check_proxy_obs_consistency()#: Check that the obs features are configured correctly for proxy observations.

property data#

Return underlying data.

Returns:: Sup3rDataset

sup3r.preprocessing.samplers.cc.DualSamplerCC

Contents

sup3r.preprocessing.samplers.cc.DualSamplerCC#