mlclouds.autoxval.AutoXVal

class AutoXVal(sites=(0, 1, 2, 3, 4, 5, 6), val_sites=None, data_files='/projects/pxs/mlclouds/training_data/{year}_{area}_v322/mlclouds_surfrad_{area}_{year}.h5', val_data=None, config={'epochs_a': 100, 'epochs_b': 90, 'features': ['solar_zenith_angle', 'cloud_type', 'refl_0_65um_nom', 'refl_0_65um_nom_stddev_3x3', 'refl_3_75um_nom', 'temp_3_75um_nom', 'temp_11_0um_nom', 'temp_11_0um_nom_stddev_3x3', 'cloud_probability', 'cloud_fraction', 'air_temperature', 'dew_point', 'relative_humidity', 'total_precipitable_water', 'surface_albedo'], 'hidden_layers': [{'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}], 'learning_rate': 0.0005, 'loss_weights_a': [1, 0], 'loss_weights_b': [0.5, 0.5], 'metric': 'relative_mae', 'n_batch': 64, 'one_hot_categories': {'flag': ['clear', 'ice_cloud', 'water_cloud', 'bad_cloud']}, 'p_fun': 'p_fun_all_sky', 'p_kwargs': {'loss_terms': ['mae_ghi']}, 'phygnn_seed': 0, 'surfrad_window_minutes': 15, 'training_prep_kwargs': {'add_cloud_flag': True, 'filter_clear': False, 'filter_daylight': True, 'filter_sky_class': False, 'nan_option': 'interp', 'sza_lim': 89}, 'y_labels': ['cld_opd_dcomp', 'cld_reff_dcomp']}, shuffle_train=False, seed=None, xval=<class 'mlclouds.autoxval.XVal'>, catch_nan=False, min_train=1, save_timeseries=False)[source]

Bases: object

Run cross validation by both varying the number of sites used for training, and the site used for validation.

Parameters:
  • sites (list) – Sites to use for training and validation

  • val_sites (None | int | list) – Site(s) to use for validation, use all if None

  • data_files (str | list) – Files to use for training and validation

  • val_data (ValidationData instance | None) – Validation data to use. Load from data_files if None.

  • config (dict) – Dict of XVal configuration options. See CONFIG for example.

  • shuffle_train (bool) – Randomize training site list before iterating over # of training sites.

  • seed (None | int) – Seed for numpy.random if int

  • xval (Class) – Cross validation class. Used for testing.

  • catch_nan (bool) – If true, catch loss==nan exceptions and continue analysis

  • min_train (int) – Minimum # of sites to use for training

  • save_timeseries (bool) – Save time series data to disk

Methods

k_fold([data_files, val_data, sites, ...])

Perform k-fold validation, only train on n-1 sites

kxn_fold([data_files, val_data, sites, ...])

Perform cross validation against subsets of training sites

save_stats([path, fname])

Save validation statistics csv and config as json

save_stats(path='./stats', fname=None)[source]

Save validation statistics csv and config as json

Parameters:
  • path (str) – Path to save stats csv and config json to

  • fname (str | None) – Filename w/o extension for stats and config files. Auto generate if None.

classmethod k_fold(data_files='/projects/pxs/mlclouds/training_data/{year}_{area}_v322/mlclouds_surfrad_{area}_{year}.h5', val_data=None, sites=(0, 1, 2, 3, 4, 5, 6), val_sites=None, config={'epochs_a': 100, 'epochs_b': 90, 'features': ['solar_zenith_angle', 'cloud_type', 'refl_0_65um_nom', 'refl_0_65um_nom_stddev_3x3', 'refl_3_75um_nom', 'temp_3_75um_nom', 'temp_11_0um_nom', 'temp_11_0um_nom_stddev_3x3', 'cloud_probability', 'cloud_fraction', 'air_temperature', 'dew_point', 'relative_humidity', 'total_precipitable_water', 'surface_albedo'], 'hidden_layers': [{'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}], 'learning_rate': 0.0005, 'loss_weights_a': [1, 0], 'loss_weights_b': [0.5, 0.5], 'metric': 'relative_mae', 'n_batch': 64, 'one_hot_categories': {'flag': ['clear', 'ice_cloud', 'water_cloud', 'bad_cloud']}, 'p_fun': 'p_fun_all_sky', 'p_kwargs': {'loss_terms': ['mae_ghi']}, 'phygnn_seed': 0, 'surfrad_window_minutes': 15, 'training_prep_kwargs': {'add_cloud_flag': True, 'filter_clear': False, 'filter_daylight': True, 'filter_sky_class': False, 'nan_option': 'interp', 'sza_lim': 89}, 'y_labels': ['cld_opd_dcomp', 'cld_reff_dcomp']}, seed=None, xval=<class 'mlclouds.autoxval.XVal'>, catch_nan=False, save_timeseries=False)[source]

Perform k-fold validation, only train on n-1 sites

classmethod kxn_fold(data_files='/projects/pxs/mlclouds/training_data/{year}_{area}_v322/mlclouds_surfrad_{area}_{year}.h5', val_data=None, sites=(0, 1, 2, 3, 4, 5, 6), val_sites=None, config={'epochs_a': 100, 'epochs_b': 90, 'features': ['solar_zenith_angle', 'cloud_type', 'refl_0_65um_nom', 'refl_0_65um_nom_stddev_3x3', 'refl_3_75um_nom', 'temp_3_75um_nom', 'temp_11_0um_nom', 'temp_11_0um_nom_stddev_3x3', 'cloud_probability', 'cloud_fraction', 'air_temperature', 'dew_point', 'relative_humidity', 'total_precipitable_water', 'surface_albedo'], 'hidden_layers': [{'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}], 'learning_rate': 0.0005, 'loss_weights_a': [1, 0], 'loss_weights_b': [0.5, 0.5], 'metric': 'relative_mae', 'n_batch': 64, 'one_hot_categories': {'flag': ['clear', 'ice_cloud', 'water_cloud', 'bad_cloud']}, 'p_fun': 'p_fun_all_sky', 'p_kwargs': {'loss_terms': ['mae_ghi']}, 'phygnn_seed': 0, 'surfrad_window_minutes': 15, 'training_prep_kwargs': {'add_cloud_flag': True, 'filter_clear': False, 'filter_daylight': True, 'filter_sky_class': False, 'nan_option': 'interp', 'sza_lim': 89}, 'y_labels': ['cld_opd_dcomp', 'cld_reff_dcomp']}, shuffle_train=False, seed=None, xval=<class 'mlclouds.autoxval.XVal'>, catch_nan=False, min_train=1, save_timeseries=False)[source]

Perform cross validation against subsets of training sites