mlclouds.autoxval.AutoXVal
- class AutoXVal(sites=(0, 1, 2, 3, 4, 5, 6), val_sites=None, data_files='/projects/pxs/mlclouds/training_data/{year}_{area}_v322/mlclouds_surfrad_{area}_{year}.h5', val_data=None, config={'epochs_a': 100, 'epochs_b': 90, 'features': ['solar_zenith_angle', 'cloud_type', 'refl_0_65um_nom', 'refl_0_65um_nom_stddev_3x3', 'refl_3_75um_nom', 'temp_3_75um_nom', 'temp_11_0um_nom', 'temp_11_0um_nom_stddev_3x3', 'cloud_probability', 'cloud_fraction', 'air_temperature', 'dew_point', 'relative_humidity', 'total_precipitable_water', 'surface_albedo'], 'hidden_layers': [{'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}], 'learning_rate': 0.0005, 'loss_weights_a': [1, 0], 'loss_weights_b': [0.5, 0.5], 'metric': 'relative_mae', 'n_batch': 64, 'one_hot_categories': {'flag': ['clear', 'ice_cloud', 'water_cloud', 'bad_cloud']}, 'p_fun': 'p_fun_all_sky', 'p_kwargs': {'loss_terms': ['mae_ghi']}, 'phygnn_seed': 0, 'surfrad_window_minutes': 15, 'training_prep_kwargs': {'add_cloud_flag': True, 'filter_clear': False, 'filter_daylight': True, 'filter_sky_class': False, 'nan_option': 'interp', 'sza_lim': 89}, 'y_labels': ['cld_opd_dcomp', 'cld_reff_dcomp']}, shuffle_train=False, seed=None, xval=<class 'mlclouds.autoxval.XVal'>, catch_nan=False, min_train=1, save_timeseries=False)[source]
Bases:
objectRun cross validation by both varying the number of sites used for training, and the site used for validation.
- Parameters:
sites (list) – Sites to use for training and validation
val_sites (None | int | list) – Site(s) to use for validation, use all if None
data_files (str | list) – Files to use for training and validation
val_data (ValidationData instance | None) – Validation data to use. Load from data_files if None.
config (dict) – Dict of XVal configuration options. See CONFIG for example.
shuffle_train (bool) – Randomize training site list before iterating over # of training sites.
seed (None | int) – Seed for numpy.random if int
xval (Class) – Cross validation class. Used for testing.
catch_nan (bool) – If true, catch loss==nan exceptions and continue analysis
min_train (int) – Minimum # of sites to use for training
save_timeseries (bool) – Save time series data to disk
Methods
k_fold([data_files, val_data, sites, ...])Perform k-fold validation, only train on n-1 sites
kxn_fold([data_files, val_data, sites, ...])Perform cross validation against subsets of training sites
save_stats([path, fname])Save validation statistics csv and config as json
- save_stats(path='./stats', fname=None)[source]
Save validation statistics csv and config as json
- Parameters:
path (str) – Path to save stats csv and config json to
fname (str | None) – Filename w/o extension for stats and config files. Auto generate if None.
- classmethod k_fold(data_files='/projects/pxs/mlclouds/training_data/{year}_{area}_v322/mlclouds_surfrad_{area}_{year}.h5', val_data=None, sites=(0, 1, 2, 3, 4, 5, 6), val_sites=None, config={'epochs_a': 100, 'epochs_b': 90, 'features': ['solar_zenith_angle', 'cloud_type', 'refl_0_65um_nom', 'refl_0_65um_nom_stddev_3x3', 'refl_3_75um_nom', 'temp_3_75um_nom', 'temp_11_0um_nom', 'temp_11_0um_nom_stddev_3x3', 'cloud_probability', 'cloud_fraction', 'air_temperature', 'dew_point', 'relative_humidity', 'total_precipitable_water', 'surface_albedo'], 'hidden_layers': [{'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}], 'learning_rate': 0.0005, 'loss_weights_a': [1, 0], 'loss_weights_b': [0.5, 0.5], 'metric': 'relative_mae', 'n_batch': 64, 'one_hot_categories': {'flag': ['clear', 'ice_cloud', 'water_cloud', 'bad_cloud']}, 'p_fun': 'p_fun_all_sky', 'p_kwargs': {'loss_terms': ['mae_ghi']}, 'phygnn_seed': 0, 'surfrad_window_minutes': 15, 'training_prep_kwargs': {'add_cloud_flag': True, 'filter_clear': False, 'filter_daylight': True, 'filter_sky_class': False, 'nan_option': 'interp', 'sza_lim': 89}, 'y_labels': ['cld_opd_dcomp', 'cld_reff_dcomp']}, seed=None, xval=<class 'mlclouds.autoxval.XVal'>, catch_nan=False, save_timeseries=False)[source]
Perform k-fold validation, only train on n-1 sites
- classmethod kxn_fold(data_files='/projects/pxs/mlclouds/training_data/{year}_{area}_v322/mlclouds_surfrad_{area}_{year}.h5', val_data=None, sites=(0, 1, 2, 3, 4, 5, 6), val_sites=None, config={'epochs_a': 100, 'epochs_b': 90, 'features': ['solar_zenith_angle', 'cloud_type', 'refl_0_65um_nom', 'refl_0_65um_nom_stddev_3x3', 'refl_3_75um_nom', 'temp_3_75um_nom', 'temp_11_0um_nom', 'temp_11_0um_nom_stddev_3x3', 'cloud_probability', 'cloud_fraction', 'air_temperature', 'dew_point', 'relative_humidity', 'total_precipitable_water', 'surface_albedo'], 'hidden_layers': [{'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}], 'learning_rate': 0.0005, 'loss_weights_a': [1, 0], 'loss_weights_b': [0.5, 0.5], 'metric': 'relative_mae', 'n_batch': 64, 'one_hot_categories': {'flag': ['clear', 'ice_cloud', 'water_cloud', 'bad_cloud']}, 'p_fun': 'p_fun_all_sky', 'p_kwargs': {'loss_terms': ['mae_ghi']}, 'phygnn_seed': 0, 'surfrad_window_minutes': 15, 'training_prep_kwargs': {'add_cloud_flag': True, 'filter_clear': False, 'filter_daylight': True, 'filter_sky_class': False, 'nan_option': 'interp', 'sza_lim': 89}, 'y_labels': ['cld_opd_dcomp', 'cld_reff_dcomp']}, shuffle_train=False, seed=None, xval=<class 'mlclouds.autoxval.XVal'>, catch_nan=False, min_train=1, save_timeseries=False)[source]
Perform cross validation against subsets of training sites