mlclouds.trainer.Trainer
- class Trainer(train_sites='all', train_files='/projects/pxs/mlclouds/training_data/{year}_{area}_v322/mlclouds_surfrad_{area}_{year}.h5', config={'epochs_a': 100, 'epochs_b': 90, 'features': ['solar_zenith_angle', 'cloud_type', 'refl_0_65um_nom', 'refl_0_65um_nom_stddev_3x3', 'refl_3_75um_nom', 'temp_3_75um_nom', 'temp_11_0um_nom', 'temp_11_0um_nom_stddev_3x3', 'cloud_probability', 'cloud_fraction', 'air_temperature', 'dew_point', 'relative_humidity', 'total_precipitable_water', 'surface_albedo'], 'hidden_layers': [{'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}], 'learning_rate': 0.0005, 'loss_weights_a': [1, 0], 'loss_weights_b': [0.5, 0.5], 'metric': 'relative_mae', 'n_batch': 64, 'one_hot_categories': {'flag': ['clear', 'ice_cloud', 'water_cloud', 'bad_cloud']}, 'p_fun': 'p_fun_all_sky', 'p_kwargs': {'loss_terms': ['mae_ghi']}, 'phygnn_seed': 0, 'surfrad_window_minutes': 15, 'training_prep_kwargs': {'add_cloud_flag': True, 'filter_clear': False, 'filter_daylight': True, 'filter_sky_class': False, 'nan_option': 'interp', 'sza_lim': 89}, 'y_labels': ['cld_opd_dcomp', 'cld_reff_dcomp']}, test_fraction=None, nsrdb_files=None, cache_pattern=None)[source]
Bases:
objectClass to handle the training of the mlclouds phygnn model
Train PHYGNN model
- Parameters:
train_sites (‘all’ | list of int) – Surfrad gids to use for training. Use all if ‘all’
train_files (list of str | str) – File or list of files to use for training. Filenames must include the four-digit year and satellite indicator (east|west).
config (dict) – Phygnn configuration dict
test_fraction (None | float) – Fraction of full data set to reserve for testing. Should be between 0 to 1. The test set is randomly selected and dropped from the training set. If None, do not reserve a test set.
nsrdb_files (list | str) – Nsrdb files including irradiance data for the training sites. This is used to compute the sky class for these locations which is then used to filter cloud type data for false positives / negatives
cache_pattern (str) – Optional .csv filepath pattern to save data to. e.g.
./df_{}.csv. This will be used to saveself.train_data.df_rawandself.train_data.df_all_skybefore they have been split into training and validation setstrain_kwargs (dict | None) – Dictionary of keyword args for
model.train_model. e.g.run_preflight, return_diagnostics, etc
Methods