mlclouds.trainer.Trainer

class Trainer(train_sites='all', train_files='/projects/pxs/mlclouds/training_data/{year}_{area}_v322/mlclouds_surfrad_{area}_{year}.h5', config={'epochs_a': 100, 'epochs_b': 90, 'features': ['solar_zenith_angle', 'cloud_type', 'refl_0_65um_nom', 'refl_0_65um_nom_stddev_3x3', 'refl_3_75um_nom', 'temp_3_75um_nom', 'temp_11_0um_nom', 'temp_11_0um_nom_stddev_3x3', 'cloud_probability', 'cloud_fraction', 'air_temperature', 'dew_point', 'relative_humidity', 'total_precipitable_water', 'surface_albedo'], 'hidden_layers': [{'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}], 'learning_rate': 0.0005, 'loss_weights_a': [1, 0], 'loss_weights_b': [0.5, 0.5], 'metric': 'relative_mae', 'n_batch': 64, 'one_hot_categories': {'flag': ['clear', 'ice_cloud', 'water_cloud', 'bad_cloud']}, 'p_fun': 'p_fun_all_sky', 'p_kwargs': {'loss_terms': ['mae_ghi']}, 'phygnn_seed': 0, 'surfrad_window_minutes': 15, 'training_prep_kwargs': {'add_cloud_flag': True, 'filter_clear': False, 'filter_daylight': True, 'filter_sky_class': False, 'nan_option': 'interp', 'sza_lim': 89}, 'y_labels': ['cld_opd_dcomp', 'cld_reff_dcomp']}, test_fraction=None, nsrdb_files=None, cache_pattern=None)[source]

Bases: object

Class to handle the training of the mlclouds phygnn model

Train PHYGNN model

Parameters:
  • train_sites (‘all’ | list of int) – Surfrad gids to use for training. Use all if ‘all’

  • train_files (list of str | str) – File or list of files to use for training. Filenames must include the four-digit year and satellite indicator (east|west).

  • config (dict) – Phygnn configuration dict

  • test_fraction (None | float) – Fraction of full data set to reserve for testing. Should be between 0 to 1. The test set is randomly selected and dropped from the training set. If None, do not reserve a test set.

  • nsrdb_files (list | str) – Nsrdb files including irradiance data for the training sites. This is used to compute the sky class for these locations which is then used to filter cloud type data for false positives / negatives

  • cache_pattern (str) – Optional .csv filepath pattern to save data to. e.g. ./df_{}.csv. This will be used to save self.train_data.df_raw and self.train_data.df_all_sky before they have been split into training and validation sets

  • train_kwargs (dict | None) – Dictionary of keyword args for model.train_model. e.g. run_preflight, return_diagnostics, etc

Methods