mlclouds.data_handlers.TrainData
- class TrainData(train_files, train_sites='all', config={'epochs_a': 100, 'epochs_b': 90, 'features': ['solar_zenith_angle', 'cloud_type', 'refl_0_65um_nom', 'refl_0_65um_nom_stddev_3x3', 'refl_3_75um_nom', 'temp_3_75um_nom', 'temp_11_0um_nom', 'temp_11_0um_nom_stddev_3x3', 'cloud_probability', 'cloud_fraction', 'air_temperature', 'dew_point', 'relative_humidity', 'total_precipitable_water', 'surface_albedo'], 'hidden_layers': [{'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}, {'activation': 'relu', 'dropout': 0.1, 'units': 256}], 'learning_rate': 0.0005, 'loss_weights_a': [1, 0], 'loss_weights_b': [0.5, 0.5], 'metric': 'relative_mae', 'n_batch': 64, 'one_hot_categories': {'flag': ['clear', 'ice_cloud', 'water_cloud', 'bad_cloud']}, 'p_fun': 'p_fun_all_sky', 'p_kwargs': {'loss_terms': ['mae_ghi']}, 'phygnn_seed': 0, 'surfrad_window_minutes': 15, 'training_prep_kwargs': {'add_cloud_flag': True, 'filter_clear': False, 'filter_daylight': True, 'filter_sky_class': False, 'nan_option': 'interp', 'sza_lim': 89}, 'y_labels': ['cld_opd_dcomp', 'cld_reff_dcomp']}, test_fraction=None, nsrdb_files=None, cache_pattern=None)[source]
Bases:
objectLoad and prep training data
- Parameters:
train_files (list | str) – File or list of files to use for training. Filenames must include the four-digit year.
train_sites (‘all’ | list of int) – Surfrad gids to use for training. Use all if ‘all’
config (dict) – Dict of configuration options. See CONFIG for example.
test_fraction (None | float) – Fraction of full data set to reserve for testing. Should be between 0 to 1. The test set is randomly selected and dropped from the training set. If None, do not reserve a test set.
nsrdb_files (list) – Nsrdb files including irradiance data for the training sites. This is used to compute the sky class for these locations which is then used to filter cloud type data for false positives / negatives. Each file needs to have a four digit year and east / west label.
cache_pattern (str) – File path pattern for saving training data. e.g.
./df_{}.csv. This will be used to saveself.x,self.y, andself.p
Methods
cache_exists(cache_pattern)Check if cache files for
df_rawanddf_all_skyexist.load_all_data(fp_pattern)Load all df_raw / df_all_sky from csv files.
save_all_data(fp_pattern)Save all raw / all_sky data to disk