sup3r.postprocessing.collectors.nc.CollectorNC#
- class CollectorNC(file_paths)[source]#
Bases:
BaseCollectorSup3r NETCDF file collection framework
- Parameters:
file_paths (list | str) – Explicit list of str file paths that will be sorted and collected or a single string with unix-style /search/patt*ern.<ext>. Files should have non-overlapping time_index and spatial domains.
Methods
collect(file_paths, out_file[, features, ...])Collect data files from a dir to one output file.
get_chunk_indices(file)Get spatial and temporal chunk indices from the given file name.
get_node_cmd(config)Get a CLI call to collect data.
get_time_dim_name(filepath)Get the name of the time dimension in the given file
group_spatial_chunks([res_kwargs])Group same spatial chunks together to get list of files with same spatial footprint but different times.
write_data(out_file, dsets, time_index, ...)Write list of datasets to out_file.
- classmethod collect(file_paths, out_file, features='all', log_level=None, log_file=None, overwrite=True, res_kwargs=None, cacher_kwargs=None)[source]#
Collect data files from a dir to one output file.
- Filename requirements:
Should end with “.nc”
- Parameters:
file_paths (list | str) – Explicit list of str file paths that will be sorted and collected or a single string with unix-style /search/patt*ern.nc.
out_file (str) – File path of final output file.
features (list | str) – List of dsets to collect. If ‘all’ then all
data_varswill be collected.log_level (str | None) – Desired log level, None will not initialize logging.
log_file (str | None) – Target log file. None logs to stdout.
overwrite (bool) – Whether to overwrite existing output file
res_kwargs (dict | None) – Dictionary of kwargs to pass to xarray.open_mfdataset.
cacher_kwargs (dict | None) – Dictionary of kwargs to pass to Cacher._write_single.
- group_spatial_chunks(res_kwargs=None)[source]#
Group same spatial chunks together to get list of files with same spatial footprint but different times. Return Loader instances for each spatial chunk with combined times.
- static get_chunk_indices(file)#
Get spatial and temporal chunk indices from the given file name.
- Returns:
temporal_chunk_index (str) – Zero padded integer for the temporal chunk index
spatial_chunk_index (str) – Zero padded integer for the spatial chunk index
- classmethod get_node_cmd(config)#
Get a CLI call to collect data.
- Parameters:
config (dict) – sup3r collection config with all necessary args and kwargs to run data collection.
- static get_time_dim_name(filepath)#
Get the name of the time dimension in the given file
- Parameters:
filepath (str) – Path to the file
- Returns:
time_key (str) – Name of the time dimension in the given file
- classmethod write_data(out_file, dsets, time_index, data_list, meta, global_attrs=None)#
Write list of datasets to out_file.
- Parameters:
out_file (str) – Pre-existing H5 file output path
dsets (list) – list of datasets to write to out_file
time_index (pd.DatetimeIndex()) – Pandas datetime index to use for file time_index.
data_list (list) – List of np.ndarray objects to write to out_file
meta (pd.DataFrame) – Full meta dataframe for the final output data.
global_attrs (dict) – Namespace of file-global attributes for the final output data.