reVeal characterize#
Execute the characterize step from a config file.
Characterize a vector grid based on specified raster and vector datasets. Outputs a new GeoPackage containing the input grid with added attributes for the user-specified characterizations.
The general structure for calling this CLI command is given below
(add --help to print help info to the terminal).
Usage
reVeal characterize [OPTIONS]
Options
- -c, --config_file <config_file>#
Required Path to the
characterizeconfiguration file. Below is a sample template config{ "execution_control": { "option": "local", "allocation": "[REQUIRED IF ON HPC]", "walltime": "[REQUIRED IF ON HPC]", "qos": "normal", "memory": null, "queue": null, "feature": null, "conda_env": null, "module": null, "sh_script": null, "keep_sh": false, "num_test_nodes": null, "max_workers": null }, "log_directory": "./logs", "log_level": "INFO", "data_dir": "[REQUIRED]", "grid": "[REQUIRED]", "characterizations": "[REQUIRED]", "expressions": "[REQUIRED]" }
execution_control: option: local allocation: '[REQUIRED IF ON HPC]' walltime: '[REQUIRED IF ON HPC]' qos: normal memory: null queue: null feature: null conda_env: null module: null sh_script: null keep_sh: false num_test_nodes: null max_workers: null log_directory: ./logs log_level: INFO data_dir: '[REQUIRED]' grid: '[REQUIRED]' characterizations: '[REQUIRED]' expressions: '[REQUIRED]'
log_directory = "./logs" log_level = "INFO" data_dir = "[REQUIRED]" grid = "[REQUIRED]" characterizations = "[REQUIRED]" expressions = "[REQUIRED]" [execution_control] option = "local" allocation = "[REQUIRED IF ON HPC]" walltime = "[REQUIRED IF ON HPC]" qos = "normal" keep_sh = false
Parameters#
- execution_controldict
Dictionary containing execution control arguments. Allowed arguments are:
- option:
({‘local’, ‘kestrel’, ‘eagle’, ‘awspc’, ‘slurm’, ‘peregrine’}) Hardware run option. Determines the type of job scheduler to use as well as the base AU cost. The “slurm” option is a catchall for HPC systems that use the SLURM scheduler and should only be used if desired hardware is not listed above. If “local”, no other HPC-specific keys in are required in execution_control (they are ignored if provided).
- allocation:
(str) HPC project (allocation) handle.
- walltime:
(int) Node walltime request in hours.
- qos:
(str, optional) Quality-of-service specifier. For Kestrel users: This should be one of {‘standby’, ‘normal’, ‘high’}. Note that ‘high’ priority doubles the AU cost. By default,
"normal".- memory:
(int, optional) Node memory max limit (in GB). By default,
None, which uses the scheduler’s default memory limit. For Kestrel users: If you would like to use the full node memory, leave this argument unspecified (or set toNone) if you are running on standard nodes. However, if you would like to use the bigmem nodes, you must specify the full upper limit of memory you would like for your job, otherwise you will be limited to the standard node memory size (250GB).- max_workers:
([int, NoneType], optional) Maximum number of workers to use for multiprocessing when running applicable methods in parallel. By default None, will use all available workers for applicable methods. Note that this value will only be applied to characterizations where
max_workersis not specified at the characterization-level configuration.- queue:
(str, optional; PBS ONLY) HPC queue to submit job to. Examples include: ‘debug’, ‘short’, ‘batch’, ‘batch-h’, ‘long’, etc. By default,
None, which uses “test_queue”.- feature:
(str, optional) Additional flags for SLURM job (e.g. “-p debug”). By default,
None, which does not specify any additional flags.- conda_env:
(str, optional) Name of conda environment to activate. By default,
None, which does not load any environments.- module:
(str, optional) Module to load. By default,
None, which does not load any modules.- sh_script:
(str, optional) Extra shell script to run before command call. By default,
None, which does not run any scripts.- keep_sh:
(bool, optional) Option to keep the HPC submission script on disk. Only has effect if executing on HPC. By default,
False, which purges the submission scripts after each job is submitted.- num_test_nodes:
(str, optional) Number of nodes to submit before terminating the submission process. This can be used to test a new submission configuration without submitting all nodes (i.e. only running a handful to ensure the inputs are specified correctly and the outputs look reasonable). By default,
None, which submits all node jobs.
Only the option key is required for local execution. For execution on the HPC, the allocation and walltime keys are also required. All other options are populated with default values, as seen above.
- log_directorystr
Path to directory where logs should be written. Path can be relative and does not have to exist on disk (it will be created if missing). By default,
"./logs".- log_level{“DEBUG”, “INFO”, “WARNING”, “ERROR”}
String representation of desired logger verbosity. Suitable options are
DEBUG(most verbose),INFO(moderately verbose),WARNING(only log warnings and errors), andERROR(only log errors). By default,"INFO".- data_dirstr
Path to parent directory containing all geospatial raster and vector datasets to be used for grid characterization.
- gridstr
Path to gridded vector dataset for which characterization will be performed. Must be an existing vector polygon dataset in a format that can be opened by pyogrio. Does not strictly need to be a grid, but some functionality may not work if it is not.
- characterizationsdict
Characterizations to be performed. Must be a dictionary keyed by the name of the output attribute for each characterization. Each value must be another dictionary with the following keys:
dset: String indicating relative path withindata_dirto dataset to be characterized.method: String indicating characterization method to be performed. Refer toreVeal.config.characterize.VALID_CHARACTERIZATION_METHODS.attribute: Attribute to summarize. Only required for certain methods. Default isNone/null.weights_dset: String indicating relative path within data_dir to dataset to be used as weights. Only applies to characterization methods for rasters; ignored otherwise.neighbor_order: Integer indicating the order of neighbors to include in the characterization of each grid cell. For example,neighbor_order = 1would result in included first-order queen’s case neighbors. Optional, default is0, which does not include neighbors.buffer_distance: Float indicating buffer distance to apply in the characterization of each grid cell. Units are based on the CRS of the input grid dataset. For instance, a value of 500 in CRS EPGS:5070 would apply a buffer of 500m to each grid cell before characterization. Optional, default is0, which does not apply a buffer.parallel: Boolean indicating whether to run the characterization in parallel. This method is only applicable to methods specified assupports_parallelinreVeal.config.VALID_CHARACTERIZATION_METHODS. Default isTrue, which will run applicable method in parallel and have no effect for other methods. This value should only be changed toFalsefor small input grids, where the performance overhead of setting up parallel processing will outweigh the speedup of running operations in parallel. As a general rule of thumb, as long as the number of grid cells in your grid is an order of magnitude larger than the number of cores available, usingparallel=Trueshould yield improved performance.max_workers: Integer indicating the number of workers to use for parallel processing. Will only be applied to methods that support parallel processing. This input will take precedence over the top-levelmax_workersfrom theexecution_controlblock (if any). If neither are specified, all available workers will be used for parallel processing.
- expressionsdict
Additional expressions to be calculated. Must be a dictionary by the name of the output attribute for each expression. Each value must be a string indicating the expression to be calculated. Expression strings can reference one or more attributes/keys referenced in the characterizations dictionary.
Note that you may remove any keys with a
nullvalue if you do not intend to update them yourself.