reVeal.cli.characterize.run#

run(data_dir, grid, characterizations, expressions, out_dir, max_workers=None, _local=True)[source]#

Run grid characterization.

Characterize a vector grid based on specified raster and vector datasets. Outputs a new GeoPackage containing the input grid with added attributes for the user-specified characterizations.

Parameters:
  • data_dir (str) – Path to parent directory containing all geospatial raster and vector datasets to be used for grid characterization.

  • grid (str) – Path to gridded vector dataset for which characterization will be performed. Must be an existing vector polygon dataset in a format that can be opened by pyogrio. Does not strictly need to be a grid, but some functionality may not work if it is not.

  • characterizations (dict) – Characterizations to be performed. Must be a dictionary keyed by the name of the output attribute for each characterization. Each value must be another dictionary with the following keys:

    • dset: String indicating relative path within data_dir to dataset to be characterized.

    • method: String indicating characterization method to be performed. Refer to reVeal.config.characterize.VALID_CHARACTERIZATION_METHODS.

    • attribute: Attribute to summarize. Only required for certain methods. Default is None/null.

    • weights_dset: String indicating relative path within data_dir to dataset to be used as weights. Only applies to characterization methods for rasters; ignored otherwise.

    • neighbor_order: Integer indicating the order of neighbors to include in the characterization of each grid cell. For example, neighbor_order = 1 would result in included first-order queen’s case neighbors. Optional, default is 0, which does not include neighbors.

    • buffer_distance: Float indicating buffer distance to apply in the characterization of each grid cell. Units are based on the CRS of the input grid dataset. For instance, a value of 500 in CRS EPGS:5070 would apply a buffer of 500m to each grid cell before characterization. Optional, default is 0, which does not apply a buffer.

    • parallel: Boolean indicating whether to run the characterization in parallel. This method is only applicable to methods specified as supports_parallel in reVeal.config.VALID_CHARACTERIZATION_METHODS. Default is True, which will run applicable method in parallel and have no effect for other methods. This value should only be changed to False for small input grids, where the performance overhead of setting up parallel processing will outweigh the speedup of running operations in parallel. As a general rule of thumb, as long as the number of grid cells in your grid is an order of magnitude larger than the number of cores available, using parallel=True should yield improved performance.

    • max_workers: Integer indicating the number of workers to use for parallel processing. Will only be applied to methods that support parallel processing. This input will take precedence over the top-level max_workers from the execution_control block (if any). If neither are specified, all available workers will be used for parallel processing.

  • expressions (dict) – Additional expressions to be calculated. Must be a dictionary by the name of the output attribute for each expression. Each value must be a string indicating the expression to be calculated. Expression strings can reference one or more attributes/keys referenced in the characterizations dictionary.

  • out_dir (str) – Output parent directory. Results will be saved to a file named “grid_char.gpkg”.

  • max_workers ([int, NoneType], optional) – Maximum number of workers to use for multiprocessing when running applicable methods in parallel. By default None, will use all available workers for applicable methods. Note that this value will only be applied to characterizations where max_workers is not specified at the characterization-level configuration.

  • _local (bool) – Flag indicating whether the code is being run locally or via HPC job submissions. NOTE: This is not a user provided parameter - it is determined dynamically by based on whether config[“execution_control”][“option”] == “local” (defaults to True if not specified).