reVReports.characterizations.recast_categories#

recast_categories(df, col, lkup, cell_size_sq_km=None)[source]#

Recast the JSON string found in df[col]

This function recasts the JSON string in df[col] to new columns in the dataframe. Each element in the embedded JSON strings will become a new column following the casting specified by lkup and cell_size_sq_km.

Parameters:
  • df (pandas.DataFrame) – Input pandas dataframe

  • col (str) – Name of column in df containing embedded JSON values (e.g., "{'0': 44.3, '1': 3.7}").

  • lkup (dict) – Dictionary used to map keys in the JSON strings to new, more meaningful names. Following the example above, this might be {"0": "Grassland", "1": "Water"}.This follows the same format one could use for pandas.rename(columns=lkup).

  • cell_size_sq_km (int, optional) –

    Optional value indicating the cell size of the characterization data being recast.

    If specified, it has two effects. First, it will be used to convert values of the JSON to values of area in units of square kilometers during the recast process. Second, all recast column names specified in lkup will have the suffix _area_sq_km added to them. Continuing from the examples above, if cell_size_sq_km=0.0081, the value 44.3 above would be multiplied by 0.0081, producing a new value of 0.35883. This value would be stored in a new column named "Water_area_sq_km".

    If not specified (or None), no conversion to area will be applied, values from the JSON will be passed through (or filled with 0 if missing), and column names specified in lkup will be used verbatim in the output dataframe.

    By default, None.

Returns:

pandas.DataFrame – New pandas dataframe with additional recast columns appended to the input dataframe.

Raises:

TypeError – A TypeError will be raised if one or more values in df[col] is not a str dtype.