Data Processing with ``geoPFA``: 2D Example from Newberry Volcano, OR ===================================================================== | This tutorial demonstrates how to use the **data processing workflow** in ``geoPFA``. | You’ll learn how to: 1. Load a PFA configuration file (which defines criteria, components, and layers) 2. Read raw geospatial datasets into the working ``pfa`` dictionary 3. Process and clean these datasets into a consistent grid ready for layer combination and favorability modeling Subsequent notebooks will build upon this by combining layers, applying weights, and producing final favorability models. 1. Imports and Setup -------------------- .. code:: ipython3 # --- General imports --- from pathlib import Path import geopandas as gpd # --- geoPFA core classes --- from geopfa.data_readers import GeospatialDataReaders from geopfa.processing import Cleaners, Processing from geopfa.plotters import GeospatialDataPlotters # --- Utilities --- from rex.utilities.utilities import safe_json_load import json Define Directories ~~~~~~~~~~~~~~~~~~ .. code:: ipython3 # Get the directory where this notebook lives notebook_dir = Path(__file__).resolve().parent if "__file__" in locals() else Path.cwd() # Define the main project directory (where 'config/', 'data/', and 'notebooks/' live) project_dir = notebook_dir.parent # Define subdirectories config_dir = project_dir / "config" data_dir = project_dir / "data" # Confirm structure print("Notebook directory:", notebook_dir) print("Config directory:", config_dir) print("Data directory:", data_dir) # Quick check before proceeding for folder in [config_dir, data_dir]: if not folder.exists(): raise FileNotFoundError(f"Expected folder not found: {folder}") .. parsed-literal:: Notebook directory: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/notebooks Config directory: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/config Data directory: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data 2. Configuration and Data Input ------------------------------- With the project directories defined, the next step is to **read the configuration file** and **load all raw geospatial data** into memory. | Each PFA project is organized according to a hierarchy of criteria → components → layers, where each **layer** represents a dataset. | |dir_structure| | The configuration JSON defines: - Which layers belong to which components, and which components belong to which criteria - Layer metadata including names, coordinate reference systems (CRS), and column definitions | - Processing instructions (e.g., interpolation methods, z-measure units) | To ensure a smooth workflow: - The structure of the ``/data`` directory must exactly match those specified in the JSON (see example below). Folder names must match criteria and component names, and file names must match layer names. | - Each dataset must have consistent column names and units with its config entry. - See Configuration Instructions for more .. figure:: figs/directory_structure.png :alt: dir_structure dir_structure .. |dir_structure| image:: figs/pfa_processing.png Read configuration file ~~~~~~~~~~~~~~~~~~~~~~~ .. code:: ipython3 # Path to configuration file pfa_path = config_dir / "newberry_superhot_config.json" # Load JSON configuration safely (handles comments and malformed JSON) pfa = safe_json_load(str(pfa_path)) # Quick check if not pfa_path.exists(): raise FileNotFoundError(f"Configuration file not found: {pfa_path}") pfa = safe_json_load(str(pfa_path)) print(f"Loaded PFA configuration from: {pfa_path}") .. parsed-literal:: Loaded PFA configuration from: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/config/newberry_superhot_config.json Load data into the ``pfa`` dictionary ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: ipython3 # Define the file types to be read (others will be skipped with a warning) file_types = [".csv", ".shp"] # Gather data according to the config structure pfa = GeospatialDataReaders.gather_data(data_dir, pfa, file_types) .. parsed-literal:: criteria: geologic component: heat reading layer: density_joint_inv reading layer: mt_resistivity_joint_inv reading layer: temperature_model_500m reading layer: earthquakes reading layer: velocity_model_vs reading layer: velocity_model_vpvs component: reservoir reading layer: lineaments reading layer: density_joint_inv reading layer: mt_resistivity_joint_inv reading layer: earthquakes reading layer: velocity_model_vs reading layer: velocity_model_vpvs component: insulation reading layer: density_joint_inv reading layer: mt_resistivity_joint_inv reading layer: earthquakes reading layer: temperature_model_500m reading layer: velocity_model_vp Verify that all layers loaded correctly ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This simple diagnostic ensures that each layer entry in the ``pfa`` dictionary contains a ``"data"`` key with a valid GeoDataFrame. .. code:: ipython3 missing_data_layers = [] for criteria, crit_data in pfa.get("criteria", {}).items(): for component, comp_data in crit_data.get("components", {}).items(): for layer, layer_data in comp_data.get("layers", {}).items(): if "data" not in layer_data: missing_data_layers.append({ "criteria": criteria, "component": component, "layer": layer, }) if missing_data_layers: print("Layers missing 'data':") for item in missing_data_layers: print(f" - {item['criteria']} / {item['component']} / {item['layer']}") else: print("All layers contain 'data'.") .. parsed-literal:: All layers contain 'data'. 3. Harmonizing Data ------------------- Now that all raw layers are loaded into the ``pfa`` dictionary, we’ll: - Define a target coordinate reference system (CRS) - Constrain grid resolution (``nx`` and ``ny``) - Import supporting data such as an area outline or a well path to give context - Derive a shared spatial extent for clipping and gridding - Preview the input data layers Define the Target CRS and Import Contextual Data ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: ipython3 # EPSG code for UTM Zone 10N (meters) target_crs = 26910 # Define grid resolution (number of cells in x and y) nx = 300; ny = 300 # Load the project boundary shapefile outline_path = data_dir / "supporting_data" / "national_monument_boundary" / "NNVM_bounds.shp" outline = gpd.read_file(outline_path).to_crs(target_crs) print(f"Loaded boundary: {outline_path.name}") .. parsed-literal:: Loaded boundary: NNVM_bounds.shp Unify CRS and Define Extent ~~~~~~~~~~~~~~~~~~~~~~~~~~~ CRS is defined manually using the variable ``target_crs`` defined above. Extent is defined using the defined ``extent_layer`` above, where the project extent is set to that of the specified ``extent_layer``. .. code:: ipython3 # Reproject all layers to the target CRS pfa = Cleaners.set_crs(pfa, target_crs=target_crs) # Choose one representative layer to define the spatial extent extent_layer = ( pfa["criteria"]["geologic"]["components"]["heat"]["layers"]["mt_resistivity_joint_inv"]["data"] ) extent = Cleaners.get_extent(extent_layer, dim=2) # Ensure we grab the 2d extent from 3d data # Validate the extent print(f"Extent: {extent}") if extent[0] >= extent[2] or extent[1] >= extent[3]: raise ValueError("Invalid extent: min values must be less than max values.") # Define the active criterion (geologic only in this example) criteria = "geologic" .. parsed-literal:: Extent: [624790.891073, 4825350.71118, 653145.891073, 4855310.71118] 4. Convert 3D Layers to 2D -------------------------- | Many datasets in a PFA project are generated or modeled in 3D, but 2D PFAs often only require a surface representation. | This step uses ``Processing.convert_3d_to_2d()`` to **collapse each 3D layer along the Z-dimension** into a 2D ``GeoDataFrame``. Instead of taking a slice at a given depth, this function sums over Z and each X,Y location, provinding insight into thickness of features. If you’d prefer to use a slice at a given depth, use that 2D slice as your input data layer rather than inputting the 3D model. | Layers are only flattened if: - They are marked as ``"is_3d": "yes"`` in the configuration file | - They are **not** explicitly excluded (``"needs_flattening": "no"``) To save time, each unique layer is flattened **only once** and then reused for any other component that references it. This is especially useful for large shared datasets. .. code:: ipython3 # Cache to store flattened layers and reuse them flattened_layers = {} for component, comp_data in pfa["criteria"][criteria]["components"].items(): for layer, layer_config in comp_data["layers"].items(): is_3d = layer_config.get("is_3d", None) needs_flattening = layer_config.get("needs_flattening", "yes") # Skip layers not flagged as 3D or marked "no" for flattening if not is_3d or str(needs_flattening).lower() == "no": continue # Use the layer name as the unique cache key key = layer if key in flattened_layers: print(f"Reusing flattened layer: {layer} for component: {component}") # Copy cached flattened layer into current component/layer pfa["criteria"][criteria]["components"][component]["layers"][layer] = ( flattened_layers[key].copy() ) continue print(f"Converting 3D layer: {layer} → 2D") # Perform flattening and store in PFA pfa = Processing.convert_3d_to_2d(pfa, criteria, component=component, layer=layer) # Cache the flattened result for reuse flattened_layers[key] = ( pfa["criteria"][criteria]["components"][component]["layers"][layer].copy() ) print("3D → 2D flattening complete.") .. parsed-literal:: Converting 3D layer: density_joint_inv → 2D Converting 3D layer: mt_resistivity_joint_inv → 2D Converting 3D layer: temperature_model_500m → 2D Converting 3D layer: velocity_model_vs → 2D Converting 3D layer: velocity_model_vpvs → 2D Reusing flattened layer: density_joint_inv for component: reservoir Reusing flattened layer: mt_resistivity_joint_inv for component: reservoir Reusing flattened layer: velocity_model_vs for component: reservoir Reusing flattened layer: velocity_model_vpvs for component: reservoir Reusing flattened layer: density_joint_inv for component: insulation Reusing flattened layer: mt_resistivity_joint_inv for component: insulation Reusing flattened layer: temperature_model_500m for component: insulation Converting 3D layer: velocity_model_vp → 2D 3D → 2D flattening complete. Visualize Cleaned Raw Data Layers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``geoPFA`` has built-in plotting tools to help visualize data. Some examples are shown below using ``GeospatialDataPlotters.geo_plot_3d``. #### Plot a Single Layer .. code:: ipython3 component = "heat" layer = "mt_resistivity_joint_inv" gdf = pfa["criteria"][criteria]["components"][component]["layers"][layer]["data"] col = pfa["criteria"][criteria]["components"][component]["layers"][layer]["data_col"] units = pfa["criteria"][criteria]["components"][component]["layers"][layer]["units"] title = f"{layer}: raw data" GeospatialDataPlotters.geo_plot(gdf, col, units, title, markersize=2, figsize=(5, 5)) .. image:: ./1-processing_newberry_20_0.png Plot All Layers (Optional) ^^^^^^^^^^^^^^^^^^^^^^^^^^ Loop through the full hierarchy and visualize every dataset. This is useful for verifying that CRS, units, and extents align, but can take time for large inputs. .. code:: ipython3 for criteria, crit_data in pfa["criteria"].items(): print(criteria) plotted_layers = [] for component, comp_data in crit_data["components"].items(): print(f"\t{component}") for layer, layer_data in comp_data["layers"].items(): if layer in plotted_layers: print(f"\t\t{layer} plotted already") continue gdf = layer_data["data"] col = layer_data.get("data_col", None) if layer == "faults_newberry": col = "None" # faults are geometries only units = layer_data["units"] title = f"{layer}: raw data" GeospatialDataPlotters.geo_plot(gdf, col, units, title, markersize=2, figsize=(5, 5)) plotted_layers.append(layer) .. parsed-literal:: geologic heat .. image:: ./1-processing_newberry_22_1.png .. image:: ./1-processing_newberry_22_2.png .. image:: ./1-processing_newberry_22_3.png .. image:: ./1-processing_newberry_22_4.png .. image:: ./1-processing_newberry_22_5.png .. image:: ./1-processing_newberry_22_6.png .. parsed-literal:: reservoir density_joint_inv plotted already mt_resistivity_joint_inv plotted already earthquakes plotted already .. image:: ./1-processing_newberry_22_8.png .. parsed-literal:: velocity_model_vs plotted already velocity_model_vpvs plotted already insulation density_joint_inv plotted already mt_resistivity_joint_inv plotted already earthquakes plotted already temperature_model_500m plotted already .. image:: ./1-processing_newberry_22_10.png 5. Data Processing and Preparation ---------------------------------- This step transforms raw geospatial inputs into standardized layers on a shared 2D grid for comparison and combination. | Examples: - *Permeability indicators*: distance to faults, earthquake density | - *Heat indicators*: resistivity, temperature, low density | - *Insulation indicators*: shallow resistivity, seismic velocity | Processing ensures all layers: - Are free from outliers | - Share the same CRS and spatial extent | - Are mapped to the same ``(nx, ny)`` grid | - Represent continuous 2D evidence produced from the data layer (e.g. distance from faults as opposed to fault location) — How It Works ~~~~~~~~~~~~ All 2D ``geoPFA`` processing functions follow the same pattern: 1. Take the **raw input** stored in ``pfa["criteria"][criteria]["components"][component]["layers"][layer]["data"]`` 2. Process it (e.g., interpolation, point density, distance-to-feature) onto the common 2D grid 3. Write the result to ``pfa["criteria"][criteria]["components"][component]["layers"][layer]["model"]`` Filter Outliers ~~~~~~~~~~~~~~~ Some datasets contain spurious extreme values. Use Cleaners.filter_geodataframe() to remove these outliers based on a quantile threshold. .. code:: ipython3 # Filter the upper 10% of values in selected datasets target_layers = [] for layer in target_layers: for component in ["heat", "reservoir"]: gdf = pfa["criteria"][criteria]["components"]["heat"]["layers"][layer]["data"] col = pfa["criteria"][criteria]["components"]["heat"]["layers"][layer]["data_col"] filtered = Cleaners.filter_geodataframe(gdf, col, quantile=0.9) pfa["criteria"][criteria]["components"][component]["layers"][layer]["data"] = filtered print("Outlier filtering complete.") .. parsed-literal:: Outlier filtering complete. Extrapolation ~~~~~~~~~~~~~ ``geoPFA`` can perform extrapolation over each layer to ensure every layer fills the complete extent. ``geoPFA`` uses a sparse Gaussian process regression (GPR) model to backfill missing data. Extrapolation works best when the extent of the missing data is relatively small compared to the known data. For large areas, ``geoPFA`` reverst to a linear trend across the training site to produce essentially the average spatial proess of the known data. For large datasets, extrapolation can take substanial processing time to complete. Users can improve model fitting time by reducing the ``training_size``. Datasets with relatively complete information will benefit greatly from reduced training sizes and should see little reduction in performance. An optional ``verbose`` flag will enable model assessment and reporting along with model diagnostic and final prediction plots. .. code:: ipython3 target_layers = ['velocity_model_vp','velocity_model_vs','velocity_model_vpvs'] extrapolated_layers = {} for criteria, crit_data in pfa["criteria"].items(): print(criteria) for component, comp_data in crit_data["components"].items(): print(f"\t{component}") for layer, layer_config in comp_data["layers"].items(): if layer in extrapolated_layers: print(f"\t\tReusing cached extrapolation for {layer}") pfa["criteria"][criteria]["components"][component][ "layers" ][layer] = extrapolated_layers[layer].copy() elif layer in target_layers: print(f"\t\tExtrapolating layer: {layer}") gdf = "data" col = pfa["criteria"][criteria]["components"][component]["layers"][layer]["data_col"] pfa = Processing.extrapolate_2d( pfa, criteria=criteria, component=component, layer=layer, dataset=gdf, data_col=col, training_size=0.8, verbose=False, ) extrapolated_layers[layer] = pfa["criteria"][criteria][ "components" ][component]["layers"][layer].copy() else: continue .. parsed-literal:: geologic heat Extrapolating layer: velocity_model_vs Extrapolating layer: velocity_model_vpvs reservoir Reusing cached extrapolation for velocity_model_vs Reusing cached extrapolation for velocity_model_vpvs insulation Extrapolating layer: velocity_model_vp Interpolation ~~~~~~~~~~~~~ For most datasets, data can be processed and prepared for layer combination and favorability estimation by simply interpolating data to the target grid. Here we do that only once for each input layer that has ``"processing_method": "interpolate"`` specified in the configuration file. .. code:: ipython3 method = 'linear' interpolated_layers = {} for criteria, crit_data in pfa["criteria"].items(): print(criteria) for component, comp_data in crit_data["components"].items(): print(f"\t{component}") for layer, layer_config in comp_data["layers"].items(): if layer_config.get("processing_method") == "interpolate": if layer in interpolated_layers: print(f"\t\tReusing cached interpolation for {layer}") pfa["criteria"][criteria]["components"][component]["layers"][layer]['model'] = ( interpolated_layers[layer]['model'].copy() ) pfa["criteria"][criteria]["components"][component]["layers"][layer]['model_data_col'] = ( interpolated_layers[layer]['model_data_col'] ) pfa["criteria"][criteria]["components"][component]["layers"][layer]['model_units'] = ( interpolated_layers[layer]['model_units'] ) else: print(f"\t\tInterpolating layer: {layer}") pfa = Processing.interpolate_points( pfa, criteria=criteria, component=component, layer=layer, interp_method=method, nx=nx, ny=ny, extent=extent, ) interpolated_layers[layer] = ( pfa["criteria"][criteria]["components"][component]["layers"][layer].copy() ) .. parsed-literal:: geologic heat Interpolating layer: density_joint_inv Interpolating layer: mt_resistivity_joint_inv Interpolating layer: temperature_model_500m Interpolating layer: velocity_model_vs Interpolating layer: velocity_model_vpvs reservoir Reusing cached interpolation for density_joint_inv Reusing cached interpolation for mt_resistivity_joint_inv Reusing cached interpolation for velocity_model_vs Reusing cached interpolation for velocity_model_vpvs insulation Reusing cached interpolation for density_joint_inv Reusing cached interpolation for mt_resistivity_joint_inv Reusing cached interpolation for temperature_model_500m Interpolating layer: velocity_model_vp Point Density to be Used as a Favorability Proxy ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For some layers, the spatial density of points (e.g., earthquake occurrences) is an indicator of permeability or fracturing. ``geoPFA`` has two options to do this in 2D, including: - ``Processing.point_density``: Computes 2D density projected onto user-defined grids (ofter coarser than the project grid), ,and then project those onto the project grid to allow PFA computations that rely on harmonized grids. - ``Processing.weighted_distance_from_points``: Compute a weighted influence field from points over the 2D project grid, using distance to control “influence,” optionallly weighted by an attribute of the user’s choice (e.g., earthquake magnitude). In this example, we use ``Processing.weighted_distance_from_points`` because it produces smoother results and alllows us to incorporate earthquake magnitude. > *Tuning Tip:* Larger ``alpha`` values spread each equarthquake’s influence farther; smaller values make it more localized. In other words, a larger value for ``alpha`` might be preferably when wworking with relatively sparse earthquakes. .. code:: ipython3 criteria = 'geologic' layer = 'earthquakes' # Filter extreme densities for comp in ["heat", "reservoir", "insulation"]: pfa = Processing.weighted_distance_from_points(pfa, criteria=criteria, component=comp, layer=layer, extent=extent, nx=nx, ny=ny, alpha=1000) gdf = pfa["criteria"][criteria]["components"][comp]["layers"][layer]["model"].copy() col = pfa["criteria"][criteria]["components"][comp]["layers"][layer]["model_data_col"] filtered = Cleaners.filter_geodataframe(gdf, col, quantile=0.9) pfa["criteria"][criteria]["components"][comp]["layers"][layer]["model"] = filtered Process Faults ~~~~~~~~~~~~~~ | Faults often act as preferential pathways for fluid flow, making them key indicators for the *reservoir* component. | Here, we use the built-in ``process_faults()`` function to calculate a **fault-based favorability score** that increases with proximity to faults and (optionally) their intersections. | The function performs: 1. **Distance calculation** – measures distance from each grid point to nearby faults. | 2. **Intersection detection** (optional) – identifies where faults intersect and computes distance to those points. | 3. **Exponential decay weighting** – applies user-defined decay constants (``alpha_fault``, ``alpha_intersection``) to represent how influence decreases with distance. | 4. **Weighted combination** – combines both effects using ``weight_fault`` and ``weight_intersection``. *Tuning Tip:* Larger ``alpha`` values spread the fault’s influence farther; smaller values make it more localized. .. code:: ipython3 component = "reservoir" layer = "lineaments" # --- Fault processing for reservoir component --- pfa = Processing.process_faults( pfa, criteria=criteria, component=component, layer=layer, extent=extent, nx=nx, ny=ny, alpha_fault=5500.0, # decay distance for faults alpha_intersection=3500.0, # decay distance for intersections weight_fault=0.7, # relative weighting of faults vs intersections weight_intersection=0.3, use_intersections=True, # include intersection effects ) # Visualize the results # Note the plotted extent differs from the raw data when comparing # Note we now grab the processed data from its "model" locations gdf = pfa["criteria"][criteria]["components"][component]["layers"][layer]["model"] col = pfa["criteria"][criteria]["components"][component]["layers"][layer]["model_data_col"] units = pfa["criteria"][criteria]["components"][component]["layers"][layer]["model_units"] title = f"{layer}: raw data" GeospatialDataPlotters.geo_plot(gdf, col, units, title, markersize=2, figsize=(5, 5)) .. image:: ./1-processing_newberry_33_0.png Plot All Processed Layers ~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: ipython3 for criteria, crit_data in pfa["criteria"].items(): print(criteria) plotted_layers = [] for component, comp_data in crit_data["components"].items(): print(f"\t{component}") for layer, layer_data in comp_data["layers"].items(): if layer in plotted_layers: continue # Skip if no processed model exists if "model" not in layer_data or layer_data["model"] is None: print(f"\t\t!! Skipping {layer}: no processed 'model' found.") continue gdf = layer_data["model"] col = layer_data.get("model_data_col") units = layer_data.get("model_units", "None") title = f"{layer}: processed model" # Plot in 3D GeospatialDataPlotters.geo_plot( gdf, col, units, title, area_outline=outline, extent=extent, markersize=0.75, figsize=(5, 5) ) plotted_layers.append(layer) print("\nFinished plotting all available processed layers.") .. parsed-literal:: geologic heat .. image:: ./1-processing_newberry_35_1.png .. image:: ./1-processing_newberry_35_2.png .. image:: ./1-processing_newberry_35_3.png .. image:: ./1-processing_newberry_35_4.png .. image:: ./1-processing_newberry_35_5.png .. image:: ./1-processing_newberry_35_6.png .. parsed-literal:: reservoir .. image:: ./1-processing_newberry_35_8.png .. parsed-literal:: insulation .. image:: ./1-processing_newberry_35_10.png .. parsed-literal:: Finished plotting all available processed layers. Save Processed Layers --------------------- It’s good practice to save processed data to disk before proceeding so processing only has to be completed once. This allows the user to make minor tweaks to the PFA, such as adjusting weights, without reprocessing all data layers. .. code:: ipython3 for criteria, crit_data in pfa["criteria"].items(): print(criteria) for component, comp_data in crit_data["components"].items(): print(f"\t{component}") for layer, layer_data in comp_data["layers"].items(): gdf = layer_data["model"] col = layer_data["model_data_col"] units = layer_data["model_units"] out_fp = data_dir / criteria / component / f"{layer}_processed.csv" gdf.to_csv(out_fp, index=False) print(f"\t\tSaved: {out_fp}") .. parsed-literal:: geologic heat Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/heat/density_joint_inv_processed.csv Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/heat/mt_resistivity_joint_inv_processed.csv Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/heat/temperature_model_500m_processed.csv Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/heat/earthquakes_processed.csv Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/heat/velocity_model_vs_processed.csv Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/heat/velocity_model_vpvs_processed.csv reservoir Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/reservoir/density_joint_inv_processed.csv Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/reservoir/mt_resistivity_joint_inv_processed.csv Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/reservoir/earthquakes_processed.csv Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/reservoir/lineaments_processed.csv Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/reservoir/velocity_model_vs_processed.csv Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/reservoir/velocity_model_vpvs_processed.csv insulation Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/insulation/density_joint_inv_processed.csv Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/insulation/mt_resistivity_joint_inv_processed.csv Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/insulation/earthquakes_processed.csv Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/insulation/temperature_model_500m_processed.csv Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/insulation/velocity_model_vp_processed.csv Save a “Clean” PFA Config (no GeoDataFrames) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Similarly, saving a “clean” processed PFA config allows the user to avoid repeating the processing step when making minor adjustments to the PFA. This processed config serves as the config file for the next step. .. code:: ipython3 def drop_geodataframes(data: dict) -> dict: """Recursively remove GeoDataFrame objects before saving.""" clean = {} for key, val in data.items(): if isinstance(val, gpd.GeoDataFrame): continue elif isinstance(val, dict): clean[key] = drop_geodataframes(val) else: clean[key] = val return clean pfa_nodf = drop_geodataframes(pfa) out_json = config_dir / "newberry_superhot_processed_config.json" with open(out_json, "w") as f: json.dump(pfa_nodf, f, indent=4) print(f"Processed PFA configuration saved to: {out_json}") .. parsed-literal:: Processed PFA configuration saved to: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/config/newberry_superhot_processed_config.json -------------- Next Steps: Layer Combination and Favorability Modeling ------------------------------------------------------- This concludes **Notebook 1 – Data Processing with ``geoPFA``**. | In this notebook, we: - Loaded, cleaned, and standardized all input datasets | - Processed each layer (filtering, interpolation, density, distance, transformations) | - Saved the resulting **processed layers** to disk You’re now ready to combine these layers into component and criteria favorability models. | If you skip running this notebook, don’t worry — the **supplementary example data** includes all intermediate processed files. | You can start directly with **Notebook 2 – Layer Combination and Favorability Modeling**, which loads those preprocessed layers automatically.