Data Processing with ``geoPFA``: 2D Example from Newberry Volcano, OR
=====================================================================

| This tutorial demonstrates how to use the **data processing workflow**
  in ``geoPFA``.
| You’ll learn how to:

1. Load a PFA configuration file (which defines criteria, components,
   and layers)
2. Read raw geospatial datasets into the working ``pfa`` dictionary
3. Process and clean these datasets into a consistent grid ready for
   layer combination and favorability modeling

Subsequent notebooks will build upon this by combining layers, applying
weights, and producing final favorability models.

1. Imports and Setup
--------------------

.. code:: ipython3

    # --- General imports ---
    from pathlib import Path
    import geopandas as gpd
    
    # --- geoPFA core classes ---
    from geopfa.data_readers import GeospatialDataReaders
    from geopfa.processing import Cleaners, Processing
    from geopfa.plotters import GeospatialDataPlotters
    
    # --- Utilities ---
    from rex.utilities.utilities import safe_json_load
    import json

Define Directories
~~~~~~~~~~~~~~~~~~

.. code:: ipython3

    # Get the directory where this notebook lives
    notebook_dir = Path(__file__).resolve().parent if "__file__" in locals() else Path.cwd()
    
    # Define the main project directory (where 'config/', 'data/', and 'notebooks/' live)
    project_dir = notebook_dir.parent
    
    # Define subdirectories
    config_dir = project_dir / "config"
    data_dir = project_dir / "data"
    
    # Confirm structure
    print("Notebook directory:", notebook_dir)
    print("Config directory:", config_dir)
    print("Data directory:", data_dir)
    
    # Quick check before proceeding
    for folder in [config_dir, data_dir]:
        if not folder.exists():
            raise FileNotFoundError(f"Expected folder not found: {folder}")


.. parsed-literal::

    Notebook directory: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/notebooks
    Config directory: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/config
    Data directory: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data


2. Configuration and Data Input
-------------------------------

With the project directories defined, the next step is to **read the
configuration file** and **load all raw geospatial data** into memory.

| Each PFA project is organized according to a hierarchy of criteria →
  components → layers, where each **layer** represents a dataset.
| |dir_structure|

| The configuration JSON defines: - Which layers belong to which
  components, and which components belong to which criteria - Layer
  metadata including names, coordinate reference systems (CRS), and
  column definitions
| - Processing instructions (e.g., interpolation methods, z-measure
  units)

| To ensure a smooth workflow: - The structure of the ``/data``
  directory must exactly match those specified in the JSON (see example
  below). Folder names must match criteria and component names, and file
  names must match layer names.
| - Each dataset must have consistent column names and units with its
  config entry. - See Configuration Instructions for more

.. figure:: figs/directory_structure.png
   :alt: dir_structure

   dir_structure

.. |dir_structure| image:: figs/pfa_processing.png

Read configuration file
~~~~~~~~~~~~~~~~~~~~~~~

.. code:: ipython3

    # Path to configuration file
    pfa_path = config_dir / "newberry_superhot_config.json"
    
    # Load JSON configuration safely (handles comments and malformed JSON)
    pfa = safe_json_load(str(pfa_path))
    
    # Quick check
    if not pfa_path.exists():
        raise FileNotFoundError(f"Configuration file not found: {pfa_path}")
    
    pfa = safe_json_load(str(pfa_path))
    print(f"Loaded PFA configuration from: {pfa_path}")


.. parsed-literal::

    Loaded PFA configuration from: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/config/newberry_superhot_config.json


Load data into the ``pfa`` dictionary
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: ipython3

    # Define the file types to be read (others will be skipped with a warning)
    file_types = [".csv", ".shp"]
    
    # Gather data according to the config structure
    pfa = GeospatialDataReaders.gather_data(data_dir, pfa, file_types)


.. parsed-literal::

    criteria: geologic
    	 component: heat
    		 reading layer: density_joint_inv
    		 reading layer: mt_resistivity_joint_inv
    		 reading layer: temperature_model_500m
    		 reading layer: earthquakes
    		 reading layer: velocity_model_vs
    		 reading layer: velocity_model_vpvs
    	 component: reservoir
    		 reading layer: lineaments
    		 reading layer: density_joint_inv
    		 reading layer: mt_resistivity_joint_inv
    		 reading layer: earthquakes
    		 reading layer: velocity_model_vs
    		 reading layer: velocity_model_vpvs
    	 component: insulation
    		 reading layer: density_joint_inv
    		 reading layer: mt_resistivity_joint_inv
    		 reading layer: earthquakes
    		 reading layer: temperature_model_500m
    		 reading layer: velocity_model_vp


Verify that all layers loaded correctly
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This simple diagnostic ensures that each layer entry in the ``pfa``
dictionary contains a ``"data"`` key with a valid GeoDataFrame.

.. code:: ipython3

    missing_data_layers = []
    
    for criteria, crit_data in pfa.get("criteria", {}).items():
        for component, comp_data in crit_data.get("components", {}).items():
            for layer, layer_data in comp_data.get("layers", {}).items():
                if "data" not in layer_data:
                    missing_data_layers.append({
                        "criteria": criteria,
                        "component": component,
                        "layer": layer,
                    })
    
    if missing_data_layers:
        print("Layers missing 'data':")
        for item in missing_data_layers:
            print(f" - {item['criteria']} / {item['component']} / {item['layer']}")
    else:
        print("All layers contain 'data'.")


.. parsed-literal::

    All layers contain 'data'.


3. Harmonizing Data
-------------------

Now that all raw layers are loaded into the ``pfa`` dictionary, we’ll: -
Define a target coordinate reference system (CRS) - Constrain grid
resolution (``nx`` and ``ny``) - Import supporting data such as an area
outline or a well path to give context - Derive a shared spatial extent
for clipping and gridding - Preview the input data layers

Define the Target CRS and Import Contextual Data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: ipython3

    # EPSG code for UTM Zone 10N (meters)
    target_crs = 26910
    # Define grid resolution (number of cells in x and y)
    nx = 300; ny = 300
    
    # Load the project boundary shapefile
    outline_path = data_dir / "supporting_data" / "national_monument_boundary" / "NNVM_bounds.shp"
    outline = gpd.read_file(outline_path).to_crs(target_crs)
    
    print(f"Loaded boundary: {outline_path.name}")


.. parsed-literal::

    Loaded boundary: NNVM_bounds.shp


Unify CRS and Define Extent
~~~~~~~~~~~~~~~~~~~~~~~~~~~

CRS is defined manually using the variable ``target_crs`` defined above.
Extent is defined using the defined ``extent_layer`` above, where the
project extent is set to that of the specified ``extent_layer``.

.. code:: ipython3

    # Reproject all layers to the target CRS
    pfa = Cleaners.set_crs(pfa, target_crs=target_crs)
    
    # Choose one representative layer to define the spatial extent
    extent_layer = (
        pfa["criteria"]["geologic"]["components"]["heat"]["layers"]["mt_resistivity_joint_inv"]["data"]
    )
    extent = Cleaners.get_extent(extent_layer, dim=2)  # Ensure we grab the 2d extent from 3d data
    
    # Validate the extent
    print(f"Extent: {extent}")
    if extent[0] >= extent[2] or extent[1] >= extent[3]:
        raise ValueError("Invalid extent: min values must be less than max values.")
    
    # Define the active criterion (geologic only in this example)
    criteria = "geologic"


.. parsed-literal::

    Extent: [624790.891073, 4825350.71118, 653145.891073, 4855310.71118]


4. Convert 3D Layers to 2D
--------------------------

| Many datasets in a PFA project are generated or modeled in 3D, but 2D
  PFAs often only require a surface representation.
| This step uses ``Processing.convert_3d_to_2d()`` to **collapse each 3D
  layer along the Z-dimension** into a 2D ``GeoDataFrame``.

Instead of taking a slice at a given depth, this function sums over Z
and each X,Y location, provinding insight into thickness of features. If
you’d prefer to use a slice at a given depth, use that 2D slice as your
input data layer rather than inputting the 3D model.

| Layers are only flattened if: - They are marked as ``"is_3d": "yes"``
  in the configuration file
| - They are **not** explicitly excluded (``"needs_flattening": "no"``)

To save time, each unique layer is flattened **only once** and then
reused for any other component that references it. This is especially
useful for large shared datasets.

.. code:: ipython3

    # Cache to store flattened layers and reuse them
    flattened_layers = {}
    
    for component, comp_data in pfa["criteria"][criteria]["components"].items():
        for layer, layer_config in comp_data["layers"].items():
            is_3d = layer_config.get("is_3d", None)
            needs_flattening = layer_config.get("needs_flattening", "yes")
    
            # Skip layers not flagged as 3D or marked "no" for flattening
            if not is_3d or str(needs_flattening).lower() == "no":
                continue
    
            # Use the layer name as the unique cache key
            key = layer
    
            if key in flattened_layers:
                print(f"Reusing flattened layer: {layer} for component: {component}")
                # Copy cached flattened layer into current component/layer
                pfa["criteria"][criteria]["components"][component]["layers"][layer] = (
                    flattened_layers[key].copy()
                )
                continue
    
            print(f"Converting 3D layer: {layer} → 2D")
    
            # Perform flattening and store in PFA
            pfa = Processing.convert_3d_to_2d(pfa, criteria, component=component, layer=layer)
    
            # Cache the flattened result for reuse
            flattened_layers[key] = (
                pfa["criteria"][criteria]["components"][component]["layers"][layer].copy()
            )
    
    print("3D → 2D flattening complete.")


.. parsed-literal::

    Converting 3D layer: density_joint_inv → 2D
    Converting 3D layer: mt_resistivity_joint_inv → 2D
    Converting 3D layer: temperature_model_500m → 2D
    Converting 3D layer: velocity_model_vs → 2D
    Converting 3D layer: velocity_model_vpvs → 2D
    Reusing flattened layer: density_joint_inv for component: reservoir
    Reusing flattened layer: mt_resistivity_joint_inv for component: reservoir
    Reusing flattened layer: velocity_model_vs for component: reservoir
    Reusing flattened layer: velocity_model_vpvs for component: reservoir
    Reusing flattened layer: density_joint_inv for component: insulation
    Reusing flattened layer: mt_resistivity_joint_inv for component: insulation
    Reusing flattened layer: temperature_model_500m for component: insulation
    Converting 3D layer: velocity_model_vp → 2D
    3D → 2D flattening complete.


Visualize Cleaned Raw Data Layers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

``geoPFA`` has built-in plotting tools to help visualize data. Some
examples are shown below using ``GeospatialDataPlotters.geo_plot_3d``.
#### Plot a Single Layer

.. code:: ipython3

    component = "heat"
    layer = "mt_resistivity_joint_inv"
    
    gdf = pfa["criteria"][criteria]["components"][component]["layers"][layer]["data"]
    col = pfa["criteria"][criteria]["components"][component]["layers"][layer]["data_col"]
    units = pfa["criteria"][criteria]["components"][component]["layers"][layer]["units"]
    title = f"{layer}: raw data"
    
    GeospatialDataPlotters.geo_plot(gdf, col, units, title, markersize=2, figsize=(5, 5))


.. image:: ./1-processing_newberry_20_0.png


Plot All Layers (Optional)
^^^^^^^^^^^^^^^^^^^^^^^^^^

Loop through the full hierarchy and visualize every dataset. This is
useful for verifying that CRS, units, and extents align, but can take
time for large inputs.

.. code:: ipython3

    for criteria, crit_data in pfa["criteria"].items():
        print(criteria)
        plotted_layers = []
        for component, comp_data in crit_data["components"].items():
            print(f"\t{component}")
            for layer, layer_data in comp_data["layers"].items():
                if layer in plotted_layers:
                    print(f"\t\t{layer} plotted already")
                    continue
    
                gdf = layer_data["data"]
                col = layer_data.get("data_col", None)
                if layer == "faults_newberry":
                    col = "None"  # faults are geometries only
    
                units = layer_data["units"]
                title = f"{layer}: raw data"
    
                GeospatialDataPlotters.geo_plot(gdf, col, units, title, markersize=2, figsize=(5, 5))
                plotted_layers.append(layer)


.. parsed-literal::

    geologic
    	heat


.. image:: ./1-processing_newberry_22_1.png


.. image:: ./1-processing_newberry_22_2.png


.. image:: ./1-processing_newberry_22_3.png


.. image:: ./1-processing_newberry_22_4.png


.. image:: ./1-processing_newberry_22_5.png


.. image:: ./1-processing_newberry_22_6.png


.. parsed-literal::

    	reservoir
    		density_joint_inv plotted already
    		mt_resistivity_joint_inv plotted already
    		earthquakes plotted already


.. image:: ./1-processing_newberry_22_8.png


.. parsed-literal::

    		velocity_model_vs plotted already
    		velocity_model_vpvs plotted already
    	insulation
    		density_joint_inv plotted already
    		mt_resistivity_joint_inv plotted already
    		earthquakes plotted already
    		temperature_model_500m plotted already


.. image:: ./1-processing_newberry_22_10.png


5. Data Processing and Preparation
----------------------------------

This step transforms raw geospatial inputs into standardized layers on a
shared 2D grid for comparison and combination.

| Examples: - *Permeability indicators*: distance to faults, earthquake
  density
| - *Heat indicators*: resistivity, temperature, low density
| - *Insulation indicators*: shallow resistivity, seismic velocity

| Processing ensures all layers: - Are free from outliers
| - Share the same CRS and spatial extent
| - Are mapped to the same ``(nx, ny)`` grid
| - Represent continuous 2D evidence produced from the data layer
  (e.g. distance from faults as opposed to fault location) —

How It Works
~~~~~~~~~~~~

All 2D ``geoPFA`` processing functions follow the same pattern:

1. Take the **raw input** stored in
   ``pfa["criteria"][criteria]["components"][component]["layers"][layer]["data"]``
2. Process it (e.g., interpolation, point density, distance-to-feature)
   onto the common 2D grid
3. Write the result to
   ``pfa["criteria"][criteria]["components"][component]["layers"][layer]["model"]``

Filter Outliers
~~~~~~~~~~~~~~~

Some datasets contain spurious extreme values. Use
Cleaners.filter_geodataframe() to remove these outliers based on a
quantile threshold.

.. code:: ipython3

    # Filter the upper 10% of values in selected datasets
    target_layers = []
    
    for layer in target_layers:
        for component in ["heat", "reservoir"]:
            gdf = pfa["criteria"][criteria]["components"]["heat"]["layers"][layer]["data"]
            col = pfa["criteria"][criteria]["components"]["heat"]["layers"][layer]["data_col"]
    
            filtered = Cleaners.filter_geodataframe(gdf, col, quantile=0.9)
            pfa["criteria"][criteria]["components"][component]["layers"][layer]["data"] = filtered
    
    print("Outlier filtering complete.")


.. parsed-literal::

    Outlier filtering complete.


Extrapolation
~~~~~~~~~~~~~

``geoPFA`` can perform extrapolation over each layer to ensure every
layer fills the complete extent. ``geoPFA`` uses a sparse Gaussian
process regression (GPR) model to backfill missing data. Extrapolation
works best when the extent of the missing data is relatively small
compared to the known data. For large areas, ``geoPFA`` reverst to a
linear trend across the training site to produce essentially the average
spatial proess of the known data.

For large datasets, extrapolation can take substanial processing time to
complete. Users can improve model fitting time by reducing the
``training_size``. Datasets with relatively complete information will
benefit greatly from reduced training sizes and should see little
reduction in performance. An optional ``verbose`` flag will enable model
assessment and reporting along with model diagnostic and final
prediction plots.

.. code:: ipython3

    target_layers = ['velocity_model_vp','velocity_model_vs','velocity_model_vpvs']
    extrapolated_layers = {}
    
    for criteria, crit_data in pfa["criteria"].items():
        print(criteria)
        for component, comp_data in crit_data["components"].items():
            print(f"\t{component}")
            for layer, layer_config in comp_data["layers"].items():
                if layer in extrapolated_layers:
                    print(f"\t\tReusing cached extrapolation for {layer}")
                    pfa["criteria"][criteria]["components"][component][
                        "layers"
                    ][layer] = extrapolated_layers[layer].copy()
                elif layer in target_layers:
                    print(f"\t\tExtrapolating layer: {layer}")
        
                    gdf = "data"
                    col = pfa["criteria"][criteria]["components"][component]["layers"][layer]["data_col"]
        
                    pfa = Processing.extrapolate_2d(
                        pfa,
                        criteria=criteria,
                        component=component,
                        layer=layer,
                        dataset=gdf,
                        data_col=col,
                        training_size=0.8,
                        verbose=False,
                    )
        
                    extrapolated_layers[layer] = pfa["criteria"][criteria][
                        "components"
                    ][component]["layers"][layer].copy()
                else: 
                    continue


.. parsed-literal::

    geologic
    	heat
    		Extrapolating layer: velocity_model_vs
    		Extrapolating layer: velocity_model_vpvs
    	reservoir
    		Reusing cached extrapolation for velocity_model_vs
    		Reusing cached extrapolation for velocity_model_vpvs
    	insulation
    		Extrapolating layer: velocity_model_vp


Interpolation
~~~~~~~~~~~~~

For most datasets, data can be processed and prepared for layer
combination and favorability estimation by simply interpolating data to
the target grid. Here we do that only once for each input layer that has
``"processing_method": "interpolate"`` specified in the configuration
file.

.. code:: ipython3

    method = 'linear'
    
    interpolated_layers = {}
    
    for criteria, crit_data in pfa["criteria"].items():
        print(criteria)
        for component, comp_data in crit_data["components"].items():
            print(f"\t{component}")
            for layer, layer_config in comp_data["layers"].items():
                if layer_config.get("processing_method") == "interpolate":
                    if layer in interpolated_layers:
                        print(f"\t\tReusing cached interpolation for {layer}")
                        pfa["criteria"][criteria]["components"][component]["layers"][layer]['model'] = (
                            interpolated_layers[layer]['model'].copy()
                        )
                        pfa["criteria"][criteria]["components"][component]["layers"][layer]['model_data_col'] = (
                            interpolated_layers[layer]['model_data_col']
                        )
                        pfa["criteria"][criteria]["components"][component]["layers"][layer]['model_units'] = (
                            interpolated_layers[layer]['model_units']
                        )
                    else:
                        print(f"\t\tInterpolating layer: {layer}")
                        pfa = Processing.interpolate_points(
                            pfa,
                            criteria=criteria,
                            component=component,
                            layer=layer,
                            interp_method=method,
                            nx=nx,
                            ny=ny,
                            extent=extent,
                        )
                        interpolated_layers[layer] = (
                            pfa["criteria"][criteria]["components"][component]["layers"][layer].copy()
                        )


.. parsed-literal::

    geologic
    	heat
    		Interpolating layer: density_joint_inv
    		Interpolating layer: mt_resistivity_joint_inv
    		Interpolating layer: temperature_model_500m
    		Interpolating layer: velocity_model_vs
    		Interpolating layer: velocity_model_vpvs
    	reservoir
    		Reusing cached interpolation for density_joint_inv
    		Reusing cached interpolation for mt_resistivity_joint_inv
    		Reusing cached interpolation for velocity_model_vs
    		Reusing cached interpolation for velocity_model_vpvs
    	insulation
    		Reusing cached interpolation for density_joint_inv
    		Reusing cached interpolation for mt_resistivity_joint_inv
    		Reusing cached interpolation for temperature_model_500m
    		Interpolating layer: velocity_model_vp


Point Density to be Used as a Favorability Proxy
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For some layers, the spatial density of points (e.g., earthquake
occurrences) is an indicator of permeability or fracturing. ``geoPFA``
has two options to do this in 2D, including: -
``Processing.point_density``: Computes 2D density projected onto
user-defined grids (ofter coarser than the project grid), ,and then
project those onto the project grid to allow PFA computations that rely
on harmonized grids. - ``Processing.weighted_distance_from_points``:
Compute a weighted influence field from points over the 2D project grid,
using distance to control “influence,” optionallly weighted by an
attribute of the user’s choice (e.g., earthquake magnitude).

In this example, we use ``Processing.weighted_distance_from_points``
because it produces smoother results and alllows us to incorporate
earthquake magnitude. > *Tuning Tip:* Larger ``alpha`` values spread
each equarthquake’s influence farther; smaller values make it more
localized. In other words, a larger value for ``alpha`` might be
preferably when wworking with relatively sparse earthquakes.

.. code:: ipython3

    criteria = 'geologic'
    layer = 'earthquakes'
    # Filter extreme densities
    for comp in ["heat", "reservoir", "insulation"]:
        pfa = Processing.weighted_distance_from_points(pfa, criteria=criteria, component=comp, 
                                   layer=layer, extent=extent, nx=nx, ny=ny, alpha=1000)
        gdf = pfa["criteria"][criteria]["components"][comp]["layers"][layer]["model"].copy()
        col = pfa["criteria"][criteria]["components"][comp]["layers"][layer]["model_data_col"]
        filtered = Cleaners.filter_geodataframe(gdf, col, quantile=0.9)
        pfa["criteria"][criteria]["components"][comp]["layers"][layer]["model"] = filtered

Process Faults
~~~~~~~~~~~~~~

| Faults often act as preferential pathways for fluid flow, making them
  key indicators for the *reservoir* component.
| Here, we use the built-in ``process_faults()`` function to calculate a
  **fault-based favorability score** that increases with proximity to
  faults and (optionally) their intersections.

| The function performs: 1. **Distance calculation** – measures distance
  from each grid point to nearby faults.
| 2. **Intersection detection** (optional) – identifies where faults
  intersect and computes distance to those points.
| 3. **Exponential decay weighting** – applies user-defined decay
  constants (``alpha_fault``, ``alpha_intersection``) to represent how
  influence decreases with distance.
| 4. **Weighted combination** – combines both effects using
  ``weight_fault`` and ``weight_intersection``.

   *Tuning Tip:* Larger ``alpha`` values spread the fault’s influence
   farther; smaller values make it more localized.

.. code:: ipython3

    component = "reservoir"
    layer = "lineaments"
    
    # --- Fault processing for reservoir component ---
    pfa = Processing.process_faults(
        pfa,
        criteria=criteria,
        component=component,
        layer=layer,
        extent=extent,
        nx=nx,
        ny=ny,
        alpha_fault=5500.0,        # decay distance for faults
        alpha_intersection=3500.0, # decay distance for intersections
        weight_fault=0.7,          # relative weighting of faults vs intersections
        weight_intersection=0.3,
        use_intersections=True,    # include intersection effects
    )
    
    #  Visualize the results
    #  Note the plotted extent differs from the raw data when comparing
    
    # Note we now grab the processed data from its "model" locations
    gdf = pfa["criteria"][criteria]["components"][component]["layers"][layer]["model"]
    col = pfa["criteria"][criteria]["components"][component]["layers"][layer]["model_data_col"]
    units = pfa["criteria"][criteria]["components"][component]["layers"][layer]["model_units"]
    title = f"{layer}: raw data"
    
    GeospatialDataPlotters.geo_plot(gdf, col, units, title, markersize=2, figsize=(5, 5))


.. image:: ./1-processing_newberry_33_0.png


Plot All Processed Layers
~~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: ipython3

    for criteria, crit_data in pfa["criteria"].items():
        print(criteria)
        plotted_layers = []
    
        for component, comp_data in crit_data["components"].items():
            print(f"\t{component}")
    
            for layer, layer_data in comp_data["layers"].items():
                if layer in plotted_layers:
                    continue
    
                # Skip if no processed model exists
                if "model" not in layer_data or layer_data["model"] is None:
                    print(f"\t\t!! Skipping {layer}: no processed 'model' found.")
                    continue
    
                gdf = layer_data["model"]
                col = layer_data.get("model_data_col")
                units = layer_data.get("model_units", "None")
                title = f"{layer}: processed model"
    
                # Plot in 3D
                GeospatialDataPlotters.geo_plot(
                    gdf, col, units, title, area_outline=outline, extent=extent, markersize=0.75, figsize=(5, 5)
                )
    
                plotted_layers.append(layer)
    
    print("\nFinished plotting all available processed layers.")


.. parsed-literal::

    geologic
    	heat


.. image:: ./1-processing_newberry_35_1.png


.. image:: ./1-processing_newberry_35_2.png


.. image:: ./1-processing_newberry_35_3.png


.. image:: ./1-processing_newberry_35_4.png


.. image:: ./1-processing_newberry_35_5.png


.. image:: ./1-processing_newberry_35_6.png


.. parsed-literal::

    	reservoir


.. image:: ./1-processing_newberry_35_8.png


.. parsed-literal::

    	insulation


.. image:: ./1-processing_newberry_35_10.png


.. parsed-literal::

    
    Finished plotting all available processed layers.


Save Processed Layers
---------------------

It’s good practice to save processed data to disk before proceeding so
processing only has to be completed once. This allows the user to make
minor tweaks to the PFA, such as adjusting weights, without reprocessing
all data layers.

.. code:: ipython3

    for criteria, crit_data in pfa["criteria"].items():
        print(criteria)
        for component, comp_data in crit_data["components"].items():
            print(f"\t{component}")
            for layer, layer_data in comp_data["layers"].items():
                gdf = layer_data["model"]
                col = layer_data["model_data_col"]
                units = layer_data["model_units"]
    
                out_fp = data_dir / criteria / component / f"{layer}_processed.csv"
                gdf.to_csv(out_fp, index=False)
                print(f"\t\tSaved: {out_fp}")


.. parsed-literal::

    geologic
    	heat
    		Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/heat/density_joint_inv_processed.csv
    		Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/heat/mt_resistivity_joint_inv_processed.csv
    		Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/heat/temperature_model_500m_processed.csv
    		Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/heat/earthquakes_processed.csv
    		Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/heat/velocity_model_vs_processed.csv
    		Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/heat/velocity_model_vpvs_processed.csv
    	reservoir
    		Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/reservoir/density_joint_inv_processed.csv
    		Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/reservoir/mt_resistivity_joint_inv_processed.csv
    		Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/reservoir/earthquakes_processed.csv
    		Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/reservoir/lineaments_processed.csv
    		Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/reservoir/velocity_model_vs_processed.csv
    		Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/reservoir/velocity_model_vpvs_processed.csv
    	insulation
    		Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/insulation/density_joint_inv_processed.csv
    		Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/insulation/mt_resistivity_joint_inv_processed.csv
    		Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/insulation/earthquakes_processed.csv
    		Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/insulation/temperature_model_500m_processed.csv
    		Saved: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/data/geologic/insulation/velocity_model_vp_processed.csv


Save a “Clean” PFA Config (no GeoDataFrames)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Similarly, saving a “clean” processed PFA config allows the user to
avoid repeating the processing step when making minor adjustments to the
PFA. This processed config serves as the config file for the next step.

.. code:: ipython3

    def drop_geodataframes(data: dict) -> dict:
        """Recursively remove GeoDataFrame objects before saving."""
        clean = {}
        for key, val in data.items():
            if isinstance(val, gpd.GeoDataFrame):
                continue
            elif isinstance(val, dict):
                clean[key] = drop_geodataframes(val)
            else:
                clean[key] = val
        return clean
    
    pfa_nodf = drop_geodataframes(pfa)
    out_json = config_dir / "newberry_superhot_processed_config.json"
    with open(out_json, "w") as f:
     json.dump(pfa_nodf, f, indent=4)
    print(f"Processed PFA configuration saved to: {out_json}")


.. parsed-literal::

    Processed PFA configuration saved to: /Users/ntaverna/Documents/DEEPEN/geoPFA/examples/Newberry/2D/config/newberry_superhot_processed_config.json


--------------

Next Steps: Layer Combination and Favorability Modeling
-------------------------------------------------------

This concludes **Notebook 1 – Data Processing with ``geoPFA``**.

| In this notebook, we: - Loaded, cleaned, and standardized all input
  datasets
| - Processed each layer (filtering, interpolation, density, distance,
  transformations)
| - Saved the resulting **processed layers** to disk

You’re now ready to combine these layers into component and criteria
favorability models.

| If you skip running this notebook, don’t worry — the **supplementary
  example data** includes all intermediate processed files.
| You can start directly with **Notebook 2 – Layer Combination and
  Favorability Modeling**, which loads those preprocessed layers
  automatically.