compass.extraction.small_wind.parse.StructuredSmallWindPermittedUseDistrictsParser#

class StructuredSmallWindPermittedUseDistrictsParser(llm_service, usage_tracker=None, **kwargs)[source]#

Bases: StructuredSmallWindParser

LLM permitted use districts scraping utility

Purpose:

Extract structured ordinance data from text.

Responsibilities:
  1. Extract ordinance values into structured format by executing a decision-tree-based chain-of-thought prompt on the text for each value to be extracted.

Key Relationships:

Uses a JSONFromTextLLMCaller for LLM queries and multiple AsyncDecisionTree instances to guide the extraction of individual values.

Parameters:
  • llm_service (Service) – LLM service used for queries.

  • usage_tracker (UsageTracker, optional) – Optional tracker instance to monitor token usage during LLM calls. By default, None.

  • **kwargs

    Keyword arguments to be passed to the underlying service processing function (i.e. llm_service.call(**kwargs)). Should not contain the following keys:

    • usage_sub_label

    • messages

    These arguments are provided by this caller object.

Methods

parse(text)

Parse text and extract permitted use districts data

Attributes

IN_LABEL

Identifier for text ingested by this class

OUT_LABEL

Identifier for structured ordinance data output by this class

TASK_ID

ID to use for this extraction for linking with LLM configs

IN_LABEL = 'districts_text'#

Identifier for text ingested by this class

OUT_LABEL = 'permitted_district_values'#

Identifier for structured ordinance data output by this class

TASK_ID = 'data_extraction'#

ID to use for this extraction for linking with LLM configs

async parse(text)[source]#

Parse text and extract permitted use districts data

Parameters:

text (str) – Permitted use districts text which may or may not contain information about allowed uses in one or more districts.

Returns:

pandas.DataFrame or None – DataFrame containing parsed-out allowed-use district names. Can also be None if a small wind energy system is not found in the text.