compass.extraction.small_wind.ordinance.SmallWindOrdinanceTextExtractor#

class SmallWindOrdinanceTextExtractor(llm_service, usage_tracker=None, **kwargs)[source]#

Bases: PromptBasedTextExtractor

Extract succinct ordinance text from input

Parameters:
  • llm_service (Service) – LLM service used for queries.

  • usage_tracker (UsageTracker, optional) – Optional tracker instance to monitor token usage during LLM calls. By default, None.

  • **kwargs

    Keyword arguments to be passed to the underlying service processing function (i.e. llm_service.call(**kwargs)). Should not contain the following keys:

    • usage_sub_label

    • messages

    These arguments are provided by this caller object.

Methods

call(sys_msg, content[, usage_sub_label])

Call LLM

Attributes

FORMATTING_PROMPT

Prompt component instructing model to preserve text structure

IN_LABEL

Identifier for collected text ingested by this class

OUTPUT_PROMPT

Prompt component instructing model output guidelines

OUT_LABEL

PROMPTS

Dicts defining the prompts for ordinance text extraction

SYSTEM_MESSAGE

System message for text extraction LLM calls

TASK_DESCRIPTION

Task description to show in progress bar

TASK_ID

ID to use for this extraction for linking with LLM configs

parsers

Iterable of parsers provided by this extractor

IN_LABEL = 'relevant_text'#

Identifier for collected text ingested by this class

FORMATTING_PROMPT = '## Formatting & Structure ##:\n- **Preserve _all_ section titles, headers, and numberings** for reference.\n- **Maintain the original wording, formatting, and structure** to ensure accuracy.'#

Prompt component instructing model to preserve text structure

OUTPUT_PROMPT = "## Output Handling ##:\n- This is a strict extraction task act like a text filter, **not** a summarizer or writer.\n- Do not add, explain, reword, or summarize anything.\n- The output must be a **copy-paste** of the original excerpt. **Absolutely no paraphrasing or rewriting.**\n- The output must consist **only** of contiguous or discontiguous verbatim blocks copied from the input.\n- The only allowed change is to remove irrelevant sections of text. You can remove irrelevant text from within sections, but you cannot add any new text or modify the text you keep in any way.\n- If **no relevant text** is found, return the response: 'No relevant text.'"#

Prompt component instructing model output guidelines

SYSTEM_MESSAGE = 'You are a text extraction assistant. Your job is to extract only verbatim, **unmodified** excerpts from the provided text. Do not interpret or paraphrase. Do not summarize. Only return exactly copied segments that match the specified scope. If the relevant content appears within a table, return the entire table, including headers and footers, exactly as formatted.'#

System message for text extraction LLM calls

async call(sys_msg, content, usage_sub_label=LLMUsageCategory.DEFAULT)#

Call LLM

Parameters:
  • sys_msg (str) – The LLM system message.

  • content (str) – Your chat message for the LLM.

  • usage_sub_label (str, optional) – Label to store token usage under. By default, "default".

Returns:

str or None – The LLM response, as a string, or None if something went wrong during the call.

property parsers#

Iterable of parsers provided by this extractor

Yields:
  • name (str) – Name describing the type of text output by the parser.

  • parser (callable()) – Async function that takes a text_chunks input and outputs parsed text.

TASK_DESCRIPTION = 'Extracting small wind ordinance text'#

Task description to show in progress bar

TASK_ID = 'ordinance_text_extraction'#

ID to use for this extraction for linking with LLM configs

PROMPTS = [{'key': 'wind_energy_systems_text', 'out_fn': '{jurisdiction} Wind Ordinance.txt', 'prompt': '# CONTEXT #\nWe want to reduce the provided excerpt to only contain information about **wind energy systems**. The extracted text will be used for structured data extraction, so it must be both **comprehensive** (retaining all relevant details) and **focused** (excluding unrelated content), with **zero rewriting or paraphrasing**. Ensure that all retained information is **directly applicable to wind energy systems** while preserving full context and accuracy.\n\n# OBJECTIVE #\nExtract all text **pertaining to wind energy systems** from the provided excerpt.\n\n# RESPONSE #\nFollow these guidelines carefully:\n\n1. ## Scope of Extraction ##:\n- Include all text that pertains to **wind energy systems**.\n- Explicitly include any text related to **bans or prohibitions** on wind energy systems.\n- Explicitly include any text related to the adoption or enactment date of the ordinance (if any).\n\n2. ## Exclusions ##:\n- Do **not** include text that does not pertain to wind energy systems.\n\n3. {FORMATTING_PROMPT}\n\n4. {OUTPUT_PROMPT}'}, {'key': 'cleaned_text_for_extraction', 'out_fn': '{jurisdiction} Small Wind Ordinance.txt', 'prompt': '# CONTEXT #\nWe want to reduce the provided excerpt to only contain information about **small, medium, or non-commercial wind energy systems**. The extracted text will be used for structured data extraction, so it must be both **comprehensive** (retaining all relevant details) and **focused** (excluding unrelated content), with **zero rewriting or paraphrasing**. Ensure that all retained information is **directly applicable** to small, medium, or non-commercial wind energy systems while preserving full context and accuracy.\n\n# OBJECTIVE #\nExtract all text **pertaining to small, medium or non-commercial wind energy systems** from the provided excerpt.\n\n# RESPONSE #\nFollow these guidelines carefully:\n\n1. ## Scope of Extraction ##:\n- Include all text that pertains to **small, medium, or non-commercial wind energy systems**, even if they are referred to by different names such as: Small wind energy turbines (swet), non-commercial wind energy systems, on-site wind energy systems, distributed wind energy systems, medium wind energy systems (mwes), agricultural wind energy systems (awes), residential wind energy systems, small wind turbines (swt), or similar\n- Explicitly include any text related to **bans or prohibitions** on small, medium, or non-commercial wind energy systems.\n- Explicitly include any text related to the adoption or enactment date of the ordinance (if any).\n- **Retain all relevant technical, design, operational, safety, environmental, and infrastructure-related provisions** that apply to the topic, such as (but not limited to):\n    - Compliance with legal or regulatory standards.\n    - Site, structural, or design specifications.\n    - Environmental impact considerations.\n    - Safety and risk mitigation measures.\n    - Infrastructure, implementation, operation, and maintenance details.\n    - All other **closely related provisions**.\n\n2. ## Exclusions ##:\n- Do **not** include text that explicitly applies **only** to private, micro, personal, building-mounted or large, utility-scale, for-sale, commercial wind energy systems.\n- Do **not** include text that does not pertain at all to wind energy systems.\n\n3.{FORMATTING_PROMPT}\n\n4. {OUTPUT_PROMPT}'}]#

Dicts defining the prompts for ordinance text extraction