compass.extraction.solar.ordinance.SolarOrdinanceTextExtractor#
- class SolarOrdinanceTextExtractor(llm_service, usage_tracker=None, **kwargs)[source]#
Bases:
PromptBasedTextExtractorExtract succinct ordinance text from input
- Parameters:
llm_service (
Service) – LLM service used for queries.usage_tracker (
UsageTracker, optional) – Optional tracker instance to monitor token usage during LLM calls. By default,None.**kwargs –
Keyword arguments to be passed to the underlying service processing function (i.e.
llm_service.call(**kwargs)). Should not contain the following keys:usage_sub_label
messages
These arguments are provided by this caller object.
Methods
call(sys_msg, content[, usage_sub_label])Call LLM
Attributes
Prompt component instructing model to preserve text structure
Identifier for collected text ingested by this class
Prompt component instructing model output guidelines
OUT_LABELDicts defining the prompts for ordinance text extraction
System message for text extraction LLM calls
Task description to show in progress bar
ID to use for this extraction for linking with LLM configs
Iterable of parsers provided by this extractor
- IN_LABEL = 'relevant_text'#
Identifier for collected text ingested by this class
- FORMATTING_PROMPT = '## Formatting & Structure ##:\n- **Preserve _all_ section titles, headers, and numberings** for reference.\n- **Maintain the original wording, formatting, and structure** to ensure accuracy.'#
Prompt component instructing model to preserve text structure
- OUTPUT_PROMPT = "## Output Handling ##:\n- This is a strict extraction task — act like a text filter, **not** a summarizer or writer.\n- Do not add, explain, reword, or summarize anything.\n- The output must be a **copy-paste** of the original excerpt. **Absolutely no paraphrasing or rewriting.**\n- The output must consist **only** of contiguous or discontiguous verbatim blocks copied from the input.\n- The only allowed change is to remove irrelevant sections of text. You can remove irrelevant text from within sections, but you cannot add any new text or modify the text you keep in any way.\n- If **no relevant text** is found, return the response: 'No relevant text.'"#
Prompt component instructing model output guidelines
- SYSTEM_MESSAGE = 'You are a text extraction assistant. Your job is to extract only verbatim, **unmodified** excerpts from the provided text. Do not interpret or paraphrase. Do not summarize. Only return exactly copied segments that match the specified scope. If the relevant content appears within a table, return the entire table, including headers and footers, exactly as formatted.'#
System message for text extraction LLM calls
- async call(sys_msg, content, usage_sub_label=LLMUsageCategory.DEFAULT)#
Call LLM
- property parsers#
Iterable of parsers provided by this extractor
- Yields:
name (
str) – Name describing the type of text output by the parser.parser (
callable()) – Async function that takes atext_chunksinput and outputs parsed text.
- TASK_DESCRIPTION = 'Extracting solar ordinance text'#
Task description to show in progress bar
- TASK_ID = 'ordinance_text_extraction'#
ID to use for this extraction for linking with LLM configs
- PROMPTS = [{'key': 'cleaned_text_for_extraction', 'out_fn': '{jurisdiction} Utility Scale Solar Ordinance.txt', 'prompt': '# CONTEXT #\nWe want to reduce the provided excerpt to only contain information about **solar energy systems**. The extracted text will be used for structured data extraction, so it must be both **comprehensive** (retaining all relevant details) and **focused** (excluding unrelated content), with **zero rewriting or paraphrasing**. Ensure that all retained information is **directly applicable to solar energy systems** while preserving full context and accuracy.\n\n# OBJECTIVE #\nExtract all text **pertaining to solar energy systems** from the provided excerpt.\n\n# RESPONSE #\nFollow these guidelines carefully:\n\n1. ## Scope of Extraction ##:\n- Include **all** text that pertains to** solar energy systems**, even if they are referred to by different names such as: Solar panels, solar energy conversion systems (secs), solar energy facilities (sef), solar energy farms (sef), solar farms (sf), utility-scale solar energy systems (uses), commercial solar energy systems (cses), ground-mounted solar energy systems (gses), alternate energy systems (aes), commercial energy production systems (cepcs), or similar\n- Explicitly include any text related to **bans or prohibitions** on solar energy systems.\n- Explicitly include any text related to the adoption or enactment date of the ordinance (if any).\n\n2. ## Exclusions ##:\n- Do **not** include text that does not pertain to solar energy systems.\n\n3. {FORMATTING_PROMPT}\n\n4. {OUTPUT_PROMPT}'}]#
Dicts defining the prompts for ordinance text extraction