compass.plugin.ordinance.PromptBasedTextExtractor#

class PromptBasedTextExtractor(llm_service, usage_tracker=None, **kwargs)[source]#

Bases: LLMCaller, BaseTextExtractor, ABC

Text extractor based on a chain of prompts

Parameters:

llm_service (Service) – LLM service used for queries.
usage_tracker (UsageTracker, optional) – Optional tracker instance to monitor token usage during LLM calls. By default, None.
**kwargs –
Keyword arguments to be passed to the underlying service processing function (i.e. llm_service.call(**kwargs)). Should not contain the following keys:
- usage_sub_label
- messages
These arguments are provided by this caller object.

Methods

call(sys_msg, content[, usage_sub_label])

Call LLM

Attributes

`FORMATTING_PROMPT`	Prompt component instructing model to preserve text structure
`IN_LABEL`	Identifier for text ingested by this class
`OUTPUT_PROMPT`	Prompt component instructing model output guidelines
`OUT_LABEL`	Identifier for final text extracted by this class
`PROMPTS`	List of dicts defining the prompts for text extraction
`SYSTEM_MESSAGE`	System message for text extraction LLM calls
`TASK_DESCRIPTION`	Task description to show in progress bar
`TASK_ID`	ID to use for this extraction for linking with LLM configs
`parsers`	Iterable of parsers provided by this extractor

SYSTEM_MESSAGE = 'You are a text extraction assistant. Your job is to extract only verbatim, **unmodified** excerpts from the provided text. Do not interpret or paraphrase. Do not summarize. Only return exactly copied segments that match the specified scope. If the relevant content appears within a table, return the entire table, including headers and footers, exactly as formatted.'#: System message for text extraction LLM calls

FORMATTING_PROMPT = '## Formatting & Structure ##:\n- **Preserve _all_ section titles, headers, and numberings** for reference.\n- **Maintain the original wording, formatting, and structure** to ensure accuracy.'#: Prompt component instructing model to preserve text structure

OUTPUT_PROMPT = "## Output Handling ##:\n- This is a strict extraction task — act like a text filter, **not** a summarizer or writer.\n- Do not add, explain, reword, or summarize anything.\n- The output must be a **copy-paste** of the original excerpt. **Absolutely no paraphrasing or rewriting.**\n- The output must consist **only** of contiguous or discontiguous verbatim blocks copied from the input.\n- The only allowed change is to remove irrelevant sections of text. You can remove irrelevant text from within sections, but you cannot add any new text or modify the text you keep in any way.\n- If **no relevant text** is found, return the response: 'No relevant text.'"#: Prompt component instructing model output guidelines

abstract property PROMPTS#

List of dicts defining the prompts for text extraction

Each dict in the list should have the following keys:

prompt: [REQUIRED] The text extraction prompt to use for the extraction. The prompt may use the following placeholders, which will be filled in with the corresponding class attributes when the prompt is applied:

"{FORMATTING_PROMPT}": The
PromptBasedTextExtractor.FORMATTING_PROMPT class attribute, which provides instructions for preserving the formatting and structure of the extracted text.

"{OUTPUT_PROMPT}": The
PromptBasedTextExtractor.OUTPUT_PROMPT class attribute, which provides instructions for how the model should format the output and what content to include or exclude.

key: [OPTIONAL] A string identifier for the text extracted by this prompt. If not provided, a default key "extracted_text_{i}" will be used, where {i} is the index of the prompt in the list. The value of this key from the last dictionary in the input list will be used as this extractor’s OUT_LABEL, which is typically used to link the extracted text to the appropriate parser via the parser’s IN_LABEL. All key values should be unique across all prompts in the chain.

out_fn: [OPTIONAL] A file name template that will be used to write the extracted text to a file. The template can include the placeholder {jurisdiction}, which will be replaced with the full jurisdiction name. If not provided, the extracted text will not be written to a file. This is primarily intended for debugging and analysis purposes, and is not required for the extraction process itself.

The prompts will be applied in the order they appear in the list, with the output text from each prompt being fed as input to the next prompt in the chain. The final output of the last prompt will be the output of the extractor.

Type:: list

property parsers#

Iterable of parsers provided by this extractor

Yields:

name (str) – Name describing the type of text output by the parser.
parser (callable()) – Async function that takes a text_chunks input and outputs parsed text.

abstract property IN_LABEL#

Identifier for text ingested by this class

Type:: str

abstract property OUT_LABEL#

Identifier for final text extracted by this class

Type:: str

TASK_DESCRIPTION = 'Condensing text for extraction'#: Task description to show in progress bar

TASK_ID = 'text_extraction'#: ID to use for this extraction for linking with LLM configs

async call(sys_msg, content, usage_sub_label=LLMUsageCategory.DEFAULT)#

Call LLM

Parameters:

sys_msg (str) – The LLM system message.
content (str) – Your chat message for the LLM.
usage_sub_label (str, optional) – Label to store token usage under. By default, "default".

Returns:

str or None – The LLM response, as a string, or None if something went wrong during the call.