compass.plugin.ordinance.PromptBasedTextCollector#

class PromptBasedTextCollector(*args, **kwargs)[source]#

Bases: JSONFromTextLLMCaller, BaseTextCollector, ABC

Text extractor based on a chain of prompts

Parameters:
  • llm_service (Service) – LLM service used for queries.

  • usage_tracker (UsageTracker, optional) – Optional tracker instance to monitor token usage during LLM calls. By default, None.

  • **kwargs

    Keyword arguments to be passed to the underlying service processing function (i.e. llm_service.call(**kwargs)). Should not contain the following keys:

    • usage_sub_label

    • messages

    These arguments are provided by this caller object.

Methods

call(sys_msg, content[, usage_sub_label])

Call LLM for structured data retrieval

check_chunk(chunk_parser, ind)

Check a chunk at a given ind to see if it contains ordinance

Attributes

OUT_LABEL

Identifier for text collected by this class

PROMPTS

List of dicts defining the prompts for text extraction

relevant_text

Combined ordinance text from the individual chunks

abstract property PROMPTS#

List of dicts defining the prompts for text extraction

Each dict in the list should have the following keys:

  • prompt: [REQUIRED] The text filter prompt to use to determine if a chunk of text is relevant for the current extraction task. The prompt must instruct the LLM to return a dictionary (as JSON) with at least one key that outputs the filter decision. The prompt may use the following placeholders, which will be filled in with the corresponding class attributes when the prompt is applied:

    • "{key}": The key corresponding to this prompt.

  • key: [REQUIRED] A string identifier for the key that in the output JSON dictionary that represents the LLM filter decision (True if the tech chunk should be kept, and False otherwise).

  • label: [OPTIONAL] A string label describing the type of relevant text this prompt is looking for (e.g. “wind energy conversion system ordinance text”). This is only used for logging purposes and does not affect the extraction process itself. If not provided, this will default to “collector step {i}”.

The prompts will be applied in the order they appear in the list, with the output text from each prompt being fed as input to the next prompt in the chain. If any of the filter decisions return False, the text will be discarded and not passed to subsequent prompts. The final output of the last prompt will determine wether or not the chunk of text being evaluated is kept as relevant text for extraction.

Type:

list

property relevant_text#

Combined ordinance text from the individual chunks

Type:

str

async check_chunk(chunk_parser, ind)[source]#

Check a chunk at a given ind to see if it contains ordinance

Parameters:
  • chunk_parser (ParseChunksWithMemory) – Instance that contains a parse_from_ind method.

  • ind (int) – Index of the chunk to check.

Returns:

bool – Boolean flag indicating whether or not the text in the chunk contains large wind energy conversion system ordinance text.

abstract property OUT_LABEL#

Identifier for text collected by this class

Type:

str

async call(sys_msg, content, usage_sub_label=LLMUsageCategory.DEFAULT)#

Call LLM for structured data retrieval

Parameters:
  • sys_msg (str) – The LLM system message. If this text does not contain the instruction text “Return your answer as a dictionary in JSON format”, it will be added.

  • content (str) – LLM call content (typically some text to extract info from).

  • usage_sub_label (str, optional) – Label to store token usage under. By default, "default".

Returns:

dict – Dictionary containing the LLM-extracted features. Dictionary may be empty if there was an error during the LLM call.