compass.plugin.ordinance.PromptBasedTextCollector#
- class PromptBasedTextCollector(*args, **kwargs)[source]#
Bases:
JSONFromTextLLMCaller,BaseTextCollector,ABCText extractor based on a chain of prompts
- Parameters:
llm_service (
Service) – LLM service used for queries.usage_tracker (
UsageTracker, optional) – Optional tracker instance to monitor token usage during LLM calls. By default,None.**kwargs –
Keyword arguments to be passed to the underlying service processing function (i.e.
llm_service.call(**kwargs)). Should not contain the following keys:usage_sub_label
messages
These arguments are provided by this caller object.
Methods
call(sys_msg, content[, usage_sub_label])Call LLM for structured data retrieval
check_chunk(chunk_parser, ind)Check a chunk at a given ind to see if it contains ordinance
Attributes
Identifier for text collected by this class
List of dicts defining the prompts for text extraction
Combined ordinance text from the individual chunks
- abstract property PROMPTS#
List of dicts defining the prompts for text extraction
Each dict in the list should have the following keys:
prompt: [REQUIRED] The text filter prompt to use to determine if a chunk of text is relevant for the current extraction task. The prompt must instruct the LLM to return a dictionary (as JSON) with at least one key that outputs the filter decision. The prompt may use the following placeholders, which will be filled in with the corresponding class attributes when the prompt is applied:
"{key}": The key corresponding to this prompt.
key: [REQUIRED] A string identifier for the key that in the output JSON dictionary that represents the LLM filter decision (
Trueif the tech chunk should be kept, andFalseotherwise).label: [OPTIONAL] A string label describing the type of relevant text this prompt is looking for (e.g. “wind energy conversion system ordinance text”). This is only used for logging purposes and does not affect the extraction process itself. If not provided, this will default to “collector step {i}”.
The prompts will be applied in the order they appear in the list, with the output text from each prompt being fed as input to the next prompt in the chain. If any of the filter decisions return
False, the text will be discarded and not passed to subsequent prompts. The final output of the last prompt will determine wether or not the chunk of text being evaluated is kept as relevant text for extraction.- Type:
- async check_chunk(chunk_parser, ind)[source]#
Check a chunk at a given ind to see if it contains ordinance
- Parameters:
chunk_parser (
ParseChunksWithMemory) – Instance that contains aparse_from_indmethod.ind (
int) – Index of the chunk to check.
- Returns:
bool– Boolean flag indicating whether or not the text in the chunk contains large wind energy conversion system ordinance text.
- async call(sys_msg, content, usage_sub_label=LLMUsageCategory.DEFAULT)#
Call LLM for structured data retrieval
- Parameters:
sys_msg (
str) – The LLM system message. If this text does not contain the instruction text “Return your answer as a dictionary in JSON format”, it will be added.content (
str) – LLM call content (typically some text to extract info from).usage_sub_label (
str, optional) – Label to store token usage under. By default,"default".
- Returns:
dict– Dictionary containing the LLM-extracted features. Dictionary may be empty if there was an error during the LLM call.