compass.plugin.one_shot.components.SchemaOrdinanceParser#

class SchemaOrdinanceParser(llm_service, usage_tracker=None, **kwargs)[source]#

Bases: SchemaOutputLLMCaller, BaseParser

Base class for parsing structured data

Parameters:

llm_service (Service) – LLM service used for queries.
usage_tracker (UsageTracker, optional) – Optional tracker instance to monitor token usage during LLM calls. By default, None.
**kwargs –
Keyword arguments to be passed to the underlying service processing function (i.e. llm_service.call(**kwargs)). Should not contain the following keys:
- usage_sub_label
- messages
These arguments are provided by this caller object.

Methods

`call`(sys_msg, content, response_format[, ...])	Call LLM for structured data retrieval
`parse`(text)	Parse text and extract structured data

Attributes

`DATA_TYPE_SHORT_DESC`	Optional short description of the type of data being extracted
`IN_LABEL`	Identifier for text ingested by this class
`OUT_LABEL`	Identifier for final structured data output
`QUALITATIVE_FEATURES`	Lowercase feature names of qualitative features
`SCHEMA`	Extraction schema
`SYSTEM_PROMPT`	System prompt to use for parsing structured data with an LLM
`TASK_ID`	ID to use for this extraction for linking with LLM configs

DATA_TYPE_SHORT_DESC = None#

Optional short description of the type of data being extracted

Examples

“wind energy ordinance”
“solar energy ordinance”
“water rights”
“resource management plan geothermal restriction”

SYSTEM_PROMPT = 'You are a legal scholar extracting structured data from {desc}documents. Follow all instructions in the schema descriptions carefully.'#: System prompt to use for parsing structured data with an LLM

abstract property SCHEMA#

Extraction schema

Type:: dict

abstract property QUALITATIVE_FEATURES#

Lowercase feature names of qualitative features

Type:: set

abstract property IN_LABEL#

Identifier for text ingested by this class

Type:: str

abstract property OUT_LABEL#

Identifier for final structured data output

Type:: str

TASK_ID = 'data_extraction'#: ID to use for this extraction for linking with LLM configs

async call(sys_msg, content, response_format, usage_sub_label=LLMUsageCategory.DEFAULT)#

Call LLM for structured data retrieval

Parameters:

sys_msg (str) – The LLM system message. If this text does not contain the instruction text “Return your answer as a dictionary in JSON format”, it will be added.
content (str) – LLM call content (typically some text to extract info from).
usage_sub_label (str, optional) – Label to store token usage under. By default, "default".
response_format (dict) –
Dictionary specifying the expected response format. This will be passed to the underlying LLM service (e.g. OpenAI) and should be formatted according to that service’s specifications for structured output. For example, for OpenAI GPT models, this should be a dictionary with the following keys:
- type: Should be set to “json_schema” to indicate that the expected output is structured JSON.
- json_schema: A dictionary specifying the expected JSON schema of the output. This should include the following keys:
  name: A string name for this response format (e.g. “extracted_features”).
  
  strict: A boolean indicating whether the LLM should strictly adhere to the provided schema (i.e. not include any keys not specified in the schema). If True, the LLM will be instructed to only include keys specified in the schema field. If False, the LLM may include additional keys not specified in the schema field.
  
  schema: A dictionary specifying the expected JSON schema of the output. This should be formatted according to JSON Schema specifications, and should define the expected structure of the output JSON object. For example, it may specify that the output should be an object with certain required properties, and the expected data types of those properties.

Returns:

dict – Dictionary containing the LLM-extracted features. Dictionary may be empty if there was an error during the LLM call.

async parse(text)[source]#

Parse text and extract structured data

Parameters:: text (str) – Text which may or may not contain information relevant to the current extraction.
Returns:: pandas.DataFrame or None – DataFrame containing structured extracted data. Can also be None if no relevant values can be parsed from the text.