compass.extraction.water.plugin.TexasWaterRightsExtractor#

class TexasWaterRightsExtractor(jurisdiction, model_configs, usage_tracker=None)[source]#

COMPASS solar extraction plugin

Parameters:

jurisdiction (Jurisdiction) – Jurisdiction for which extraction is being performed.
model_configs (dict) – Dictionary where keys are LLMTasks and values are LLMConfig instances to be used for those tasks.
usage_tracker (UsageTracker, optional) – Usage tracker instance that can be used to record the LLM call cost. By default, None.

Methods

`filter_docs`(extraction_context)	Filter down candidate documents before parsing
`get_heuristic`()	Get a BaseHeuristic instance with a check() method
`get_query_templates`()	Get a list of search engine query templates for extraction
`get_website_keywords`()	Get a dict of website search keyword scores
`parse_docs_for_structured_data`(...)	Parse documents to extract structured data/information
`record_usage`()	Persist usage tracking data when a tracker is available
`save_structured_data`(doc_infos, out_dir)	Write extracted water rights data to disk

Attributes

`IDENTIFIER`	Identifier for extraction task
`JURISDICTION_DATA_FP`	Path to Texas GCW names

IDENTIFIER = 'tx water rights'#

Identifier for extraction task

JURISDICTION_DATA_FP = PosixPath('/home/runner/work/COMPASS/COMPASS/compass/data/tx_water_districts.csv')#

Path to Texas GCW names

async get_query_templates()[source]#

Get a list of search engine query templates for extraction

Query templates can contain the placeholder {jurisdiction} which will be replaced with the full jurisdiction name during the search engine query.

async get_website_keywords()[source]#

Get a dict of website search keyword scores

Dictionary mapping keywords to scores that indicate links which should be prioritized when performing a website scrape for a document.

async get_heuristic()[source]#

Get a BaseHeuristic instance with a check() method

The check() method should accept a string of text and return True if the text passes the heuristic check and False otherwise.

async filter_docs(extraction_context)[source]#

Filter down candidate documents before parsing

Parameters:: extraction_context (ExtractionContext) – Context containing candidate documents to be filtered. Set the .documents attribute of this object to be the iterable of documents that should be kept for parsing.
Returns:: ExtractionContext – Context with filtered down documents.

async parse_docs_for_structured_data(extraction_context)[source]#

Parse documents to extract structured data/information

Parameters:: extraction_context (ExtractionContext) – Context containing candidate documents to parse.
Returns:: ExtractionContext or None – Context with extracted data/information stored in the .attrs dictionary, or None if no data was extracted.

classmethod save_structured_data(doc_infos, out_dir)[source]#

Write extracted water rights data to disk

Parameters:

doc_infos (list of dict) –
List of dictionaries containing the following keys:
- ”jurisdiction”: An initialized Jurisdiction object representing the jurisdiction that was extracted.
- ”ord_db_fp”: A path to the extracted structured data stored on disk, or None if no data was extracted.
out_dir (path-like) – Path to the output directory for the data.

Returns:

int – Number of unique water rights districts that information was found/written for.

async record_usage()#: Persist usage tracking data when a tracker is available