compass.extraction.water.plugin.TexasWaterRightsExtractor#
- class TexasWaterRightsExtractor(jurisdiction, model_configs, usage_tracker=None)[source]#
Bases:
BaseExtractionPluginCOMPASS solar extraction plugin
- Parameters:
jurisdiction (
Jurisdiction) – Jurisdiction for which extraction is being performed.model_configs (
dict) – Dictionary where keys areLLMTasksand values areLLMConfiginstances to be used for those tasks.usage_tracker (
UsageTracker, optional) – Usage tracker instance that can be used to record the LLM call cost. By default,None.
Methods
filter_docs(extraction_context[, ...])Filter down candidate documents before parsing
Get a BaseHeuristic instance with a check() method
Get a list of search engine query templates for extraction
Get a dict of website search keyword scores
Parse documents to extract structured data/information
Persist usage tracking data when a tracker is available
save_structured_data(doc_infos, out_dir)Write extracted water rights data to disk
Attributes
Identifier for extraction task
Path to Texas GCW names
- JURISDICTION_DATA_FP = PosixPath('/home/runner/work/COMPASS/COMPASS/compass/data/tx_water_districts.csv')#
Path to Texas GCW names
- Type:
- async get_query_templates()[source]#
Get a list of search engine query templates for extraction
Query templates can contain the placeholder
{jurisdiction}which will be replaced with the full jurisdiction name during the search engine query.
- async get_website_keywords()[source]#
Get a dict of website search keyword scores
Dictionary mapping keywords to scores that indicate links which should be prioritized when performing a website scrape for a document.
- async get_heuristic()[source]#
Get a BaseHeuristic instance with a check() method
The
check()method should accept a string of text and returnTrueif the text passes the heuristic check andFalseotherwise.
- async filter_docs(extraction_context, need_jurisdiction_verification=True)[source]#
Filter down candidate documents before parsing
- Parameters:
extraction_context (
ExtractionContext) – Context containing candidate documents to be filtered. Set the.documentsattribute of this object to be the iterable of documents that should be kept for parsing.need_jurisdiction_verification (
bool, optional) – Whether to verify that documents pertain to the correct jurisdiction. By default,True.
- Returns:
ExtractionContext– Context with filtered down documents.
- async parse_docs_for_structured_data(extraction_context)[source]#
Parse documents to extract structured data/information
- Parameters:
extraction_context (
ExtractionContext) – Context containing candidate documents to parse.- Returns:
ExtractionContextorNone– Context with extracted data/information stored in the.attrsdictionary, orNoneif no data was extracted.
- classmethod save_structured_data(doc_infos, out_dir)[source]#
Write extracted water rights data to disk
- Parameters:
List of dictionaries containing the following keys:
”jurisdiction”: An initialized Jurisdiction object representing the jurisdiction that was extracted.
”ord_db_fp”: A path to the extracted structured data stored on disk, or
Noneif no data was extracted.
out_dir (path-like) – Path to the output directory for the data.
- Returns:
int– Number of unique water rights districts that information was found/written for.
- async record_usage()#
Persist usage tracking data when a tracker is available