Glossary#
- analysis run#
Complete invocation of
compass processthat ingests a configuration file, processes jurisdictions, and writes results to the run directory.- clean directory#
Intermediate folder storing cleaned ordinance text used for LLM prompting during feature extraction.
- clean text file#
Plain-text excerpt derived from ordinance documents that isolates relevant sections for prompts and validation.
- compass process#
CLI command that executes the end-to-end pipeline using the inputs defined in the configuration file.
- configuration file#
JSON or JSON5 document that declares inputs, model assignments, concurrency, and output directories for a run.
- decision tree#
Hierarchical rubric of questions and outcomes that organizes how ordinance features are extracted and validated.
- decision tree prompt#
Structured prompt template that guides the LLM through branching questions to extract quantitative and qualitative ordinance data.
- extraction pipeline#
Crawlers, parsers, and feature detectors that transform raw ordinance text into structured records.
- INFRA-COMPASS#
End-to-end pipeline that discovers, parses, and validates energy infrastructure ordinances with LLM tooling.
- jurisdiction#
County or municipality defined in the jurisdiction CSV that frames the geographic scope of an analysis run.
- jurisdiction CSV#
Input spreadsheet whose
CountyandStatecolumns list the locations processed in a run.- LLM#
Large Language Model that interprets ordinance text, classifies features, and answers structured extraction questions.
- llm cost tracker#
Runtime utility that multiplies token usage by configured pricing to report estimated spend per model.
- llm service#
Abstraction over providers such as OpenAI or Azure OpenAI that enforces authentication, rate limits, and retry policies.
- llm service rate limit#
Configuration value that caps tokens per minute for a model to avoid provider throttling.
- llm task#
Logical label assigned to prompt templates that maps to a specific model entry within the configuration.
- location#
Combination of county and state identifiers that maps to one row in the jurisdiction CSV and produces a single output bundle.
- location file log#
Per-location structured log that aggregates runtime diagnostics and JSON exception summaries.
- location manifest#
JSON metadata file emitted per location summarizing source documents, extraction status, and validation outcomes.
- log directory#
Folder defined by
log_dirthat stores run-level logs, prompt archives, and timing summaries.- OCR#
Optical Character Recognition stage powered by
pytesseractthat converts scanned ordinance PDFs into searchable text.- ordinance#
Legal text that governs energy infrastructure within a jurisdiction and feeds the extraction workflows.
- ordinance document#
Source PDF or HTML retrieved during crawling that contains the legal language for the targeted technology.
- ordinance file directory#
Folder defined by
ordinance_file_dirthat caches downloaded ordinance PDFs and HTML files.- out directory#
Root folder defined by
out_dirwhere structured results, cleaned text, and logs for each run are written.- Pixi#
Environment manager used to install dependencies, run tasks, and maintain reproducible shells for COMPASS.
- Playwright#
Browser automation framework used to crawl web portals and download ordinance documents reliably.
pytesseract#Python wrapper for the Tesseract OCR engine used to enable text extraction from scanned ordinance documents.
- rate limiter#
Token-based throttle that keeps LLM requests within provider quotas while maximizing throughput.
- structured record#
Tabular representation of ordinance features, thresholds, and metadata exported for downstream analysis.
- technology#
techconfiguration key that defines the target infrastructure domain, such as solar or wind.- text splitter#
Utility that chunks ordinance text into overlapping segments sized for LLM context windows.
- validation pipeline#
Post-processing stage that verifies extracted features, resolves conflicts, and confirms location metadata.
- web search#
Search-and-crawl phase that discovers ordinance links using providers such as Tavily, DuckDuckGo Search, or custom engines.