compass.validation.location.JurisdictionWebsiteValidator#

class JurisdictionWebsiteValidator(browser_semaphore=None, file_loader_kwargs=None, **kwargs)[source]#

Bases: object

Validate whether a website is the primary jurisdiction portal

Notes

The validator stores the initialization arguments so they can be reused across many documents without reconfiguration.

Parameters:
  • browser_semaphore (asyncio.Semaphore, optional) – Semaphore constraining concurrent Playwright usage. None applies no concurrency limit. Default is None.

  • file_loader_kwargs (dict, optional) – Keyword arguments passed to elm.web.file_loader.AsyncWebFileLoader. Default is None.

  • **kwargs – Additional keyword arguments cached for downstream LLM calls triggered during validation.

Methods

check(url, jurisdiction)

Determine whether a website serves as a jurisdiction's portal

Attributes

WEB_PAGE_CHECK_SYSTEM_MESSAGE

System message for main jurisdiction website validation calls

WEB_PAGE_CHECK_SYSTEM_MESSAGE = 'You are an expert data analyst that examines website text to determine if the website is the main website for a given jurisdiction. Only ever answer based on the information from the website itself.'#

System message for main jurisdiction website validation calls

async check(url, jurisdiction)[source]#

Determine whether a website serves as a jurisdiction’s portal

The validator first performs an inexpensive URL classification before downloading page content. Only when the URL fails the initial check does it fetch and inspect the page text using a generic LLM caller.

Parameters:
Returns:

boolTrue when either the URL quick check or the full page evaluation indicates the site is the official main website for the jurisdiction.

Raises:

compass.exceptions.COMPASSError – Propagated from BaseLLMCaller if configured to raise on LLM failures.

Examples

>>> validator = JurisdictionWebsiteValidator()
>>> await validator.check("https://county.gov", jurisdiction)
True