compass.scripts.download.find_jurisdiction_website#

async find_jurisdiction_website(jurisdiction, model_configs, file_loader_kwargs=None, search_semaphore=None, browser_semaphore=None, usage_tracker=None, url_ignore_substrings=None, **kwargs)[source]#

Search for the main landing page of a given jurisdiction

This function submits two pre-determined queries based on the jurisdiction name, prioritizing official landing pages. Additional kwargs (for example, alternate search engines) can be supplied to fine-tune behavior.

Parameters:
  • jurisdiction (Jurisdiction) – Jurisdiction instance representing the jurisdiction to find the main webpage for.

  • model_configs (dict) – Dictionary of LLMConfig instances. Should have at minium a “default” key that is used as a fallback for all tasks.

  • file_loader_kwargs (dict, optional) – Dictionary of keyword arguments pairs to initialize elm.web.file_loader.AsyncWebFileLoader. If found, the “pw_launch_kwargs” key in these will also be used to initialize the elm.web.search.google.PlaywrightGoogleLinkSearch used for the Google URL search. By default, None.

  • search_semaphore (asyncio.Semaphore, optional) – Semaphore instance that can be used to limit the number of playwright browsers used to submit search engine queries open concurrently. If None, no limits are applied. By default, None.

  • browser_semaphore (asyncio.Semaphore, optional) – Semaphore instance that can be used to limit the number of playwright browsers open concurrently. If None, no limits are applied. By default, None.

  • usage_tracker (UsageTracker, optional) – Optional tracker instance to monitor token usage during LLM calls. By default, None.

  • url_ignore_substrings (list of str, optional) – URL substrings that should be excluded from search results. Substrings are applied case-insensitively. By default, None.

  • **kwargs – Additional arguments forwarded to elm.web.search.run.search_with_fallback().

Returns:

str or None – URL for the jurisdiction website, if found; None otherwise.