compass.scripts.download.download_jurisdiction_ordinance_using_search_engine#

async download_jurisdiction_ordinance_using_search_engine(question_templates, jurisdiction, num_urls=5, file_loader_kwargs=None, search_semaphore=None, browser_semaphore=None, url_ignore_substrings=None, **kwargs)[source]#

Download the ordinance document(s) for a single jurisdiction

Parameters:
  • question_templates (sequence of str) – Query templates that will be formatted with the jurisdiction name before submission to the search engine.

  • jurisdiction (Jurisdiction) – Location objects representing the jurisdiction.

  • num_urls (int, optional) – Number of unique Google search result URL’s to check for ordinance document. By default, 5.

  • file_loader_kwargs (dict, optional) – Dictionary of keyword-argument pairs to initialize elm.web.file_loader.AsyncWebFileLoader with. If found, the “pw_launch_kwargs” key in these will also be used to initialize the elm.web.search.google.PlaywrightGoogleLinkSearch used for the google URL search. By default, None.

  • search_semaphore (asyncio.Semaphore, optional) – Semaphore instance that can be used to limit the number of playwright browsers used to submit search engine queries open concurrently. If this input is None, the input from browser_semaphore will be used in its place (i.e. the searches and file downloads will be limited using the same semaphore). By default, None.

  • browser_semaphore (asyncio.Semaphore, optional) – Semaphore instance that can be used to limit the number of playwright browsers used to download content from the web open concurrently. If None, no limits are applied. By default, None.

  • url_ignore_substrings (list of str, optional) – URL substrings that should be excluded from search results. Substrings are applied case-insensitively. By default, None.

  • **kwargs – Additional keyword arguments forwarded to elm.web.search.run.web_search_links_as_docs(). Common entries include usage_tracker for logging LLM usage and extra Playwright configuration.

Returns:

list or None – List of BaseDocument instances possibly containing ordinance information, or None if no ordinance document was found.

Notes

Requires TempFileCachePB service to be running.