compass.pipeline.collection.steps.ElmWebsiteCrawlStep#

class ElmWebsiteCrawlStep[source]#

Bases: CollectionStep

Concrete Strategy for ELM-based website crawling

Methods

collect(workflow)

Collect documents based on an ELM website crawl

Attributes

STEP_NAME

Identifier for step

STEP_NAME = 'website_search_elm'#

Identifier for step

async collect(workflow)[source]#

Collect documents based on an ELM website crawl

Parameters:

workflow (compass.pipeline.jurisdiction.SingleJurisdictionRun) – The workflow for the jurisdiction being processed, which may or may not have website search enabled. If website search is not enabled, this function will return an empty list. If website search is enabled but no jurisdiction website can be found or validated, this function will also return an empty list, but will first attempt to find and validate a jurisdiction website based on the workflow’s configuration before giving up on website document collection for this jurisdiction. If website search is enabled and a jurisdiction website is found and validated (either from user input or through automatic search), this function will attempt to crawl the jurisdiction website for documents using ELM, and return a list of documents collected from the crawl. If any errors are encountered during the crawl, this function will log the error and return an empty list.

Returns:

list – List of documents collected from crawling the jurisdiction website using ELM, with jurisdiction verification enabled based on the workflow’s configuration. If website search is not enabled or if no jurisdiction website can be found or validated, this will return an empty list.