compass.pipeline.collection.steps.CompassWebsiteCrawlStep#
- class CompassWebsiteCrawlStep[source]#
Bases:
CollectionStepConcrete Strategy for COMPASS-based website crawling
Methods
collect(workflow)Collect documents based on a COMPASS website crawl
Attributes
Identifier for step
- STEP_NAME = 'website_search_compass'#
Identifier for step
- async collect(workflow)[source]#
Collect documents based on a COMPASS website crawl
- Parameters:
workflow (
compass.pipeline.jurisdiction.SingleJurisdictionRun) – The workflow for the jurisdiction being processed, which may or may not have website search enabled. If website search is not enabled, this function will return an empty list. If website search is enabled but no jurisdiction website can be found or validated, this function will also return an empty list, but will first attempt to find and validate a jurisdiction website based on the workflow’s configuration before giving up on website document collection for this jurisdiction. If website search is enabled and a jurisdiction website is found and validated (either from user input or through automatic search), this function will attempt to crawl the jurisdiction website for documents using COMPASS, and return a list of documents collected from the crawl. If any errors are encountered during the crawl, this function will log the error and return an empty list.- Returns:
list– List of documents collected from crawling the jurisdiction website using COMPASS, with jurisdiction verification enabled based on the workflow’s configuration. If website search is not enabled or if no jurisdiction website can be found or validated, this will return an empty list.