compass.pipeline.collection.base.DocumentCollection#
- class DocumentCollection(workflow)[source]#
Bases:
objectWorkflow object that applies a fixed pipeline of steps
- Parameters:
workflow (
compass.pipeline.jurisdiction.SingleJurisdictionRun) – The workflow for the jurisdiction being processed, which may or may not have website search enabled. The workflow is passed to each collection step, which may use it to access jurisdiction information and other relevant data, and to determine whether website search is enabled.
Methods
execute(*[, eager_extract, relative_to])Run the fixed collection sequence
- async execute(*, eager_extract=False, relative_to=None)[source]#
Run the fixed collection sequence
The document collection has a well-defined order:
Process any/all known local documents
Process any/all known document URLs
Search engine-based search for ordinance documents
Jurisdiction website crawl-based search for ordinance documents
Users can disable any of these steps via the workflow configuration.
- Parameters:
eager_extract (
bool, optional) – Option to apply extraction as soon as any documents are found. If the extraction returns any structured data, subsequent steps are skipped for that jurisdiction. By default,False.relative_to (path-like, optional) – Optional directory that should be the root of all relative paths. By default,
None.
- Returns:
dictorNone– Ifeager_extractisFalse, a dictionary containing collection information and metadata. Ifeager_extractisTrue, the result of the extraction workflow if any structured data was extracted, orNoneif no structured data was extracted from any of the collected documents.