compass.pipeline.data_classes.KnownSourcesInput#

class KnownSourcesInput(known_local_docs=None, known_doc_urls=None)[source]#

Bases: object

Value Object for known documents and URL inputs

Parameters:
  • known_local_docs (dict or path-like, optional) – A dictionary where keys are the jurisdiction codes (as strings) and values are lists of dictionaries containing information about each local document. Each document dictionary should contain at least the key "source_fp" pointing to the full local document path. Additional keys are copied onto the loaded document as attributes. This input can also be a path to a JSON file containing the same mapping. By default, None.

  • known_doc_urls (dict or path-like, optional) – A dictionary where keys are the jurisdiction codes (as strings) and values are lists of dictionaries containing information about each known URL to check. Each document dictionary should contain at least the key "source" representing the known document URL. Additional keys are copied onto the loaded document as attributes. This input can also be a path to a JSON file containing the same mapping. By default, None.

Methods