compass.pipeline.collection.dedupe.DocumentDeDuplicator#

class DocumentDeDuplicator[source]#

Bases: object

Domain Service for deduplicating collected documents

Methods

add_docs(docs, *, step_name, jurisdiction_name)

Add documents to the collection mapping

Attributes

values

Deduplicated collected docs

add_docs(docs, *, step_name, jurisdiction_name)[source]#

Add documents to the collection mapping

Parameters:
  • docs (list) – Collected document objects to add to the internal de-duplicated mapping.

  • step_name (str) – Identifier for the collection step that produced the documents.

  • jurisdiction_name (str) – Full jurisdiction name to attach to documents that do not already include one.

property values#

Deduplicated collected docs