compass.extraction.context.ExtractionContext#

class ExtractionContext(documents=None, attrs=None)[source]#

Bases: object

Context for extraction operations supporting multiple documents

This class provides a Document-compatible interface for extraction workflows that may involve one or more source documents. It tracks chunk-level provenance to identify which document each text chunk originated from, while maintaining compatibility with existing extraction functions that expect Document-like objects

Parameters:
  • documents (sequence of elm.web.document.BaseDocument, optional) – One or more source documents contributing to this context. For single-document workflows (solar, wind), pass a list with one document. For multi-document workflows (water rights), pass all contributing documents

  • attrs (dict, optional) – Context-level attributes for extraction metadata (jurisdiction, tech type, etc.). By default, None

Methods

mark_doc_as_data_source(doc[, out_fn_stem])

Mark a document as a data source for extraction

multi_doc_context([attr_text_key])

Get concatenated text representation of documents

Attributes

data_docs

List of documents that contributed to extraction

documents

List of documents that might contain relevant info

num_documents

Number of source documents in this context

pages

Concatenated pages from all documents

text

Concatenated text from all documents

property text#

Concatenated text from all documents

Type:

str

property pages#

Concatenated pages from all documents

Type:

list

property num_documents#

Number of source documents in this context

Type:

int

property documents#

List of documents that might contain relevant info

Type:

list

property data_docs#

List of documents that contributed to extraction

Type:

list

async mark_doc_as_data_source(doc, out_fn_stem=None)[source]#

Mark a document as a data source for extraction

Parameters:
  • doc (elm.web.document.BaseDocument) – Document to add as a data source

  • out_fn_stem (str, optional) – Optional output filename stem for this document. If provided, the document file will be moved from the temporary directory to the output directory with this filename stem and appropriate file suffix. By default, None.

multi_doc_context(attr_text_key=None)[source]#

Get concatenated text representation of documents

This method creates a concatenated text representation of the documents in this context, optionally pulling the text from the documents’ attr_text_key.

Parameters:

attr_text_key (str, optional) – The key used to look up the document’s .attrs dictionary for the text to concatenate. If None, the full document text is used for concatenation.

Returns:

str – Concatenated text representation of the documents in this context.