compass.extraction.context.ExtractionContext#
- class ExtractionContext(documents=None, attrs=None)[source]#
Bases:
objectContext for extraction operations supporting multiple documents
This class provides a Document-compatible interface for extraction workflows that may involve one or more source documents. It tracks chunk-level provenance to identify which document each text chunk originated from, while maintaining compatibility with existing extraction functions that expect Document-like objects
- Parameters:
documents (sequence of
elm.web.document.BaseDocument, optional) – One or more source documents contributing to this context. For single-document workflows (solar, wind), pass a list with one document. For multi-document workflows (water rights), pass all contributing documentsattrs (
dict, optional) – Context-level attributes for extraction metadata (jurisdiction, tech type, etc.). By default,None
Methods
mark_doc_as_data_source(doc[, out_fn_stem])Mark a document as a data source for extraction
multi_doc_context([attr_text_key])Get concatenated text representation of documents
Attributes
List of documents that contributed to extraction
List of documents that might contain relevant info
Number of source documents in this context
Concatenated pages from all documents
Concatenated text from all documents
- async mark_doc_as_data_source(doc, out_fn_stem=None)[source]#
Mark a document as a data source for extraction
- Parameters:
doc (
elm.web.document.BaseDocument) – Document to add as a data sourceout_fn_stem (
str, optional) – Optional output filename stem for this document. If provided, the document file will be moved from the temporary directory to the output directory with this filename stem and appropriate file suffix. By default,None.