compass.plugin.interface.BaseTextCollector#

class BaseTextCollector(llm_service, usage_tracker=None, **kwargs)[source]#

Bases: BaseLLMCaller, ABC

Base class for text collectors that gather relevant text

Parameters:
  • llm_service (Service) – LLM service used for queries.

  • usage_tracker (UsageTracker, optional) – Optional tracker instance to monitor token usage during LLM calls. By default, None.

  • **kwargs

    Keyword arguments to be passed to the underlying service processing function (i.e. llm_service.call(**kwargs)). Should not contain the following keys:

    • usage_sub_label

    • messages

    These arguments are provided by this caller object.

Methods

check_chunk(chunk_parser, ind)

Check if a text chunk is relevant for extraction

Attributes

OUT_LABEL

Identifier for text collected by this class

relevant_text

Combined relevant text from the individual chunks

abstract property OUT_LABEL#

Identifier for text collected by this class

Type:

str

abstract property relevant_text#

Combined relevant text from the individual chunks

Type:

str

abstractmethod async check_chunk(chunk_parser, ind)[source]#

Check if a text chunk is relevant for extraction

You should validate chunks like so:

is_correct_kind_of_text = await chunk_parser.parse_from_ind(
    ind,
    key="my_unique_validation_key",
    llm_call_callback=my_async_llm_call_function,
)

where the “key” is unique to this particular validation (it will be used to cache the validation result in the chunk parser’s memory) and my_async_llm_call_function is an async function that takes in a key and text chunk and returns a boolean indicating whether or not the text chunk passes the validation. You can call chunk_parser.parse_from_ind as many times as you want within this method, but be sure to use unique keys for each validation.

Parameters:
  • chunk_parser (ParseChunksWithMemory) – Instance that contains a parse_from_ind method.

  • ind (int) – Index of the chunk to check.

Returns:

bool – Boolean flag indicating whether or not the text in the chunk contains information relevant to the extraction task.

See also

parse_from_ind()

Method used to parse text from a chunk with memory of prior chunk validations.