compass.validation.content.TextKindValidator#

class TextKindValidator[source]#

Bases: ABC

Base class for a text kind validator

This class is in charge of parsing text in chunks and ultimately (after some X number of chunks have been parsed) determining if the text is the right ‘kind’ of text for a given extraction.

Methods

check_chunk(chunk_parser, ind)

Check a chunk to see if it contains the right kind of text

Attributes

is_correct_kind_of_text

True if text is a good fit for extraction

abstract property is_correct_kind_of_text#

True if text is a good fit for extraction

Type:

bool

abstractmethod async check_chunk(chunk_parser, ind)[source]#

Check a chunk to see if it contains the right kind of text

You should validate chunks like so:

is_correct_kind_of_text = await chunk_parser.parse_from_ind(
    ind,
    key="my_unique_validation_key",
    llm_call_callback=my_async_llm_call_function,
)

where the “key” is unique to this particular validation (it will be used to cache the validation result in the chunk parser’s memory) and my_async_llm_call_function is an async function that takes in a key and text chunk and returns a boolean indicating whether or not the text chunk passes the validation. You can call chunk_parser.parse_from_ind as many times as you want within this method, but be sure to use unique keys for each validation.

Parameters:
  • chunk_parser (ParseChunksWithMemory) – Instance that contains a parse_from_ind method.

  • ind (int) – Index of the chunk to check.

Returns:

bool – Boolean flag indicating whether or not the text in the chunk resembles legal text.

See also

ParseChunksWithMemory.parse_from_ind

Method used to parse text from a chunk with memory of prior chunk validations.