compass.validation.content.parse_by_chunks#
- async parse_by_chunks(chunk_parser, heuristic, legal_text_validator=None, callbacks=None, min_chunks_to_process=3)[source]#
Stream text chunks through heuristic and legal validators
This method goes through the chunks one by one, and passes them to the callback parsers if the legal_text_validator check passes. If min_chunks_to_process number of chunks fail the legal text check, parsing is aborted.
- Parameters:
chunk_parser (
ParseChunksWithMemory) – Instance that contains the attributestext_chunksandnum_to_recall. The chunks in thetext_chunksattribute will be iterated over.heuristic (
Heuristic) – Instance of Heuristic with a check method. This should be a fast check meant to quickly dispose of chunks of text. Any chunk that fails this check will NOT be passed to the callback parsers.legal_text_validator (
LegalTextValidator, optional) – Instance of LegalTextValidator that can be used to validate each chunk for legal text. If not provided, the legal text check will be skipped. By default,None.callbacks (
list, optional) – List of async callbacks that take a chunk_parser and index as inputs and return a boolean determining whether the text chunk was parsed successfully or not. By default,None, which does not use any callbacks.min_chunks_to_process (
int, optional) – Minimum number of chunks to process before aborting due to text not being legal. By default,3.
Notes
This coroutine only orchestrates validation. Callbacks are responsible for persisting any extracted results. Callback futures are awaited concurrently and share the same task name as the caller to simplify tracing within structured logging.