compass.services.cpu.OCRPDFLoader#

class OCRPDFLoader(**kwargs)[source]#

Bases: PDFLoader

Loader service for OCR

Parameters:

**kwargs – Keyword-value argument pairs to pass to concurrent.futures.ProcessPoolExecutor. By default, None.

Methods

acquire_resources()

Open thread pool and temp directory

call(*args, **kwargs)

Call the service

process(fn, pdf_bytes, **kwargs)

Execute a PDF parsing function in the process pool

process_using_futures(fut, *args, **kwargs)

Process a call to the service

release_resources()

Shutdown thread pool and cleanup temp directory

Attributes

MAX_CONCURRENT_JOBS

Max number of concurrent job submissions.

can_process

Always True (limiting is handled by asyncio)

name

Service name used to pull the correct queue object

MAX_CONCURRENT_JOBS = 10000#

Max number of concurrent job submissions.

acquire_resources()#

Open thread pool and temp directory

async classmethod call(*args, **kwargs)#

Call the service

Parameters:
  • *args – Positional and keyword arguments to be passed to the underlying service processing function.

  • **kwargs – Positional and keyword arguments to be passed to the underlying service processing function.

Returns:

object – A response object from the underlying service.

property can_process#

Always True (limiting is handled by asyncio)

Type:

bool

property name#

Service name used to pull the correct queue object

Type:

str

async process(fn, pdf_bytes, **kwargs)#

Execute a PDF parsing function in the process pool

Parameters:
  • fn (callable()) – Callable executed inside the process pool. Receives pdf_bytes as the first argument.

  • pdf_bytes (bytes) – Raw PDF payload forwarded to fn.

  • **kwargs – Additional keyword arguments passed to fn.

Returns:

Any – Result returned by fn after execution.

async process_using_futures(fut, *args, **kwargs)#

Process a call to the service

The result is communicated by updating fut.

Parameters:
  • fut (asyncio.Future) – A future object that should get the result of the processing operation. If the processing function returns answer, this method should call fut.set_result(answer).

  • **kwargs – Keyword arguments to be passed to the underlying processing function.

release_resources()#

Shutdown thread pool and cleanup temp directory