Runs Google Document AI PDF Batch Processing on a list of Blob.
docai_parse(
self,
blobs: Sequence[Blob],
*,
gcs_output_path: Optional[str] = None,
processor_name: Optional[str] = None,
batch_size: int = 1000,
field_mask: Optional[str] = None,
**process_options_kwargs: Any = {}
) -> List[Operation]Document AI has a 1000 file limit per batch, so batches larger than that need to be split into multiple requests.
Batch processing is an async long-running operation and results are stored in a output GCS bucket.
| Name | Type | Description |
|---|---|---|
blobs* | Sequence[Blob] | List of |
gcs_output_path | Optional[str] | Default: NonePath (folder) on GCS to store results |
processor_name | Optional[str] | Default: NoneName of a Document AI processor. |
batch_size | int | Default: 1000Amount of documents per batch |
field_mask | Optional[str] | Default: NoneComma-separated list of which fields to include in the Document
AI response. Suggested: |
process_options_kwargs | Any | Default: {}Optional parameters to pass to the Document AI processors |