Runs Google Document AI PDF Batch Processing on a list of blobs.
docai_parse(
self,
blobs: Sequence[Blob],
*,
gcs_output_path: Optional[str] = None,
processor_name: Optional[str] = None,
batch_size: int = 1000,
enable_native_pdf_parsing: bool = True,
field_mask: Optional[str] = None
) -> List[Operation]Document AI has a 1000 file limit per batch, so batches larger than that need to be split into multiple requests. Batch processing is an async long-running operation and results are stored in a output GCS bucket.
| Name | Type | Description |
|---|---|---|
blobs* | Sequence[Blob] | a list of blobs to be parsed |
gcs_output_path | Optional[str] | Default: Nonea path (folder) on GCS to store results |
processor_name | Optional[str] | Default: Nonename of a Document AI processor. |
batch_size | int | Default: 1000amount of documents per batch |
enable_native_pdf_parsing | bool | Default: Truea config option for the parser |
field_mask | Optional[str] | Default: Nonea comma-separated list of which fields to include in the Document AI response. suggested: "text,pages.pageNumber,pages.layout" |