Parses a list of blobs lazily.
batch_parse(
self,
blobs: Sequence[Blob],
gcs_output_path: Optional[str] = None,
timeout_sec: int = 3600,
check_in_interval_sec: int = 60
) -> Iterator[Document]This is a long-running operation. A recommended way is to decouple parsing from creating LangChain Documents:
operations = parser.docai_parse(blobs, gcs_path) parser.is_running(operations) You can get operations names and save them: names = [op.operation.name for op in operations] And when all operations are finished, you can use their results: operations = parser.operations_from_names(operation_names) results = parser.get_results(operations) docs = parser.parse_from_results(results)
| Name | Type | Description |
|---|---|---|
blobs* | Sequence[Blob] | a list of blobs to parse. |
gcs_output_path | Optional[str] | Default: Nonea path on Google Cloud Storage to store parsing results. |
timeout_sec | int | Default: 3600a timeout to wait for Document AI to complete, in seconds. |
check_in_interval_sec | int | Default: 60an interval to wait until next check whether parsing operations have been completed, in seconds |