| Name | Type | Description |
|---|---|---|
api_key | str | Default: NoneThe API key for accessing the Upstage API.
Defaults to None, in which case it will be fetched from the environment
variable |
base_url | str | Default: DOCUMENT_PARSE_BASE_URLThe base URL for accessing the Upstage API. |
model | str | Default: DOCUMENT_PARSE_DEFAULT_MODELThe model to be used for the document parse. Defaults to "document-parse". |
split | SplitType | Default: 'none' |
ocr | OCRMode | Default: 'auto' |
output_format | OutputFormat | Default: 'html' |
coordinates | bool | Default: True |
base64_encoding | List[Category] | Default: None |
Upstage Document Parse Parser.
To use, you should have the environment variable UPSTAGE_API_KEY
set with your API key or pass it as a named parameter to the constructor.
Example:
from langchain_upstage import UpstageDocumentParseParser
loader = UpstageDocumentParseParser(split="page", output_format="text")The type of splitting to be applied. Defaults to "none" (no splitting).
Extract text from images in the document using OCR. If the value is "force", OCR is used to extract text from an image. If the value is "auto", text is extracted from a PDF. (An error will occur if the value is "auto" and the input is NOT in PDF format)
Format of the inference results.
Whether to include the coordinates of the OCR in the output.
The category of the elements to be encoded in base64.