| Name | Type | Description |
|---|---|---|
file_path* | Union[str, Path, List[str], List[Path]] | The path to the document to be loaded. |
split | SplitType | Default: 'none' |
api_key | str | Default: None |
base_url | str | Default: DOCUMENT_PARSE_BASE_URL |
model | str | Default: DOCUMENT_PARSE_DEFAULT_MODEL |
ocr | OCRMode | Default: 'auto' |
output_format | OutputFormat | Default: 'html' |
coordinates | bool | Default: True |
base64_encoding | List[Category] | Default: None |
Upstage Document Parse Loader.
To use, you should have the environment variable UPSTAGE_API_KEY
set with your API key or pass it as a named parameter to the constructor.
Example:
from langchain_upstage import UpstageDocumentParseLoader
file_path = "/PATH/TO/YOUR/FILE.pdf"
loader = UpstageDocumentParseLoader(
file_path, split="page", output_format="text"
)The type of splitting to be applied.
The API key for accessing the Upstage API.
Defaults to None, in which case it will be fetched from the environment
variable UPSTAGE_API_KEY.
The base URL for accessing the Upstage API.
The model to be used for the document parse.
Defaults to 'document-parse'.
Extract text from images in the document using OCR.
If the value is 'force', OCR is used to extract text from an image.
If the value is 'auto', text is extracted from a PDF. (An error will
occur if the value is 'auto' and the input is NOT in PDF format)
Format of the inference results.
Whether to include the coordinates of the OCR in the output.
The category of the elements to be encoded in base64.