langchain-upstage¶
langchain_upstage
¶
ChatUpstage
¶
Bases: BaseChatOpenAI
ChatUpstage chat model.
To use, you should have the environment variable UPSTAGE_API_KEY set with your
API key or pass it as a named parameter to the constructor.
| METHOD | DESCRIPTION |
|---|---|
validate_environment |
Validate that api key and python package exists in environment. |
get_token_ids |
Get the tokens present in the text. |
get_num_tokens_from_messages |
Calculate num tokens for solar model. |
model_name
class-attribute
instance-attribute
¶
Model name to use.
upstage_api_key
class-attribute
instance-attribute
¶
upstage_api_key: SecretStr = Field(
default_factory=secret_from_env(
"UPSTAGE_API_KEY",
error_message="You must specify an api key. You can pass it an argument as `api_key=...` or set the environment variable `UPSTAGE_API_KEY`.",
),
alias="api_key",
)
Automatically inferred from env are UPSTAGE_API_KEY if not provided.
upstage_api_base
class-attribute
instance-attribute
¶
upstage_api_base: str | None = Field(
default_factory=from_env(
"UPSTAGE_API_BASE", default="https://api.upstage.ai/v1/solar"
),
alias="base_url",
)
Base URL path for API requests.
Leave blank if not using a proxy or service emulator.
openai_api_key
class-attribute
instance-attribute
¶
OpenAI api key is not supported for upstage. use upstage_api_key instead.
openai_api_base
class-attribute
instance-attribute
¶
OpenAI api base is not supported for upstage. use upstage_api_base instead.
openai_organization
class-attribute
instance-attribute
¶
OpenAI organization is not supported for upstage.
tiktoken_model_name
class-attribute
instance-attribute
¶
tiktoken_model_name: str | None = None
Tiktoken is not supported for upstage.
tokenizer_name
class-attribute
instance-attribute
¶
tokenizer_name: str | None = 'upstage/solar-pro2-tokenizer'
Huggingface tokenizer name.
Solar tokenizer is opened in huggingface https://huggingface.co/upstage/solar-pro-tokenizer
default_headers
class-attribute
instance-attribute
¶
Add trace headers.
UpstageDocumentParseLoader
¶
Bases: BaseLoader
Upstage Document Parse Loader.
To use, you should have the environment variable UPSTAGE_API_KEY
set with your API key or pass it as a named parameter to the constructor.
Example
| METHOD | DESCRIPTION |
|---|---|
aload |
Load data into |
load_and_split |
Load |
alazy_load |
A lazy loader for |
__init__ |
Initializes an instance of the Upstage document parse loader. |
load |
Loads and parses the document using the |
lazy_load |
Lazily loads and parses the document using the |
merge_and_split |
Merges the page content and metadata of multiple documents into a single |
aload
async
¶
load_and_split
¶
load_and_split(text_splitter: TextSplitter | None = None) -> list[Document]
Load Document and split into chunks. Chunks are returned as Document.
Danger
Do not override this method. It should be considered to be deprecated!
| PARAMETER | DESCRIPTION |
|---|---|
text_splitter
|
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ImportError
|
If |
| RETURNS | DESCRIPTION |
|---|---|
list[Document]
|
List of |
alazy_load
async
¶
alazy_load() -> AsyncIterator[Document]
A lazy loader for Document.
| YIELDS | DESCRIPTION |
|---|---|
AsyncIterator[Document]
|
The |
__init__
¶
__init__(
file_path: str | Path | list[str] | list[Path],
split: SplitType = "none",
api_key: str | None = None,
base_url: str = DOCUMENT_PARSE_BASE_URL,
model: str = DOCUMENT_PARSE_DEFAULT_MODEL,
chart_recognition: bool = True,
ocr: OCR = "auto",
output_format: OutputFormat = "html",
coordinates: bool = True,
base64_encoding: list[Category] | None = None,
)
Initializes an instance of the Upstage document parse loader.
| PARAMETER | DESCRIPTION |
|---|---|
file_path
|
The path to the document to be loaded. |
split
|
The type of splitting to be applied.
TYPE:
|
api_key
|
The API key for accessing the Upstage API.
Defaults to None, in which case it will be fetched from the environment
variable
TYPE:
|
base_url
|
The base URL for accessing the Upstage API.
TYPE:
|
model
|
The model to be used for the document parse.
Defaults to
TYPE:
|
ocr
|
Extract text from images in the document using OCR. If the value is If the value is
TYPE:
|
output_format
|
Format of the inference results.
TYPE:
|
coordinates
|
Whether to include the coordinates of the OCR in the output.
TYPE:
|
base64_encoding
|
The category of the elements to be encoded in base64.
TYPE:
|
load
¶
lazy_load
¶
merge_and_split
¶
Merges the page content and metadata of multiple documents into a single document, or splits the documents using a custom splitter.
| PARAMETER | DESCRIPTION |
|---|---|
documents
|
A list of
TYPE:
|
splitter
|
An optional splitter object that implements the
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list
|
A list of |
list[Document]
|
|
list[Document]
|
If a splitter is provided, the documents are split and a list of |
list[Document]
|
objects is returned. |
| RAISES | DESCRIPTION |
|---|---|
AssertionError
|
If a splitter is provided but it does not implement
the |
UpstageDocumentParseParser
¶
Bases: BaseBlobParser
Upstage Document Parse Parser.
To use, you should have the environment variable UPSTAGE_API_KEY
set with your API key or pass it as a named parameter to the constructor.
Example
| METHOD | DESCRIPTION |
|---|---|
parse |
Eagerly parse the blob into a |
__init__ |
Initializes an instance of the Upstage class. |
lazy_parse |
Lazily parses a document and yields Document objects based on the specified |
parse
¶
Eagerly parse the blob into a Document or list of Document objects.
This is a convenience method for interactive development environment.
Production applications should favor the lazy_parse method instead.
Subclasses should generally not over-ride this parse method.
| PARAMETER | DESCRIPTION |
|---|---|
blob
|
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Document]
|
List of |
__init__
¶
__init__(
api_key: str | None = None,
base_url: str = DOCUMENT_PARSE_BASE_URL,
model: str = DOCUMENT_PARSE_DEFAULT_MODEL,
split: SplitType = "none",
chart_recognition: bool = True,
ocr: OCR = "auto",
output_format: OutputFormat = "html",
coordinates: bool = True,
base64_encoding: list[Category] | None = None,
)
Initializes an instance of the Upstage class.
| PARAMETER | DESCRIPTION |
|---|---|
api_key
|
The API key for accessing the Upstage API.
Defaults to None, in which case it will be fetched from the environment
variable
TYPE:
|
base_url
|
The base URL for accessing the Upstage API.
TYPE:
|
model
|
The model to be used for the document parse. Defaults to "document-parse".
TYPE:
|
split
|
The type of splitting to be applied. Defaults to "none" (no splitting).
TYPE:
|
ocr
|
Extract text from images in the document using OCR. If the value is "force", OCR is used to extract text from an image. If the value is "auto", text is extracted from a PDF. (An error will occur if the value is "auto" and the input is NOT in PDF format)
TYPE:
|
output_format
|
Format of the inference results.
TYPE:
|
coordinates
|
Whether to include the coordinates of the OCR in the output.
TYPE:
|
base64_encoding
|
The category of the elements to be encoded in base64.
TYPE:
|
lazy_parse
¶
Lazily parses a document and yields Document objects based on the specified split type.
| PARAMETER | DESCRIPTION |
|---|---|
blob
|
The input document blob to parse.
TYPE:
|
is_batch
|
Whether to parse the document in batches.
Defaults to
TYPE:
|
| YIELDS | DESCRIPTION |
|---|---|
Document
|
The parsed document object.
TYPE::
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If an invalid split type is provided. |
UpstageEmbeddings
¶
Bases: BaseModel, Embeddings
UpstageEmbeddings embedding model.
To use, set the environment variable UPSTAGE_API_KEY with your API key or
pass it as a named parameter to the constructor.
Example
| METHOD | DESCRIPTION |
|---|---|
build_extra |
Build extra kwargs from additional params that were passed in. |
validate_environment |
Validate that api key and python package exists in environment. |
embed_documents |
Embed a list of document texts using passage model. |
embed_query |
Embed query text using query model. |
aembed_documents |
Embed a list of document texts using passage model asynchronously. |
aembed_query |
Asynchronous Embed query text using query model. |
model
class-attribute
instance-attribute
¶
Embeddings model name to use. Do not add suffixes like -query and -passage.
Instead, use 'solar-embedding-1-large' for example.
dimensions
class-attribute
instance-attribute
¶
dimensions: int | None = None
The number of dimensions the resulting output embeddings should have.
Not yet supported.
upstage_api_key
class-attribute
instance-attribute
¶
upstage_api_key: SecretStr = Field(
default_factory=secret_from_env(
"UPSTAGE_API_KEY",
error_message="You must specify an api key. You can pass it an argument as `api_key=...` or set the environment variable `UPSTAGE_API_KEY`.",
),
alias="api_key",
)
Automatically inferred from env are UPSTAGE_API_KEY if not provided.
upstage_api_base
class-attribute
instance-attribute
¶
upstage_api_base: str | None = Field(
default_factory=from_env(
"UPSTAGE_API_BASE", default="https://api.upstage.ai/v1/solar"
),
alias="base_url",
)
Endpoint URL to use.
embedding_ctx_length
class-attribute
instance-attribute
¶
embedding_ctx_length: int = 4096
The maximum number of tokens to embed at once.
Not yet supported.
allowed_special
class-attribute
instance-attribute
¶
Not yet supported.
disallowed_special
class-attribute
instance-attribute
¶
Not yet supported.
chunk_size
class-attribute
instance-attribute
¶
chunk_size: int = 1000
Maximum number of texts to embed in each batch.
Not yet supported.
max_retries
class-attribute
instance-attribute
¶
max_retries: int = 2
Maximum number of retries to make when generating.
request_timeout
class-attribute
instance-attribute
¶
Timeout for requests to Upstage embedding API. Can be float, httpx.Timeout or None.
show_progress_bar
class-attribute
instance-attribute
¶
show_progress_bar: bool = False
Whether to show a progress bar when embedding.
Not yet supported.
model_kwargs
class-attribute
instance-attribute
¶
Holds any model parameters valid for create call not explicitly specified.
skip_empty
class-attribute
instance-attribute
¶
skip_empty: bool = False
Whether to skip empty strings when embedding or raise an error. Defaults to not skipping.
Not yet supported.
default_headers
class-attribute
instance-attribute
¶
add trace header.
http_client
class-attribute
instance-attribute
¶
http_client: Any | None = None
Optional httpx.Client.
Only used for sync invocations.
Must specify http_async_client as well if you'd like a custom client for async
invocations.
http_async_client
class-attribute
instance-attribute
¶
http_async_client: Any | None = None
Optional httpx.AsyncClient.
Only used for async invocations.
Must specify http_client as well if you'd like a custom client for sync
invocations.
build_extra
classmethod
¶
Build extra kwargs from additional params that were passed in.
validate_environment
¶
validate_environment() -> Self
Validate that api key and python package exists in environment.
embed_documents
¶
embed_query
¶
aembed_documents
async
¶
UpstagePrebuiltInformationExtraction
¶
UpstagePrebuiltInformationExtraction Information extraction model.
To use, set the environment variable UPSTAGE_API_KEY with your API key or
pass it as a named parameter to the constructor.
Example
UpstageUniversalInformationExtraction
¶
UpstageUniversalInformationExtraction Information extraction model.
To use, set the environment variable UPSTAGE_API_KEY with your API key or
pass it as a named parameter to the constructor.