Skip to content

langchain-upstage

PyPI - Version PyPI - License PyPI - Downloads

langchain_upstage

ChatUpstage

Bases: BaseChatOpenAI

ChatUpstage chat model.

To use, you should have the environment variable UPSTAGE_API_KEY set with your API key or pass it as a named parameter to the constructor.

Example
from langchain_upstage import ChatUpstage

model = ChatUpstage()
METHOD DESCRIPTION
validate_environment

Validate that api key and python package exists in environment.

get_token_ids

Get the tokens present in the text.

get_num_tokens_from_messages

Calculate num tokens for solar model.

model_name class-attribute instance-attribute

model_name: str = Field(default='solar-mini', alias='model')

Model name to use.

upstage_api_key class-attribute instance-attribute

upstage_api_key: SecretStr = Field(
    default_factory=secret_from_env(
        "UPSTAGE_API_KEY",
        error_message="You must specify an api key. You can pass it an argument as `api_key=...` or set the environment variable `UPSTAGE_API_KEY`.",
    ),
    alias="api_key",
)

Automatically inferred from env are UPSTAGE_API_KEY if not provided.

upstage_api_base class-attribute instance-attribute

upstage_api_base: str | None = Field(
    default_factory=from_env(
        "UPSTAGE_API_BASE", default="https://api.upstage.ai/v1/solar"
    ),
    alias="base_url",
)

Base URL path for API requests.

Leave blank if not using a proxy or service emulator.

openai_api_key class-attribute instance-attribute

openai_api_key: SecretStr | None = Field(default=None)

OpenAI api key is not supported for upstage. use upstage_api_key instead.

openai_api_base class-attribute instance-attribute

openai_api_base: str | None = Field(default=None)

OpenAI api base is not supported for upstage. use upstage_api_base instead.

openai_organization class-attribute instance-attribute

openai_organization: str | None = Field(default=None)

OpenAI organization is not supported for upstage.

tiktoken_model_name class-attribute instance-attribute

tiktoken_model_name: str | None = None

Tiktoken is not supported for upstage.

tokenizer_name class-attribute instance-attribute

tokenizer_name: str | None = 'upstage/solar-pro2-tokenizer'

Huggingface tokenizer name.

Solar tokenizer is opened in huggingface https://huggingface.co/upstage/solar-pro-tokenizer

default_headers class-attribute instance-attribute

default_headers: Mapping[str, str] | None = DEFAULT_HEADERS

Add trace headers.

validate_environment

validate_environment() -> Self

Validate that api key and python package exists in environment.

get_token_ids

get_token_ids(text: str) -> list[int]

Get the tokens present in the text.

get_num_tokens_from_messages

get_num_tokens_from_messages(
    messages: Sequence[BaseMessage], tools: Sequence[Any] | None = None
) -> int

Calculate num tokens for solar model.

UpstageDocumentParseLoader

Bases: BaseLoader

Upstage Document Parse Loader.

To use, you should have the environment variable UPSTAGE_API_KEY set with your API key or pass it as a named parameter to the constructor.

Example
from langchain_upstage import UpstageDocumentParseLoader

file_path = "/PATH/TO/YOUR/FILE.pdf"
loader = UpstageDocumentParseLoader(
    file_path, split="page", output_format="text"
)
METHOD DESCRIPTION
aload

Load data into Document objects.

load_and_split

Load Document and split into chunks. Chunks are returned as Document.

alazy_load

A lazy loader for Document.

__init__

Initializes an instance of the Upstage document parse loader.

load

Loads and parses the document using the UpstageDocumentParseParser.

lazy_load

Lazily loads and parses the document using the UpstageDocumentParseParser.

merge_and_split

Merges the page content and metadata of multiple documents into a single

aload async

aload() -> list[Document]

Load data into Document objects.

RETURNS DESCRIPTION
list[Document]

The documents.

load_and_split

load_and_split(text_splitter: TextSplitter | None = None) -> list[Document]

Load Document and split into chunks. Chunks are returned as Document.

Danger

Do not override this method. It should be considered to be deprecated!

PARAMETER DESCRIPTION
text_splitter

TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.

TYPE: TextSplitter | None DEFAULT: None

RAISES DESCRIPTION
ImportError

If langchain-text-splitters is not installed and no text_splitter is provided.

RETURNS DESCRIPTION
list[Document]

List of Document.

alazy_load async

alazy_load() -> AsyncIterator[Document]

A lazy loader for Document.

YIELDS DESCRIPTION
AsyncIterator[Document]

The Document objects.

__init__

__init__(
    file_path: str | Path | list[str] | list[Path],
    split: SplitType = "none",
    api_key: str | None = None,
    base_url: str = DOCUMENT_PARSE_BASE_URL,
    model: str = DOCUMENT_PARSE_DEFAULT_MODEL,
    chart_recognition: bool = True,
    ocr: OCR = "auto",
    output_format: OutputFormat = "html",
    coordinates: bool = True,
    base64_encoding: list[Category] | None = None,
)

Initializes an instance of the Upstage document parse loader.

PARAMETER DESCRIPTION
file_path

The path to the document to be loaded.

TYPE: str | Path | list[str] | list[Path]

split

The type of splitting to be applied.

TYPE: SplitType DEFAULT: 'none'

api_key

The API key for accessing the Upstage API. Defaults to None, in which case it will be fetched from the environment variable UPSTAGE_API_KEY.

TYPE: str DEFAULT: None

base_url

The base URL for accessing the Upstage API.

TYPE: str DEFAULT: DOCUMENT_PARSE_BASE_URL

model

The model to be used for the document parse. Defaults to 'document-parse'.

TYPE: str DEFAULT: DOCUMENT_PARSE_DEFAULT_MODEL

ocr

Extract text from images in the document using OCR.

If the value is 'force', OCR is used to extract text from an image.

If the value is 'auto', text is extracted from a PDF. (An error will occur if the value is 'auto' and the input is NOT in PDF format)

TYPE: OCRMode DEFAULT: 'auto'

output_format

Format of the inference results.

TYPE: OutputFormat DEFAULT: 'html'

coordinates

Whether to include the coordinates of the OCR in the output.

TYPE: bool DEFAULT: True

base64_encoding

The category of the elements to be encoded in base64.

TYPE: list[Category] DEFAULT: None

load

load() -> list[Document]

Loads and parses the document using the UpstageDocumentParseParser.

RETURNS DESCRIPTION
list[Document]

Document objects representing the parsed layout analysis.

lazy_load

lazy_load() -> Iterator[Document]

Lazily loads and parses the document using the UpstageDocumentParseParser.

RETURNS DESCRIPTION
Iterator[Document]

An iterator of Document objects representing the parsed layout analysis.

merge_and_split

merge_and_split(
    documents: list[Document], splitter: object | None = None
) -> list[Document]

Merges the page content and metadata of multiple documents into a single document, or splits the documents using a custom splitter.

PARAMETER DESCRIPTION
documents

A list of Document objects to be merged and split.

TYPE: list

splitter

An optional splitter object that implements the split_documents method. If provided, the documents will be split using this splitter. Defaults to None, in which case the documents are merged.

TYPE: object DEFAULT: None

RETURNS DESCRIPTION
list

A list of Document objects. If no splitter is provided, a single

TYPE: list[Document]

list[Document]

Document object is returned with the merged content and combined metadata.

list[Document]

If a splitter is provided, the documents are split and a list of Document

list[Document]

objects is returned.

RAISES DESCRIPTION
AssertionError

If a splitter is provided but it does not implement the split_documents method.

UpstageDocumentParseParser

Bases: BaseBlobParser

Upstage Document Parse Parser.

To use, you should have the environment variable UPSTAGE_API_KEY set with your API key or pass it as a named parameter to the constructor.

Example
from langchain_upstage import UpstageDocumentParseParser

loader = UpstageDocumentParseParser(split="page", output_format="text")
METHOD DESCRIPTION
parse

Eagerly parse the blob into a Document or list of Document objects.

__init__

Initializes an instance of the Upstage class.

lazy_parse

Lazily parses a document and yields Document objects based on the specified

parse

parse(blob: Blob) -> list[Document]

Eagerly parse the blob into a Document or list of Document objects.

This is a convenience method for interactive development environment.

Production applications should favor the lazy_parse method instead.

Subclasses should generally not over-ride this parse method.

PARAMETER DESCRIPTION
blob

Blob instance

TYPE: Blob

RETURNS DESCRIPTION
list[Document]

List of Document objects

__init__

__init__(
    api_key: str | None = None,
    base_url: str = DOCUMENT_PARSE_BASE_URL,
    model: str = DOCUMENT_PARSE_DEFAULT_MODEL,
    split: SplitType = "none",
    chart_recognition: bool = True,
    ocr: OCR = "auto",
    output_format: OutputFormat = "html",
    coordinates: bool = True,
    base64_encoding: list[Category] | None = None,
)

Initializes an instance of the Upstage class.

PARAMETER DESCRIPTION
api_key

The API key for accessing the Upstage API. Defaults to None, in which case it will be fetched from the environment variable UPSTAGE_API_KEY.

TYPE: str DEFAULT: None

base_url

The base URL for accessing the Upstage API.

TYPE: str DEFAULT: DOCUMENT_PARSE_BASE_URL

model

The model to be used for the document parse. Defaults to "document-parse".

TYPE: str DEFAULT: DOCUMENT_PARSE_DEFAULT_MODEL

split

The type of splitting to be applied. Defaults to "none" (no splitting).

TYPE: SplitType DEFAULT: 'none'

ocr

Extract text from images in the document using OCR. If the value is "force", OCR is used to extract text from an image. If the value is "auto", text is extracted from a PDF. (An error will occur if the value is "auto" and the input is NOT in PDF format)

TYPE: OCRMode DEFAULT: 'auto'

output_format

Format of the inference results.

TYPE: OutputFormat DEFAULT: 'html'

coordinates

Whether to include the coordinates of the OCR in the output.

TYPE: bool DEFAULT: True

base64_encoding

The category of the elements to be encoded in base64.

TYPE: list[Category] DEFAULT: None

lazy_parse

lazy_parse(blob: Blob, is_batch: bool = False) -> Iterator[Document]

Lazily parses a document and yields Document objects based on the specified split type.

PARAMETER DESCRIPTION
blob

The input document blob to parse.

TYPE: Blob

is_batch

Whether to parse the document in batches. Defaults to False (single page parsing)

TYPE: bool DEFAULT: False

YIELDS DESCRIPTION
Document

The parsed document object.

TYPE:: Document

RAISES DESCRIPTION
ValueError

If an invalid split type is provided.

UpstageEmbeddings

Bases: BaseModel, Embeddings

UpstageEmbeddings embedding model.

To use, set the environment variable UPSTAGE_API_KEY with your API key or pass it as a named parameter to the constructor.

Example
from langchain_upstage import UpstageEmbeddings

model = UpstageEmbeddings(model='solar-embedding-1-large')
METHOD DESCRIPTION
build_extra

Build extra kwargs from additional params that were passed in.

validate_environment

Validate that api key and python package exists in environment.

embed_documents

Embed a list of document texts using passage model.

embed_query

Embed query text using query model.

aembed_documents

Embed a list of document texts using passage model asynchronously.

aembed_query

Asynchronous Embed query text using query model.

model class-attribute instance-attribute

model: str = Field(...)

Embeddings model name to use. Do not add suffixes like -query and -passage.

Instead, use 'solar-embedding-1-large' for example.

dimensions class-attribute instance-attribute

dimensions: int | None = None

The number of dimensions the resulting output embeddings should have.

Not yet supported.

upstage_api_key class-attribute instance-attribute

upstage_api_key: SecretStr = Field(
    default_factory=secret_from_env(
        "UPSTAGE_API_KEY",
        error_message="You must specify an api key. You can pass it an argument as `api_key=...` or set the environment variable `UPSTAGE_API_KEY`.",
    ),
    alias="api_key",
)

Automatically inferred from env are UPSTAGE_API_KEY if not provided.

upstage_api_base class-attribute instance-attribute

upstage_api_base: str | None = Field(
    default_factory=from_env(
        "UPSTAGE_API_BASE", default="https://api.upstage.ai/v1/solar"
    ),
    alias="base_url",
)

Endpoint URL to use.

embedding_ctx_length class-attribute instance-attribute

embedding_ctx_length: int = 4096

The maximum number of tokens to embed at once.

Not yet supported.

allowed_special class-attribute instance-attribute

allowed_special: Literal['all'] | set[str] = set()

Not yet supported.

disallowed_special class-attribute instance-attribute

disallowed_special: Literal['all'] | set[str] | Sequence[str] = 'all'

Not yet supported.

chunk_size class-attribute instance-attribute

chunk_size: int = 1000

Maximum number of texts to embed in each batch.

Not yet supported.

max_retries class-attribute instance-attribute

max_retries: int = 2

Maximum number of retries to make when generating.

request_timeout class-attribute instance-attribute

request_timeout: float | tuple[float, float] | Any | None = Field(
    default=None, alias="timeout"
)

Timeout for requests to Upstage embedding API. Can be float, httpx.Timeout or None.

show_progress_bar class-attribute instance-attribute

show_progress_bar: bool = False

Whether to show a progress bar when embedding.

Not yet supported.

model_kwargs class-attribute instance-attribute

model_kwargs: dict[str, Any] = Field(default_factory=dict)

Holds any model parameters valid for create call not explicitly specified.

skip_empty class-attribute instance-attribute

skip_empty: bool = False

Whether to skip empty strings when embedding or raise an error. Defaults to not skipping.

Not yet supported.

default_headers class-attribute instance-attribute

default_headers: Mapping[str, str] | None = DEFAULT_HEADERS

add trace header.

http_client class-attribute instance-attribute

http_client: Any | None = None

Optional httpx.Client.

Only used for sync invocations.

Must specify http_async_client as well if you'd like a custom client for async invocations.

http_async_client class-attribute instance-attribute

http_async_client: Any | None = None

Optional httpx.AsyncClient.

Only used for async invocations.

Must specify http_client as well if you'd like a custom client for sync invocations.

build_extra classmethod

build_extra(values: dict[str, Any]) -> Any

Build extra kwargs from additional params that were passed in.

validate_environment

validate_environment() -> Self

Validate that api key and python package exists in environment.

embed_documents

embed_documents(texts: list[str]) -> list[list[float]]

Embed a list of document texts using passage model.

PARAMETER DESCRIPTION
texts

The list of texts to embed.

TYPE: list[str]

RETURNS DESCRIPTION
list[list[float]]

List of embeddings, one for each text.

embed_query

embed_query(text: str) -> list[float]

Embed query text using query model.

PARAMETER DESCRIPTION
text

The text to embed.

TYPE: str

RETURNS DESCRIPTION
list[float]

Embedding for the text.

aembed_documents async

aembed_documents(texts: list[str]) -> list[list[float]]

Embed a list of document texts using passage model asynchronously.

PARAMETER DESCRIPTION
texts

The list of texts to embed.

TYPE: list[str]

RETURNS DESCRIPTION
list[list[float]]

List of embeddings, one for each text.

aembed_query async

aembed_query(text: str) -> list[float]

Asynchronous Embed query text using query model.

PARAMETER DESCRIPTION
text

The text to embed.

TYPE: str

RETURNS DESCRIPTION
list[float]

Embedding for the text.

UpstagePrebuiltInformationExtraction

UpstagePrebuiltInformationExtraction Information extraction model.

To use, set the environment variable UPSTAGE_API_KEY with your API key or pass it as a named parameter to the constructor.

Example
from langchain_upstage import UpstagePrebuiltInformationExtraction

model = UpstagePrebuiltInformationExtraction(model='receipt-extraction')

UpstageUniversalInformationExtraction

UpstageUniversalInformationExtraction Information extraction model.

To use, set the environment variable UPSTAGE_API_KEY with your API key or pass it as a named parameter to the constructor.

Example
from langchain_upstage import UpstageUniversalInformationExtraction

model = UpstageUniversalInformationExtraction(model='information-extract')