Client

client ¶

Client for interacting with the LangSmith API.

Use the client to customize API keys / workspace connections, SSL certs, etc. for tracing.

Also used to create, read, update, and delete LangSmith resources such as runs (~trace spans), datasets, examples (~records), feedback (~metrics), projects (tracer sessions/groups), etc.

For detailed API documentation, visit the LangSmith docs.

FUNCTION	DESCRIPTION
`close_session`	Close the session.
`convert_prompt_to_openai_format`	Convert a prompt to OpenAI format.
`convert_prompt_to_anthropic_format`	Convert a prompt to Anthropic format.
`dump_model`	Dump model depending on pydantic version.
`prep_obj_for_push`	Format the object so its Prompt Hub compatible.

Client ¶

Client for interacting with the LangSmith API.

METHOD	DESCRIPTION
`__init__`	Initialize a `Client` instance.
`__repr__`	Return a string representation of the instance with a link to the URL.
`request_with_retries`	Send a request with retries.
`upload_dataframe`	Upload a dataframe as individual examples to the LangSmith API.
`upload_csv`	Upload a CSV file to the LangSmith API.
`create_run`	Persist a run to the LangSmith API.
`batch_ingest_runs`	Batch ingest/upsert multiple runs in the Langsmith system.
`multipart_ingest`	Batch ingest/upsert multiple runs in the Langsmith system.
`update_run`	Update a run in the LangSmith API.
`flush_compressed_traces`	Force flush the currently buffered compressed runs.
`flush`	Flush either queue or compressed buffer, depending on mode.
`read_run`	Read a run from the LangSmith API.
`list_runs`	List runs from the LangSmith API.
`get_run_stats`	Get aggregate statistics over queried runs.
`get_run_url`	Get the URL for a run.
`share_run`	Get a share link for a run.
`unshare_run`	Delete share link for a run.
`read_run_shared_link`	Retrieve the shared link for a specific run.
`run_is_shared`	Get share state for a run.
`read_shared_run`	Get shared runs.
`list_shared_runs`	Get shared runs.
`read_dataset_shared_schema`	Retrieve the shared schema of a dataset.
`share_dataset`	Get a share link for a dataset.
`unshare_dataset`	Delete share link for a dataset.
`read_shared_dataset`	Get shared datasets.
`list_shared_examples`	Get shared examples.
`list_shared_projects`	List shared projects.
`create_project`	Create a project on the LangSmith API.
`update_project`	Update a LangSmith project.
`read_project`	Read a project from the LangSmith API.
`has_project`	Check if a project exists.
`get_test_results`	Read the record-level information from an experiment into a Pandas DF.
`list_projects`	List projects from the LangSmith API.
`delete_project`	Delete a project from LangSmith.
`create_dataset`	Create a dataset in the LangSmith API.
`has_dataset`	Check whether a dataset exists in your tenant.
`read_dataset`	Read a dataset from the LangSmith API.
`diff_dataset_versions`	Get the difference between two versions of a dataset.
`read_dataset_openai_finetuning`	Download a dataset in OpenAI Jsonl format and load it as a list of dicts.
`list_datasets`	List the datasets on the LangSmith API.
`delete_dataset`	Delete a dataset from the LangSmith API.
`update_dataset_tag`	Update the tags of a dataset.
`list_dataset_versions`	List dataset versions.
`read_dataset_version`	Get dataset version by `as_of` or exact tag.
`clone_public_dataset`	Clone a public dataset to your own langsmith tenant.
`create_llm_example`	Add an example (row) to an LLM-type dataset.
`create_chat_example`	Add an example (row) to a Chat-type dataset.
`create_example_from_run`	Add an example (row) to a dataset from a run.
`update_examples_multipart`	Update examples using multipart.
`upload_examples_multipart`	Upload examples using multipart.
`upsert_examples_multipart`	Upsert examples.
`create_examples`	Create examples in a dataset.
`create_example`	Create a dataset example in the LangSmith API.
`read_example`	Read an example from the LangSmith API.
`list_examples`	Retrieve the example rows of the specified dataset.
`index_dataset`	Enable dataset indexing. Examples are indexed by their inputs.
`sync_indexed_dataset`	Sync dataset index.
`similar_examples`	Retrieve the dataset examples whose inputs best match the current inputs.
`update_example`	Update a specific example.
`update_examples`	Update multiple examples.
`delete_example`	Delete an example by ID.
`delete_examples`	Delete multiple examples by ID.
`list_dataset_splits`	Get the splits for a dataset.
`update_dataset_splits`	Update the splits for a dataset.
`evaluate_run`	Evaluate a run.
`aevaluate_run`	Evaluate a run asynchronously.
`create_feedback`	Create feedback for a run.
`update_feedback`	Update a feedback in the LangSmith API.
`read_feedback`	Read a feedback from the LangSmith API.
`list_feedback`	List the feedback objects on the LangSmith API.
`delete_feedback`	Delete a feedback by ID.
`create_feedback_from_token`	Create feedback from a presigned token or URL.
`create_presigned_feedback_token`	Create a pre-signed URL to send feedback data to.
`create_presigned_feedback_tokens`	Create a pre-signed URL to send feedback data to.
`list_presigned_feedback_tokens`	List the feedback ingest tokens for a run.
`list_feedback_formulas`	List feedback formulas.
`get_feedback_formula_by_id`	Get a feedback formula by ID.
`create_feedback_formula`	Create a feedback formula.
`update_feedback_formula`	Update a feedback formula.
`delete_feedback_formula`	Delete a feedback formula by ID.
`list_annotation_queues`	List the annotation queues on the LangSmith API.
`create_annotation_queue`	Create an annotation queue on the LangSmith API.
`read_annotation_queue`	Read an annotation queue with the specified `queue_id`.
`update_annotation_queue`	Update an annotation queue with the specified `queue_id`.
`delete_annotation_queue`	Delete an annotation queue with the specified `queue_id`.
`add_runs_to_annotation_queue`	Add runs to an annotation queue with the specified `queue_id`.
`delete_run_from_annotation_queue`	Delete a run from an annotation queue with the specified `queue_id` and `run_id`.
`get_run_from_annotation_queue`	Get a run from an annotation queue at the specified index.
`create_comparative_experiment`	Create a comparative experiment on the LangSmith API.
`arun_on_dataset`	Asynchronously run the Chain or language model on a dataset.
`run_on_dataset`	Run the Chain or language model on a dataset.
`like_prompt`	Like a prompt.
`unlike_prompt`	Unlike a prompt.
`list_prompts`	List prompts with pagination.
`get_prompt`	Get a specific prompt by its identifier.
`create_prompt`	Create a new prompt.
`create_commit`	Create a commit for an existing prompt.
`update_prompt`	Update a prompt's metadata.
`delete_prompt`	Delete a prompt.
`pull_prompt_commit`	Pull a prompt object from the LangSmith API.
`list_prompt_commits`	List commits for a given prompt.
`pull_prompt`	Pull a prompt and return it as a LangChain `PromptTemplate`.
`push_prompt`	Push a prompt to the LangSmith API.
`cleanup`	Manually trigger cleanup of the background thread.
`evaluate`	Evaluate a target system on a given dataset.
`aevaluate`	Evaluate an async target system on a given dataset.
`get_experiment_results`	Get results for an experiment, including experiment session aggregated stats and experiment runs for each dataset example.
`generate_insights`	Generate Insights over your agent chat histories.
`poll_insights`	Poll the status of an Insights report.

api_key `property` `writable` ¶

api_key: str | None

Return the API key used for authentication.

workspace_id `property` `writable` ¶

workspace_id: str | None

Return the workspace ID used for API requests.

info `property` ¶

info: LangSmithInfo

Get the information about the LangSmith API.

RETURNS	DESCRIPTION
`LangSmithInfo`	The information about the LangSmith API, or `None` if the API is not available.

init ¶

__init__(
    api_url: str | None = None,
    *,
    api_key: str | None = None,
    retry_config: Retry | None = None,
    timeout_ms: int | tuple[int, int] | None = None,
    web_url: str | None = None,
    session: Session | None = None,
    auto_batch_tracing: bool = True,
    anonymizer: Callable[[dict], dict] | None = None,
    hide_inputs: Callable[[dict], dict] | bool | None = None,
    hide_outputs: Callable[[dict], dict] | bool | None = None,
    hide_metadata: Callable[[dict], dict] | bool | None = None,
    process_buffered_run_ops: Callable[[Sequence[dict]], Sequence[dict]] | None = None,
    run_ops_buffer_size: int | None = None,
    run_ops_buffer_timeout_ms: float | None = None,
    info: dict | LangSmithInfo | None = None,
    api_urls: dict[str, str] | None = None,
    otel_tracer_provider: TracerProvider | None = None,
    otel_enabled: bool | None = None,
    tracing_sampling_rate: float | None = None,
    workspace_id: str | None = None,
    max_batch_size_bytes: int | None = None,
    headers: dict[str, str] | None = None,
) -> None

Initialize a Client instance.

PARAMETER	DESCRIPTION
`api_url`	URL for the LangSmith API. Defaults to the `LANGCHAIN_ENDPOINT` environment variable or `https://api.smith.langchain.com` if not set. TYPE: `str \| None` DEFAULT: `None`
`api_key`	API key for the LangSmith API. Defaults to the `LANGCHAIN_API_KEY` environment variable. TYPE: `str \| None` DEFAULT: `None`
`retry_config`	Retry configuration for the `HTTPAdapter`. TYPE: `Retry \| None` DEFAULT: `None`
`timeout_ms`	Timeout for the `HTTPAdapter`. Can also be a 2-tuple of `(connect timeout, read timeout)` to set them separately. TYPE: `int \| Tuple[int, int] \| None` DEFAULT: `None`
`web_url`	URL for the LangSmith web app. Default is auto-inferred from the `ENDPOINT`. TYPE: `str \| None` DEFAULT: `None`
`session`	The session to use for requests. If `None`, a new session will be created. TYPE: `Session \| None` DEFAULT: `None`
`auto_batch_tracing`	Whether to automatically batch tracing. TYPE: `bool, default=True` DEFAULT: `True`
`anonymizer`	A function applied for masking serialized run inputs and outputs, before sending to the API. TYPE: `Callable[[dict], dict] \| None` DEFAULT: `None`
`hide_inputs`	Whether to hide run inputs when tracing with this client. If `True`, hides the entire inputs. If a function, applied to all run inputs when creating runs. TYPE: `Callable[[dict], dict] \| bool \| None` DEFAULT: `None`
`hide_outputs`	Whether to hide run outputs when tracing with this client. If `True`, hides the entire outputs. If a function, applied to all run outputs when creating runs. TYPE: `Callable[[dict], dict] \| bool \| None` DEFAULT: `None`
`hide_metadata`	Whether to hide run metadata when tracing with this client. If `True`, hides the entire metadata. If a function, applied to all run metadata when creating runs. TYPE: `Callable[[dict], dict] \| bool \| None` DEFAULT: `None`
`process_buffered_run_ops`	A function applied to buffered run operations that allows for modification of the raw run dicts before they are converted to multipart and compressed. Useful specifically for high throughput tracing where you need to apply a rate-limited API or other costly process to the runs before they are sent to the API. Note that the buffer will only flush automatically when `run_ops_buffer_size` is reached or a new run is added to the buffer after `run_ops_buffer_timeout_ms` has elapsed - it will not flush outside of these conditions unless you manually call `client.flush()`, so be sure to do this before your code exits. TYPE: `Callable[[Sequence[dict]], Sequence[dict]] \| None` DEFAULT: `None`
`run_ops_buffer_size`	Maximum number of run operations to collect in the buffer before applying `process_buffered_run_ops` and sending to the API. Required when `process_buffered_run_ops` is provided. TYPE: `int \| None` DEFAULT: `None`
`run_ops_buffer_timeout_ms`	Maximum time in milliseconds to wait before flushing the run ops buffer when new runs are added. Defaults to `5000`. Only used when `process_buffered_run_ops` is provided. TYPE: `int \| None` DEFAULT: `None`
`info`	The information about the LangSmith API. If not provided, it will be fetched from the API. TYPE: `dict \| LangSmithInfo \| None` DEFAULT: `None`
`api_urls`	A dictionary of write API URLs and their corresponding API keys. Useful for multi-tenant setups. Data is only read from the first URL in the dictionary. However, ONLY Runs are written (`POST` and `PATCH`) to all URLs in the dictionary. Feedback, sessions, datasets, examples, annotation queues and evaluation results are only written to the first. TYPE: `Dict[str, str] \| None` DEFAULT: `None`
`otel_tracer_provider`	Optional tracer provider for OpenTelemetry integration. If not provided, a LangSmith-specific tracer provider will be used. TYPE: `TracerProvider \| None` DEFAULT: `None`
`tracing_sampling_rate`	The sampling rate for tracing. If provided, overrides the `LANGCHAIN_TRACING_SAMPLING_RATE` environment variable. Should be a float between `0` and `1`, where `1` means trace everything and `0` means trace nothing. TYPE: `float \| None` DEFAULT: `None`
`workspace_id`	The workspace ID. Required for org-scoped API keys. TYPE: `str \| None` DEFAULT: `None`
`max_batch_size_bytes`	The maximum size of a batch of runs in bytes. If not provided, the default is set by the server. TYPE: `int \| None` DEFAULT: `None`
`headers`	Additional HTTP headers to include in all requests. These headers will be merged with the default headers (User-Agent, Accept, x-api-key, etc.). Custom headers will not override the default required headers. TYPE: `Dict[str, str] \| None` DEFAULT: `None`

RAISES	DESCRIPTION
`LangSmithUserError`	If the API key is not provided when using the hosted service.
`LangSmithUserError`	If both `api_url` and `api_urls` are provided.

repr ¶

__repr__() -> str

Return a string representation of the instance with a link to the URL.

RETURNS	DESCRIPTION
`str`	The string representation of the instance. TYPE: `str`

request_with_retries ¶

request_with_retries(
    method: Literal["GET", "POST", "PUT", "PATCH", "DELETE"],
    pathname: str,
    *,
    request_kwargs: Mapping | None = None,
    stop_after_attempt: int = 1,
    retry_on: Sequence[type[BaseException]] | None = None,
    to_ignore: Sequence[type[BaseException]] | None = None,
    handle_response: Callable[[Response, int], Any] | None = None,
    _context: str = "",
    **kwargs: Any,
) -> Response

Send a request with retries.

PARAMETER	DESCRIPTION
`method`	The HTTP request method. TYPE: `str`
`pathname`	The pathname of the request URL. Will be appended to the API URL. TYPE: `str`
`request_kwargs`	Additional request parameters. TYPE: `Mapping` DEFAULT: `None`
`stop_after_attempt`	The number of attempts to make. TYPE: `int, default=1` DEFAULT: `1`
`retry_on`	The exceptions to retry on. In addition to: `[LangSmithConnectionError, LangSmithAPIError]`. TYPE: `Sequence[Type[BaseException]] \| None` DEFAULT: `None`
`to_ignore`	The exceptions to ignore / pass on. TYPE: `Sequence[Type[BaseException]] \| None` DEFAULT: `None`
`handle_response`	A function to handle the response and return whether to continue retrying. TYPE: `Callable[[Response, int], Any] \| None` DEFAULT: `None`
`_context`	The context of the request. TYPE: `str, default=""` DEFAULT: `''`
`**kwargs`	Additional keyword arguments to pass to the request. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`Response`	The response object.

RAISES	DESCRIPTION
`LangSmithAPIError`	If a server error occurs.
`LangSmithUserError`	If the request fails.
`LangSmithConnectionError`	If a connection error occurs.
`LangSmithError`	If the request fails.

upload_dataframe ¶

upload_dataframe(
    df: DataFrame,
    name: str,
    input_keys: Sequence[str],
    output_keys: Sequence[str],
    *,
    description: str | None = None,
    data_type: DataType | None = kv,
) -> Dataset

Upload a dataframe as individual examples to the LangSmith API.

PARAMETER	DESCRIPTION
`df`	The dataframe to upload. TYPE: `DataFrame`
`name`	The name of the dataset. TYPE: `str`
`input_keys`	The input keys. TYPE: `Sequence[str]`
`output_keys`	The output keys. TYPE: `Sequence[str]`
`description`	The description of the dataset. TYPE: `str \| None` DEFAULT: `None`
`data_type`	The data type of the dataset. TYPE: `DataType \| None` DEFAULT: `kv`

RETURNS	DESCRIPTION
`Dataset`	The uploaded dataset. TYPE: `Dataset`

RAISES	DESCRIPTION
`ValueError`	If the `csv_file` is not a `str` or `tuple`.

Example

from langsmith import Client
import os
import pandas as pd

client = Client()

df = pd.read_parquet("path/to/your/myfile.parquet")
input_keys = ["column1", "column2"]  # replace with your input column names
output_keys = ["output1", "output2"]  # replace with your output column names

dataset = client.upload_dataframe(
    df=df,
    input_keys=input_keys,
    output_keys=output_keys,
    name="My Parquet Dataset",
    description="Dataset created from a parquet file",
    data_type="kv",  # The default
)

upload_csv ¶

upload_csv(
    csv_file: str | tuple[str, BytesIO],
    input_keys: Sequence[str],
    output_keys: Sequence[str],
    *,
    name: str | None = None,
    description: str | None = None,
    data_type: DataType | None = kv,
) -> Dataset

Upload a CSV file to the LangSmith API.

PARAMETER	DESCRIPTION
`csv_file`	The CSV file to upload. If a string, it should be the path. If a tuple, it should be a tuple containing the filename and a `BytesIO` object. TYPE: `str \| Tuple[str, BytesIO]`
`input_keys`	The input keys. TYPE: `Sequence[str]`
`output_keys`	The output keys. TYPE: `Sequence[str]`
`name`	The name of the dataset. TYPE: `str \| None` DEFAULT: `None`
`description`	The description of the dataset. TYPE: `str \| None` DEFAULT: `None`
`data_type`	The data type of the dataset. TYPE: `DataType \| None` DEFAULT: `kv`

RETURNS	DESCRIPTION
`Dataset`	The uploaded dataset. TYPE: `Dataset`

RAISES	DESCRIPTION
`ValueError`	If the `csv_file` is not a string or tuple.

Example

from langsmith import Client
import os

client = Client()

csv_file = "path/to/your/myfile.csv"
input_keys = ["column1", "column2"]  # replace with your input column names
output_keys = ["output1", "output2"]  # replace with your output column names

dataset = client.upload_csv(
    csv_file=csv_file,
    input_keys=input_keys,
    output_keys=output_keys,
    name="My CSV Dataset",
    description="Dataset created from a CSV file",
    data_type="kv",  # The default
)

create_run ¶

create_run(
    name: str,
    inputs: dict[str, Any],
    run_type: RUN_TYPE_T,
    *,
    project_name: str | None = None,
    revision_id: str | None = None,
    dangerously_allow_filesystem: bool = False,
    api_key: str | None = None,
    api_url: str | None = None,
    **kwargs: Any,
) -> None

Persist a run to the LangSmith API.

PARAMETER	DESCRIPTION
`name`	The name of the run. TYPE: `str`
`inputs`	The input values for the run. TYPE: `Dict[str, Any]`
`run_type`	The type of the run, such as tool, chain, llm, retriever, embedding, prompt, or parser. TYPE: `str`
`project_name`	The project name of the run. TYPE: `str \| None` DEFAULT: `None`
`revision_id`	The revision ID of the run. TYPE: `UUID \| str \| None` DEFAULT: `None`
`api_key`	The API key to use for this specific run. TYPE: `str \| None` DEFAULT: `None`
`api_url`	The API URL to use for this specific run. TYPE: `str \| None` DEFAULT: `None`
`**kwargs`	Additional keyword arguments. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`None`	None

RAISES	DESCRIPTION
`LangSmithUserError`	If the API key is not provided when using the hosted service.

Example

from langsmith import Client
import datetime
from uuid import uuid4

client = Client()

run_id = uuid4()
client.create_run(
    id=run_id,
    project_name=project_name,
    name="test_run",
    run_type="llm",
    inputs={"prompt": "hello world"},
    outputs={"generation": "hi there"},
    start_time=datetime.datetime.now(datetime.timezone.utc),
    end_time=datetime.datetime.now(datetime.timezone.utc),
    hide_inputs=True,
    hide_outputs=True,
)

batch_ingest_runs ¶

batch_ingest_runs(
    create: Sequence[Run | RunLikeDict | dict] | None = None,
    update: Sequence[Run | RunLikeDict | dict] | None = None,
    *,
    pre_sampled: bool = False,
) -> None

Batch ingest/upsert multiple runs in the Langsmith system.

PARAMETER	DESCRIPTION
`create`	A sequence of `Run` objects or equivalent dictionaries representing runs to be created / posted. TYPE: `Sequence[Run \| RunLikeDict] \| None` DEFAULT: `None`
`update`	A sequence of `Run` objects or equivalent dictionaries representing runs that have already been created and should be updated / patched. TYPE: `Sequence[Run \| RunLikeDict] \| None` DEFAULT: `None`
`pre_sampled`	Whether the runs have already been subject to sampling, and therefore should not be sampled again. TYPE: `bool, default=False` DEFAULT: `False`

RAISES	DESCRIPTION
`LangsmithAPIError`	If there is an error in the API request.

RETURNS	DESCRIPTION
`None`	None

Note

The run objects MUST contain the dotted_order and trace_id fields to be accepted by the API.

Example

from langsmith import Client
import datetime
from uuid import uuid4

client = Client()
_session = "__test_batch_ingest_runs"
trace_id = uuid4()
trace_id_2 = uuid4()
run_id_2 = uuid4()
current_time = datetime.datetime.now(datetime.timezone.utc).strftime(
    "%Y%m%dT%H%M%S%fZ"
)
later_time = (
    datetime.datetime.now(datetime.timezone.utc) + timedelta(seconds=1)
).strftime("%Y%m%dT%H%M%S%fZ")

runs_to_create = [
    {
        "id": str(trace_id),
        "session_name": _session,
        "name": "run 1",
        "run_type": "chain",
        "dotted_order": f"{current_time}{str(trace_id)}",
        "trace_id": str(trace_id),
        "inputs": {"input1": 1, "input2": 2},
        "outputs": {"output1": 3, "output2": 4},
    },
    {
        "id": str(trace_id_2),
        "session_name": _session,
        "name": "run 3",
        "run_type": "chain",
        "dotted_order": f"{current_time}{str(trace_id_2)}",
        "trace_id": str(trace_id_2),
        "inputs": {"input1": 1, "input2": 2},
        "error": "error",
    },
    {
        "id": str(run_id_2),
        "session_name": _session,
        "name": "run 2",
        "run_type": "chain",
        "dotted_order": f"{current_time}{str(trace_id)}."
        f"{later_time}{str(run_id_2)}",
        "trace_id": str(trace_id),
        "parent_run_id": str(trace_id),
        "inputs": {"input1": 5, "input2": 6},
    },
]
runs_to_update = [
    {
        "id": str(run_id_2),
        "dotted_order": f"{current_time}{str(trace_id)}."
        f"{later_time}{str(run_id_2)}",
        "trace_id": str(trace_id),
        "parent_run_id": str(trace_id),
        "outputs": {"output1": 4, "output2": 5},
    },
]

client.batch_ingest_runs(create=runs_to_create, update=runs_to_update)

multipart_ingest ¶

multipart_ingest(
    create: Sequence[Run | RunLikeDict | dict] | None = None,
    update: Sequence[Run | RunLikeDict | dict] | None = None,
    *,
    pre_sampled: bool = False,
    dangerously_allow_filesystem: bool = False,
) -> None

Batch ingest/upsert multiple runs in the Langsmith system.

PARAMETER	DESCRIPTION
`create`	A sequence of `Run` objects or equivalent dictionaries representing runs to be created / posted. TYPE: `Sequence[Run \| RunLikeDict] \| None` DEFAULT: `None`
`update`	A sequence of `Run` objects or equivalent dictionaries representing runs that have already been created and should be updated / patched. TYPE: `Sequence[Run \| RunLikeDict] \| None` DEFAULT: `None`
`pre_sampled`	Whether the runs have already been subject to sampling, and therefore should not be sampled again. TYPE: `bool, default=False` DEFAULT: `False`

RAISES	DESCRIPTION
`LangsmithAPIError`	If there is an error in the API request.

Note

The run objects MUST contain the dotted_order and trace_id fields to be accepted by the API.

Example

from langsmith import Client
import datetime
from uuid import uuid4

client = Client()
_session = "__test_batch_ingest_runs"
trace_id = uuid4()
trace_id_2 = uuid4()
run_id_2 = uuid4()
current_time = datetime.datetime.now(datetime.timezone.utc).strftime(
    "%Y%m%dT%H%M%S%fZ"
)
later_time = (
    datetime.datetime.now(datetime.timezone.utc) + timedelta(seconds=1)
).strftime("%Y%m%dT%H%M%S%fZ")

runs_to_create = [
    {
        "id": str(trace_id),
        "session_name": _session,
        "name": "run 1",
        "run_type": "chain",
        "dotted_order": f"{current_time}{str(trace_id)}",
        "trace_id": str(trace_id),
        "inputs": {"input1": 1, "input2": 2},
        "outputs": {"output1": 3, "output2": 4},
    },
    {
        "id": str(trace_id_2),
        "session_name": _session,
        "name": "run 3",
        "run_type": "chain",
        "dotted_order": f"{current_time}{str(trace_id_2)}",
        "trace_id": str(trace_id_2),
        "inputs": {"input1": 1, "input2": 2},
        "error": "error",
    },
    {
        "id": str(run_id_2),
        "session_name": _session,
        "name": "run 2",
        "run_type": "chain",
        "dotted_order": f"{current_time}{str(trace_id)}."
        f"{later_time}{str(run_id_2)}",
        "trace_id": str(trace_id),
        "parent_run_id": str(trace_id),
        "inputs": {"input1": 5, "input2": 6},
    },
]
runs_to_update = [
    {
        "id": str(run_id_2),
        "dotted_order": f"{current_time}{str(trace_id)}."
        f"{later_time}{str(run_id_2)}",
        "trace_id": str(trace_id),
        "parent_run_id": str(trace_id),
        "outputs": {"output1": 4, "output2": 5},
    },
]

client.multipart_ingest(create=runs_to_create, update=runs_to_update)

update_run ¶

update_run(
    run_id: ID_TYPE,
    *,
    name: str | None = None,
    run_type: RUN_TYPE_T | None = None,
    start_time: datetime | None = None,
    end_time: datetime | None = None,
    error: str | None = None,
    inputs: dict | None = None,
    outputs: dict | None = None,
    events: Sequence[dict] | None = None,
    extra: dict | None = None,
    tags: list[str] | None = None,
    attachments: Attachments | None = None,
    dangerously_allow_filesystem: bool = False,
    reference_example_id: str | UUID | None = None,
    api_key: str | None = None,
    api_url: str | None = None,
    **kwargs: Any,
) -> None

Update a run in the LangSmith API.

PARAMETER	DESCRIPTION
`run_id`	The ID of the run to update. TYPE: `UUID \| str`
`name`	The name of the run. TYPE: `str \| None` DEFAULT: `None`
`run_type`	The type of the run (e.g., llm, chain, tool). TYPE: `str \| None` DEFAULT: `None`
`start_time`	The start time of the run. TYPE: `datetime \| None` DEFAULT: `None`
`end_time`	The end time of the run. TYPE: `datetime \| None` DEFAULT: `None`
`error`	The error message of the run. TYPE: `str \| None` DEFAULT: `None`
`inputs`	The input values for the run. TYPE: `Dict \| None` DEFAULT: `None`
`outputs`	The output values for the run. TYPE: `Dict \| None` DEFAULT: `None`
`events`	The events for the run. TYPE: `Sequence[dict] \| None` DEFAULT: `None`
`extra`	The extra information for the run. TYPE: `Dict \| None` DEFAULT: `None`
`tags`	The tags for the run. TYPE: `List[str] \| None` DEFAULT: `None`
`attachments`	A dictionary of attachments to add to the run. The keys are the attachment names, and the values are Attachment objects containing the data and mime type. TYPE: `Dict[str, Attachment] \| None` DEFAULT: `None`
`reference_example_id`	ID of the example that was the source of the run inputs. Used for runs that were part of an experiment. TYPE: `str \| UUID \| None` DEFAULT: `None`
`api_key`	The API key to use for this specific run. TYPE: `str \| None` DEFAULT: `None`
`api_url`	The API URL to use for this specific run. TYPE: `str \| None` DEFAULT: `None`
`**kwargs`	Kwargs are ignored. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`None`	None

Examples:

from langsmith import Client
import datetime
from uuid import uuid4

client = Client()
project_name = "__test_update_run"

start_time = datetime.datetime.now()
revision_id = uuid4()
run: dict = dict(
    id=uuid4(),
    name="test_run",
    run_type="llm",
    inputs={"text": "hello world"},
    project_name=project_name,
    api_url=os.getenv("LANGCHAIN_ENDPOINT"),
    start_time=start_time,
    extra={"extra": "extra"},
    revision_id=revision_id,
)
# Create the run
client.create_run(**run)
run["outputs"] = {"output": ["Hi"]}
run["extra"]["foo"] = "bar"
run["name"] = "test_run_updated"
# Update the run
client.update_run(run["id"], **run)

flush_compressed_traces ¶

flush_compressed_traces(attempts: int = 3) -> None

Force flush the currently buffered compressed runs.

flush ¶

flush() -> None

Flush either queue or compressed buffer, depending on mode.

read_run ¶

read_run(run_id: ID_TYPE, load_child_runs: bool = False) -> Run

Read a run from the LangSmith API.

PARAMETER	DESCRIPTION
`run_id`	The ID of the run to read. TYPE: `UUID \| str`
`load_child_runs`	Whether to load nested child runs. TYPE: `bool, default=False` DEFAULT: `False`

RETURNS	DESCRIPTION
`Run`	The run read from the LangSmith API. TYPE: `Run`

Examples:

from langsmith import Client

# Existing run
run_id = "your-run-id"

client = Client()
stored_run = client.read_run(run_id)

list_runs ¶

list_runs(
    *,
    project_id: ID_TYPE | Sequence[ID_TYPE] | None = None,
    project_name: str | Sequence[str] | None = None,
    run_type: str | None = None,
    trace_id: ID_TYPE | None = None,
    reference_example_id: ID_TYPE | None = None,
    query: str | None = None,
    filter: str | None = None,
    trace_filter: str | None = None,
    tree_filter: str | None = None,
    is_root: bool | None = None,
    parent_run_id: ID_TYPE | None = None,
    start_time: datetime | None = None,
    error: bool | None = None,
    run_ids: Sequence[ID_TYPE] | None = None,
    select: Sequence[str] | None = None,
    limit: int | None = None,
    **kwargs: Any,
) -> Iterator[Run]

List runs from the LangSmith API.

PARAMETER	DESCRIPTION
`project_id`	The ID(s) of the project to filter by. TYPE: `UUID \| str, Sequence[UUID \| str] \| None` DEFAULT: `None`
`project_name`	The name(s) of the project to filter by. TYPE: `str \| Sequence[str] \| None` DEFAULT: `None`
`run_type`	The type of the runs to filter by. TYPE: `str \| None` DEFAULT: `None`
`trace_id`	The ID of the trace to filter by. TYPE: `UUID \| str \| None` DEFAULT: `None`
`reference_example_id`	The ID of the reference example to filter by. TYPE: `UUID \| str \| None` DEFAULT: `None`
`query`	The query string to filter by. TYPE: `str \| None` DEFAULT: `None`
`filter`	The filter string to filter by. TYPE: `str \| None` DEFAULT: `None`
`trace_filter`	Filter to apply to the ROOT run in the trace tree. This is meant to be used in conjunction with the regular `filter` parameter to let you filter runs by attributes of the root run within a trace. TYPE: `str \| None` DEFAULT: `None`
`tree_filter`	Filter to apply to OTHER runs in the trace tree, including sibling and child runs. This is meant to be used in conjunction with the regular `filter` parameter to let you filter runs by attributes of any run within a trace. TYPE: `str \| None` DEFAULT: `None`
`is_root`	Whether to filter by root runs. TYPE: `bool \| None` DEFAULT: `None`
`parent_run_id`	The ID of the parent run to filter by. TYPE: `UUID \| str \| None` DEFAULT: `None`
`start_time`	The start time to filter by. TYPE: `datetime \| None` DEFAULT: `None`
`error`	Whether to filter by error status. TYPE: `bool \| None` DEFAULT: `None`
`run_ids`	The IDs of the runs to filter by. TYPE: `Sequence[UUID \| str] \| None` DEFAULT: `None`
`select`	The fields to select. TYPE: `Sequence[str] \| None` DEFAULT: `None`
`limit`	The maximum number of runs to return. TYPE: `int \| None` DEFAULT: `None`
`**kwargs`	Additional keyword arguments. TYPE: `Any` DEFAULT: `{}`

YIELDS	DESCRIPTION
`Run`	The runs.

Examples:

# List all runs in a project
project_runs = client.list_runs(project_name="<your_project>")

# List LLM and Chat runs in the last 24 hours
todays_llm_runs = client.list_runs(
    project_name="<your_project>",
    start_time=datetime.now() - timedelta(days=1),
    run_type="llm",
)

# List root traces in a project
root_runs = client.list_runs(project_name="<your_project>", is_root=1)

# List runs without errors
correct_runs = client.list_runs(project_name="<your_project>", error=False)

# List runs and only return their inputs/outputs (to speed up the query)
input_output_runs = client.list_runs(
    project_name="<your_project>", select=["inputs", "outputs"]
)

# List runs by run ID
run_ids = [
    "a36092d2-4ad5-4fb4-9c0d-0dba9a2ed836",
    "9398e6be-964f-4aa4-8ae9-ad78cd4b7074",
]
selected_runs = client.list_runs(id=run_ids)

# List all "chain" type runs that took more than 10 seconds and had
# `total_tokens` greater than 5000
chain_runs = client.list_runs(
    project_name="<your_project>",
    filter='and(eq(run_type, "chain"), gt(latency, 10), gt(total_tokens, 5000))',
)

# List all runs called "extractor" whose root of the trace was assigned feedback "user_score" score of 1
good_extractor_runs = client.list_runs(
    project_name="<your_project>",
    filter='eq(name, "extractor")',
    trace_filter='and(eq(feedback_key, "user_score"), eq(feedback_score, 1))',
)

# List all runs that started after a specific timestamp and either have "error" not equal to null or a "Correctness" feedback score equal to 0
complex_runs = client.list_runs(
    project_name="<your_project>",
    filter='and(gt(start_time, "2023-07-15T12:34:56Z"), or(neq(error, null), and(eq(feedback_key, "Correctness"), eq(feedback_score, 0.0))))',
)

# List all runs where `tags` include "experimental" or "beta" and `latency` is greater than 2 seconds
tagged_runs = client.list_runs(
    project_name="<your_project>",
    filter='and(or(has(tags, "experimental"), has(tags, "beta")), gt(latency, 2))',
)

get_run_stats ¶

get_run_stats(
    *,
    id: list[ID_TYPE] | None = None,
    trace: ID_TYPE | None = None,
    parent_run: ID_TYPE | None = None,
    run_type: str | None = None,
    project_names: list[str] | None = None,
    project_ids: list[ID_TYPE] | None = None,
    reference_example_ids: list[ID_TYPE] | None = None,
    start_time: str | None = None,
    end_time: str | None = None,
    error: bool | None = None,
    query: str | None = None,
    filter: str | None = None,
    trace_filter: str | None = None,
    tree_filter: str | None = None,
    is_root: bool | None = None,
    data_source_type: str | None = None,
) -> dict[str, Any]

Get aggregate statistics over queried runs.

Takes in similar query parameters to list_runs and returns statistics based on the runs that match the query.

PARAMETER	DESCRIPTION
`id`	List of run IDs to filter by. TYPE: `List[UUID \| str] \| None` DEFAULT: `None`
`trace`	Trace ID to filter by. TYPE: `UUID \| str \| None` DEFAULT: `None`
`parent_run`	Parent run ID to filter by. TYPE: `UUID \| str \| None` DEFAULT: `None`
`run_type`	Run type to filter by. TYPE: `str \| None` DEFAULT: `None`
`project_names`	List of project names to filter by. TYPE: `List[str] \| None` DEFAULT: `None`
`project_ids`	List of project IDs to filter by. TYPE: `List[UUID \| str] \| None` DEFAULT: `None`
`reference_example_ids`	List of reference example IDs to filter by. TYPE: `List[UUID \| str] \| None` DEFAULT: `None`
`start_time`	Start time to filter by. TYPE: `str \| None` DEFAULT: `None`
`end_time`	End time to filter by. TYPE: `str \| None` DEFAULT: `None`
`error`	Filter by error status. TYPE: `bool \| None` DEFAULT: `None`
`query`	Query string to filter by. TYPE: `str \| None` DEFAULT: `None`
`filter`	Filter string to apply. TYPE: `str \| None` DEFAULT: `None`
`trace_filter`	Trace filter string to apply. TYPE: `str \| None` DEFAULT: `None`
`tree_filter`	Tree filter string to apply. TYPE: `str \| None` DEFAULT: `None`
`is_root`	Filter by root run status. TYPE: `bool \| None` DEFAULT: `None`
`data_source_type`	Data source type to filter by. TYPE: `str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`dict[str, Any]`	Dict[str, Any]: A dictionary containing the run statistics.

get_run_url ¶

get_run_url(
    *, run: RunBase, project_name: str | None = None, project_id: ID_TYPE | None = None
) -> str

Get the URL for a run.

Not recommended for use within your agent runtime. More for use interacting with runs after the fact for data analysis or ETL workloads.

PARAMETER	DESCRIPTION
`run`	The run. TYPE: `RunBase`
`project_name`	The name of the project. TYPE: `str \| None` DEFAULT: `None`
`project_id`	The ID of the project. TYPE: `UUID \| str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`str`	The URL for the run. TYPE: `str`

share_run ¶

share_run(run_id: ID_TYPE, *, share_id: ID_TYPE | None = None) -> str

Get a share link for a run.

PARAMETER	DESCRIPTION
`run_id`	The ID of the run to share. TYPE: `UUID \| str`
`share_id`	Custom share ID. If not provided, a random UUID will be generated. TYPE: `UUID \| str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`str`	The URL of the shared run. TYPE: `str`

unshare_run ¶

unshare_run(run_id: ID_TYPE) -> None

Delete share link for a run.

PARAMETER	DESCRIPTION
`run_id`	The ID of the run to unshare. TYPE: `UUID \| str`

RETURNS	DESCRIPTION
`None`	None

read_run_shared_link ¶

read_run_shared_link(run_id: ID_TYPE) -> str | None

Retrieve the shared link for a specific run.

PARAMETER	DESCRIPTION
`run_id`	The ID of the run. TYPE: `UUID \| str`

RETURNS	DESCRIPTION
`str \| None`	Optional[str]: The shared link for the run, or None if the link is not
`str \| None`	available.

run_is_shared ¶

run_is_shared(run_id: ID_TYPE) -> bool

Get share state for a run.

PARAMETER	DESCRIPTION
`run_id`	The ID of the run. TYPE: `UUID \| str`

RETURNS	DESCRIPTION
`bool`	True if the run is shared, False otherwise. TYPE: `bool`

read_shared_run ¶

read_shared_run(share_token: ID_TYPE | str, run_id: ID_TYPE | None = None) -> Run

Get shared runs.

PARAMETER	DESCRIPTION
`share_token`	The share token or URL of the shared run. TYPE: `UUID \| str`
`run_id`	The ID of the specific run to retrieve. If not provided, the full shared run will be returned. TYPE: `UUID \| str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Run`	The shared run. TYPE: `Run`

list_shared_runs ¶

list_shared_runs(
    share_token: ID_TYPE | str, run_ids: list[str] | None = None
) -> Iterator[Run]

Get shared runs.

PARAMETER	DESCRIPTION
`share_token`	The share token or URL of the shared run. TYPE: `UUID \| str`
`run_ids`	A list of run IDs to filter the results by. TYPE: `List[str] \| None` DEFAULT: `None`

YIELDS	DESCRIPTION
`Run`	A shared run.

read_dataset_shared_schema ¶

read_dataset_shared_schema(
    dataset_id: ID_TYPE | None = None, *, dataset_name: str | None = None
) -> DatasetShareSchema

Retrieve the shared schema of a dataset.

PARAMETER	DESCRIPTION
`dataset_id`	The ID of the dataset. Either `dataset_id` or `dataset_name` must be given. TYPE: `UUID \| str \| None` DEFAULT: `None`
`dataset_name`	The name of the dataset. Either `dataset_id` or `dataset_name` must be given. TYPE: `str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`DatasetShareSchema`	ls_schemas.DatasetShareSchema: The shared schema of the dataset.

RAISES	DESCRIPTION
`ValueError`	If neither `dataset_id` nor `dataset_name` is given.

share_dataset ¶

share_dataset(
    dataset_id: ID_TYPE | None = None, *, dataset_name: str | None = None
) -> DatasetShareSchema

Get a share link for a dataset.

PARAMETER	DESCRIPTION
`dataset_id`	The ID of the dataset. Either `dataset_id` or `dataset_name` must be given. TYPE: `UUID \| str \| None` DEFAULT: `None`
`dataset_name`	The name of the dataset. Either `dataset_id` or `dataset_name` must be given. TYPE: `str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`DatasetShareSchema`	ls_schemas.DatasetShareSchema: The shared schema of the dataset.

RAISES	DESCRIPTION
`ValueError`	If neither `dataset_id` nor `dataset_name` is given.

unshare_dataset ¶

unshare_dataset(dataset_id: ID_TYPE) -> None

Delete share link for a dataset.

PARAMETER	DESCRIPTION
`dataset_id`	The ID of the dataset to unshare. TYPE: `UUID \| str`

RETURNS	DESCRIPTION
`None`	None

read_shared_dataset ¶

read_shared_dataset(share_token: str) -> Dataset

Get shared datasets.

PARAMETER	DESCRIPTION
`share_token`	The share token or URL of the shared dataset. TYPE: `UUID \| str`

RETURNS	DESCRIPTION
`Dataset`	The shared dataset. TYPE: `Dataset`

list_shared_examples ¶

list_shared_examples(
    share_token: str,
    *,
    example_ids: list[ID_TYPE] | None = None,
    limit: int | None = None,
) -> Iterator[Example]

Get shared examples.

PARAMETER	DESCRIPTION
`share_token`	The share token or URL of the shared dataset. TYPE: `UUID \| str`
`example_ids`	The IDs of the examples to filter by. TYPE: `List[UUID, str] \| None` DEFAULT: `None`
`limit`	Maximum number of examples to return, by default None. TYPE: `int \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Iterator[Example]`	List[ls_schemas.Example]: The list of shared examples.

list_shared_projects ¶

list_shared_projects(
    *,
    dataset_share_token: str,
    project_ids: list[ID_TYPE] | None = None,
    name: str | None = None,
    name_contains: str | None = None,
    limit: int | None = None,
) -> Iterator[TracerSessionResult]

List shared projects.

PARAMETER	DESCRIPTION
`dataset_share_token`	The share token of the dataset. TYPE: `str`
`project_ids`	List of project IDs to filter the results, by default None. TYPE: `List[UUID \| str] \| None` DEFAULT: `None`
`name`	Name of the project to filter the results, by default None. TYPE: `str \| None` DEFAULT: `None`
`name_contains`	Substring to search for in project names, by default None. TYPE: `str \| None` DEFAULT: `None`
`limit`	Maximum number of projects to return, by default None. TYPE: `int \| None` DEFAULT: `None`

YIELDS	DESCRIPTION
`TracerSessionResult`	The shared projects.

create_project ¶

create_project(
    project_name: str,
    *,
    description: str | None = None,
    metadata: dict | None = None,
    upsert: bool = False,
    project_extra: dict | None = None,
    reference_dataset_id: ID_TYPE | None = None,
) -> TracerSession

Create a project on the LangSmith API.

PARAMETER	DESCRIPTION
`project_name`	The name of the project. TYPE: `str`
`project_extra`	Additional project information. TYPE: `dict \| None` DEFAULT: `None`
`metadata`	Additional metadata to associate with the project. TYPE: `dict \| None` DEFAULT: `None`
`description`	The description of the project. TYPE: `str \| None` DEFAULT: `None`
`upsert`	Whether to update the project if it already exists. TYPE: `bool, default=False` DEFAULT: `False`
`reference_dataset_id`	The ID of the reference dataset to associate with the project. TYPE: `Optional[Union[UUID, str]` DEFAULT: `None`

RETURNS	DESCRIPTION
`TracerSession`	The created project. TYPE: `TracerSession`

update_project ¶

update_project(
    project_id: ID_TYPE,
    *,
    name: str | None = None,
    description: str | None = None,
    metadata: dict | None = None,
    project_extra: dict | None = None,
    end_time: datetime | None = None,
) -> TracerSession

Update a LangSmith project.

PARAMETER	DESCRIPTION
`project_id`	The ID of the project to update. TYPE: `UUID \| str`
`name`	The new name to give the project. This is only valid if the project has been assigned an end_time, meaning it has been completed/closed. TYPE: `str \| None` DEFAULT: `None`
`description`	The new description to give the project. TYPE: `str \| None` DEFAULT: `None`
`metadata`	Additional metadata to associate with the project. TYPE: `dict \| None` DEFAULT: `None`
`project_extra`	Additional project information. TYPE: `dict \| None` DEFAULT: `None`
`end_time`	The time the project was completed. TYPE: `datetime \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`TracerSession`	The updated project. TYPE: `TracerSession`

read_project ¶

read_project(
    *,
    project_id: str | None = None,
    project_name: str | None = None,
    include_stats: bool = False,
) -> TracerSessionResult

Read a project from the LangSmith API.

PARAMETER	DESCRIPTION
`project_id`	The ID of the project to read. TYPE: `str \| None` DEFAULT: `None`
`project_name`	The name of the project to read. Only one of project_id or project_name may be given. TYPE: `str \| None` DEFAULT: `None`
`include_stats`	Whether to include a project's aggregate statistics in the response. TYPE: `bool, default=False` DEFAULT: `False`

RETURNS	DESCRIPTION
`TracerSessionResult`	The project. TYPE: `TracerSessionResult`

has_project ¶

has_project(project_name: str, *, project_id: str | None = None) -> bool

Check if a project exists.

PARAMETER	DESCRIPTION
`project_name`	The name of the project to check for. TYPE: `str`
`project_id`	The ID of the project to check for. TYPE: `str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`bool`	Whether the project exists. TYPE: `bool`

get_test_results ¶

get_test_results(
    *, project_id: ID_TYPE | None = None, project_name: str | None = None
) -> DataFrame

Read the record-level information from an experiment into a Pandas DF.

Note

This will fetch whatever data exists in the DB. Results are not immediately available in the DB upon evaluation run completion.

Feedback score values will be returned as an average across all runs for the experiment. Non-numeric feedback scores will be omitted.

PARAMETER	DESCRIPTION
`project_id`	The ID of the project. TYPE: `UUID \| str \| None` DEFAULT: `None`
`project_name`	The name of the project. TYPE: `str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`DataFrame`	pd.DataFrame: A dataframe containing the test results.

list_projects ¶

list_projects(
    project_ids: list[ID_TYPE] | None = None,
    name: str | None = None,
    name_contains: str | None = None,
    reference_dataset_id: ID_TYPE | None = None,
    reference_dataset_name: str | None = None,
    reference_free: bool | None = None,
    include_stats: bool | None = None,
    dataset_version: str | None = None,
    limit: int | None = None,
    metadata: dict[str, Any] | None = None,
) -> Iterator[TracerSessionResult]

List projects from the LangSmith API.

PARAMETER	DESCRIPTION
`project_ids`	A list of project IDs to filter by, by default None TYPE: `List[UUID \| str] \| None` DEFAULT: `None`
`name`	The name of the project to filter by, by default None TYPE: `str \| None` DEFAULT: `None`
`name_contains`	A string to search for in the project name, by default None TYPE: `str \| None` DEFAULT: `None`
`reference_dataset_id`	A dataset ID to filter by, by default None TYPE: `List[UUID \| str] \| None` DEFAULT: `None`
`reference_dataset_name`	The name of the reference dataset to filter by, by default None TYPE: `str \| None` DEFAULT: `None`
`reference_free`	Whether to filter for only projects not associated with a dataset. TYPE: `bool \| None` DEFAULT: `None`
`limit`	The maximum number of projects to return, by default None TYPE: `int \| None` DEFAULT: `None`
`metadata`	Metadata to filter by. TYPE: `Dict[str, Any] \| None` DEFAULT: `None`

YIELDS	DESCRIPTION
`TracerSessionResult`	The projects.

RAISES	DESCRIPTION
`ValueError`	If both reference_dataset_id and reference_dataset_name are given.

delete_project ¶

delete_project(
    *, project_name: str | None = None, project_id: str | None = None
) -> None

Delete a project from LangSmith.

PARAMETER	DESCRIPTION
`project_name`	The name of the project to delete. TYPE: `str \| None` DEFAULT: `None`
`project_id`	The ID of the project to delete. TYPE: `str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`None`	None

RAISES	DESCRIPTION
`ValueError`	If neither project_name or project_id is provided.

create_dataset ¶

create_dataset(
    dataset_name: str,
    *,
    description: str | None = None,
    data_type: DataType = kv,
    inputs_schema: dict[str, Any] | None = None,
    outputs_schema: dict[str, Any] | None = None,
    transformations: list[DatasetTransformation] | None = None,
    metadata: dict | None = None,
) -> Dataset

Create a dataset in the LangSmith API.

PARAMETER	DESCRIPTION
`dataset_name`	The name of the dataset. TYPE: `str`
`description`	The description of the dataset. TYPE: `str \| None` DEFAULT: `None`
`data_type`	The data type of the dataset. TYPE: `DataType, default=DataType.kv` DEFAULT: `kv`
`inputs_schema`	The schema definition for the inputs of the dataset. TYPE: `Dict[str, Any] \| None` DEFAULT: `None`
`outputs_schema`	The schema definition for the outputs of the dataset. TYPE: `Dict[str, Any] \| None` DEFAULT: `None`
`transformations`	A list of transformations to apply to the dataset. TYPE: `List[DatasetTransformation] \| None` DEFAULT: `None`
`metadata`	Additional metadata to associate with the dataset. TYPE: `dict \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Dataset`	The created dataset. TYPE: `Dataset`

RAISES	DESCRIPTION
`HTTPError`	If the request to create the dataset fails.

has_dataset ¶

has_dataset(
    *, dataset_name: str | None = None, dataset_id: ID_TYPE | None = None
) -> bool

Check whether a dataset exists in your tenant.

PARAMETER	DESCRIPTION
`dataset_name`	The name of the dataset to check. TYPE: `str \| None` DEFAULT: `None`
`dataset_id`	The ID of the dataset to check. TYPE: `UUID \| str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`bool`	Whether the dataset exists. TYPE: `bool`

read_dataset ¶

read_dataset(
    *, dataset_name: str | None = None, dataset_id: ID_TYPE | None = None
) -> Dataset

Read a dataset from the LangSmith API.

PARAMETER	DESCRIPTION
`dataset_name`	The name of the dataset to read. TYPE: `str \| None` DEFAULT: `None`
`dataset_id`	The ID of the dataset to read. TYPE: `UUID \| str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Dataset`	The dataset. TYPE: `Dataset`

diff_dataset_versions ¶

diff_dataset_versions(
    dataset_id: ID_TYPE | None = None,
    *,
    dataset_name: str | None = None,
    from_version: str | datetime,
    to_version: str | datetime,
) -> DatasetDiffInfo

Get the difference between two versions of a dataset.

PARAMETER	DESCRIPTION
`dataset_id`	The ID of the dataset. TYPE: `UUID \| str \| None` DEFAULT: `None`
`dataset_name`	The name of the dataset. TYPE: `str \| None` DEFAULT: `None`
`from_version`	The starting version for the diff. TYPE: `str \| datetime`
`to_version`	The ending version for the diff. TYPE: `str \| datetime`

RETURNS	DESCRIPTION
`DatasetDiffInfo`	The difference between the two versions of the dataset. TYPE: `DatasetDiffInfo`

Examples:

# Get the difference between two tagged versions of a dataset
from_version = "prod"
to_version = "dev"
diff = client.diff_dataset_versions(
    dataset_name="my-dataset",
    from_version=from_version,
    to_version=to_version,
)

# Get the difference between two timestamped versions of a dataset
from_version = datetime.datetime(2024, 1, 1)
to_version = datetime.datetime(2024, 2, 1)
diff = client.diff_dataset_versions(
    dataset_name="my-dataset",
    from_version=from_version,
    to_version=to_version,
)

read_dataset_openai_finetuning ¶

read_dataset_openai_finetuning(
    dataset_id: ID_TYPE | None = None, *, dataset_name: str | None = None
) -> list

Download a dataset in OpenAI Jsonl format and load it as a list of dicts.

PARAMETER	DESCRIPTION
`dataset_id`	The ID of the dataset to download. TYPE: `UUID \| str \| None` DEFAULT: `None`
`dataset_name`	The name of the dataset to download. TYPE: `str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`list`	list[dict]: The dataset loaded as a list of dicts.

RAISES	DESCRIPTION
`ValueError`	If neither dataset_id nor dataset_name is provided.

list_datasets ¶

list_datasets(
    *,
    dataset_ids: list[ID_TYPE] | None = None,
    data_type: str | None = None,
    dataset_name: str | None = None,
    dataset_name_contains: str | None = None,
    metadata: dict[str, Any] | None = None,
    limit: int | None = None,
) -> Iterator[Dataset]

List the datasets on the LangSmith API.

PARAMETER	DESCRIPTION
`dataset_ids`	A list of dataset IDs to filter the results by. TYPE: `List[UUID \| str] \| None` DEFAULT: `None`
`data_type`	The data type of the datasets to filter the results by. TYPE: `str \| None` DEFAULT: `None`
`dataset_name`	The name of the dataset to filter the results by. TYPE: `str \| None` DEFAULT: `None`
`dataset_name_contains`	A substring to search for in the dataset names. TYPE: `str \| None` DEFAULT: `None`
`metadata`	A dictionary of metadata to filter the results by. TYPE: `Dict[str, Any] \| None` DEFAULT: `None`
`limit`	The maximum number of datasets to return. TYPE: `int \| None` DEFAULT: `None`

YIELDS	DESCRIPTION
`Dataset`	The datasets.

delete_dataset ¶

delete_dataset(
    *, dataset_id: ID_TYPE | None = None, dataset_name: str | None = None
) -> None

Delete a dataset from the LangSmith API.

PARAMETER	DESCRIPTION
`dataset_id`	The ID of the dataset to delete. TYPE: `UUID \| str \| None` DEFAULT: `None`
`dataset_name`	The name of the dataset to delete. TYPE: `str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`None`	None

update_dataset_tag ¶

update_dataset_tag(
    *,
    dataset_id: ID_TYPE | None = None,
    dataset_name: str | None = None,
    as_of: datetime,
    tag: str,
) -> None

Update the tags of a dataset.

If the tag is already assigned to a different version of this dataset, the tag will be moved to the new version. The as_of parameter is used to determine which version of the dataset to apply the new tags to. It must be an exact version of the dataset to succeed. You can use the read_dataset_version method to find the exact version to apply the tags to.

PARAMETER	DESCRIPTION
`dataset_id`	The ID of the dataset to update. TYPE: `UUID \| str \| None` DEFAULT: `None`
`dataset_name`	The name of the dataset to update. TYPE: `str \| None` DEFAULT: `None`
`as_of`	The timestamp of the dataset to apply the new tags to. TYPE: `datetime`
`tag`	The new tag to apply to the dataset. TYPE: `str`

RETURNS	DESCRIPTION
`None`	None

Examples:

dataset_name = "my-dataset"
# Get the version of a dataset <= a given timestamp
dataset_version = client.read_dataset_version(
    dataset_name=dataset_name, as_of=datetime.datetime(2024, 1, 1)
)
# Assign that version a new tag
client.update_dataset_tags(
    dataset_name="my-dataset",
    as_of=dataset_version.as_of,
    tag="prod",
)

list_dataset_versions ¶

list_dataset_versions(
    *,
    dataset_id: ID_TYPE | None = None,
    dataset_name: str | None = None,
    search: str | None = None,
    limit: int | None = None,
) -> Iterator[DatasetVersion]

List dataset versions.

PARAMETER	DESCRIPTION
`dataset_id`	The ID of the dataset. TYPE: `UUID \| str \| None` DEFAULT: `None`
`dataset_name`	The name of the dataset. TYPE: `str \| None` DEFAULT: `None`
`search`	The search query. TYPE: `str \| None` DEFAULT: `None`
`limit`	The maximum number of versions to return. TYPE: `int \| None` DEFAULT: `None`

YIELDS	DESCRIPTION
`DatasetVersion`	The dataset versions.

read_dataset_version ¶

read_dataset_version(
    *,
    dataset_id: ID_TYPE | None = None,
    dataset_name: str | None = None,
    as_of: datetime | None = None,
    tag: str | None = None,
) -> DatasetVersion

Get dataset version by as_of or exact tag.

Ues this to resolve the nearest version to a given timestamp or for a given tag.

PARAMETER	DESCRIPTION
`dataset_id`	The ID of the dataset. TYPE: `ID_TYPE \| None` DEFAULT: `None`
`dataset_name`	The name of the dataset. TYPE: `str \| None` DEFAULT: `None`
`as_of`	The timestamp of the dataset to retrieve. TYPE: `datetime \| None` DEFAULT: `None`
`tag`	The tag of the dataset to retrieve. TYPE: `str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`DatasetVersion`	The dataset version. TYPE: `DatasetVersion`

Examples:

# Get the latest version of a dataset
client.read_dataset_version(dataset_name="my-dataset", tag="latest")

# Get the version of a dataset <= a given timestamp
client.read_dataset_version(
    dataset_name="my-dataset",
    as_of=datetime.datetime(2024, 1, 1),
)


# Get the version of a dataset with a specific tag
client.read_dataset_version(dataset_name="my-dataset", tag="prod")

clone_public_dataset ¶

clone_public_dataset(
    token_or_url: str,
    *,
    source_api_url: str | None = None,
    dataset_name: str | None = None,
) -> Dataset

Clone a public dataset to your own langsmith tenant.

This operation is idempotent. If you already have a dataset with the given name, this function will do nothing.

PARAMETER	DESCRIPTION
`token_or_url`	The token of the public dataset to clone. TYPE: `str`
`source_api_url`	The URL of the langsmith server where the data is hosted. Defaults to the API URL of your current client. TYPE: `str \| None` DEFAULT: `None`
`dataset_name`	The name of the dataset to create in your tenant. Defaults to the name of the public dataset. TYPE: `str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Dataset`	The cloned dataset. TYPE: `Dataset`

create_llm_example ¶

create_llm_example(
    prompt: str,
    generation: str | None = None,
    dataset_id: ID_TYPE | None = None,
    dataset_name: str | None = None,
    created_at: datetime | None = None,
) -> Example

Add an example (row) to an LLM-type dataset.

PARAMETER	DESCRIPTION
`prompt`	The input prompt for the example. TYPE: `str`
`generation`	The output generation for the example. TYPE: `str \| None` DEFAULT: `None`
`dataset_id`	The ID of the dataset. TYPE: `UUID \| str \| None` DEFAULT: `None`
`dataset_name`	The name of the dataset. TYPE: `str \| None` DEFAULT: `None`
`created_at`	The creation timestamp of the example. TYPE: `datetime \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Example`	The created example TYPE: `Example`

create_chat_example ¶

create_chat_example(
    messages: list[Mapping[str, Any] | BaseMessageLike],
    generations: Mapping[str, Any] | BaseMessageLike | None = None,
    dataset_id: ID_TYPE | None = None,
    dataset_name: str | None = None,
    created_at: datetime | None = None,
) -> Example

Add an example (row) to a Chat-type dataset.

PARAMETER	DESCRIPTION
`messages`	The input messages for the example. TYPE: `List[Mapping[str, Any] \| BaseMessageLike]`
`generations`	The output messages for the example. TYPE: `Mapping[str, Any] \| BaseMessageLike \| None` DEFAULT: `None`
`dataset_id`	The ID of the dataset. TYPE: `UUID \| str \| None` DEFAULT: `None`
`dataset_name`	The name of the dataset. TYPE: `str \| None` DEFAULT: `None`
`created_at`	The creation timestamp of the example. TYPE: `datetime \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Example`	The created example TYPE: `Example`

create_example_from_run ¶

create_example_from_run(
    run: Run,
    dataset_id: ID_TYPE | None = None,
    dataset_name: str | None = None,
    created_at: datetime | None = None,
) -> Example

Add an example (row) to a dataset from a run.

PARAMETER	DESCRIPTION
`run`	The run to create an example from. TYPE: `Run`
`dataset_id`	The ID of the dataset. TYPE: `UUID \| str \| None` DEFAULT: `None`
`dataset_name`	The name of the dataset. TYPE: `str \| None` DEFAULT: `None`
`created_at`	The creation timestamp of the example. TYPE: `datetime \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Example`	The created example TYPE: `Example`

update_examples_multipart ¶

update_examples_multipart(
    *,
    dataset_id: ID_TYPE,
    updates: list[ExampleUpdate] | None = None,
    dangerously_allow_filesystem: bool = False,
) -> UpsertExamplesResponse

Update examples using multipart.

.. deprecated:: 0.3.9

Use Client.update_examples instead. Will be removed in 0.4.0.

upload_examples_multipart ¶

upload_examples_multipart(
    *,
    dataset_id: ID_TYPE,
    uploads: list[ExampleCreate] | None = None,
    dangerously_allow_filesystem: bool = False,
) -> UpsertExamplesResponse

Upload examples using multipart.

.. deprecated:: 0.3.9

Use Client.create_examples instead. Will be removed in 0.4.0.

upsert_examples_multipart ¶

upsert_examples_multipart(
    *,
    upserts: list[ExampleUpsertWithAttachments] | None = None,
    dangerously_allow_filesystem: bool = False,
) -> UpsertExamplesResponse

Upsert examples.

.. deprecated:: 0.3.9

Use Client.create_examples and Client.update_examples instead. Will be
removed in 0.4.0.

create_examples ¶

create_examples(
    *,
    dataset_name: str | None = None,
    dataset_id: ID_TYPE | None = None,
    examples: Sequence[ExampleCreate | dict] | None = None,
    dangerously_allow_filesystem: bool = False,
    max_concurrency: Annotated[int, Field(ge=1, le=3)] = 1,
    **kwargs: Any,
) -> UpsertExamplesResponse | dict[str, Any]

Create examples in a dataset.

PARAMETER	DESCRIPTION
`dataset_name`	The name of the dataset to create the examples in. Must specify exactly one of dataset_name or dataset_id. TYPE: `str \| None` DEFAULT: `None`
`dataset_id`	The ID of the dataset to create the examples in. Must specify exactly one of dataset_name or dataset_id TYPE: `UUID \| str \| None` DEFAULT: `None`
`examples`	The examples to create. TYPE: `Sequence[ExampleCreate \| dict]` DEFAULT: `None`
`dangerously_allow_filesystem`	Whether to allow uploading files from the filesystem. TYPE: `bool` DEFAULT: `False`
`**kwargs`	Legacy keyword args. Should not be specified if 'examples' is specified. inputs (Sequence[Mapping[str, Any]]): The input values for the examples. outputs (Optional[Sequence[Optional[Mapping[str, Any]]]]): The output values for the examples. metadata (Optional[Sequence[Optional[Mapping[str, Any]]]]): The metadata for the examples. splits (Optional[Sequence[Optional[str \| List[str]]]]): The splits for the examples, which are divisions of your dataset such as 'train', 'test', or 'validation'. source_run_ids (Optional[Sequence[Optional[Union[UUID, str]]]]): The IDs of the source runs associated with the examples. ids (Optional[Sequence[Union[UUID, str]]]): The IDs of the examples. TYPE: `Any` DEFAULT: `{}`

RAISES	DESCRIPTION
`ValueError`	If 'examples' and legacy args are both provided.

RETURNS	DESCRIPTION
`UpsertExamplesResponse \| dict[str, Any]`	The LangSmith JSON response. Includes 'count' and 'example_ids'.

Behavior changed in langsmith 0.3.11

Updated to take argument 'examples', a single list where each element is the full example to create. This should be used instead of the legacy 'inputs', 'outputs', etc. arguments which split each examples attributes across arguments.

Updated to support creating examples with attachments.

Example

from langsmith import Client

client = Client()

dataset = client.create_dataset("agent-qa")

examples = [
    {
        "inputs": {"question": "what's an agent"},
        "outputs": {"answer": "an agent is..."},
        "metadata": {"difficulty": "easy"},
    },
    {
        "inputs": {
            "question": "can you explain the agent architecture in this diagram?"
        },
        "outputs": {"answer": "this diagram shows..."},
        "attachments": {"diagram": {"mime_type": "image/png", "data": b"..."}},
        "metadata": {"difficulty": "medium"},
    },
    # more examples...
]

response = client.create_examples(dataset_name="agent-qa", examples=examples)
# -> {"example_ids": [...

create_example ¶

create_example(
    inputs: Mapping[str, Any] | None = None,
    dataset_id: ID_TYPE | None = None,
    dataset_name: str | None = None,
    created_at: datetime | None = None,
    outputs: Mapping[str, Any] | None = None,
    metadata: Mapping[str, Any] | None = None,
    split: str | list[str] | None = None,
    example_id: ID_TYPE | None = None,
    source_run_id: ID_TYPE | None = None,
    use_source_run_io: bool = False,
    use_source_run_attachments: list[str] | None = None,
    attachments: Attachments | None = None,
) -> Example

Create a dataset example in the LangSmith API.

Examples are rows in a dataset, containing the inputs and expected outputs (or other reference information) for a model or chain.

PARAMETER	DESCRIPTION
`inputs`	The input values for the example. TYPE: `Mapping[str, Any]` DEFAULT: `None`
`dataset_id`	The ID of the dataset to create the example in. TYPE: `UUID \| str \| None` DEFAULT: `None`
`dataset_name`	The name of the dataset to create the example in. TYPE: `str \| None` DEFAULT: `None`
`created_at`	The creation timestamp of the example. TYPE: `datetime \| None` DEFAULT: `None`
`outputs`	The output values for the example. TYPE: `Mapping[str, Any] \| None` DEFAULT: `None`
`metadata`	The metadata for the example. TYPE: `Mapping[str, Any] \| None` DEFAULT: `None`
`split`	The splits for the example, which are divisions of your dataset such as 'train', 'test', or 'validation'. TYPE: `str \| List[str] \| None` DEFAULT: `None`
`example_id`	The ID of the example to create. If not provided, a new example will be created. TYPE: `UUID \| str \| None` DEFAULT: `None`
`source_run_id`	The ID of the source run associated with this example. TYPE: `UUID \| str \| None` DEFAULT: `None`
`use_source_run_io`	Whether to use the inputs, outputs, and attachments from the source run. TYPE: `bool` DEFAULT: `False`
`use_source_run_attachments`	Which attachments to use from the source run. If use_source_run_io is True, all attachments will be used regardless of this param. TYPE: `List[str] \| None` DEFAULT: `None`
`attachments`	The attachments for the example. TYPE: `Attachments \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Example`	The created example. TYPE: `Example`

read_example ¶

read_example(example_id: ID_TYPE, *, as_of: datetime | None = None) -> Example

Read an example from the LangSmith API.

PARAMETER	DESCRIPTION
`example_id`	The ID of the example to read. TYPE: `UUID \| str`
`as_of`	The dataset version tag OR timestamp to retrieve the example as of. Response examples will only be those that were present at the time of the tagged (or timestamped) version. TYPE: `datetime \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Example`	The example. TYPE: `Example`

list_examples ¶

list_examples(
    dataset_id: ID_TYPE | None = None,
    dataset_name: str | None = None,
    example_ids: Sequence[ID_TYPE] | None = None,
    as_of: datetime | str | None = None,
    splits: Sequence[str] | None = None,
    inline_s3_urls: bool = True,
    *,
    offset: int = 0,
    limit: int | None = None,
    metadata: dict | None = None,
    filter: str | None = None,
    include_attachments: bool = False,
    **kwargs: Any,
) -> Iterator[Example]

Retrieve the example rows of the specified dataset.

PARAMETER	DESCRIPTION
`dataset_id`	The ID of the dataset to filter by. TYPE: `UUID \| str \| None` DEFAULT: `None`
`dataset_name`	The name of the dataset to filter by. TYPE: `str \| None` DEFAULT: `None`
`example_ids`	The IDs of the examples to filter by. TYPE: `Optional[Sequence[Union[UUID, str]]` DEFAULT: `None`
`as_of`	The dataset version tag OR timestamp to retrieve the examples as of. Response examples will only be those that were present at the time of the tagged (or timestamped) version. TYPE: `datetime \| str \| None` DEFAULT: `None`
`splits`	A list of dataset splits, which are divisions of your dataset such as 'train', 'test', or 'validation'. Returns examples only from the specified splits. TYPE: `Sequence[str] \| None` DEFAULT: `None`
`inline_s3_urls`	Whether to inline S3 URLs. TYPE: `bool, default=True` DEFAULT: `True`
`offset`	The offset to start from. Defaults to 0. TYPE: `int, default=0` DEFAULT: `0`
`limit`	The maximum number of examples to return. TYPE: `int \| None` DEFAULT: `None`
`metadata`	A dictionary of metadata to filter by. TYPE: `dict \| None` DEFAULT: `None`
`filter`	A structured filter string to apply to the examples. TYPE: `str \| None` DEFAULT: `None`
`include_attachments`	Whether to include the attachments in the response. TYPE: `bool, default=False` DEFAULT: `False`
`**kwargs`	Additional keyword arguments are ignored. TYPE: `Any` DEFAULT: `{}`

YIELDS	DESCRIPTION
`Example`	The examples.

Examples:

List all examples for a dataset:

from langsmith import Client

client = Client()

# By Dataset ID
examples = client.list_examples(
    dataset_id="c9ace0d8-a82c-4b6c-13d2-83401d68e9ab"
)
# By Dataset Name
examples = client.list_examples(dataset_name="My Test Dataset")

List examples by id

example_ids = [
    "734fc6a0-c187-4266-9721-90b7a025751a",
    "d6b4c1b9-6160-4d63-9b61-b034c585074f",
    "4d31df4e-f9c3-4a6e-8b6c-65701c2fed13",
]
examples = client.list_examples(example_ids=example_ids)

List examples by metadata

examples = client.list_examples(
    dataset_name=dataset_name, metadata={"foo": "bar"}
)

List examples by structured filter

examples = client.list_examples(
    dataset_name=dataset_name,
    filter='and(not(has(metadata, \'{"foo": "bar"}\')), exists(metadata, "tenant_id"))',
)

index_dataset ¶

index_dataset(*, dataset_id: ID_TYPE, tag: str = 'latest', **kwargs: Any) -> None

Enable dataset indexing. Examples are indexed by their inputs.

This enables searching for similar examples by inputs with client.similar_examples().

PARAMETER	DESCRIPTION
`dataset_id`	The ID of the dataset to index. TYPE: `UUID \| str`
`tag`	The version of the dataset to index. If 'latest' then any updates to the dataset (additions, updates, deletions of examples) will be reflected in the index. TYPE: `str \| None` DEFAULT: `'latest'`
`**kwargs`	Additional keyword arguments to pass as part of request body. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`None`	None

sync_indexed_dataset ¶

sync_indexed_dataset(*, dataset_id: ID_TYPE, **kwargs: Any) -> None

Sync dataset index.

This already happens automatically every 5 minutes, but you can call this to force a sync.

PARAMETER	DESCRIPTION
`dataset_id`	The ID of the dataset to sync. TYPE: `UUID \| str`

RETURNS	DESCRIPTION
`None`	None

similar_examples ¶

similar_examples(
    inputs: dict,
    /,
    *,
    limit: int,
    dataset_id: ID_TYPE,
    filter: str | None = None,
    **kwargs: Any,
) -> list[ExampleSearch]

Retrieve the dataset examples whose inputs best match the current inputs.

Note

Must have few-shot indexing enabled for the dataset. See client.index_dataset().

PARAMETER	DESCRIPTION
`inputs`	The inputs to use as a search query. Must match the dataset input schema. Must be JSON serializable. TYPE: `dict`
`limit`	The maximum number of examples to return. TYPE: `int`
`dataset_id`	The ID of the dataset to search over. TYPE: `UUID \| str`
`filter`	A filter string to apply to the search results. Uses the same syntax as the `filter` parameter in `list_runs()`. Only a subset of operations are supported. For example, you can use `and(eq(metadata.some_tag, 'some_value'), neq(metadata.env, 'dev'))` to filter only examples where some_tag has some_value, and the environment is not dev. TYPE: `str \| None` DEFAULT: `None`
`**kwargs`	Additional keyword arguments to pass as part of request body. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[ExampleSearch]`	list[ExampleSearch]: List of ExampleSearch objects.

Examples:

from langsmith import Client

client = Client()
client.similar_examples(
    {"question": "When would i use the runnable generator"},
    limit=3,
    dataset_id="...",
)

[
    ExampleSearch(
        inputs={
            "question": "How do I cache a Chat model? What caches can I use?"
        },
        outputs={
            "answer": "You can use LangChain's caching layer for Chat Models. This can save you money by reducing the number of API calls you make to the LLM provider, if you're often requesting the same completion multiple times, and speed up your application.\n\nfrom langchain.cache import InMemoryCache\nlangchain.llm_cache = InMemoryCache()\n\n# The first time, it is not yet in cache, so it should take longer\nllm.predict('Tell me a joke')\n\nYou can also use SQLite Cache which uses a SQLite database:\n\nrm .langchain.db\n\nfrom langchain.cache import SQLiteCache\nlangchain.llm_cache = SQLiteCache(database_path=\".langchain.db\")\n\n# The first time, it is not yet in cache, so it should take longer\nllm.predict('Tell me a joke') \n"
        },
        metadata=None,
        id=UUID("b2ddd1c4-dff6-49ae-8544-f48e39053398"),
        dataset_id=UUID("01b6ce0f-bfb6-4f48-bbb8-f19272135d40"),
    ),
    ExampleSearch(
        inputs={"question": "What's a runnable lambda?"},
        outputs={
            "answer": "A runnable lambda is an object that implements LangChain's `Runnable` interface and runs a callbale (i.e., a function). Note the function must accept a single argument."
        },
        metadata=None,
        id=UUID("f94104a7-2434-4ba7-8293-6a283f4860b4"),
        dataset_id=UUID("01b6ce0f-bfb6-4f48-bbb8-f19272135d40"),
    ),
    ExampleSearch(
        inputs={"question": "Show me how to use RecursiveURLLoader"},
        outputs={
            "answer": 'The RecursiveURLLoader comes from the langchain.document_loaders.recursive_url_loader module. Here\'s an example of how to use it:\n\nfrom langchain.document_loaders.recursive_url_loader import RecursiveUrlLoader\n\n# Create an instance of RecursiveUrlLoader with the URL you want to load\nloader = RecursiveUrlLoader(url="https://example.com")\n\n# Load all child links from the URL page\nchild_links = loader.load()\n\n# Print the child links\nfor link in child_links:\n    print(link)\n\nMake sure to replace "https://example.com" with the actual URL you want to load. The load() method returns a list of child links found on the URL page. You can iterate over this list to access each child link.'
        },
        metadata=None,
        id=UUID("0308ea70-a803-4181-a37d-39e95f138f8c"),
        dataset_id=UUID("01b6ce0f-bfb6-4f48-bbb8-f19272135d40"),
    ),
]

update_example ¶

update_example(
    example_id: ID_TYPE,
    *,
    inputs: dict[str, Any] | None = None,
    outputs: Mapping[str, Any] | None = None,
    metadata: dict | None = None,
    split: str | list[str] | None = None,
    dataset_id: ID_TYPE | None = None,
    attachments_operations: AttachmentsOperations | None = None,
    attachments: Attachments | None = None,
) -> dict[str, Any]

Update a specific example.

PARAMETER	DESCRIPTION
`example_id`	The ID of the example to update. TYPE: `UUID \| str`
`inputs`	The input values to update. TYPE: `Dict[str, Any] \| None` DEFAULT: `None`
`outputs`	The output values to update. TYPE: `Mapping[str, Any] \| None` DEFAULT: `None`
`metadata`	The metadata to update. TYPE: `Dict \| None` DEFAULT: `None`
`split`	The dataset split to update, such as 'train', 'test', or 'validation'. TYPE: `str \| List[str] \| None` DEFAULT: `None`
`dataset_id`	The ID of the dataset to update. TYPE: `UUID \| str \| None` DEFAULT: `None`
`attachments_operations`	The attachments operations to perform. TYPE: `AttachmentsOperations \| None` DEFAULT: `None`
`attachments`	The attachments to add to the example. TYPE: `Attachments \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`dict[str, Any]`	Dict[str, Any]: The updated example.

update_examples ¶

update_examples(
    *,
    dataset_name: str | None = None,
    dataset_id: ID_TYPE | None = None,
    updates: Sequence[ExampleUpdate | dict] | None = None,
    dangerously_allow_filesystem: bool = False,
    **kwargs: Any,
) -> dict[str, Any]

Update multiple examples.

Examples are expected to all be part of the same dataset.

PARAMETER	DESCRIPTION
`dataset_name`	The name of the dataset to update. Should specify exactly one of 'dataset_name' or 'dataset_id'. TYPE: `str \| None` DEFAULT: `None`
`dataset_id`	The ID of the dataset to update. Should specify exactly one of 'dataset_name' or 'dataset_id'. TYPE: `UUID \| str \| None` DEFAULT: `None`
`updates`	The example updates. Overwrites any specified fields and does not update any unspecified fields. TYPE: `Sequence[ExampleUpdate \| dict] \| None` DEFAULT: `None`
`dangerously_allow_filesystem`	Whether to allow using filesystem paths as attachments. TYPE: `bool` DEFAULT: `False`
`**kwargs`	Legacy keyword args. Should not be specified if 'updates' is specified. example_ids (Sequence[UUID \| str]): The IDs of the examples to update. inputs (Sequence[dict \| None] \| None): The input values for the examples. outputs (Sequence[dict \| None] \| None): The output values for the examples. metadata (Sequence[dict \| None] \| None): The metadata for the examples. splits (Sequence[str \| list[str] \| None] \| None): The splits for the examples, which are divisions of your dataset such as 'train', 'test', or 'validation'. attachments_operations (Sequence[AttachmentsOperations \| None] \| None): The operations to perform on the attachments. dataset_ids (Sequence[UUID \| str] \| None): The IDs of the datasets to move the examples to. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`dict[str, Any]`	The LangSmith JSON response. Includes 'message', 'count', and 'example_ids'.

Behavior changed in langsmith 0.3.9

Updated to ...

Example

from langsmith import Client

client = Client()

dataset = client.create_dataset("agent-qa")

examples = [
    {
        "inputs": {"question": "what's an agent"},
        "outputs": {"answer": "an agent is..."},
        "metadata": {"difficulty": "easy"},
    },
    {
        "inputs": {
            "question": "can you explain the agent architecture in this diagram?"
        },
        "outputs": {"answer": "this diagram shows..."},
        "attachments": {"diagram": {"mime_type": "image/png", "data": b"..."}},
        "metadata": {"difficulty": "medium"},
    },
    # more examples...
]

response = client.create_examples(dataset_name="agent-qa", examples=examples)
example_ids = response["example_ids"]

updates = [
    {
        "id": example_ids[0],
        "inputs": {"question": "what isn't an agent"},
        "outputs": {"answer": "an agent is not..."},
    },
    {
        "id": example_ids[1],
        "attachments_operations": [
            {"rename": {"diagram": "agent_diagram"}, "retain": []}
        ],
    },
]
response = client.update_examples(dataset_name="agent-qa", updates=updates)
# -> {"example_ids": [...

delete_example ¶

delete_example(example_id: ID_TYPE) -> None

Delete an example by ID.

PARAMETER	DESCRIPTION
`example_id`	The ID of the example to delete. TYPE: `UUID \| str`

RETURNS	DESCRIPTION
`None`	None

delete_examples ¶

delete_examples(example_ids: Sequence[ID_TYPE]) -> None

Delete multiple examples by ID.

Parameters¶

example_ids : Sequence[ID_TYPE] The IDs of the examples to delete.

list_dataset_splits ¶

list_dataset_splits(
    *,
    dataset_id: ID_TYPE | None = None,
    dataset_name: str | None = None,
    as_of: str | datetime | None = None,
) -> list[str]

Get the splits for a dataset.

PARAMETER	DESCRIPTION
`dataset_id`	The ID of the dataset. TYPE: `UUID \| str \| None` DEFAULT: `None`
`dataset_name`	The name of the dataset. TYPE: `str \| None` DEFAULT: `None`
`as_of`	The version of the dataset to retrieve splits for. Can be a timestamp or a string tag. Defaults to "latest". TYPE: `str \| datetime \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`list[str]`	List[str]: The names of this dataset's splits.

update_dataset_splits ¶

update_dataset_splits(
    *,
    dataset_id: ID_TYPE | None = None,
    dataset_name: str | None = None,
    split_name: str,
    example_ids: list[ID_TYPE],
    remove: bool = False,
) -> None

Update the splits for a dataset.

PARAMETER	DESCRIPTION
`dataset_id`	The ID of the dataset to update. TYPE: `UUID \| str \| None` DEFAULT: `None`
`dataset_name`	The name of the dataset to update. TYPE: `str \| None` DEFAULT: `None`
`split_name`	The name of the split to update. TYPE: `str`
`example_ids`	The IDs of the examples to add to or remove from the split. TYPE: `List[UUID \| str]`
`remove`	If True, remove the examples from the split. If False, add the examples to the split. TYPE: `bool \| None` DEFAULT: `False`

RETURNS	DESCRIPTION
`None`	None

evaluate_run ¶

evaluate_run(
    run: Run | RunBase | str | UUID,
    evaluator: RunEvaluator,
    *,
    source_info: dict[str, Any] | None = None,
    reference_example: Example | str | dict | UUID | None = None,
    load_child_runs: bool = False,
) -> EvaluationResult

Evaluate a run.

PARAMETER	DESCRIPTION
`run`	The run to evaluate. TYPE: `Run \| RunBase \| str \| UUID`
`evaluator`	The evaluator to use. TYPE: `RunEvaluator`
`source_info`	Additional information about the source of the evaluation to log as feedback metadata. TYPE: `Dict[str, Any] \| None` DEFAULT: `None`
`reference_example`	The example to use as a reference for the evaluation. If not provided, the run's reference example will be used. TYPE: `Example \| str \| dict \| UUID \| None` DEFAULT: `None`
`load_child_runs`	Whether to load child runs when resolving the run ID. TYPE: `bool, default=False` DEFAULT: `False`

RETURNS	DESCRIPTION
`Feedback`	The feedback object created by the evaluation. TYPE: `EvaluationResult`

aevaluate_run `async` ¶

aevaluate_run(
    run: Run | str | UUID,
    evaluator: RunEvaluator,
    *,
    source_info: dict[str, Any] | None = None,
    reference_example: Example | str | dict | UUID | None = None,
    load_child_runs: bool = False,
) -> EvaluationResult

Evaluate a run asynchronously.

PARAMETER	DESCRIPTION
`run`	The run to evaluate. TYPE: `Run \| str \| UUID`
`evaluator`	The evaluator to use. TYPE: `RunEvaluator`
`source_info`	Additional information about the source of the evaluation to log as feedback metadata. TYPE: `Dict[str, Any] \| None` DEFAULT: `None`
`reference_example`	The example to use as a reference for the evaluation. If not provided, the run's reference example will be used. TYPE: `Example \| str \| dict \| UUID \| None` DEFAULT: `None`
`load_child_runs`	Whether to load child runs when resolving the run ID. TYPE: `bool, default=False` DEFAULT: `False`

RETURNS	DESCRIPTION
`EvaluationResult`	The evaluation result object created by the evaluation. TYPE: `EvaluationResult`

create_feedback ¶

create_feedback(
    run_id: ID_TYPE | None = None,
    key: str = "unnamed",
    *,
    score: float | int | bool | None = None,
    value: str | dict | None = None,
    trace_id: ID_TYPE | None = None,
    correction: dict | None = None,
    comment: str | None = None,
    source_info: dict[str, Any] | None = None,
    feedback_source_type: FeedbackSourceType | str = API,
    source_run_id: ID_TYPE | None = None,
    feedback_id: ID_TYPE | None = None,
    feedback_config: FeedbackConfig | None = None,
    stop_after_attempt: int = 10,
    project_id: ID_TYPE | None = None,
    comparative_experiment_id: ID_TYPE | None = None,
    feedback_group_id: ID_TYPE | None = None,
    extra: dict | None = None,
    error: bool | None = None,
    **kwargs: Any,
) -> Feedback

Create feedback for a run.

Note

To enable feedback to be batch uploaded in the background you must specify trace_id. We highly encourage this for latency-sensitive environments.

PARAMETER	DESCRIPTION
`key`	The name of the feedback metric. TYPE: `str` DEFAULT: `'unnamed'`
`score`	The score to rate this run on the metric or aspect. TYPE: `float \| int \| bool \| None` DEFAULT: `None`
`value`	The display value or non-numeric value for this feedback. TYPE: `float \| int \| bool \| str \| dict \| None` DEFAULT: `None`
`run_id`	The ID of the run to provide feedback for. At least one of run_id, trace_id, or project_id must be specified. TYPE: `UUID \| str \| None` DEFAULT: `None`
`trace_id`	The ID of the trace (i.e. root parent run) of the run to provide feedback for (specified by run_id). If run_id and trace_id are the same, only trace_id needs to be specified. NOTE: trace_id is required feedback ingestion to be batched and backgrounded. TYPE: `UUID \| str \| None` DEFAULT: `None`
`correction`	The proper ground truth for this run. TYPE: `dict \| None` DEFAULT: `None`
`comment`	A comment about this feedback, such as a justification for the score or chain-of-thought trajectory for an LLM judge. TYPE: `str \| None` DEFAULT: `None`
`source_info`	Information about the source of this feedback. TYPE: `Dict[str, Any] \| None` DEFAULT: `None`
`feedback_source_type`	The type of feedback source, such as model (for model-generated feedback) or API. TYPE: `FeedbackSourceType \| str` DEFAULT: `API`
`source_run_id`	The ID of the run that generated this feedback, if a "model" type. TYPE: `UUID \| str \| None` DEFAULT: `None`
`feedback_id`	The ID of the feedback to create. If not provided, a random UUID will be generated. TYPE: `UUID \| str \| None` DEFAULT: `None`
`feedback_config`	The configuration specifying how to interpret feedback with this key. Examples include continuous (with min/max bounds), categorical, or freeform. TYPE: `FeedbackConfig \| None` DEFAULT: `None`
`stop_after_attempt`	The number of times to retry the request before giving up. TYPE: `int, default=10` DEFAULT: `10`
`project_id`	The ID of the project (or experiment) to provide feedback on. This is used for creating summary metrics for experiments. Cannot specify run_id or trace_id if project_id is specified, and vice versa. TYPE: `UUID \| str \| None` DEFAULT: `None`
`comparative_experiment_id`	If this feedback was logged as a part of a comparative experiment, this associates the feedback with that experiment. TYPE: `UUID \| str \| None` DEFAULT: `None`
`feedback_group_id`	When logging preferences, ranking runs, or other comparative feedback, this is used to group feedback together. TYPE: `UUID \| str \| None` DEFAULT: `None`
`extra`	Metadata for the feedback. TYPE: `Dict \| None` DEFAULT: `None`
`**kwargs`	Additional keyword arguments. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`Feedback`	The created feedback object. TYPE: `Feedback`

Example

from langsmith import trace, traceable, Client


@traceable
def foo(x):
    return {"y": x * 2}


@traceable
def bar(y):
    return {"z": y - 1}


client = Client()

inputs = {"x": 1}
with trace(name="foobar", inputs=inputs) as root_run:
    result = foo(**inputs)
    result = bar(**result)
    root_run.outputs = result
    trace_id = root_run.id
    child_runs = root_run.child_runs

# Provide feedback for a trace (a.k.a. a root run)
client.create_feedback(
    key="user_feedback",
    score=1,
    trace_id=trace_id,
)

# Provide feedback for a child run
foo_run_id = [run for run in child_runs if run.name == "foo"][0].id
client.create_feedback(
    key="correctness",
    score=0,
    run_id=foo_run_id,
    # trace_id= is optional but recommended to enable batched and backgrounded
    # feedback ingestion.
    trace_id=trace_id,
)

update_feedback ¶

update_feedback(
    feedback_id: ID_TYPE,
    *,
    score: float | int | bool | None = None,
    value: float | int | bool | str | dict | None = None,
    correction: dict | None = None,
    comment: str | None = None,
) -> None

Update a feedback in the LangSmith API.

PARAMETER	DESCRIPTION
`feedback_id`	The ID of the feedback to update. TYPE: `UUID \| str`
`score`	The score to update the feedback with. TYPE: `float \| int \| bool \| None` DEFAULT: `None`
`value`	The value to update the feedback with. TYPE: `float \| int \| bool \| str \| dict \| None` DEFAULT: `None`
`correction`	The correction to update the feedback with. TYPE: `dict \| None` DEFAULT: `None`
`comment`	The comment to update the feedback with. TYPE: `str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`None`	None

read_feedback ¶

read_feedback(feedback_id: ID_TYPE) -> Feedback

Read a feedback from the LangSmith API.

PARAMETER	DESCRIPTION
`feedback_id`	The ID of the feedback to read. TYPE: `UUID \| str`

RETURNS	DESCRIPTION
`Feedback`	The feedback. TYPE: `Feedback`

list_feedback ¶

list_feedback(
    *,
    run_ids: Sequence[ID_TYPE] | None = None,
    feedback_key: Sequence[str] | None = None,
    feedback_source_type: Sequence[FeedbackSourceType] | None = None,
    limit: int | None = None,
    **kwargs: Any,
) -> Iterator[Feedback]

List the feedback objects on the LangSmith API.

PARAMETER	DESCRIPTION
`run_ids`	The IDs of the runs to filter by. TYPE: `Sequence[UUID \| str] \| None` DEFAULT: `None`
`feedback_key`	The feedback key(s) to filter by. Examples: 'correctness' The query performs a union of all feedback keys. TYPE: `Sequence[str] \| None` DEFAULT: `None`
`feedback_source_type`	The type of feedback source, such as model or API. TYPE: `Sequence[FeedbackSourceType] \| None` DEFAULT: `None`
`limit`	The maximum number of feedback to return. TYPE: `int \| None` DEFAULT: `None`
`**kwargs`	Additional keyword arguments. TYPE: `Any` DEFAULT: `{}`

YIELDS	DESCRIPTION
`Feedback`	The feedback objects.

delete_feedback ¶

delete_feedback(feedback_id: ID_TYPE) -> None

Delete a feedback by ID.

PARAMETER	DESCRIPTION
`feedback_id`	The ID of the feedback to delete. TYPE: `UUID \| str`

RETURNS	DESCRIPTION
`None`	None

create_feedback_from_token ¶

create_feedback_from_token(
    token_or_url: str | UUID,
    score: float | int | bool | None = None,
    *,
    value: float | int | bool | str | dict | None = None,
    correction: dict | None = None,
    comment: str | None = None,
    metadata: dict | None = None,
) -> None

Create feedback from a presigned token or URL.

PARAMETER	DESCRIPTION
`token_or_url`	The token or URL from which to create feedback. TYPE: `str \| UUID`
`score`	The score of the feedback. TYPE: `float \| int \| bool \| None` DEFAULT: `None`
`value`	The value of the feedback. TYPE: `float \| int \| bool \| str \| dict \| None` DEFAULT: `None`
`correction`	The correction of the feedback. TYPE: `dict \| None` DEFAULT: `None`
`comment`	The comment of the feedback. TYPE: `str \| None` DEFAULT: `None`
`metadata`	Additional metadata for the feedback. TYPE: `dict \| None` DEFAULT: `None`

RAISES	DESCRIPTION
`ValueError`	If the source API URL is invalid.

RETURNS	DESCRIPTION
`None`	None

create_presigned_feedback_token ¶

create_presigned_feedback_token(
    run_id: ID_TYPE,
    feedback_key: str,
    *,
    expiration: datetime | timedelta | None = None,
    feedback_config: FeedbackConfig | None = None,
    feedback_id: ID_TYPE | None = None,
) -> FeedbackIngestToken

Create a pre-signed URL to send feedback data to.

This is useful for giving browser-based clients a way to upload feedback data directly to LangSmith without accessing the API key.

PARAMETER	DESCRIPTION
`run_id`	The ID of the run. TYPE: `UUID \| str`
`feedback_key`	The key of the feedback to create. TYPE: `str`
`expiration`	The expiration time of the pre-signed URL. Either a datetime or a timedelta offset from now. Default to 3 hours. TYPE: `datetime \| timedelta \| None` DEFAULT: `None`
`feedback_config`	If creating a feedback_key for the first time, this defines how the metric should be interpreted, such as a continuous score (w/ optional bounds), or distribution over categorical values. TYPE: `FeedbackConfig \| None` DEFAULT: `None`
`feedback_id`	The ID of the feedback to create. If not provided, a new feedback will be created. TYPE: `Optional[Union[UUID, str]` DEFAULT: `None`

RETURNS	DESCRIPTION
`FeedbackIngestToken`	The pre-signed URL for uploading feedback data. TYPE: `FeedbackIngestToken`

create_presigned_feedback_tokens ¶

create_presigned_feedback_tokens(
    run_id: ID_TYPE,
    feedback_keys: Sequence[str],
    *,
    expiration: datetime | timedelta | None = None,
    feedback_configs: Sequence[FeedbackConfig | None] | None = None,
) -> Sequence[FeedbackIngestToken]

Create a pre-signed URL to send feedback data to.

This is useful for giving browser-based clients a way to upload feedback data directly to LangSmith without accessing the API key.

PARAMETER	DESCRIPTION
`run_id`	The ID of the run. TYPE: `UUID \| str`
`feedback_keys`	The key of the feedback to create. TYPE: `Sequence[str]`
`expiration`	The expiration time of the pre-signed URL. Either a datetime or a timedelta offset from now. Default to 3 hours. TYPE: `datetime \| timedelta \| None` DEFAULT: `None`
`feedback_configs`	If creating a feedback_key for the first time, this defines how the metric should be interpreted, such as a continuous score (w/ optional bounds), or distribution over categorical values. TYPE: `Sequence[FeedbackConfig \| None] \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Sequence[FeedbackIngestToken]`	Sequence[FeedbackIngestToken]: The pre-signed URL for uploading feedback data.

list_presigned_feedback_tokens ¶

list_presigned_feedback_tokens(
    run_id: ID_TYPE, *, limit: int | None = None
) -> Iterator[FeedbackIngestToken]

List the feedback ingest tokens for a run.

PARAMETER	DESCRIPTION
`run_id`	The ID of the run to filter by. TYPE: `UUID \| str`
`limit`	The maximum number of tokens to return. TYPE: `int \| None` DEFAULT: `None`

YIELDS	DESCRIPTION
`FeedbackIngestToken`	The feedback ingest tokens.

list_feedback_formulas ¶

list_feedback_formulas(
    *,
    dataset_id: ID_TYPE | None = None,
    session_id: ID_TYPE | None = None,
    limit: int | None = None,
    offset: int = 0,
) -> Iterator[FeedbackFormula]

List feedback formulas.

PARAMETER	DESCRIPTION
`dataset_id`	The ID of the dataset to filter by. TYPE: `UUID \| str \| None` DEFAULT: `None`
`session_id`	The ID of the session to filter by. TYPE: `UUID \| str \| None` DEFAULT: `None`
`limit`	The maximum number of feedback formulas to return. TYPE: `int \| None` DEFAULT: `None`
`offset`	The starting offset for pagination. TYPE: `int` DEFAULT: `0`

YIELDS	DESCRIPTION
`FeedbackFormula`	The feedback formulas.

get_feedback_formula_by_id ¶

get_feedback_formula_by_id(feedback_formula_id: ID_TYPE) -> FeedbackFormula

Get a feedback formula by ID.

PARAMETER	DESCRIPTION
`feedback_formula_id`	The ID of the feedback formula to retrieve. TYPE: `UUID \| str`

RETURNS	DESCRIPTION
`FeedbackFormula`	The requested feedback formula.

create_feedback_formula ¶

create_feedback_formula(
    *,
    feedback_key: str,
    aggregation_type: Literal["sum", "avg"],
    formula_parts: Sequence[FeedbackFormulaWeightedVariable | dict],
    dataset_id: ID_TYPE | None = None,
    session_id: ID_TYPE | None = None,
) -> FeedbackFormula

Create a feedback formula.

PARAMETER	DESCRIPTION
`feedback_key`	The feedback key for the formula. TYPE: `str`
`aggregation_type`	The aggregation type to use when combining parts. TYPE: `Literal['sum', 'avg']`
`formula_parts`	The weighted feedback keys included in the formula. TYPE: `Sequence[FeedbackFormulaWeightedVariable \| dict]`
`dataset_id`	The dataset to scope the formula to. TYPE: `UUID \| str \| None` DEFAULT: `None`
`session_id`	The session to scope the formula to. TYPE: `UUID \| str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`FeedbackFormula`	The created feedback formula.

update_feedback_formula ¶

update_feedback_formula(
    feedback_formula_id: ID_TYPE,
    *,
    feedback_key: str,
    aggregation_type: Literal["sum", "avg"],
    formula_parts: Sequence[FeedbackFormulaWeightedVariable | dict],
) -> FeedbackFormula

Update a feedback formula.

PARAMETER	DESCRIPTION
`feedback_formula_id`	The ID of the feedback formula to update. TYPE: `UUID \| str`
`feedback_key`	The feedback key for the formula. TYPE: `str`
`aggregation_type`	The aggregation type to use when combining parts. TYPE: `Literal['sum', 'avg']`
`formula_parts`	The weighted feedback keys included in the formula. TYPE: `Sequence[FeedbackFormulaWeightedVariable \| dict]`

RETURNS	DESCRIPTION
`FeedbackFormula`	The updated feedback formula.

delete_feedback_formula ¶

delete_feedback_formula(feedback_formula_id: ID_TYPE) -> None

Delete a feedback formula by ID.

PARAMETER	DESCRIPTION
`feedback_formula_id`	The ID of the feedback formula to delete. TYPE: `UUID \| str`

list_annotation_queues ¶

list_annotation_queues(
    *,
    queue_ids: list[ID_TYPE] | None = None,
    name: str | None = None,
    name_contains: str | None = None,
    limit: int | None = None,
) -> Iterator[AnnotationQueue]

List the annotation queues on the LangSmith API.

PARAMETER	DESCRIPTION
`queue_ids`	The IDs of the queues to filter by. TYPE: `List[UUID \| str] \| None` DEFAULT: `None`
`name`	The name of the queue to filter by. TYPE: `str \| None` DEFAULT: `None`
`name_contains`	The substring that the queue name should contain. TYPE: `str \| None` DEFAULT: `None`
`limit`	The maximum number of queues to return. TYPE: `int \| None` DEFAULT: `None`

YIELDS	DESCRIPTION
`AnnotationQueue`	The annotation queues.

create_annotation_queue ¶

create_annotation_queue(
    *,
    name: str,
    description: str | None = None,
    queue_id: ID_TYPE | None = None,
    rubric_instructions: str | None = None,
) -> AnnotationQueueWithDetails

Create an annotation queue on the LangSmith API.

PARAMETER	DESCRIPTION
`name`	The name of the annotation queue. TYPE: `str`
`description`	The description of the annotation queue. TYPE: `str \| None` DEFAULT: `None`
`queue_id`	The ID of the annotation queue. TYPE: `UUID \| str \| None` DEFAULT: `None`
`rubric_instructions`	The rubric instructions for the annotation queue. TYPE: `str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`AnnotationQueue`	The created annotation queue object. TYPE: `AnnotationQueueWithDetails`

read_annotation_queue ¶

read_annotation_queue(queue_id: ID_TYPE) -> AnnotationQueue

Read an annotation queue with the specified queue_id.

PARAMETER	DESCRIPTION
`queue_id`	The ID of the annotation queue to read. TYPE: `UUID \| str`

RETURNS	DESCRIPTION
`AnnotationQueue`	The annotation queue object. TYPE: `AnnotationQueue`

update_annotation_queue ¶

update_annotation_queue(
    queue_id: ID_TYPE,
    *,
    name: str,
    description: str | None = None,
    rubric_instructions: str | None = None,
) -> None

Update an annotation queue with the specified queue_id.

PARAMETER	DESCRIPTION
`queue_id`	The ID of the annotation queue to update. TYPE: `UUID \| str`
`name`	The new name for the annotation queue. TYPE: `str`
`description`	The new description for the annotation queue. TYPE: `str \| None` DEFAULT: `None`
`rubric_instructions`	The new rubric instructions for the annotation queue. TYPE: `str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`None`	None

delete_annotation_queue ¶

delete_annotation_queue(queue_id: ID_TYPE) -> None

Delete an annotation queue with the specified queue_id.

PARAMETER	DESCRIPTION
`queue_id`	The ID of the annotation queue to delete. TYPE: `UUID \| str`

RETURNS	DESCRIPTION
`None`	None

add_runs_to_annotation_queue ¶

add_runs_to_annotation_queue(queue_id: ID_TYPE, *, run_ids: list[ID_TYPE]) -> None

Add runs to an annotation queue with the specified queue_id.

PARAMETER	DESCRIPTION
`queue_id`	The ID of the annotation queue. TYPE: `UUID \| str`
`run_ids`	The IDs of the runs to be added to the annotation queue. TYPE: `List[UUID \| str]`

RETURNS	DESCRIPTION
`None`	None

delete_run_from_annotation_queue ¶

delete_run_from_annotation_queue(queue_id: ID_TYPE, *, run_id: ID_TYPE) -> None

Delete a run from an annotation queue with the specified queue_id and run_id.

PARAMETER	DESCRIPTION
`queue_id`	The ID of the annotation queue. TYPE: `UUID \| str`
`run_id`	The ID of the run to be added to the annotation queue. TYPE: `UUID \| str`

RETURNS	DESCRIPTION
`None`	None

get_run_from_annotation_queue ¶

get_run_from_annotation_queue(
    queue_id: ID_TYPE, *, index: int
) -> RunWithAnnotationQueueInfo

Get a run from an annotation queue at the specified index.

PARAMETER	DESCRIPTION
`queue_id`	The ID of the annotation queue. TYPE: `UUID \| str`
`index`	The index of the run to retrieve. TYPE: `int`

RETURNS	DESCRIPTION
`RunWithAnnotationQueueInfo`	The run at the specified index. TYPE: `RunWithAnnotationQueueInfo`

RAISES	DESCRIPTION
`LangSmithNotFoundError`	If the run is not found at the given index.
`LangSmithError`	For other API-related errors.

create_comparative_experiment ¶

create_comparative_experiment(
    name: str,
    experiments: Sequence[ID_TYPE],
    *,
    reference_dataset: ID_TYPE | None = None,
    description: str | None = None,
    created_at: datetime | None = None,
    metadata: dict[str, Any] | None = None,
    id: ID_TYPE | None = None,
) -> ComparativeExperiment

Create a comparative experiment on the LangSmith API.

These experiments compare 2 or more experiment results over a shared dataset.

PARAMETER	DESCRIPTION
`name`	The name of the comparative experiment. TYPE: `str`
`experiments`	The IDs of the experiments to compare. TYPE: `Sequence[UUID \| str]`
`reference_dataset`	The ID of the dataset these experiments are compared on. TYPE: `UUID \| str \| None` DEFAULT: `None`
`description`	The description of the comparative experiment. TYPE: `str \| None` DEFAULT: `None`
`created_at`	The creation time of the comparative experiment. TYPE: `datetime \| None` DEFAULT: `None`
`metadata`	Additional metadata for the comparative experiment. TYPE: `Dict[str, Any] \| None` DEFAULT: `None`
`id`	The ID of the comparative experiment. TYPE: `UUID \| str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`ComparativeExperiment`	The created comparative experiment object. TYPE: `ComparativeExperiment`

arun_on_dataset `async` ¶

arun_on_dataset(
    dataset_name: str,
    llm_or_chain_factory: Any,
    *,
    evaluation: Any | None = None,
    concurrency_level: int = 5,
    project_name: str | None = None,
    project_metadata: dict[str, Any] | None = None,
    dataset_version: datetime | str | None = None,
    verbose: bool = False,
    input_mapper: Callable[[dict], Any] | None = None,
    revision_id: str | None = None,
    **kwargs: Any,
) -> dict[str, Any]

Asynchronously run the Chain or language model on a dataset.

.. deprecated:: 0.1.0

This method is deprecated. Use :func:langsmith.aevaluate instead.

run_on_dataset ¶

run_on_dataset(
    dataset_name: str,
    llm_or_chain_factory: Any,
    *,
    evaluation: Any | None = None,
    concurrency_level: int = 5,
    project_name: str | None = None,
    project_metadata: dict[str, Any] | None = None,
    dataset_version: datetime | str | None = None,
    verbose: bool = False,
    input_mapper: Callable[[dict], Any] | None = None,
    revision_id: str | None = None,
    **kwargs: Any,
) -> dict[str, Any]

Run the Chain or language model on a dataset.

.. deprecated:: 0.1.0

This method is deprecated. Use :func:langsmith.aevaluate instead.

like_prompt ¶

like_prompt(prompt_identifier: str) -> dict[str, int]

Like a prompt.

PARAMETER	DESCRIPTION
`prompt_identifier`	The identifier of the prompt. TYPE: `str`

RETURNS	DESCRIPTION
`dict[str, int]`	Dict[str, int]: A dictionary with the key 'likes' and the count of likes as the value.

unlike_prompt ¶

unlike_prompt(prompt_identifier: str) -> dict[str, int]

Unlike a prompt.

PARAMETER	DESCRIPTION
`prompt_identifier`	The identifier of the prompt. TYPE: `str`

RETURNS	DESCRIPTION
`dict[str, int]`	Dict[str, int]: A dictionary with the key 'likes' and the count of likes as the value.

list_prompts ¶

list_prompts(
    *,
    limit: int = 100,
    offset: int = 0,
    is_public: bool | None = None,
    is_archived: bool | None = False,
    sort_field: PromptSortField = updated_at,
    sort_direction: Literal["desc", "asc"] = "desc",
    query: str | None = None,
) -> ListPromptsResponse

List prompts with pagination.

PARAMETER	DESCRIPTION
`limit`	The maximum number of prompts to return. Defaults to 100. TYPE: `int, default=100` DEFAULT: `100`
`offset`	The number of prompts to skip. Defaults to 0. TYPE: `int, default=0` DEFAULT: `0`
`is_public`	Filter prompts by if they are public. TYPE: `bool \| None` DEFAULT: `None`
`is_archived`	Filter prompts by if they are archived. TYPE: `bool \| None` DEFAULT: `False`
`sort_field`	The field to sort by. Defaults to "updated_at". TYPE: `PromptSortField` DEFAULT: `updated_at`
`sort_direction`	The order to sort by. Defaults to "desc". TYPE: `Literal["desc", "asc"], default="desc"` DEFAULT: `'desc'`
`query`	Filter prompts by a search query. TYPE: `str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`ListPromptsResponse`	A response object containing TYPE: `ListPromptsResponse`
`ListPromptsResponse`	the list of prompts.

get_prompt ¶

get_prompt(prompt_identifier: str) -> Prompt | None

Get a specific prompt by its identifier.

PARAMETER	DESCRIPTION
`prompt_identifier`	The identifier of the prompt. The identifier should be in the format "prompt_name" or "owner/prompt_name". TYPE: `str`

RETURNS	DESCRIPTION
`Prompt \| None`	Optional[Prompt]: The prompt object.

RAISES	DESCRIPTION
`HTTPError`	If the prompt is not found or another error occurs.

create_prompt ¶

create_prompt(
    prompt_identifier: str,
    *,
    description: str | None = None,
    readme: str | None = None,
    tags: Sequence[str] | None = None,
    is_public: bool = False,
) -> Prompt

Create a new prompt.

Does not attach prompt object, just creates an empty prompt.

PARAMETER	DESCRIPTION
`prompt_identifier`	The identifier of the prompt. The identifier should be in the formatof owner/name:hash, name:hash, owner/name, or name TYPE: `str`
`description`	A description of the prompt. TYPE: `str \| None` DEFAULT: `None`
`readme`	A readme for the prompt. TYPE: `str \| None` DEFAULT: `None`
`tags`	A list of tags for the prompt. TYPE: `Sequence[str] \| None` DEFAULT: `None`
`is_public`	Whether the prompt should be public. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`Prompt`	The created prompt object. TYPE: `Prompt`

RAISES	DESCRIPTION
`ValueError`	If the current tenant is not the owner.
`HTTPError`	If the server request fails.

create_commit ¶

create_commit(
    prompt_identifier: str, object: Any, *, parent_commit_hash: str | None = None
) -> str

Create a commit for an existing prompt.

PARAMETER	DESCRIPTION
`prompt_identifier`	The identifier of the prompt. TYPE: `str`
`object`	The LangChain object to commit. TYPE: `Any`
`parent_commit_hash`	The hash of the parent commit. Defaults to latest commit. TYPE: `str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`str`	The url of the prompt commit. TYPE: `str`

RAISES	DESCRIPTION
`HTTPError`	If the server request fails.
`ValueError`	If the prompt does not exist.

update_prompt ¶

update_prompt(
    prompt_identifier: str,
    *,
    description: str | None = None,
    readme: str | None = None,
    tags: Sequence[str] | None = None,
    is_public: bool | None = None,
    is_archived: bool | None = None,
) -> dict[str, Any]

Update a prompt's metadata.

To update the content of a prompt, use push_prompt or create_commit instead.

PARAMETER	DESCRIPTION
`prompt_identifier`	The identifier of the prompt to update. TYPE: `str`
`description`	New description for the prompt. TYPE: `str \| None` DEFAULT: `None`
`readme`	New readme for the prompt. TYPE: `str \| None` DEFAULT: `None`
`tags`	New list of tags for the prompt. TYPE: `Sequence[str] \| None` DEFAULT: `None`
`is_public`	New public status for the prompt. TYPE: `bool \| None` DEFAULT: `None`
`is_archived`	New archived status for the prompt. TYPE: `bool \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`dict[str, Any]`	Dict[str, Any]: The updated prompt data as returned by the server.

RAISES	DESCRIPTION
`ValueError`	If the prompt_identifier is empty.
`HTTPError`	If the server request fails.

delete_prompt ¶

delete_prompt(prompt_identifier: str) -> None

Delete a prompt.

PARAMETER	DESCRIPTION
`prompt_identifier`	The identifier of the prompt to delete. TYPE: `str`

RETURNS	DESCRIPTION
`bool`	True if the prompt was successfully deleted, False otherwise. TYPE: `None`

RAISES	DESCRIPTION
`ValueError`	If the current tenant is not the owner of the prompt.

pull_prompt_commit ¶

pull_prompt_commit(
    prompt_identifier: str, *, include_model: bool | None = False
) -> PromptCommit

Pull a prompt object from the LangSmith API.

PARAMETER	DESCRIPTION
`prompt_identifier`	The identifier of the prompt. TYPE: `str`

RETURNS	DESCRIPTION
`PromptCommit`	The prompt object. TYPE: `PromptCommit`

RAISES	DESCRIPTION
`ValueError`	If no commits are found for the prompt.

list_prompt_commits ¶

list_prompt_commits(
    prompt_identifier: str,
    *,
    limit: int | None = None,
    offset: int = 0,
    include_model: bool = False,
) -> Iterator[ListedPromptCommit]

List commits for a given prompt.

PARAMETER	DESCRIPTION
`prompt_identifier`	The identifier of the prompt in the format 'owner/repo_name'. TYPE: `str`
`limit`	The maximum number of commits to return. If None, returns all commits. TYPE: `int \| None` DEFAULT: `None`
`offset`	The number of commits to skip before starting to return results. TYPE: `int, default=0` DEFAULT: `0`
`include_model`	Whether to include the model information in the commit data. TYPE: `bool, default=False` DEFAULT: `False`

YIELDS	DESCRIPTION
`ListedPromptCommit`	A ListedPromptCommit object for each commit.

Note

This method uses pagination to retrieve commits. It will make multiple API calls if necessary to retrieve all commits or up to the specified limit.

pull_prompt ¶

pull_prompt(prompt_identifier: str, *, include_model: bool | None = False) -> Any

Pull a prompt and return it as a LangChain PromptTemplate.

This method requires langchain-core.

PARAMETER	DESCRIPTION
`prompt_identifier`	The identifier of the prompt. TYPE: `str`
`include_model`	Whether to include the model information in the prompt data. TYPE: `bool \| None` DEFAULT: `False`

RETURNS	DESCRIPTION
`Any`	The prompt object in the specified format. TYPE: `Any`

push_prompt ¶

push_prompt(
    prompt_identifier: str,
    *,
    object: Any | None = None,
    parent_commit_hash: str = "latest",
    is_public: bool | None = None,
    description: str | None = None,
    readme: str | None = None,
    tags: Sequence[str] | None = None,
) -> str

Push a prompt to the LangSmith API.

Can be used to update prompt metadata or prompt content.

If the prompt does not exist, it will be created. If the prompt exists, it will be updated.

PARAMETER	DESCRIPTION
`prompt_identifier`	The identifier of the prompt. TYPE: `str`
`object`	The LangChain object to push. TYPE: `Any \| None` DEFAULT: `None`
`parent_commit_hash`	The parent commit hash. Defaults to "latest". TYPE: `str` DEFAULT: `'latest'`
`is_public`	Whether the prompt should be public. If None (default), the current visibility status is maintained for existing prompts. For new prompts, None defaults to private. Set to True to make public, or False to make private. TYPE: `bool \| None` DEFAULT: `None`
`description`	A description of the prompt. Defaults to an empty string. TYPE: `str \| None` DEFAULT: `None`
`readme`	A readme for the prompt. Defaults to an empty string. TYPE: `str \| None` DEFAULT: `None`
`tags`	A list of tags for the prompt. Defaults to an empty list. TYPE: `Sequence[str] \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`str`	The URL of the prompt. TYPE: `str`

cleanup ¶

cleanup() -> None

Manually trigger cleanup of the background thread.

evaluate ¶

evaluate(
    target: TARGET_T | Runnable | EXPERIMENT_T | tuple[EXPERIMENT_T, EXPERIMENT_T],
    /,
    data: DATA_T | None = None,
    evaluators: Sequence[EVALUATOR_T] | Sequence[COMPARATIVE_EVALUATOR_T] | None = None,
    summary_evaluators: Sequence[SUMMARY_EVALUATOR_T] | None = None,
    metadata: dict | None = None,
    experiment_prefix: str | None = None,
    description: str | None = None,
    max_concurrency: int | None = 0,
    num_repetitions: int = 1,
    blocking: bool = True,
    experiment: EXPERIMENT_T | None = None,
    upload_results: bool = True,
    error_handling: Literal["log", "ignore"] = "log",
    **kwargs: Any,
) -> ExperimentResults | ComparativeExperimentResults

Evaluate a target system on a given dataset.

PARAMETER	DESCRIPTION
`target`	The target system or experiment(s) to evaluate. Can be a function that takes a `dict` and returns a `dict`, a langchain `Runnable`, an existing experiment ID, or a two-tuple of experiment IDs. TYPE: `TARGET_T \| Runnable \| EXPERIMENT_T \| Tuple[EXPERIMENT_T, EXPERIMENT_T]`
`data`	The dataset to evaluate on. Can be a dataset name, a list of examples, or a generator of examples. TYPE: `DATA_T` DEFAULT: `None`
`evaluators`	A list of evaluators to run on each example. The evaluator signature depends on the target type. Default to None. TYPE: `Sequence[EVALUATOR_T] \| Sequence[COMPARATIVE_EVALUATOR_T] \| None` DEFAULT: `None`
`summary_evaluators`	A list of summary evaluators to run on the entire dataset. Should not be specified if comparing two existing experiments. TYPE: `Sequence[SUMMARY_EVALUATOR_T] \| None` DEFAULT: `None`
`metadata`	Metadata to attach to the experiment. TYPE: `dict \| None` DEFAULT: `None`
`experiment_prefix`	A prefix to provide for your experiment name. TYPE: `str \| None` DEFAULT: `None`
`description`	A free-form text description for the experiment. TYPE: `str \| None` DEFAULT: `None`
`max_concurrency`	The maximum number of concurrent evaluations to run. If `None` then no limit is set. If `0` then no concurrency. TYPE: `Optional[int], default=0` DEFAULT: `0`
`blocking`	Whether to block until the evaluation is complete. TYPE: `bool, default=True` DEFAULT: `True`
`num_repetitions`	The number of times to run the evaluation. Each item in the dataset will be run and evaluated this many times. Defaults to 1. TYPE: `int, default=1` DEFAULT: `1`
`experiment`	An existing experiment to extend. If provided, `experiment_prefix` is ignored. For advanced usage only. Should not be specified if target is an existing experiment or two-tuple fo experiments. TYPE: `EXPERIMENT_T \| None` DEFAULT: `None`
`upload_results`	Whether to upload the results to LangSmith. TYPE: `bool, default=True` DEFAULT: `True`
`error_handling`	How to handle individual run errors. `'log'` will trace the runs with the error message as part of the experiment, `'ignore'` will not count the run as part of the experiment at all. TYPE: `str, default="log"` DEFAULT: `'log'`
`**kwargs`	Additional keyword arguments to pass to the evaluator. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`ExperimentResults`	If target is a function, Runnable, or existing experiment. TYPE: `ExperimentResults \| ComparativeExperimentResults`
`ComparativeExperimentResults`	If target is a two-tuple of existing experiments. TYPE: `ExperimentResults \| ComparativeExperimentResults`

Examples:

Prepare the dataset:

from langsmith import Client

client = Client()
dataset = client.clone_public_dataset(
    "https://smith.langchain.com/public/419dcab2-1d66-4b94-8901-0357ead390df/d"
)
dataset_name = "Evaluate Examples"

Basic usage:

def accuracy(outputs: dict, reference_outputs: dict) -> dict:
    # Row-level evaluator for accuracy.
    pred = outputs["response"]
    expected = reference_outputs["answer"]
    return {"score": expected.lower() == pred.lower()}

def precision(outputs: list[dict], reference_outputs: list[dict]) -> dict:
    # Experiment-level evaluator for precision.
    # TP / (TP + FP)
    predictions = [out["response"].lower() for out in outputs]
    expected = [ref["answer"].lower() for ref in reference_outputs]
    # yes and no are the only possible answers
    tp = sum([p == e for p, e in zip(predictions, expected) if p == "yes"])
    fp = sum([p == "yes" and e == "no" for p, e in zip(predictions, expected)])
    return {"score": tp / (tp + fp)}


def predict(inputs: dict) -> dict:
    # This can be any function or just an API call to your app.
    return {"response": "Yes"}


results = client.evaluate(
    predict,
    data=dataset_name,
    evaluators=[accuracy],
    summary_evaluators=[precision],
    experiment_prefix="My Experiment",
    description="Evaluating the accuracy of a simple prediction model.",
    metadata={
        "my-prompt-version": "abcd-1234",
    },
)

Evaluating over only a subset of the examples

experiment_name = results.experiment_name
examples = client.list_examples(dataset_name=dataset_name, limit=5)
results = client.evaluate(
    predict,
    data=examples,
    evaluators=[accuracy],
    summary_evaluators=[precision],
    experiment_prefix="My Experiment",
    description="Just testing a subset synchronously.",
)

Streaming each prediction to more easily + eagerly debug.

results = client.evaluate(
    predict,
    data=dataset_name,
    evaluators=[accuracy],
    summary_evaluators=[precision],
    description="I don't even have to block!",
    blocking=False,
)
for i, result in enumerate(results):  # doctest: +ELLIPSIS
    pass

Using the evaluate API with an off-the-shelf LangChain evaluator:

from langsmith.evaluation import LangChainStringEvaluator
from langchain.chat_models import init_chat_model


def prepare_criteria_data(run: Run, example: Example):
    return {
        "prediction": run.outputs["output"],
        "reference": example.outputs["answer"],
        "input": str(example.inputs),
    }


results = client.evaluate(
    predict,
    data=dataset_name,
    evaluators=[
        accuracy,
        LangChainStringEvaluator("embedding_distance"),
        LangChainStringEvaluator(
            "labeled_criteria",
            config={
                "criteria": {
                    "usefulness": "The prediction is useful if it is correct"
                    " and/or asks a useful followup question."
                },
                "llm": init_chat_model("gpt-4o"),
            },
            prepare_data=prepare_criteria_data,
        ),
    ],
    description="Evaluating with off-the-shelf LangChain evaluators.",
    summary_evaluators=[precision],
)

View the evaluation results for experiment:... Evaluating a LangChain object:

from langchain_core.runnables import chain as as_runnable


@as_runnable
def nested_predict(inputs):
    return {"response": "Yes"}


@as_runnable
def lc_predict(inputs):
    return nested_predict.invoke(inputs)


results = client.evaluate(
    lc_predict,
    data=dataset_name,
    evaluators=[accuracy],
    description="This time we're evaluating a LangChain object.",
    summary_evaluators=[precision],
)

Comparative evaluation:

results = client.evaluate(
    # The target is a tuple of the experiment IDs to compare
    target=(
        "12345678-1234-1234-1234-123456789012",
        "98765432-1234-1234-1234-123456789012",
    ),
    evaluators=[accuracy],
    summary_evaluators=[precision],
)

Evaluate an existing experiment:

results = client.evaluate(
    # The target is the ID of the experiment we are evaluating
    target="12345678-1234-1234-1234-123456789012",
    evaluators=[accuracy],
    summary_evaluators=[precision],
)

Added in langsmith 0.2.0

aevaluate `async` ¶

aevaluate(
    target: ATARGET_T | AsyncIterable[dict] | Runnable | str | UUID | TracerSession,
    /,
    data: DATA_T | AsyncIterable[Example] | Iterable[Example] | None = None,
    evaluators: Sequence[EVALUATOR_T | AEVALUATOR_T] | None = None,
    summary_evaluators: Sequence[SUMMARY_EVALUATOR_T] | None = None,
    metadata: dict | None = None,
    experiment_prefix: str | None = None,
    description: str | None = None,
    max_concurrency: int | None = 0,
    num_repetitions: int = 1,
    blocking: bool = True,
    experiment: TracerSession | str | UUID | None = None,
    upload_results: bool = True,
    error_handling: Literal["log", "ignore"] = "log",
    **kwargs: Any,
) -> AsyncExperimentResults

Evaluate an async target system on a given dataset.

PARAMETER	DESCRIPTION
`target`	The target system or experiment(s) to evaluate. Can be an async function that takes a `dict` and returns a `dict`, a langchain `Runnable`, an existing experiment ID, or a two-tuple of experiment IDs. TYPE: `ATARGET_T \| AsyncIterable[dict] \| Runnable \| str \| UUID \| TracerSession`
`data`	The dataset to evaluate on. Can be a dataset name, a list of examples, an async generator of examples, or an async iterable of examples. TYPE: `DATA_T \| AsyncIterable[Example]` DEFAULT: `None`
`evaluators`	A list of evaluators to run on each example. TYPE: `Sequence[EVALUATOR_T] \| None` DEFAULT: `None`
`summary_evaluators`	A list of summary evaluators to run on the entire dataset. TYPE: `Sequence[SUMMARY_EVALUATOR_T] \| None` DEFAULT: `None`
`metadata`	Metadata to attach to the experiment. TYPE: `dict \| None` DEFAULT: `None`
`experiment_prefix`	A prefix to provide for your experiment name. TYPE: `str \| None` DEFAULT: `None`
`description`	A description of the experiment. TYPE: `str \| None` DEFAULT: `None`
`max_concurrency`	The maximum number of concurrent evaluations to run. If `None` then no limit is set. If `0` then no concurrency. TYPE: `Optional[int], default=0` DEFAULT: `0`
`num_repetitions`	The number of times to run the evaluation. Each item in the dataset will be run and evaluated this many times. Defaults to 1. TYPE: `int, default=1` DEFAULT: `1`
`blocking`	Whether to block until the evaluation is complete. TYPE: `bool, default=True` DEFAULT: `True`
`experiment`	An existing experiment to extend. If provided, `experiment_prefix` is ignored. For advanced usage only. TYPE: `TracerSession \| None` DEFAULT: `None`
`upload_results`	Whether to upload the results to LangSmith. TYPE: `bool, default=True` DEFAULT: `True`
`error_handling`	How to handle individual run errors. `'log'` will trace the runs with the error message as part of the experiment, `'ignore'` will not count the run as part of the experiment at all. TYPE: `str, default="log"` DEFAULT: `'log'`
`**kwargs`	Additional keyword arguments to pass to the evaluator. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`AsyncExperimentResults`	An async iterator over the experiment results.

Environment

LANGSMITH_TEST_CACHE: If set, API calls will be cached to disk to save time and cost during testing.

Recommended to commit the cache files to your repository for faster CI/CD runs.

Requires the 'langsmith[vcr]' package to be installed.

Examples:

Prepare the dataset:

import asyncio
from langsmith import Client

client = Client()
dataset = client.clone_public_dataset(
    "https://smith.langchain.com/public/419dcab2-1d66-4b94-8901-0357ead390df/d"
)
dataset_name = "Evaluate Examples"

Basic usage:

def accuracy(outputs: dict, reference_outputs: dict) -> dict:
    # Row-level evaluator for accuracy.
    pred = outputs["resposen"]
    expected = reference_outputs["answer"]
    return {"score": expected.lower() == pred.lower()}


def precision(outputs: list[dict], reference_outputs: list[dict]) -> dict:
    # Experiment-level evaluator for precision.
    # TP / (TP + FP)
    predictions = [out["response"].lower() for out in outputs]
    expected = [ref["answer"].lower() for ref in reference_outputs]
    # yes and no are the only possible answers
    tp = sum([p == e for p, e in zip(predictions, expected) if p == "yes"])
    fp = sum([p == "yes" and e == "no" for p, e in zip(predictions, expected)])
    return {"score": tp / (tp + fp)}


async def apredict(inputs: dict) -> dict:
    # This can be any async function or just an API call to your app.
    await asyncio.sleep(0.1)
    return {"response": "Yes"}


results = asyncio.run(
    client.aevaluate(
        apredict,
        data=dataset_name,
        evaluators=[accuracy],
        summary_evaluators=[precision],
        experiment_prefix="My Experiment",
        description="Evaluate the accuracy of the model asynchronously.",
        metadata={
            "my-prompt-version": "abcd-1234",
        },
    )
)

Evaluating over only a subset of the examples using an async generator:

async def example_generator():
    examples = client.list_examples(dataset_name=dataset_name, limit=5)
    for example in examples:
        yield example


results = asyncio.run(
    client.aevaluate(
        apredict,
        data=example_generator(),
        evaluators=[accuracy],
        summary_evaluators=[precision],
        experiment_prefix="My Subset Experiment",
        description="Evaluate a subset of examples asynchronously.",
    )
)

Streaming each prediction to more easily + eagerly debug.

results = asyncio.run(
    client.aevaluate(
        apredict,
        data=dataset_name,
        evaluators=[accuracy],
        summary_evaluators=[precision],
        experiment_prefix="My Streaming Experiment",
        description="Streaming predictions for debugging.",
        blocking=False,
    )
)


async def aenumerate(iterable):
    async for elem in iterable:
        print(elem)


asyncio.run(aenumerate(results))

Running without concurrency:

results = asyncio.run(
    client.aevaluate(
        apredict,
        data=dataset_name,
        evaluators=[accuracy],
        summary_evaluators=[precision],
        experiment_prefix="My Experiment Without Concurrency",
        description="This was run without concurrency.",
        max_concurrency=0,
    )
)

Using Async evaluators:

async def helpfulness(outputs: dict) -> dict:
    # Row-level evaluator for helpfulness.
    await asyncio.sleep(5)  # Replace with your LLM API call
    return {"score": outputs["output"] == "Yes"}


results = asyncio.run(
    client.aevaluate(
        apredict,
        data=dataset_name,
        evaluators=[helpfulness],
        summary_evaluators=[precision],
        experiment_prefix="My Helpful Experiment",
        description="Applying async evaluators example.",
    )
)

Evaluate an existing experiment:

results = asyncio.run(
    client.aevaluate(
        # The target is the ID of the experiment we are evaluating
        target="419dcab2-1d66-4b94-8901-0357ead390df",
        evaluators=[accuracy, helpfulness],
        summary_evaluators=[precision],
    )
)

Added in langsmith 0.2.0

get_experiment_results ¶

get_experiment_results(
    name: str | None = None,
    project_id: UUID | None = None,
    preview: bool = False,
    comparative_experiment_id: UUID | None = None,
    filters: dict[UUID, list[str]] | None = None,
    limit: int | None = None,
) -> ExperimentResults

Get results for an experiment, including experiment session aggregated stats and experiment runs for each dataset example.

Experiment results may not be available immediately after the experiment is created.

PARAMETER	DESCRIPTION
`name`	The experiment name. TYPE: `str \| None` DEFAULT: `None`
`project_id`	Experiment's tracing project id, also called session_id, can be found in the url of the LS experiment page TYPE: `UUID \| None` DEFAULT: `None`
`preview`	Whether to return lightweight preview data only. When True, fetches inputs_preview/outputs_preview summaries instead of full inputs/outputs from S3 storage. Faster and less bandwidth. TYPE: `bool` DEFAULT: `False`
`comparative_experiment_id`	Optional comparative experiment UUID for pairwise comparison experiment results. TYPE: `UUID \| None` DEFAULT: `None`
`filters`	Optional filters to apply to results TYPE: `dict[UUID, list[str]] \| None` DEFAULT: `None`
`limit`	Maximum number of results to return TYPE: `int \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`ExperimentResults`	ExperimentResults with: - feedback_stats: Combined feedback statistics including session-level feedback - run_stats: Aggregated run statistics (latency, tokens, cost, etc.) - examples_with_runs: Iterator of ExampleWithRuns

RAISES	DESCRIPTION
`ValueError`	If project not found for the given session_id

Example

client = Client()
results = client.get_experiment_results(
    project_id="037ae90f-f297-4926-b93c-37d8abf6899f",
)
for example_with_runs in results["examples_with_runs"]:
    print(example_with_runs.dict())

# Access aggregated experiment statistics
print(f"Total runs: {results['run_stats']['run_count']}")
print(f"Total cost: {results['run_stats']['total_cost']}")
print(f"P50 latency: {results['run_stats']['latency_p50']}")

# Access feedback statistics
print(f"Feedback stats: {results['feedback_stats']}")

generate_insights ¶

generate_insights(
    *,
    chat_histories: list[list[dict]],
    instructions: str = DEFAULT_INSTRUCTIONS,
    name: str | None = None,
    model: Literal["openai", "anthropic"] | None = None,
    openai_api_key: str | None = None,
    anthropic_api_key: str | None = None,
) -> InsightsReport

Generate Insights over your agent chat histories.

Note

Only available to Plus and higher tier LangSmith users.
Insights Agent uses user's model API key. The cost of the report grows linearly with the number of chat histories you upload and the size of each history. For more see insights.
This method will upload your chat histories as traces to LangSmith.
If you pass in a model API key this will be set as a workspace secret meaning it will be usedin for evaluators and the playground.

PARAMETER	DESCRIPTION
`chat_histories`	A list of chat histories. Each chat history should be a list of messages. We recommend formatting these as OpenAI messages with a "role" and "content" key. Max length 1000 items. TYPE: `list[list[dict]]`
`instructions`	Instructions for the Insights agent. Should focus on what your agent does and what types of insights you want to generate. TYPE: `str` DEFAULT: `DEFAULT_INSTRUCTIONS`
`name`	Name for the generated Insights report. TYPE: `str \| None` DEFAULT: `None`
`model`	Whether to use OpenAI or Anthropic models. This will impact the cost of generating the Insights Report. TYPE: `Literal['openai', 'anthropic'] \| None` DEFAULT: `None`
`openai_api_key`	OpenAI API key to use. Only needed if you have not already stored this in LangSmith as a workspace secret. TYPE: `str \| None` DEFAULT: `None`
`anthropic_api_key`	Anthropic API key to use. Only needed if you have not already stored this in LangSmith as a workspace secret. TYPE: `str \| None` DEFAULT: `None`

Example

import os
from langsmith import Client

client = client()

chat_histories = [
    [
        {"role": "user", "content": "how are you"},
        {"role": "assistant", "content": "good!"},
    ],
    [
        {"role": "user", "content": "do you like art"},
        {"role": "assistant", "content": "only Tarkovsky"},
    ],
]

report = client.generate_insights(
    chat_histories=chat_histories,
    name="Conversation Topics",
    instructions="What are the high-level topics of conversations users are having with the assistant?",
    openai_api_key=os.environ["OPENAI_API_KEY"],
)

# client.poll_insights(report=report)

poll_insights ¶

poll_insights(
    *,
    report: InsightsReport | None = None,
    id: str | UUID | None = None,
    project_id: str | UUID | None = None,
    rate: int = 30,
    timeout: int = 30 * 60,
    verbose: bool = False,
) -> InsightsReport

Poll the status of an Insights report.

PARAMETER	DESCRIPTION
`report`	THe InsightsReport. TYPE: `InsightsReport \| None` DEFAULT: `None`
`id`	The Insights report ID. Should only specify if 'report' is not specified. TYPE: `str \| UUID \| None` DEFAULT: `None`
`project_id`	The Tracing project ID. Should only specify if 'report' is not specified. TYPE: `str \| UUID \| None` DEFAULT: `None`

close_session ¶

close_session(session: Session) -> None

Close the session.

PARAMETER	DESCRIPTION
`session`	The session to close. TYPE: `Session`

convert_prompt_to_openai_format ¶

convert_prompt_to_openai_format(
    messages: Any, model_kwargs: dict[str, Any] | None = None
) -> dict

Convert a prompt to OpenAI format.

Requires the langchain_openai package to be installed.

PARAMETER	DESCRIPTION
`messages`	The messages to convert. TYPE: `Any`
`model_kwargs`	Model configuration arguments including `stop` and any other required arguments. TYPE: `Dict[str, Any] \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`dict`	The prompt in OpenAI format. TYPE: `dict`

RAISES	DESCRIPTION
`ImportError`	If the `langchain_openai` package is not installed.
`LangSmithError`	If there is an error during the conversion process.

convert_prompt_to_anthropic_format ¶

convert_prompt_to_anthropic_format(
    messages: Any, model_kwargs: dict[str, Any] | None = None
) -> dict

Convert a prompt to Anthropic format.

Requires the langchain_anthropic package to be installed.

PARAMETER	DESCRIPTION
`messages`	The messages to convert. TYPE: `Any`
`model_kwargs`	Model configuration arguments including `model_name` and `stop`. TYPE: `Dict[str, Any] \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`dict`	The prompt in Anthropic format. TYPE: `dict`

dump_model ¶

dump_model(model) -> dict[str, Any]

Dump model depending on pydantic version.

prep_obj_for_push ¶

prep_obj_for_push(obj: Any) -> Any

Format the object so its Prompt Hub compatible.

Client

client ¶

Client ¶

api_key property writable ¶

workspace_id property writable ¶

info property ¶

__init__ ¶

__repr__ ¶

request_with_retries ¶

upload_dataframe ¶

upload_csv ¶

create_run ¶

batch_ingest_runs ¶

multipart_ingest ¶

update_run ¶

flush_compressed_traces ¶

flush ¶

read_run ¶

list_runs ¶

get_run_stats ¶

get_run_url ¶

share_run ¶

unshare_run ¶

read_run_shared_link ¶

run_is_shared ¶

read_shared_run ¶

list_shared_runs ¶

read_dataset_shared_schema ¶

share_dataset ¶

unshare_dataset ¶

read_shared_dataset ¶

list_shared_examples ¶

list_shared_projects ¶

create_project ¶

update_project ¶

read_project ¶

has_project ¶

get_test_results ¶

list_projects ¶

delete_project ¶

create_dataset ¶

has_dataset ¶

read_dataset ¶

diff_dataset_versions ¶

read_dataset_openai_finetuning ¶

list_datasets ¶

delete_dataset ¶

update_dataset_tag ¶

list_dataset_versions ¶

read_dataset_version ¶

clone_public_dataset ¶

create_llm_example ¶

create_chat_example ¶

create_example_from_run ¶

update_examples_multipart ¶

upload_examples_multipart ¶

upsert_examples_multipart ¶

create_examples ¶

create_example ¶

read_example ¶

list_examples ¶

index_dataset ¶

sync_indexed_dataset ¶

similar_examples ¶

update_example ¶

update_examples ¶

delete_example ¶

delete_examples ¶

Parameters¶

list_dataset_splits ¶

update_dataset_splits ¶

evaluate_run ¶

aevaluate_run async ¶

create_feedback ¶

update_feedback ¶

read_feedback ¶

list_feedback ¶

delete_feedback ¶

create_feedback_from_token ¶

create_presigned_feedback_token ¶

api_key `property` `writable` ¶

workspace_id `property` `writable` ¶

info `property` ¶

init ¶

repr ¶

aevaluate_run `async` ¶

arun_on_dataset `async` ¶

aevaluate `async` ¶