Skip to content

Client

client

Client for interacting with the LangSmith API.

Use the client to customize API keys / workspace connections, SSL certs, etc. for tracing.

Also used to create, read, update, and delete LangSmith resources such as runs (~trace spans), datasets, examples (~records), feedback (~metrics), projects (tracer sessions/groups), etc.

For detailed API documentation, visit the LangSmith docs.

FUNCTION DESCRIPTION
close_session

Close the session.

convert_prompt_to_openai_format

Convert a prompt to OpenAI format.

convert_prompt_to_anthropic_format

Convert a prompt to Anthropic format.

dump_model

Dump model depending on pydantic version.

prep_obj_for_push

Format the object so its Prompt Hub compatible.

Client

Client for interacting with the LangSmith API.

METHOD DESCRIPTION
__init__

Initialize a Client instance.

__repr__

Return a string representation of the instance with a link to the URL.

request_with_retries

Send a request with retries.

upload_dataframe

Upload a dataframe as individual examples to the LangSmith API.

upload_csv

Upload a CSV file to the LangSmith API.

create_run

Persist a run to the LangSmith API.

batch_ingest_runs

Batch ingest/upsert multiple runs in the Langsmith system.

multipart_ingest

Batch ingest/upsert multiple runs in the Langsmith system.

update_run

Update a run in the LangSmith API.

flush_compressed_traces

Force flush the currently buffered compressed runs.

flush

Flush either queue or compressed buffer, depending on mode.

read_run

Read a run from the LangSmith API.

list_runs

List runs from the LangSmith API.

get_run_stats

Get aggregate statistics over queried runs.

get_run_url

Get the URL for a run.

share_run

Get a share link for a run.

unshare_run

Delete share link for a run.

read_run_shared_link

Retrieve the shared link for a specific run.

run_is_shared

Get share state for a run.

read_shared_run

Get shared runs.

list_shared_runs

Get shared runs.

read_dataset_shared_schema

Retrieve the shared schema of a dataset.

share_dataset

Get a share link for a dataset.

unshare_dataset

Delete share link for a dataset.

read_shared_dataset

Get shared datasets.

list_shared_examples

Get shared examples.

list_shared_projects

List shared projects.

create_project

Create a project on the LangSmith API.

update_project

Update a LangSmith project.

read_project

Read a project from the LangSmith API.

has_project

Check if a project exists.

get_test_results

Read the record-level information from an experiment into a Pandas DF.

list_projects

List projects from the LangSmith API.

delete_project

Delete a project from LangSmith.

create_dataset

Create a dataset in the LangSmith API.

has_dataset

Check whether a dataset exists in your tenant.

read_dataset

Read a dataset from the LangSmith API.

diff_dataset_versions

Get the difference between two versions of a dataset.

read_dataset_openai_finetuning

Download a dataset in OpenAI Jsonl format and load it as a list of dicts.

list_datasets

List the datasets on the LangSmith API.

delete_dataset

Delete a dataset from the LangSmith API.

update_dataset_tag

Update the tags of a dataset.

list_dataset_versions

List dataset versions.

read_dataset_version

Get dataset version by as_of or exact tag.

clone_public_dataset

Clone a public dataset to your own langsmith tenant.

create_llm_example

Add an example (row) to an LLM-type dataset.

create_chat_example

Add an example (row) to a Chat-type dataset.

create_example_from_run

Add an example (row) to a dataset from a run.

update_examples_multipart

Update examples using multipart.

upload_examples_multipart

Upload examples using multipart.

upsert_examples_multipart

Upsert examples.

create_examples

Create examples in a dataset.

create_example

Create a dataset example in the LangSmith API.

read_example

Read an example from the LangSmith API.

list_examples

Retrieve the example rows of the specified dataset.

index_dataset

Enable dataset indexing. Examples are indexed by their inputs.

sync_indexed_dataset

Sync dataset index.

similar_examples

Retrieve the dataset examples whose inputs best match the current inputs.

update_example

Update a specific example.

update_examples

Update multiple examples.

delete_example

Delete an example by ID.

delete_examples

Delete multiple examples by ID.

list_dataset_splits

Get the splits for a dataset.

update_dataset_splits

Update the splits for a dataset.

evaluate_run

Evaluate a run.

aevaluate_run

Evaluate a run asynchronously.

create_feedback

Create feedback for a run.

update_feedback

Update a feedback in the LangSmith API.

read_feedback

Read a feedback from the LangSmith API.

list_feedback

List the feedback objects on the LangSmith API.

delete_feedback

Delete a feedback by ID.

create_feedback_from_token

Create feedback from a presigned token or URL.

create_presigned_feedback_token

Create a pre-signed URL to send feedback data to.

create_presigned_feedback_tokens

Create a pre-signed URL to send feedback data to.

list_presigned_feedback_tokens

List the feedback ingest tokens for a run.

list_feedback_formulas

List feedback formulas.

get_feedback_formula_by_id

Get a feedback formula by ID.

create_feedback_formula

Create a feedback formula.

update_feedback_formula

Update a feedback formula.

delete_feedback_formula

Delete a feedback formula by ID.

list_annotation_queues

List the annotation queues on the LangSmith API.

create_annotation_queue

Create an annotation queue on the LangSmith API.

read_annotation_queue

Read an annotation queue with the specified queue_id.

update_annotation_queue

Update an annotation queue with the specified queue_id.

delete_annotation_queue

Delete an annotation queue with the specified queue_id.

add_runs_to_annotation_queue

Add runs to an annotation queue with the specified queue_id.

delete_run_from_annotation_queue

Delete a run from an annotation queue with the specified queue_id and run_id.

get_run_from_annotation_queue

Get a run from an annotation queue at the specified index.

create_comparative_experiment

Create a comparative experiment on the LangSmith API.

arun_on_dataset

Asynchronously run the Chain or language model on a dataset.

run_on_dataset

Run the Chain or language model on a dataset.

like_prompt

Like a prompt.

unlike_prompt

Unlike a prompt.

list_prompts

List prompts with pagination.

get_prompt

Get a specific prompt by its identifier.

create_prompt

Create a new prompt.

create_commit

Create a commit for an existing prompt.

update_prompt

Update a prompt's metadata.

delete_prompt

Delete a prompt.

pull_prompt_commit

Pull a prompt object from the LangSmith API.

list_prompt_commits

List commits for a given prompt.

pull_prompt

Pull a prompt and return it as a LangChain PromptTemplate.

push_prompt

Push a prompt to the LangSmith API.

cleanup

Manually trigger cleanup of the background thread.

evaluate

Evaluate a target system on a given dataset.

aevaluate

Evaluate an async target system on a given dataset.

get_experiment_results

Get results for an experiment, including experiment session aggregated stats and experiment runs for each dataset example.

generate_insights

Generate Insights over your agent chat histories.

poll_insights

Poll the status of an Insights report.

api_key property writable

api_key: str | None

Return the API key used for authentication.

workspace_id property writable

workspace_id: str | None

Return the workspace ID used for API requests.

info property

Get the information about the LangSmith API.

RETURNS DESCRIPTION
LangSmithInfo

The information about the LangSmith API, or None if the API is not available.

__init__

__init__(
    api_url: str | None = None,
    *,
    api_key: str | None = None,
    retry_config: Retry | None = None,
    timeout_ms: int | tuple[int, int] | None = None,
    web_url: str | None = None,
    session: Session | None = None,
    auto_batch_tracing: bool = True,
    anonymizer: Callable[[dict], dict] | None = None,
    hide_inputs: Callable[[dict], dict] | bool | None = None,
    hide_outputs: Callable[[dict], dict] | bool | None = None,
    hide_metadata: Callable[[dict], dict] | bool | None = None,
    process_buffered_run_ops: Callable[[Sequence[dict]], Sequence[dict]] | None = None,
    run_ops_buffer_size: int | None = None,
    run_ops_buffer_timeout_ms: float | None = None,
    info: dict | LangSmithInfo | None = None,
    api_urls: dict[str, str] | None = None,
    otel_tracer_provider: TracerProvider | None = None,
    otel_enabled: bool | None = None,
    tracing_sampling_rate: float | None = None,
    workspace_id: str | None = None,
    max_batch_size_bytes: int | None = None,
    headers: dict[str, str] | None = None,
) -> None

Initialize a Client instance.

PARAMETER DESCRIPTION
api_url

URL for the LangSmith API. Defaults to the LANGCHAIN_ENDPOINT environment variable or https://api.smith.langchain.com if not set.

TYPE: str | None DEFAULT: None

api_key

API key for the LangSmith API. Defaults to the LANGCHAIN_API_KEY environment variable.

TYPE: str | None DEFAULT: None

retry_config

Retry configuration for the HTTPAdapter.

TYPE: Retry | None DEFAULT: None

timeout_ms

Timeout for the HTTPAdapter.

Can also be a 2-tuple of (connect timeout, read timeout) to set them separately.

TYPE: int | Tuple[int, int] | None DEFAULT: None

web_url

URL for the LangSmith web app. Default is auto-inferred from the ENDPOINT.

TYPE: str | None DEFAULT: None

session

The session to use for requests.

If None, a new session will be created.

TYPE: Session | None DEFAULT: None

auto_batch_tracing

Whether to automatically batch tracing.

TYPE: bool, default=True DEFAULT: True

anonymizer

A function applied for masking serialized run inputs and outputs, before sending to the API.

TYPE: Callable[[dict], dict] | None DEFAULT: None

hide_inputs

Whether to hide run inputs when tracing with this client.

If True, hides the entire inputs.

If a function, applied to all run inputs when creating runs.

TYPE: Callable[[dict], dict] | bool | None DEFAULT: None

hide_outputs

Whether to hide run outputs when tracing with this client.

If True, hides the entire outputs.

If a function, applied to all run outputs when creating runs.

TYPE: Callable[[dict], dict] | bool | None DEFAULT: None

hide_metadata

Whether to hide run metadata when tracing with this client.

If True, hides the entire metadata.

If a function, applied to all run metadata when creating runs.

TYPE: Callable[[dict], dict] | bool | None DEFAULT: None

process_buffered_run_ops

A function applied to buffered run operations that allows for modification of the raw run dicts before they are converted to multipart and compressed.

Useful specifically for high throughput tracing where you need to apply a rate-limited API or other costly process to the runs before they are sent to the API.

Note that the buffer will only flush automatically when run_ops_buffer_size is reached or a new run is added to the buffer after run_ops_buffer_timeout_ms has elapsed - it will not flush outside of these conditions unless you manually call client.flush(), so be sure to do this before your code exits.

TYPE: Callable[[Sequence[dict]], Sequence[dict]] | None DEFAULT: None

run_ops_buffer_size

Maximum number of run operations to collect in the buffer before applying process_buffered_run_ops and sending to the API.

Required when process_buffered_run_ops is provided.

TYPE: int | None DEFAULT: None

run_ops_buffer_timeout_ms

Maximum time in milliseconds to wait before flushing the run ops buffer when new runs are added.

Defaults to 5000.

Only used when process_buffered_run_ops is provided.

TYPE: int | None DEFAULT: None

info

The information about the LangSmith API.

If not provided, it will be fetched from the API.

TYPE: dict | LangSmithInfo | None DEFAULT: None

api_urls

A dictionary of write API URLs and their corresponding API keys.

Useful for multi-tenant setups. Data is only read from the first URL in the dictionary. However, ONLY Runs are written (POST and PATCH) to all URLs in the dictionary. Feedback, sessions, datasets, examples, annotation queues and evaluation results are only written to the first.

TYPE: Dict[str, str] | None DEFAULT: None

otel_tracer_provider

Optional tracer provider for OpenTelemetry integration.

If not provided, a LangSmith-specific tracer provider will be used.

TYPE: TracerProvider | None DEFAULT: None

tracing_sampling_rate

The sampling rate for tracing.

If provided, overrides the LANGCHAIN_TRACING_SAMPLING_RATE environment variable.

Should be a float between 0 and 1, where 1 means trace everything and 0 means trace nothing.

TYPE: float | None DEFAULT: None

workspace_id

The workspace ID.

Required for org-scoped API keys.

TYPE: str | None DEFAULT: None

max_batch_size_bytes

The maximum size of a batch of runs in bytes.

If not provided, the default is set by the server.

TYPE: int | None DEFAULT: None

headers

Additional HTTP headers to include in all requests. These headers will be merged with the default headers (User-Agent, Accept, x-api-key, etc.). Custom headers will not override the default required headers.

TYPE: Dict[str, str] | None DEFAULT: None

RAISES DESCRIPTION
LangSmithUserError

If the API key is not provided when using the hosted service.

LangSmithUserError

If both api_url and api_urls are provided.

__repr__

__repr__() -> str

Return a string representation of the instance with a link to the URL.

RETURNS DESCRIPTION
str

The string representation of the instance.

TYPE: str

request_with_retries

request_with_retries(
    method: Literal["GET", "POST", "PUT", "PATCH", "DELETE"],
    pathname: str,
    *,
    request_kwargs: Mapping | None = None,
    stop_after_attempt: int = 1,
    retry_on: Sequence[type[BaseException]] | None = None,
    to_ignore: Sequence[type[BaseException]] | None = None,
    handle_response: Callable[[Response, int], Any] | None = None,
    _context: str = "",
    **kwargs: Any,
) -> Response

Send a request with retries.

PARAMETER DESCRIPTION
method

The HTTP request method.

TYPE: str

pathname

The pathname of the request URL. Will be appended to the API URL.

TYPE: str

request_kwargs

Additional request parameters.

TYPE: Mapping DEFAULT: None

stop_after_attempt

The number of attempts to make.

TYPE: int, default=1 DEFAULT: 1

retry_on

The exceptions to retry on.

In addition to: [LangSmithConnectionError, LangSmithAPIError].

TYPE: Sequence[Type[BaseException]] | None DEFAULT: None

to_ignore

The exceptions to ignore / pass on.

TYPE: Sequence[Type[BaseException]] | None DEFAULT: None

handle_response

A function to handle the response and return whether to continue retrying.

TYPE: Callable[[Response, int], Any] | None DEFAULT: None

_context

The context of the request.

TYPE: str, default="" DEFAULT: ''

**kwargs

Additional keyword arguments to pass to the request.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
Response

The response object.

RAISES DESCRIPTION
LangSmithAPIError

If a server error occurs.

LangSmithUserError

If the request fails.

LangSmithConnectionError

If a connection error occurs.

LangSmithError

If the request fails.

upload_dataframe

upload_dataframe(
    df: DataFrame,
    name: str,
    input_keys: Sequence[str],
    output_keys: Sequence[str],
    *,
    description: str | None = None,
    data_type: DataType | None = kv,
) -> Dataset

Upload a dataframe as individual examples to the LangSmith API.

PARAMETER DESCRIPTION
df

The dataframe to upload.

TYPE: DataFrame

name

The name of the dataset.

TYPE: str

input_keys

The input keys.

TYPE: Sequence[str]

output_keys

The output keys.

TYPE: Sequence[str]

description

The description of the dataset.

TYPE: str | None DEFAULT: None

data_type

The data type of the dataset.

TYPE: DataType | None DEFAULT: kv

RETURNS DESCRIPTION
Dataset

The uploaded dataset.

TYPE: Dataset

RAISES DESCRIPTION
ValueError

If the csv_file is not a str or tuple.

Example
from langsmith import Client
import os
import pandas as pd

client = Client()

df = pd.read_parquet("path/to/your/myfile.parquet")
input_keys = ["column1", "column2"]  # replace with your input column names
output_keys = ["output1", "output2"]  # replace with your output column names

dataset = client.upload_dataframe(
    df=df,
    input_keys=input_keys,
    output_keys=output_keys,
    name="My Parquet Dataset",
    description="Dataset created from a parquet file",
    data_type="kv",  # The default
)

upload_csv

upload_csv(
    csv_file: str | tuple[str, BytesIO],
    input_keys: Sequence[str],
    output_keys: Sequence[str],
    *,
    name: str | None = None,
    description: str | None = None,
    data_type: DataType | None = kv,
) -> Dataset

Upload a CSV file to the LangSmith API.

PARAMETER DESCRIPTION
csv_file

The CSV file to upload.

If a string, it should be the path.

If a tuple, it should be a tuple containing the filename and a BytesIO object.

TYPE: str | Tuple[str, BytesIO]

input_keys

The input keys.

TYPE: Sequence[str]

output_keys

The output keys.

TYPE: Sequence[str]

name

The name of the dataset.

TYPE: str | None DEFAULT: None

description

The description of the dataset.

TYPE: str | None DEFAULT: None

data_type

The data type of the dataset.

TYPE: DataType | None DEFAULT: kv

RETURNS DESCRIPTION
Dataset

The uploaded dataset.

TYPE: Dataset

RAISES DESCRIPTION
ValueError

If the csv_file is not a string or tuple.

Example
from langsmith import Client
import os

client = Client()

csv_file = "path/to/your/myfile.csv"
input_keys = ["column1", "column2"]  # replace with your input column names
output_keys = ["output1", "output2"]  # replace with your output column names

dataset = client.upload_csv(
    csv_file=csv_file,
    input_keys=input_keys,
    output_keys=output_keys,
    name="My CSV Dataset",
    description="Dataset created from a CSV file",
    data_type="kv",  # The default
)

create_run

create_run(
    name: str,
    inputs: dict[str, Any],
    run_type: RUN_TYPE_T,
    *,
    project_name: str | None = None,
    revision_id: str | None = None,
    dangerously_allow_filesystem: bool = False,
    api_key: str | None = None,
    api_url: str | None = None,
    **kwargs: Any,
) -> None

Persist a run to the LangSmith API.

PARAMETER DESCRIPTION
name

The name of the run.

TYPE: str

inputs

The input values for the run.

TYPE: Dict[str, Any]

run_type

The type of the run, such as tool, chain, llm, retriever, embedding, prompt, or parser.

TYPE: str

project_name

The project name of the run.

TYPE: str | None DEFAULT: None

revision_id

The revision ID of the run.

TYPE: UUID | str | None DEFAULT: None

api_key

The API key to use for this specific run.

TYPE: str | None DEFAULT: None

api_url

The API URL to use for this specific run.

TYPE: str | None DEFAULT: None

**kwargs

Additional keyword arguments.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
None

None

RAISES DESCRIPTION
LangSmithUserError

If the API key is not provided when using the hosted service.

Example
from langsmith import Client
import datetime
from uuid import uuid4

client = Client()

run_id = uuid4()
client.create_run(
    id=run_id,
    project_name=project_name,
    name="test_run",
    run_type="llm",
    inputs={"prompt": "hello world"},
    outputs={"generation": "hi there"},
    start_time=datetime.datetime.now(datetime.timezone.utc),
    end_time=datetime.datetime.now(datetime.timezone.utc),
    hide_inputs=True,
    hide_outputs=True,
)

batch_ingest_runs

batch_ingest_runs(
    create: Sequence[Run | RunLikeDict | dict] | None = None,
    update: Sequence[Run | RunLikeDict | dict] | None = None,
    *,
    pre_sampled: bool = False,
) -> None

Batch ingest/upsert multiple runs in the Langsmith system.

PARAMETER DESCRIPTION
create

A sequence of Run objects or equivalent dictionaries representing runs to be created / posted.

TYPE: Sequence[Run | RunLikeDict] | None DEFAULT: None

update

A sequence of Run objects or equivalent dictionaries representing runs that have already been created and should be updated / patched.

TYPE: Sequence[Run | RunLikeDict] | None DEFAULT: None

pre_sampled

Whether the runs have already been subject to sampling, and therefore should not be sampled again.

TYPE: bool, default=False DEFAULT: False

RAISES DESCRIPTION
LangsmithAPIError

If there is an error in the API request.

RETURNS DESCRIPTION
None

None

Note

The run objects MUST contain the dotted_order and trace_id fields to be accepted by the API.

Example
from langsmith import Client
import datetime
from uuid import uuid4

client = Client()
_session = "__test_batch_ingest_runs"
trace_id = uuid4()
trace_id_2 = uuid4()
run_id_2 = uuid4()
current_time = datetime.datetime.now(datetime.timezone.utc).strftime(
    "%Y%m%dT%H%M%S%fZ"
)
later_time = (
    datetime.datetime.now(datetime.timezone.utc) + timedelta(seconds=1)
).strftime("%Y%m%dT%H%M%S%fZ")

runs_to_create = [
    {
        "id": str(trace_id),
        "session_name": _session,
        "name": "run 1",
        "run_type": "chain",
        "dotted_order": f"{current_time}{str(trace_id)}",
        "trace_id": str(trace_id),
        "inputs": {"input1": 1, "input2": 2},
        "outputs": {"output1": 3, "output2": 4},
    },
    {
        "id": str(trace_id_2),
        "session_name": _session,
        "name": "run 3",
        "run_type": "chain",
        "dotted_order": f"{current_time}{str(trace_id_2)}",
        "trace_id": str(trace_id_2),
        "inputs": {"input1": 1, "input2": 2},
        "error": "error",
    },
    {
        "id": str(run_id_2),
        "session_name": _session,
        "name": "run 2",
        "run_type": "chain",
        "dotted_order": f"{current_time}{str(trace_id)}."
        f"{later_time}{str(run_id_2)}",
        "trace_id": str(trace_id),
        "parent_run_id": str(trace_id),
        "inputs": {"input1": 5, "input2": 6},
    },
]
runs_to_update = [
    {
        "id": str(run_id_2),
        "dotted_order": f"{current_time}{str(trace_id)}."
        f"{later_time}{str(run_id_2)}",
        "trace_id": str(trace_id),
        "parent_run_id": str(trace_id),
        "outputs": {"output1": 4, "output2": 5},
    },
]

client.batch_ingest_runs(create=runs_to_create, update=runs_to_update)

multipart_ingest

multipart_ingest(
    create: Sequence[Run | RunLikeDict | dict] | None = None,
    update: Sequence[Run | RunLikeDict | dict] | None = None,
    *,
    pre_sampled: bool = False,
    dangerously_allow_filesystem: bool = False,
) -> None

Batch ingest/upsert multiple runs in the Langsmith system.

PARAMETER DESCRIPTION
create

A sequence of Run objects or equivalent dictionaries representing runs to be created / posted.

TYPE: Sequence[Run | RunLikeDict] | None DEFAULT: None

update

A sequence of Run objects or equivalent dictionaries representing runs that have already been created and should be updated / patched.

TYPE: Sequence[Run | RunLikeDict] | None DEFAULT: None

pre_sampled

Whether the runs have already been subject to sampling, and therefore should not be sampled again.

TYPE: bool, default=False DEFAULT: False

RAISES DESCRIPTION
LangsmithAPIError

If there is an error in the API request.

Note

The run objects MUST contain the dotted_order and trace_id fields to be accepted by the API.

Example
from langsmith import Client
import datetime
from uuid import uuid4

client = Client()
_session = "__test_batch_ingest_runs"
trace_id = uuid4()
trace_id_2 = uuid4()
run_id_2 = uuid4()
current_time = datetime.datetime.now(datetime.timezone.utc).strftime(
    "%Y%m%dT%H%M%S%fZ"
)
later_time = (
    datetime.datetime.now(datetime.timezone.utc) + timedelta(seconds=1)
).strftime("%Y%m%dT%H%M%S%fZ")

runs_to_create = [
    {
        "id": str(trace_id),
        "session_name": _session,
        "name": "run 1",
        "run_type": "chain",
        "dotted_order": f"{current_time}{str(trace_id)}",
        "trace_id": str(trace_id),
        "inputs": {"input1": 1, "input2": 2},
        "outputs": {"output1": 3, "output2": 4},
    },
    {
        "id": str(trace_id_2),
        "session_name": _session,
        "name": "run 3",
        "run_type": "chain",
        "dotted_order": f"{current_time}{str(trace_id_2)}",
        "trace_id": str(trace_id_2),
        "inputs": {"input1": 1, "input2": 2},
        "error": "error",
    },
    {
        "id": str(run_id_2),
        "session_name": _session,
        "name": "run 2",
        "run_type": "chain",
        "dotted_order": f"{current_time}{str(trace_id)}."
        f"{later_time}{str(run_id_2)}",
        "trace_id": str(trace_id),
        "parent_run_id": str(trace_id),
        "inputs": {"input1": 5, "input2": 6},
    },
]
runs_to_update = [
    {
        "id": str(run_id_2),
        "dotted_order": f"{current_time}{str(trace_id)}."
        f"{later_time}{str(run_id_2)}",
        "trace_id": str(trace_id),
        "parent_run_id": str(trace_id),
        "outputs": {"output1": 4, "output2": 5},
    },
]

client.multipart_ingest(create=runs_to_create, update=runs_to_update)

update_run

update_run(
    run_id: ID_TYPE,
    *,
    name: str | None = None,
    run_type: RUN_TYPE_T | None = None,
    start_time: datetime | None = None,
    end_time: datetime | None = None,
    error: str | None = None,
    inputs: dict | None = None,
    outputs: dict | None = None,
    events: Sequence[dict] | None = None,
    extra: dict | None = None,
    tags: list[str] | None = None,
    attachments: Attachments | None = None,
    dangerously_allow_filesystem: bool = False,
    reference_example_id: str | UUID | None = None,
    api_key: str | None = None,
    api_url: str | None = None,
    **kwargs: Any,
) -> None

Update a run in the LangSmith API.

PARAMETER DESCRIPTION
run_id

The ID of the run to update.

TYPE: UUID | str

name

The name of the run.

TYPE: str | None DEFAULT: None

run_type

The type of the run (e.g., llm, chain, tool).

TYPE: str | None DEFAULT: None

start_time

The start time of the run.

TYPE: datetime | None DEFAULT: None

end_time

The end time of the run.

TYPE: datetime | None DEFAULT: None

error

The error message of the run.

TYPE: str | None DEFAULT: None

inputs

The input values for the run.

TYPE: Dict | None DEFAULT: None

outputs

The output values for the run.

TYPE: Dict | None DEFAULT: None

events

The events for the run.

TYPE: Sequence[dict] | None DEFAULT: None

extra

The extra information for the run.

TYPE: Dict | None DEFAULT: None

tags

The tags for the run.

TYPE: List[str] | None DEFAULT: None

attachments

A dictionary of attachments to add to the run. The keys are the attachment names, and the values are Attachment objects containing the data and mime type.

TYPE: Dict[str, Attachment] | None DEFAULT: None

reference_example_id

ID of the example that was the source of the run inputs. Used for runs that were part of an experiment.

TYPE: str | UUID | None DEFAULT: None

api_key

The API key to use for this specific run.

TYPE: str | None DEFAULT: None

api_url

The API URL to use for this specific run.

TYPE: str | None DEFAULT: None

**kwargs

Kwargs are ignored.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
None

None

Examples:

from langsmith import Client
import datetime
from uuid import uuid4

client = Client()
project_name = "__test_update_run"

start_time = datetime.datetime.now()
revision_id = uuid4()
run: dict = dict(
    id=uuid4(),
    name="test_run",
    run_type="llm",
    inputs={"text": "hello world"},
    project_name=project_name,
    api_url=os.getenv("LANGCHAIN_ENDPOINT"),
    start_time=start_time,
    extra={"extra": "extra"},
    revision_id=revision_id,
)
# Create the run
client.create_run(**run)
run["outputs"] = {"output": ["Hi"]}
run["extra"]["foo"] = "bar"
run["name"] = "test_run_updated"
# Update the run
client.update_run(run["id"], **run)

flush_compressed_traces

flush_compressed_traces(attempts: int = 3) -> None

Force flush the currently buffered compressed runs.

flush

flush() -> None

Flush either queue or compressed buffer, depending on mode.

read_run

read_run(run_id: ID_TYPE, load_child_runs: bool = False) -> Run

Read a run from the LangSmith API.

PARAMETER DESCRIPTION
run_id

The ID of the run to read.

TYPE: UUID | str

load_child_runs

Whether to load nested child runs.

TYPE: bool, default=False DEFAULT: False

RETURNS DESCRIPTION
Run

The run read from the LangSmith API.

TYPE: Run

Examples:

from langsmith import Client

# Existing run
run_id = "your-run-id"

client = Client()
stored_run = client.read_run(run_id)

list_runs

list_runs(
    *,
    project_id: ID_TYPE | Sequence[ID_TYPE] | None = None,
    project_name: str | Sequence[str] | None = None,
    run_type: str | None = None,
    trace_id: ID_TYPE | None = None,
    reference_example_id: ID_TYPE | None = None,
    query: str | None = None,
    filter: str | None = None,
    trace_filter: str | None = None,
    tree_filter: str | None = None,
    is_root: bool | None = None,
    parent_run_id: ID_TYPE | None = None,
    start_time: datetime | None = None,
    error: bool | None = None,
    run_ids: Sequence[ID_TYPE] | None = None,
    select: Sequence[str] | None = None,
    limit: int | None = None,
    **kwargs: Any,
) -> Iterator[Run]

List runs from the LangSmith API.

PARAMETER DESCRIPTION
project_id

The ID(s) of the project to filter by.

TYPE: UUID | str, Sequence[UUID | str] | None DEFAULT: None

project_name

The name(s) of the project to filter by.

TYPE: str | Sequence[str] | None DEFAULT: None

run_type

The type of the runs to filter by.

TYPE: str | None DEFAULT: None

trace_id

The ID of the trace to filter by.

TYPE: UUID | str | None DEFAULT: None

reference_example_id

The ID of the reference example to filter by.

TYPE: UUID | str | None DEFAULT: None

query

The query string to filter by.

TYPE: str | None DEFAULT: None

filter

The filter string to filter by.

TYPE: str | None DEFAULT: None

trace_filter

Filter to apply to the ROOT run in the trace tree. This is meant to be used in conjunction with the regular filter parameter to let you filter runs by attributes of the root run within a trace.

TYPE: str | None DEFAULT: None

tree_filter

Filter to apply to OTHER runs in the trace tree, including sibling and child runs. This is meant to be used in conjunction with the regular filter parameter to let you filter runs by attributes of any run within a trace.

TYPE: str | None DEFAULT: None

is_root

Whether to filter by root runs.

TYPE: bool | None DEFAULT: None

parent_run_id

The ID of the parent run to filter by.

TYPE: UUID | str | None DEFAULT: None

start_time

The start time to filter by.

TYPE: datetime | None DEFAULT: None

error

Whether to filter by error status.

TYPE: bool | None DEFAULT: None

run_ids

The IDs of the runs to filter by.

TYPE: Sequence[UUID | str] | None DEFAULT: None

select

The fields to select.

TYPE: Sequence[str] | None DEFAULT: None

limit

The maximum number of runs to return.

TYPE: int | None DEFAULT: None

**kwargs

Additional keyword arguments.

TYPE: Any DEFAULT: {}

YIELDS DESCRIPTION
Run

The runs.

Examples:

# List all runs in a project
project_runs = client.list_runs(project_name="<your_project>")

# List LLM and Chat runs in the last 24 hours
todays_llm_runs = client.list_runs(
    project_name="<your_project>",
    start_time=datetime.now() - timedelta(days=1),
    run_type="llm",
)

# List root traces in a project
root_runs = client.list_runs(project_name="<your_project>", is_root=1)

# List runs without errors
correct_runs = client.list_runs(project_name="<your_project>", error=False)

# List runs and only return their inputs/outputs (to speed up the query)
input_output_runs = client.list_runs(
    project_name="<your_project>", select=["inputs", "outputs"]
)

# List runs by run ID
run_ids = [
    "a36092d2-4ad5-4fb4-9c0d-0dba9a2ed836",
    "9398e6be-964f-4aa4-8ae9-ad78cd4b7074",
]
selected_runs = client.list_runs(id=run_ids)

# List all "chain" type runs that took more than 10 seconds and had
# `total_tokens` greater than 5000
chain_runs = client.list_runs(
    project_name="<your_project>",
    filter='and(eq(run_type, "chain"), gt(latency, 10), gt(total_tokens, 5000))',
)

# List all runs called "extractor" whose root of the trace was assigned feedback "user_score" score of 1
good_extractor_runs = client.list_runs(
    project_name="<your_project>",
    filter='eq(name, "extractor")',
    trace_filter='and(eq(feedback_key, "user_score"), eq(feedback_score, 1))',
)

# List all runs that started after a specific timestamp and either have "error" not equal to null or a "Correctness" feedback score equal to 0
complex_runs = client.list_runs(
    project_name="<your_project>",
    filter='and(gt(start_time, "2023-07-15T12:34:56Z"), or(neq(error, null), and(eq(feedback_key, "Correctness"), eq(feedback_score, 0.0))))',
)

# List all runs where `tags` include "experimental" or "beta" and `latency` is greater than 2 seconds
tagged_runs = client.list_runs(
    project_name="<your_project>",
    filter='and(or(has(tags, "experimental"), has(tags, "beta")), gt(latency, 2))',
)

get_run_stats

get_run_stats(
    *,
    id: list[ID_TYPE] | None = None,
    trace: ID_TYPE | None = None,
    parent_run: ID_TYPE | None = None,
    run_type: str | None = None,
    project_names: list[str] | None = None,
    project_ids: list[ID_TYPE] | None = None,
    reference_example_ids: list[ID_TYPE] | None = None,
    start_time: str | None = None,
    end_time: str | None = None,
    error: bool | None = None,
    query: str | None = None,
    filter: str | None = None,
    trace_filter: str | None = None,
    tree_filter: str | None = None,
    is_root: bool | None = None,
    data_source_type: str | None = None,
) -> dict[str, Any]

Get aggregate statistics over queried runs.

Takes in similar query parameters to list_runs and returns statistics based on the runs that match the query.

PARAMETER DESCRIPTION
id

List of run IDs to filter by.

TYPE: List[UUID | str] | None DEFAULT: None

trace

Trace ID to filter by.

TYPE: UUID | str | None DEFAULT: None

parent_run

Parent run ID to filter by.

TYPE: UUID | str | None DEFAULT: None

run_type

Run type to filter by.

TYPE: str | None DEFAULT: None

project_names

List of project names to filter by.

TYPE: List[str] | None DEFAULT: None

project_ids

List of project IDs to filter by.

TYPE: List[UUID | str] | None DEFAULT: None

reference_example_ids

List of reference example IDs to filter by.

TYPE: List[UUID | str] | None DEFAULT: None

start_time

Start time to filter by.

TYPE: str | None DEFAULT: None

end_time

End time to filter by.

TYPE: str | None DEFAULT: None

error

Filter by error status.

TYPE: bool | None DEFAULT: None

query

Query string to filter by.

TYPE: str | None DEFAULT: None

filter

Filter string to apply.

TYPE: str | None DEFAULT: None

trace_filter

Trace filter string to apply.

TYPE: str | None DEFAULT: None

tree_filter

Tree filter string to apply.

TYPE: str | None DEFAULT: None

is_root

Filter by root run status.

TYPE: bool | None DEFAULT: None

data_source_type

Data source type to filter by.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
dict[str, Any]

Dict[str, Any]: A dictionary containing the run statistics.

get_run_url

get_run_url(
    *, run: RunBase, project_name: str | None = None, project_id: ID_TYPE | None = None
) -> str

Get the URL for a run.

Not recommended for use within your agent runtime. More for use interacting with runs after the fact for data analysis or ETL workloads.

PARAMETER DESCRIPTION
run

The run.

TYPE: RunBase

project_name

The name of the project.

TYPE: str | None DEFAULT: None

project_id

The ID of the project.

TYPE: UUID | str | None DEFAULT: None

RETURNS DESCRIPTION
str

The URL for the run.

TYPE: str

share_run

share_run(run_id: ID_TYPE, *, share_id: ID_TYPE | None = None) -> str

Get a share link for a run.

PARAMETER DESCRIPTION
run_id

The ID of the run to share.

TYPE: UUID | str

share_id

Custom share ID. If not provided, a random UUID will be generated.

TYPE: UUID | str | None DEFAULT: None

RETURNS DESCRIPTION
str

The URL of the shared run.

TYPE: str

unshare_run

unshare_run(run_id: ID_TYPE) -> None

Delete share link for a run.

PARAMETER DESCRIPTION
run_id

The ID of the run to unshare.

TYPE: UUID | str

RETURNS DESCRIPTION
None

None

read_run_shared_link(run_id: ID_TYPE) -> str | None

Retrieve the shared link for a specific run.

PARAMETER DESCRIPTION
run_id

The ID of the run.

TYPE: UUID | str

RETURNS DESCRIPTION
str | None

Optional[str]: The shared link for the run, or None if the link is not

str | None

available.

run_is_shared

run_is_shared(run_id: ID_TYPE) -> bool

Get share state for a run.

PARAMETER DESCRIPTION
run_id

The ID of the run.

TYPE: UUID | str

RETURNS DESCRIPTION
bool

True if the run is shared, False otherwise.

TYPE: bool

read_shared_run

read_shared_run(share_token: ID_TYPE | str, run_id: ID_TYPE | None = None) -> Run

Get shared runs.

PARAMETER DESCRIPTION
share_token

The share token or URL of the shared run.

TYPE: UUID | str

run_id

The ID of the specific run to retrieve. If not provided, the full shared run will be returned.

TYPE: UUID | str | None DEFAULT: None

RETURNS DESCRIPTION
Run

The shared run.

TYPE: Run

list_shared_runs

list_shared_runs(
    share_token: ID_TYPE | str, run_ids: list[str] | None = None
) -> Iterator[Run]

Get shared runs.

PARAMETER DESCRIPTION
share_token

The share token or URL of the shared run.

TYPE: UUID | str

run_ids

A list of run IDs to filter the results by.

TYPE: List[str] | None DEFAULT: None

YIELDS DESCRIPTION
Run

A shared run.

read_dataset_shared_schema

read_dataset_shared_schema(
    dataset_id: ID_TYPE | None = None, *, dataset_name: str | None = None
) -> DatasetShareSchema

Retrieve the shared schema of a dataset.

PARAMETER DESCRIPTION
dataset_id

The ID of the dataset. Either dataset_id or dataset_name must be given.

TYPE: UUID | str | None DEFAULT: None

dataset_name

The name of the dataset. Either dataset_id or dataset_name must be given.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
DatasetShareSchema

ls_schemas.DatasetShareSchema: The shared schema of the dataset.

RAISES DESCRIPTION
ValueError

If neither dataset_id nor dataset_name is given.

share_dataset

share_dataset(
    dataset_id: ID_TYPE | None = None, *, dataset_name: str | None = None
) -> DatasetShareSchema

Get a share link for a dataset.

PARAMETER DESCRIPTION
dataset_id

The ID of the dataset. Either dataset_id or dataset_name must be given.

TYPE: UUID | str | None DEFAULT: None

dataset_name

The name of the dataset. Either dataset_id or dataset_name must be given.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
DatasetShareSchema

ls_schemas.DatasetShareSchema: The shared schema of the dataset.

RAISES DESCRIPTION
ValueError

If neither dataset_id nor dataset_name is given.

unshare_dataset

unshare_dataset(dataset_id: ID_TYPE) -> None

Delete share link for a dataset.

PARAMETER DESCRIPTION
dataset_id

The ID of the dataset to unshare.

TYPE: UUID | str

RETURNS DESCRIPTION
None

None

read_shared_dataset

read_shared_dataset(share_token: str) -> Dataset

Get shared datasets.

PARAMETER DESCRIPTION
share_token

The share token or URL of the shared dataset.

TYPE: UUID | str

RETURNS DESCRIPTION
Dataset

The shared dataset.

TYPE: Dataset

list_shared_examples

list_shared_examples(
    share_token: str,
    *,
    example_ids: list[ID_TYPE] | None = None,
    limit: int | None = None,
) -> Iterator[Example]

Get shared examples.

PARAMETER DESCRIPTION
share_token

The share token or URL of the shared dataset.

TYPE: UUID | str

example_ids

The IDs of the examples to filter by.

TYPE: List[UUID, str] | None DEFAULT: None

limit

Maximum number of examples to return, by default None.

TYPE: int | None DEFAULT: None

RETURNS DESCRIPTION
Iterator[Example]

List[ls_schemas.Example]: The list of shared examples.

list_shared_projects

list_shared_projects(
    *,
    dataset_share_token: str,
    project_ids: list[ID_TYPE] | None = None,
    name: str | None = None,
    name_contains: str | None = None,
    limit: int | None = None,
) -> Iterator[TracerSessionResult]

List shared projects.

PARAMETER DESCRIPTION
dataset_share_token

The share token of the dataset.

TYPE: str

project_ids

List of project IDs to filter the results, by default None.

TYPE: List[UUID | str] | None DEFAULT: None

name

Name of the project to filter the results, by default None.

TYPE: str | None DEFAULT: None

name_contains

Substring to search for in project names, by default None.

TYPE: str | None DEFAULT: None

limit

Maximum number of projects to return, by default None.

TYPE: int | None DEFAULT: None

YIELDS DESCRIPTION
TracerSessionResult

The shared projects.

create_project

create_project(
    project_name: str,
    *,
    description: str | None = None,
    metadata: dict | None = None,
    upsert: bool = False,
    project_extra: dict | None = None,
    reference_dataset_id: ID_TYPE | None = None,
) -> TracerSession

Create a project on the LangSmith API.

PARAMETER DESCRIPTION
project_name

The name of the project.

TYPE: str

project_extra

Additional project information.

TYPE: dict | None DEFAULT: None

metadata

Additional metadata to associate with the project.

TYPE: dict | None DEFAULT: None

description

The description of the project.

TYPE: str | None DEFAULT: None

upsert

Whether to update the project if it already exists.

TYPE: bool, default=False DEFAULT: False

reference_dataset_id

The ID of the reference dataset to associate with the project.

TYPE: Optional[Union[UUID, str] DEFAULT: None

RETURNS DESCRIPTION
TracerSession

The created project.

TYPE: TracerSession

update_project

update_project(
    project_id: ID_TYPE,
    *,
    name: str | None = None,
    description: str | None = None,
    metadata: dict | None = None,
    project_extra: dict | None = None,
    end_time: datetime | None = None,
) -> TracerSession

Update a LangSmith project.

PARAMETER DESCRIPTION
project_id

The ID of the project to update.

TYPE: UUID | str

name

The new name to give the project. This is only valid if the project has been assigned an end_time, meaning it has been completed/closed.

TYPE: str | None DEFAULT: None

description

The new description to give the project.

TYPE: str | None DEFAULT: None

metadata

Additional metadata to associate with the project.

TYPE: dict | None DEFAULT: None

project_extra

Additional project information.

TYPE: dict | None DEFAULT: None

end_time

The time the project was completed.

TYPE: datetime | None DEFAULT: None

RETURNS DESCRIPTION
TracerSession

The updated project.

TYPE: TracerSession

read_project

read_project(
    *,
    project_id: str | None = None,
    project_name: str | None = None,
    include_stats: bool = False,
) -> TracerSessionResult

Read a project from the LangSmith API.

PARAMETER DESCRIPTION
project_id

The ID of the project to read.

TYPE: str | None DEFAULT: None

project_name

The name of the project to read. Only one of project_id or project_name may be given.

TYPE: str | None DEFAULT: None

include_stats

Whether to include a project's aggregate statistics in the response.

TYPE: bool, default=False DEFAULT: False

RETURNS DESCRIPTION
TracerSessionResult

The project.

TYPE: TracerSessionResult

has_project

has_project(project_name: str, *, project_id: str | None = None) -> bool

Check if a project exists.

PARAMETER DESCRIPTION
project_name

The name of the project to check for.

TYPE: str

project_id

The ID of the project to check for.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
bool

Whether the project exists.

TYPE: bool

get_test_results

get_test_results(
    *, project_id: ID_TYPE | None = None, project_name: str | None = None
) -> DataFrame

Read the record-level information from an experiment into a Pandas DF.

Note

This will fetch whatever data exists in the DB. Results are not immediately available in the DB upon evaluation run completion.

Feedback score values will be returned as an average across all runs for the experiment. Non-numeric feedback scores will be omitted.

PARAMETER DESCRIPTION
project_id

The ID of the project.

TYPE: UUID | str | None DEFAULT: None

project_name

The name of the project.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
DataFrame

pd.DataFrame: A dataframe containing the test results.

list_projects

list_projects(
    project_ids: list[ID_TYPE] | None = None,
    name: str | None = None,
    name_contains: str | None = None,
    reference_dataset_id: ID_TYPE | None = None,
    reference_dataset_name: str | None = None,
    reference_free: bool | None = None,
    include_stats: bool | None = None,
    dataset_version: str | None = None,
    limit: int | None = None,
    metadata: dict[str, Any] | None = None,
) -> Iterator[TracerSessionResult]

List projects from the LangSmith API.

PARAMETER DESCRIPTION
project_ids

A list of project IDs to filter by, by default None

TYPE: List[UUID | str] | None DEFAULT: None

name

The name of the project to filter by, by default None

TYPE: str | None DEFAULT: None

name_contains

A string to search for in the project name, by default None

TYPE: str | None DEFAULT: None

reference_dataset_id

A dataset ID to filter by, by default None

TYPE: List[UUID | str] | None DEFAULT: None

reference_dataset_name

The name of the reference dataset to filter by, by default None

TYPE: str | None DEFAULT: None

reference_free

Whether to filter for only projects not associated with a dataset.

TYPE: bool | None DEFAULT: None

limit

The maximum number of projects to return, by default None

TYPE: int | None DEFAULT: None

metadata

Metadata to filter by.

TYPE: Dict[str, Any] | None DEFAULT: None

YIELDS DESCRIPTION
TracerSessionResult

The projects.

RAISES DESCRIPTION
ValueError

If both reference_dataset_id and reference_dataset_name are given.

delete_project

delete_project(
    *, project_name: str | None = None, project_id: str | None = None
) -> None

Delete a project from LangSmith.

PARAMETER DESCRIPTION
project_name

The name of the project to delete.

TYPE: str | None DEFAULT: None

project_id

The ID of the project to delete.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
None

None

RAISES DESCRIPTION
ValueError

If neither project_name or project_id is provided.

create_dataset

create_dataset(
    dataset_name: str,
    *,
    description: str | None = None,
    data_type: DataType = kv,
    inputs_schema: dict[str, Any] | None = None,
    outputs_schema: dict[str, Any] | None = None,
    transformations: list[DatasetTransformation] | None = None,
    metadata: dict | None = None,
) -> Dataset

Create a dataset in the LangSmith API.

PARAMETER DESCRIPTION
dataset_name

The name of the dataset.

TYPE: str

description

The description of the dataset.

TYPE: str | None DEFAULT: None

data_type

The data type of the dataset.

TYPE: DataType, default=DataType.kv DEFAULT: kv

inputs_schema

The schema definition for the inputs of the dataset.

TYPE: Dict[str, Any] | None DEFAULT: None

outputs_schema

The schema definition for the outputs of the dataset.

TYPE: Dict[str, Any] | None DEFAULT: None

transformations

A list of transformations to apply to the dataset.

TYPE: List[DatasetTransformation] | None DEFAULT: None

metadata

Additional metadata to associate with the dataset.

TYPE: dict | None DEFAULT: None

RETURNS DESCRIPTION
Dataset

The created dataset.

TYPE: Dataset

RAISES DESCRIPTION
HTTPError

If the request to create the dataset fails.

has_dataset

has_dataset(
    *, dataset_name: str | None = None, dataset_id: ID_TYPE | None = None
) -> bool

Check whether a dataset exists in your tenant.

PARAMETER DESCRIPTION
dataset_name

The name of the dataset to check.

TYPE: str | None DEFAULT: None

dataset_id

The ID of the dataset to check.

TYPE: UUID | str | None DEFAULT: None

RETURNS DESCRIPTION
bool

Whether the dataset exists.

TYPE: bool

read_dataset

read_dataset(
    *, dataset_name: str | None = None, dataset_id: ID_TYPE | None = None
) -> Dataset

Read a dataset from the LangSmith API.

PARAMETER DESCRIPTION
dataset_name

The name of the dataset to read.

TYPE: str | None DEFAULT: None

dataset_id

The ID of the dataset to read.

TYPE: UUID | str | None DEFAULT: None

RETURNS DESCRIPTION
Dataset

The dataset.

TYPE: Dataset

diff_dataset_versions

diff_dataset_versions(
    dataset_id: ID_TYPE | None = None,
    *,
    dataset_name: str | None = None,
    from_version: str | datetime,
    to_version: str | datetime,
) -> DatasetDiffInfo

Get the difference between two versions of a dataset.

PARAMETER DESCRIPTION
dataset_id

The ID of the dataset.

TYPE: UUID | str | None DEFAULT: None

dataset_name

The name of the dataset.

TYPE: str | None DEFAULT: None

from_version

The starting version for the diff.

TYPE: str | datetime

to_version

The ending version for the diff.

TYPE: str | datetime

RETURNS DESCRIPTION
DatasetDiffInfo

The difference between the two versions of the dataset.

TYPE: DatasetDiffInfo

Examples:

# Get the difference between two tagged versions of a dataset
from_version = "prod"
to_version = "dev"
diff = client.diff_dataset_versions(
    dataset_name="my-dataset",
    from_version=from_version,
    to_version=to_version,
)

# Get the difference between two timestamped versions of a dataset
from_version = datetime.datetime(2024, 1, 1)
to_version = datetime.datetime(2024, 2, 1)
diff = client.diff_dataset_versions(
    dataset_name="my-dataset",
    from_version=from_version,
    to_version=to_version,
)

read_dataset_openai_finetuning

read_dataset_openai_finetuning(
    dataset_id: ID_TYPE | None = None, *, dataset_name: str | None = None
) -> list

Download a dataset in OpenAI Jsonl format and load it as a list of dicts.

PARAMETER DESCRIPTION
dataset_id

The ID of the dataset to download.

TYPE: UUID | str | None DEFAULT: None

dataset_name

The name of the dataset to download.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
list

list[dict]: The dataset loaded as a list of dicts.

RAISES DESCRIPTION
ValueError

If neither dataset_id nor dataset_name is provided.

list_datasets

list_datasets(
    *,
    dataset_ids: list[ID_TYPE] | None = None,
    data_type: str | None = None,
    dataset_name: str | None = None,
    dataset_name_contains: str | None = None,
    metadata: dict[str, Any] | None = None,
    limit: int | None = None,
) -> Iterator[Dataset]

List the datasets on the LangSmith API.

PARAMETER DESCRIPTION
dataset_ids

A list of dataset IDs to filter the results by.

TYPE: List[UUID | str] | None DEFAULT: None

data_type

The data type of the datasets to filter the results by.

TYPE: str | None DEFAULT: None

dataset_name

The name of the dataset to filter the results by.

TYPE: str | None DEFAULT: None

dataset_name_contains

A substring to search for in the dataset names.

TYPE: str | None DEFAULT: None

metadata

A dictionary of metadata to filter the results by.

TYPE: Dict[str, Any] | None DEFAULT: None

limit

The maximum number of datasets to return.

TYPE: int | None DEFAULT: None

YIELDS DESCRIPTION
Dataset

The datasets.

delete_dataset

delete_dataset(
    *, dataset_id: ID_TYPE | None = None, dataset_name: str | None = None
) -> None

Delete a dataset from the LangSmith API.

PARAMETER DESCRIPTION
dataset_id

The ID of the dataset to delete.

TYPE: UUID | str | None DEFAULT: None

dataset_name

The name of the dataset to delete.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
None

None

update_dataset_tag

update_dataset_tag(
    *,
    dataset_id: ID_TYPE | None = None,
    dataset_name: str | None = None,
    as_of: datetime,
    tag: str,
) -> None

Update the tags of a dataset.

If the tag is already assigned to a different version of this dataset, the tag will be moved to the new version. The as_of parameter is used to determine which version of the dataset to apply the new tags to. It must be an exact version of the dataset to succeed. You can use the read_dataset_version method to find the exact version to apply the tags to.

PARAMETER DESCRIPTION
dataset_id

The ID of the dataset to update.

TYPE: UUID | str | None DEFAULT: None

dataset_name

The name of the dataset to update.

TYPE: str | None DEFAULT: None

as_of

The timestamp of the dataset to apply the new tags to.

TYPE: datetime

tag

The new tag to apply to the dataset.

TYPE: str

RETURNS DESCRIPTION
None

None

Examples:

dataset_name = "my-dataset"
# Get the version of a dataset <= a given timestamp
dataset_version = client.read_dataset_version(
    dataset_name=dataset_name, as_of=datetime.datetime(2024, 1, 1)
)
# Assign that version a new tag
client.update_dataset_tags(
    dataset_name="my-dataset",
    as_of=dataset_version.as_of,
    tag="prod",
)

list_dataset_versions

list_dataset_versions(
    *,
    dataset_id: ID_TYPE | None = None,
    dataset_name: str | None = None,
    search: str | None = None,
    limit: int | None = None,
) -> Iterator[DatasetVersion]

List dataset versions.

PARAMETER DESCRIPTION
dataset_id

The ID of the dataset.

TYPE: UUID | str | None DEFAULT: None

dataset_name

The name of the dataset.

TYPE: str | None DEFAULT: None

search

The search query.

TYPE: str | None DEFAULT: None

limit

The maximum number of versions to return.

TYPE: int | None DEFAULT: None

YIELDS DESCRIPTION
DatasetVersion

The dataset versions.

read_dataset_version

read_dataset_version(
    *,
    dataset_id: ID_TYPE | None = None,
    dataset_name: str | None = None,
    as_of: datetime | None = None,
    tag: str | None = None,
) -> DatasetVersion

Get dataset version by as_of or exact tag.

Ues this to resolve the nearest version to a given timestamp or for a given tag.

PARAMETER DESCRIPTION
dataset_id

The ID of the dataset.

TYPE: ID_TYPE | None DEFAULT: None

dataset_name

The name of the dataset.

TYPE: str | None DEFAULT: None

as_of

The timestamp of the dataset to retrieve.

TYPE: datetime | None DEFAULT: None

tag

The tag of the dataset to retrieve.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
DatasetVersion

The dataset version.

TYPE: DatasetVersion

Examples:

# Get the latest version of a dataset
client.read_dataset_version(dataset_name="my-dataset", tag="latest")

# Get the version of a dataset <= a given timestamp
client.read_dataset_version(
    dataset_name="my-dataset",
    as_of=datetime.datetime(2024, 1, 1),
)


# Get the version of a dataset with a specific tag
client.read_dataset_version(dataset_name="my-dataset", tag="prod")

clone_public_dataset

clone_public_dataset(
    token_or_url: str,
    *,
    source_api_url: str | None = None,
    dataset_name: str | None = None,
) -> Dataset

Clone a public dataset to your own langsmith tenant.

This operation is idempotent. If you already have a dataset with the given name, this function will do nothing.

PARAMETER DESCRIPTION
token_or_url

The token of the public dataset to clone.

TYPE: str

source_api_url

The URL of the langsmith server where the data is hosted. Defaults to the API URL of your current client.

TYPE: str | None DEFAULT: None

dataset_name

The name of the dataset to create in your tenant. Defaults to the name of the public dataset.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
Dataset

The cloned dataset.

TYPE: Dataset

create_llm_example

create_llm_example(
    prompt: str,
    generation: str | None = None,
    dataset_id: ID_TYPE | None = None,
    dataset_name: str | None = None,
    created_at: datetime | None = None,
) -> Example

Add an example (row) to an LLM-type dataset.

PARAMETER DESCRIPTION
prompt

The input prompt for the example.

TYPE: str

generation

The output generation for the example.

TYPE: str | None DEFAULT: None

dataset_id

The ID of the dataset.

TYPE: UUID | str | None DEFAULT: None

dataset_name

The name of the dataset.

TYPE: str | None DEFAULT: None

created_at

The creation timestamp of the example.

TYPE: datetime | None DEFAULT: None

RETURNS DESCRIPTION
Example

The created example

TYPE: Example

create_chat_example

create_chat_example(
    messages: list[Mapping[str, Any] | BaseMessageLike],
    generations: Mapping[str, Any] | BaseMessageLike | None = None,
    dataset_id: ID_TYPE | None = None,
    dataset_name: str | None = None,
    created_at: datetime | None = None,
) -> Example

Add an example (row) to a Chat-type dataset.

PARAMETER DESCRIPTION
messages

The input messages for the example.

TYPE: List[Mapping[str, Any] | BaseMessageLike]

generations

The output messages for the example.

TYPE: Mapping[str, Any] | BaseMessageLike | None DEFAULT: None

dataset_id

The ID of the dataset.

TYPE: UUID | str | None DEFAULT: None

dataset_name

The name of the dataset.

TYPE: str | None DEFAULT: None

created_at

The creation timestamp of the example.

TYPE: datetime | None DEFAULT: None

RETURNS DESCRIPTION
Example

The created example

TYPE: Example

create_example_from_run

create_example_from_run(
    run: Run,
    dataset_id: ID_TYPE | None = None,
    dataset_name: str | None = None,
    created_at: datetime | None = None,
) -> Example

Add an example (row) to a dataset from a run.

PARAMETER DESCRIPTION
run

The run to create an example from.

TYPE: Run

dataset_id

The ID of the dataset.

TYPE: UUID | str | None DEFAULT: None

dataset_name

The name of the dataset.

TYPE: str | None DEFAULT: None

created_at

The creation timestamp of the example.

TYPE: datetime | None DEFAULT: None

RETURNS DESCRIPTION
Example

The created example

TYPE: Example

update_examples_multipart

update_examples_multipart(
    *,
    dataset_id: ID_TYPE,
    updates: list[ExampleUpdate] | None = None,
    dangerously_allow_filesystem: bool = False,
) -> UpsertExamplesResponse

Update examples using multipart.

.. deprecated:: 0.3.9

Use Client.update_examples instead. Will be removed in 0.4.0.

upload_examples_multipart

upload_examples_multipart(
    *,
    dataset_id: ID_TYPE,
    uploads: list[ExampleCreate] | None = None,
    dangerously_allow_filesystem: bool = False,
) -> UpsertExamplesResponse

Upload examples using multipart.

.. deprecated:: 0.3.9

Use Client.create_examples instead. Will be removed in 0.4.0.

upsert_examples_multipart

upsert_examples_multipart(
    *,
    upserts: list[ExampleUpsertWithAttachments] | None = None,
    dangerously_allow_filesystem: bool = False,
) -> UpsertExamplesResponse

Upsert examples.

.. deprecated:: 0.3.9

Use Client.create_examples and Client.update_examples instead. Will be
removed in 0.4.0.

create_examples

create_examples(
    *,
    dataset_name: str | None = None,
    dataset_id: ID_TYPE | None = None,
    examples: Sequence[ExampleCreate | dict] | None = None,
    dangerously_allow_filesystem: bool = False,
    max_concurrency: Annotated[int, Field(ge=1, le=3)] = 1,
    **kwargs: Any,
) -> UpsertExamplesResponse | dict[str, Any]

Create examples in a dataset.

PARAMETER DESCRIPTION
dataset_name

The name of the dataset to create the examples in. Must specify exactly one of dataset_name or dataset_id.

TYPE: str | None DEFAULT: None

dataset_id

The ID of the dataset to create the examples in. Must specify exactly one of dataset_name or dataset_id

TYPE: UUID | str | None DEFAULT: None

examples

The examples to create.

TYPE: Sequence[ExampleCreate | dict] DEFAULT: None

dangerously_allow_filesystem

Whether to allow uploading files from the filesystem.

TYPE: bool DEFAULT: False

**kwargs

Legacy keyword args. Should not be specified if 'examples' is specified.

  • inputs (Sequence[Mapping[str, Any]]): The input values for the examples.
  • outputs (Optional[Sequence[Optional[Mapping[str, Any]]]]): The output values for the examples.
  • metadata (Optional[Sequence[Optional[Mapping[str, Any]]]]): The metadata for the examples.
  • splits (Optional[Sequence[Optional[str | List[str]]]]): The splits for the examples, which are divisions of your dataset such as 'train', 'test', or 'validation'.
  • source_run_ids (Optional[Sequence[Optional[Union[UUID, str]]]]): The IDs of the source runs associated with the examples.
  • ids (Optional[Sequence[Union[UUID, str]]]): The IDs of the examples.

TYPE: Any DEFAULT: {}

RAISES DESCRIPTION
ValueError

If 'examples' and legacy args are both provided.

RETURNS DESCRIPTION
UpsertExamplesResponse | dict[str, Any]

The LangSmith JSON response. Includes 'count' and 'example_ids'.

Behavior changed in langsmith 0.3.11

Updated to take argument 'examples', a single list where each element is the full example to create. This should be used instead of the legacy 'inputs', 'outputs', etc. arguments which split each examples attributes across arguments.

Updated to support creating examples with attachments.

Example
from langsmith import Client

client = Client()

dataset = client.create_dataset("agent-qa")

examples = [
    {
        "inputs": {"question": "what's an agent"},
        "outputs": {"answer": "an agent is..."},
        "metadata": {"difficulty": "easy"},
    },
    {
        "inputs": {
            "question": "can you explain the agent architecture in this diagram?"
        },
        "outputs": {"answer": "this diagram shows..."},
        "attachments": {"diagram": {"mime_type": "image/png", "data": b"..."}},
        "metadata": {"difficulty": "medium"},
    },
    # more examples...
]

response = client.create_examples(dataset_name="agent-qa", examples=examples)
# -> {"example_ids": [...

create_example

create_example(
    inputs: Mapping[str, Any] | None = None,
    dataset_id: ID_TYPE | None = None,
    dataset_name: str | None = None,
    created_at: datetime | None = None,
    outputs: Mapping[str, Any] | None = None,
    metadata: Mapping[str, Any] | None = None,
    split: str | list[str] | None = None,
    example_id: ID_TYPE | None = None,
    source_run_id: ID_TYPE | None = None,
    use_source_run_io: bool = False,
    use_source_run_attachments: list[str] | None = None,
    attachments: Attachments | None = None,
) -> Example

Create a dataset example in the LangSmith API.

Examples are rows in a dataset, containing the inputs and expected outputs (or other reference information) for a model or chain.

PARAMETER DESCRIPTION
inputs

The input values for the example.

TYPE: Mapping[str, Any] DEFAULT: None

dataset_id

The ID of the dataset to create the example in.

TYPE: UUID | str | None DEFAULT: None

dataset_name

The name of the dataset to create the example in.

TYPE: str | None DEFAULT: None

created_at

The creation timestamp of the example.

TYPE: datetime | None DEFAULT: None

outputs

The output values for the example.

TYPE: Mapping[str, Any] | None DEFAULT: None

metadata

The metadata for the example.

TYPE: Mapping[str, Any] | None DEFAULT: None

split

The splits for the example, which are divisions of your dataset such as 'train', 'test', or 'validation'.

TYPE: str | List[str] | None DEFAULT: None

example_id

The ID of the example to create. If not provided, a new example will be created.

TYPE: UUID | str | None DEFAULT: None

source_run_id

The ID of the source run associated with this example.

TYPE: UUID | str | None DEFAULT: None

use_source_run_io

Whether to use the inputs, outputs, and attachments from the source run.

TYPE: bool DEFAULT: False

use_source_run_attachments

Which attachments to use from the source run. If use_source_run_io is True, all attachments will be used regardless of this param.

TYPE: List[str] | None DEFAULT: None

attachments

The attachments for the example.

TYPE: Attachments | None DEFAULT: None

RETURNS DESCRIPTION
Example

The created example.

TYPE: Example

read_example

read_example(example_id: ID_TYPE, *, as_of: datetime | None = None) -> Example

Read an example from the LangSmith API.

PARAMETER DESCRIPTION
example_id

The ID of the example to read.

TYPE: UUID | str

as_of

The dataset version tag OR timestamp to retrieve the example as of. Response examples will only be those that were present at the time of the tagged (or timestamped) version.

TYPE: datetime | None DEFAULT: None

RETURNS DESCRIPTION
Example

The example.

TYPE: Example

list_examples

list_examples(
    dataset_id: ID_TYPE | None = None,
    dataset_name: str | None = None,
    example_ids: Sequence[ID_TYPE] | None = None,
    as_of: datetime | str | None = None,
    splits: Sequence[str] | None = None,
    inline_s3_urls: bool = True,
    *,
    offset: int = 0,
    limit: int | None = None,
    metadata: dict | None = None,
    filter: str | None = None,
    include_attachments: bool = False,
    **kwargs: Any,
) -> Iterator[Example]

Retrieve the example rows of the specified dataset.

PARAMETER DESCRIPTION
dataset_id

The ID of the dataset to filter by.

TYPE: UUID | str | None DEFAULT: None

dataset_name

The name of the dataset to filter by.

TYPE: str | None DEFAULT: None

example_ids

The IDs of the examples to filter by.

TYPE: Optional[Sequence[Union[UUID, str]] DEFAULT: None

as_of

The dataset version tag OR timestamp to retrieve the examples as of. Response examples will only be those that were present at the time of the tagged (or timestamped) version.

TYPE: datetime | str | None DEFAULT: None

splits

A list of dataset splits, which are divisions of your dataset such as 'train', 'test', or 'validation'. Returns examples only from the specified splits.

TYPE: Sequence[str] | None DEFAULT: None

inline_s3_urls

Whether to inline S3 URLs.

TYPE: bool, default=True DEFAULT: True

offset

The offset to start from. Defaults to 0.

TYPE: int, default=0 DEFAULT: 0

limit

The maximum number of examples to return.

TYPE: int | None DEFAULT: None

metadata

A dictionary of metadata to filter by.

TYPE: dict | None DEFAULT: None

filter

A structured filter string to apply to the examples.

TYPE: str | None DEFAULT: None

include_attachments

Whether to include the attachments in the response.

TYPE: bool, default=False DEFAULT: False

**kwargs

Additional keyword arguments are ignored.

TYPE: Any DEFAULT: {}

YIELDS DESCRIPTION
Example

The examples.

Examples:

List all examples for a dataset:

from langsmith import Client

client = Client()

# By Dataset ID
examples = client.list_examples(
    dataset_id="c9ace0d8-a82c-4b6c-13d2-83401d68e9ab"
)
# By Dataset Name
examples = client.list_examples(dataset_name="My Test Dataset")

List examples by id

example_ids = [
    "734fc6a0-c187-4266-9721-90b7a025751a",
    "d6b4c1b9-6160-4d63-9b61-b034c585074f",
    "4d31df4e-f9c3-4a6e-8b6c-65701c2fed13",
]
examples = client.list_examples(example_ids=example_ids)

List examples by metadata

examples = client.list_examples(
    dataset_name=dataset_name, metadata={"foo": "bar"}
)

List examples by structured filter

examples = client.list_examples(
    dataset_name=dataset_name,
    filter='and(not(has(metadata, \'{"foo": "bar"}\')), exists(metadata, "tenant_id"))',
)

index_dataset

index_dataset(*, dataset_id: ID_TYPE, tag: str = 'latest', **kwargs: Any) -> None

Enable dataset indexing. Examples are indexed by their inputs.

This enables searching for similar examples by inputs with client.similar_examples().

PARAMETER DESCRIPTION
dataset_id

The ID of the dataset to index.

TYPE: UUID | str

tag

The version of the dataset to index. If 'latest' then any updates to the dataset (additions, updates, deletions of examples) will be reflected in the index.

TYPE: str | None DEFAULT: 'latest'

**kwargs

Additional keyword arguments to pass as part of request body.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
None

None

sync_indexed_dataset

sync_indexed_dataset(*, dataset_id: ID_TYPE, **kwargs: Any) -> None

Sync dataset index.

This already happens automatically every 5 minutes, but you can call this to force a sync.

PARAMETER DESCRIPTION
dataset_id

The ID of the dataset to sync.

TYPE: UUID | str

RETURNS DESCRIPTION
None

None

similar_examples

similar_examples(
    inputs: dict,
    /,
    *,
    limit: int,
    dataset_id: ID_TYPE,
    filter: str | None = None,
    **kwargs: Any,
) -> list[ExampleSearch]

Retrieve the dataset examples whose inputs best match the current inputs.

Note

Must have few-shot indexing enabled for the dataset. See client.index_dataset().

PARAMETER DESCRIPTION
inputs

The inputs to use as a search query. Must match the dataset input schema. Must be JSON serializable.

TYPE: dict

limit

The maximum number of examples to return.

TYPE: int

dataset_id

The ID of the dataset to search over.

TYPE: UUID | str

filter

A filter string to apply to the search results. Uses the same syntax as the filter parameter in list_runs(). Only a subset of operations are supported.

For example, you can use and(eq(metadata.some_tag, 'some_value'), neq(metadata.env, 'dev')) to filter only examples where some_tag has some_value, and the environment is not dev.

TYPE: str | None DEFAULT: None

**kwargs

Additional keyword arguments to pass as part of request body.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[ExampleSearch]

list[ExampleSearch]: List of ExampleSearch objects.

Examples:

from langsmith import Client

client = Client()
client.similar_examples(
    {"question": "When would i use the runnable generator"},
    limit=3,
    dataset_id="...",
)
[
    ExampleSearch(
        inputs={
            "question": "How do I cache a Chat model? What caches can I use?"
        },
        outputs={
            "answer": "You can use LangChain's caching layer for Chat Models. This can save you money by reducing the number of API calls you make to the LLM provider, if you're often requesting the same completion multiple times, and speed up your application.\n\nfrom langchain.cache import InMemoryCache\nlangchain.llm_cache = InMemoryCache()\n\n# The first time, it is not yet in cache, so it should take longer\nllm.predict('Tell me a joke')\n\nYou can also use SQLite Cache which uses a SQLite database:\n\nrm .langchain.db\n\nfrom langchain.cache import SQLiteCache\nlangchain.llm_cache = SQLiteCache(database_path=\".langchain.db\")\n\n# The first time, it is not yet in cache, so it should take longer\nllm.predict('Tell me a joke') \n"
        },
        metadata=None,
        id=UUID("b2ddd1c4-dff6-49ae-8544-f48e39053398"),
        dataset_id=UUID("01b6ce0f-bfb6-4f48-bbb8-f19272135d40"),
    ),
    ExampleSearch(
        inputs={"question": "What's a runnable lambda?"},
        outputs={
            "answer": "A runnable lambda is an object that implements LangChain's `Runnable` interface and runs a callbale (i.e., a function). Note the function must accept a single argument."
        },
        metadata=None,
        id=UUID("f94104a7-2434-4ba7-8293-6a283f4860b4"),
        dataset_id=UUID("01b6ce0f-bfb6-4f48-bbb8-f19272135d40"),
    ),
    ExampleSearch(
        inputs={"question": "Show me how to use RecursiveURLLoader"},
        outputs={
            "answer": 'The RecursiveURLLoader comes from the langchain.document_loaders.recursive_url_loader module. Here\'s an example of how to use it:\n\nfrom langchain.document_loaders.recursive_url_loader import RecursiveUrlLoader\n\n# Create an instance of RecursiveUrlLoader with the URL you want to load\nloader = RecursiveUrlLoader(url="https://example.com")\n\n# Load all child links from the URL page\nchild_links = loader.load()\n\n# Print the child links\nfor link in child_links:\n    print(link)\n\nMake sure to replace "https://example.com" with the actual URL you want to load. The load() method returns a list of child links found on the URL page. You can iterate over this list to access each child link.'
        },
        metadata=None,
        id=UUID("0308ea70-a803-4181-a37d-39e95f138f8c"),
        dataset_id=UUID("01b6ce0f-bfb6-4f48-bbb8-f19272135d40"),
    ),
]

update_example

update_example(
    example_id: ID_TYPE,
    *,
    inputs: dict[str, Any] | None = None,
    outputs: Mapping[str, Any] | None = None,
    metadata: dict | None = None,
    split: str | list[str] | None = None,
    dataset_id: ID_TYPE | None = None,
    attachments_operations: AttachmentsOperations | None = None,
    attachments: Attachments | None = None,
) -> dict[str, Any]

Update a specific example.

PARAMETER DESCRIPTION
example_id

The ID of the example to update.

TYPE: UUID | str

inputs

The input values to update.

TYPE: Dict[str, Any] | None DEFAULT: None

outputs

The output values to update.

TYPE: Mapping[str, Any] | None DEFAULT: None

metadata

The metadata to update.

TYPE: Dict | None DEFAULT: None

split

The dataset split to update, such as 'train', 'test', or 'validation'.

TYPE: str | List[str] | None DEFAULT: None

dataset_id

The ID of the dataset to update.

TYPE: UUID | str | None DEFAULT: None

attachments_operations

The attachments operations to perform.

TYPE: AttachmentsOperations | None DEFAULT: None

attachments

The attachments to add to the example.

TYPE: Attachments | None DEFAULT: None

RETURNS DESCRIPTION
dict[str, Any]

Dict[str, Any]: The updated example.

update_examples

update_examples(
    *,
    dataset_name: str | None = None,
    dataset_id: ID_TYPE | None = None,
    updates: Sequence[ExampleUpdate | dict] | None = None,
    dangerously_allow_filesystem: bool = False,
    **kwargs: Any,
) -> dict[str, Any]

Update multiple examples.

Examples are expected to all be part of the same dataset.

PARAMETER DESCRIPTION
dataset_name

The name of the dataset to update. Should specify exactly one of 'dataset_name' or 'dataset_id'.

TYPE: str | None DEFAULT: None

dataset_id

The ID of the dataset to update. Should specify exactly one of 'dataset_name' or 'dataset_id'.

TYPE: UUID | str | None DEFAULT: None

updates

The example updates. Overwrites any specified fields and does not update any unspecified fields.

TYPE: Sequence[ExampleUpdate | dict] | None DEFAULT: None

dangerously_allow_filesystem

Whether to allow using filesystem paths as attachments.

TYPE: bool DEFAULT: False

**kwargs

Legacy keyword args. Should not be specified if 'updates' is specified.

  • example_ids (Sequence[UUID | str]): The IDs of the examples to update.
  • inputs (Sequence[dict | None] | None): The input values for the examples.
  • outputs (Sequence[dict | None] | None): The output values for the examples.
  • metadata (Sequence[dict | None] | None): The metadata for the examples.
  • splits (Sequence[str | list[str] | None] | None): The splits for the examples, which are divisions of your dataset such as 'train', 'test', or 'validation'.
  • attachments_operations (Sequence[AttachmentsOperations | None] | None): The operations to perform on the attachments.
  • dataset_ids (Sequence[UUID | str] | None): The IDs of the datasets to move the examples to.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
dict[str, Any]

The LangSmith JSON response. Includes 'message', 'count', and 'example_ids'.

Behavior changed in langsmith 0.3.9

Updated to ...

Example
from langsmith import Client

client = Client()

dataset = client.create_dataset("agent-qa")

examples = [
    {
        "inputs": {"question": "what's an agent"},
        "outputs": {"answer": "an agent is..."},
        "metadata": {"difficulty": "easy"},
    },
    {
        "inputs": {
            "question": "can you explain the agent architecture in this diagram?"
        },
        "outputs": {"answer": "this diagram shows..."},
        "attachments": {"diagram": {"mime_type": "image/png", "data": b"..."}},
        "metadata": {"difficulty": "medium"},
    },
    # more examples...
]

response = client.create_examples(dataset_name="agent-qa", examples=examples)
example_ids = response["example_ids"]

updates = [
    {
        "id": example_ids[0],
        "inputs": {"question": "what isn't an agent"},
        "outputs": {"answer": "an agent is not..."},
    },
    {
        "id": example_ids[1],
        "attachments_operations": [
            {"rename": {"diagram": "agent_diagram"}, "retain": []}
        ],
    },
]
response = client.update_examples(dataset_name="agent-qa", updates=updates)
# -> {"example_ids": [...

delete_example

delete_example(example_id: ID_TYPE) -> None

Delete an example by ID.

PARAMETER DESCRIPTION
example_id

The ID of the example to delete.

TYPE: UUID | str

RETURNS DESCRIPTION
None

None

delete_examples

delete_examples(example_ids: Sequence[ID_TYPE]) -> None

Delete multiple examples by ID.

Parameters

example_ids : Sequence[ID_TYPE] The IDs of the examples to delete.

list_dataset_splits

list_dataset_splits(
    *,
    dataset_id: ID_TYPE | None = None,
    dataset_name: str | None = None,
    as_of: str | datetime | None = None,
) -> list[str]

Get the splits for a dataset.

PARAMETER DESCRIPTION
dataset_id

The ID of the dataset.

TYPE: UUID | str | None DEFAULT: None

dataset_name

The name of the dataset.

TYPE: str | None DEFAULT: None

as_of

The version of the dataset to retrieve splits for. Can be a timestamp or a string tag. Defaults to "latest".

TYPE: str | datetime | None DEFAULT: None

RETURNS DESCRIPTION
list[str]

List[str]: The names of this dataset's splits.

update_dataset_splits

update_dataset_splits(
    *,
    dataset_id: ID_TYPE | None = None,
    dataset_name: str | None = None,
    split_name: str,
    example_ids: list[ID_TYPE],
    remove: bool = False,
) -> None

Update the splits for a dataset.

PARAMETER DESCRIPTION
dataset_id

The ID of the dataset to update.

TYPE: UUID | str | None DEFAULT: None

dataset_name

The name of the dataset to update.

TYPE: str | None DEFAULT: None

split_name

The name of the split to update.

TYPE: str

example_ids

The IDs of the examples to add to or remove from the split.

TYPE: List[UUID | str]

remove

If True, remove the examples from the split. If False, add the examples to the split.

TYPE: bool | None DEFAULT: False

RETURNS DESCRIPTION
None

None

evaluate_run

evaluate_run(
    run: Run | RunBase | str | UUID,
    evaluator: RunEvaluator,
    *,
    source_info: dict[str, Any] | None = None,
    reference_example: Example | str | dict | UUID | None = None,
    load_child_runs: bool = False,
) -> EvaluationResult

Evaluate a run.

PARAMETER DESCRIPTION
run

The run to evaluate.

TYPE: Run | RunBase | str | UUID

evaluator

The evaluator to use.

TYPE: RunEvaluator

source_info

Additional information about the source of the evaluation to log as feedback metadata.

TYPE: Dict[str, Any] | None DEFAULT: None

reference_example

The example to use as a reference for the evaluation. If not provided, the run's reference example will be used.

TYPE: Example | str | dict | UUID | None DEFAULT: None

load_child_runs

Whether to load child runs when resolving the run ID.

TYPE: bool, default=False DEFAULT: False

RETURNS DESCRIPTION
Feedback

The feedback object created by the evaluation.

TYPE: EvaluationResult

aevaluate_run async

aevaluate_run(
    run: Run | str | UUID,
    evaluator: RunEvaluator,
    *,
    source_info: dict[str, Any] | None = None,
    reference_example: Example | str | dict | UUID | None = None,
    load_child_runs: bool = False,
) -> EvaluationResult

Evaluate a run asynchronously.

PARAMETER DESCRIPTION
run

The run to evaluate.

TYPE: Run | str | UUID

evaluator

The evaluator to use.

TYPE: RunEvaluator

source_info

Additional information about the source of the evaluation to log as feedback metadata.

TYPE: Dict[str, Any] | None DEFAULT: None

reference_example

The example to use as a reference for the evaluation. If not provided, the run's reference example will be used.

TYPE: Example | str | dict | UUID | None DEFAULT: None

load_child_runs

Whether to load child runs when resolving the run ID.

TYPE: bool, default=False DEFAULT: False

RETURNS DESCRIPTION
EvaluationResult

The evaluation result object created by the evaluation.

TYPE: EvaluationResult

create_feedback

create_feedback(
    run_id: ID_TYPE | None = None,
    key: str = "unnamed",
    *,
    score: float | int | bool | None = None,
    value: str | dict | None = None,
    trace_id: ID_TYPE | None = None,
    correction: dict | None = None,
    comment: str | None = None,
    source_info: dict[str, Any] | None = None,
    feedback_source_type: FeedbackSourceType | str = API,
    source_run_id: ID_TYPE | None = None,
    feedback_id: ID_TYPE | None = None,
    feedback_config: FeedbackConfig | None = None,
    stop_after_attempt: int = 10,
    project_id: ID_TYPE | None = None,
    comparative_experiment_id: ID_TYPE | None = None,
    feedback_group_id: ID_TYPE | None = None,
    extra: dict | None = None,
    error: bool | None = None,
    **kwargs: Any,
) -> Feedback

Create feedback for a run.

Note

To enable feedback to be batch uploaded in the background you must specify trace_id. We highly encourage this for latency-sensitive environments.

PARAMETER DESCRIPTION
key

The name of the feedback metric.

TYPE: str DEFAULT: 'unnamed'

score

The score to rate this run on the metric or aspect.

TYPE: float | int | bool | None DEFAULT: None

value

The display value or non-numeric value for this feedback.

TYPE: float | int | bool | str | dict | None DEFAULT: None

run_id

The ID of the run to provide feedback for. At least one of run_id, trace_id, or project_id must be specified.

TYPE: UUID | str | None DEFAULT: None

trace_id

The ID of the trace (i.e. root parent run) of the run to provide feedback for (specified by run_id). If run_id and trace_id are the same, only trace_id needs to be specified. NOTE: trace_id is required feedback ingestion to be batched and backgrounded.

TYPE: UUID | str | None DEFAULT: None

correction

The proper ground truth for this run.

TYPE: dict | None DEFAULT: None

comment

A comment about this feedback, such as a justification for the score or chain-of-thought trajectory for an LLM judge.

TYPE: str | None DEFAULT: None

source_info

Information about the source of this feedback.

TYPE: Dict[str, Any] | None DEFAULT: None

feedback_source_type

The type of feedback source, such as model (for model-generated feedback) or API.

TYPE: FeedbackSourceType | str DEFAULT: API

source_run_id

The ID of the run that generated this feedback, if a "model" type.

TYPE: UUID | str | None DEFAULT: None

feedback_id

The ID of the feedback to create. If not provided, a random UUID will be generated.

TYPE: UUID | str | None DEFAULT: None

feedback_config

The configuration specifying how to interpret feedback with this key. Examples include continuous (with min/max bounds), categorical, or freeform.

TYPE: FeedbackConfig | None DEFAULT: None

stop_after_attempt

The number of times to retry the request before giving up.

TYPE: int, default=10 DEFAULT: 10

project_id

The ID of the project (or experiment) to provide feedback on. This is used for creating summary metrics for experiments. Cannot specify run_id or trace_id if project_id is specified, and vice versa.

TYPE: UUID | str | None DEFAULT: None

comparative_experiment_id

If this feedback was logged as a part of a comparative experiment, this associates the feedback with that experiment.

TYPE: UUID | str | None DEFAULT: None

feedback_group_id

When logging preferences, ranking runs, or other comparative feedback, this is used to group feedback together.

TYPE: UUID | str | None DEFAULT: None

extra

Metadata for the feedback.

TYPE: Dict | None DEFAULT: None

**kwargs

Additional keyword arguments.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
Feedback

The created feedback object.

TYPE: Feedback

Example
from langsmith import trace, traceable, Client


@traceable
def foo(x):
    return {"y": x * 2}


@traceable
def bar(y):
    return {"z": y - 1}


client = Client()

inputs = {"x": 1}
with trace(name="foobar", inputs=inputs) as root_run:
    result = foo(**inputs)
    result = bar(**result)
    root_run.outputs = result
    trace_id = root_run.id
    child_runs = root_run.child_runs

# Provide feedback for a trace (a.k.a. a root run)
client.create_feedback(
    key="user_feedback",
    score=1,
    trace_id=trace_id,
)

# Provide feedback for a child run
foo_run_id = [run for run in child_runs if run.name == "foo"][0].id
client.create_feedback(
    key="correctness",
    score=0,
    run_id=foo_run_id,
    # trace_id= is optional but recommended to enable batched and backgrounded
    # feedback ingestion.
    trace_id=trace_id,
)

update_feedback

update_feedback(
    feedback_id: ID_TYPE,
    *,
    score: float | int | bool | None = None,
    value: float | int | bool | str | dict | None = None,
    correction: dict | None = None,
    comment: str | None = None,
) -> None

Update a feedback in the LangSmith API.

PARAMETER DESCRIPTION
feedback_id

The ID of the feedback to update.

TYPE: UUID | str

score

The score to update the feedback with.

TYPE: float | int | bool | None DEFAULT: None

value

The value to update the feedback with.

TYPE: float | int | bool | str | dict | None DEFAULT: None

correction

The correction to update the feedback with.

TYPE: dict | None DEFAULT: None

comment

The comment to update the feedback with.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
None

None

read_feedback

read_feedback(feedback_id: ID_TYPE) -> Feedback

Read a feedback from the LangSmith API.

PARAMETER DESCRIPTION
feedback_id

The ID of the feedback to read.

TYPE: UUID | str

RETURNS DESCRIPTION
Feedback

The feedback.

TYPE: Feedback

list_feedback

list_feedback(
    *,
    run_ids: Sequence[ID_TYPE] | None = None,
    feedback_key: Sequence[str] | None = None,
    feedback_source_type: Sequence[FeedbackSourceType] | None = None,
    limit: int | None = None,
    **kwargs: Any,
) -> Iterator[Feedback]

List the feedback objects on the LangSmith API.

PARAMETER DESCRIPTION
run_ids

The IDs of the runs to filter by.

TYPE: Sequence[UUID | str] | None DEFAULT: None

feedback_key

The feedback key(s) to filter by. Examples: 'correctness' The query performs a union of all feedback keys.

TYPE: Sequence[str] | None DEFAULT: None

feedback_source_type

The type of feedback source, such as model or API.

TYPE: Sequence[FeedbackSourceType] | None DEFAULT: None

limit

The maximum number of feedback to return.

TYPE: int | None DEFAULT: None

**kwargs

Additional keyword arguments.

TYPE: Any DEFAULT: {}

YIELDS DESCRIPTION
Feedback

The feedback objects.

delete_feedback

delete_feedback(feedback_id: ID_TYPE) -> None

Delete a feedback by ID.

PARAMETER DESCRIPTION
feedback_id

The ID of the feedback to delete.

TYPE: UUID | str

RETURNS DESCRIPTION
None

None

create_feedback_from_token

create_feedback_from_token(
    token_or_url: str | UUID,
    score: float | int | bool | None = None,
    *,
    value: float | int | bool | str | dict | None = None,
    correction: dict | None = None,
    comment: str | None = None,
    metadata: dict | None = None,
) -> None

Create feedback from a presigned token or URL.

PARAMETER DESCRIPTION
token_or_url

The token or URL from which to create feedback.

TYPE: str | UUID

score

The score of the feedback.

TYPE: float | int | bool | None DEFAULT: None

value

The value of the feedback.

TYPE: float | int | bool | str | dict | None DEFAULT: None

correction

The correction of the feedback.

TYPE: dict | None DEFAULT: None

comment

The comment of the feedback.

TYPE: str | None DEFAULT: None

metadata

Additional metadata for the feedback.

TYPE: dict | None DEFAULT: None

RAISES DESCRIPTION
ValueError

If the source API URL is invalid.

RETURNS DESCRIPTION
None

None

create_presigned_feedback_token

create_presigned_feedback_token(
    run_id: ID_TYPE,
    feedback_key: str,
    *,
    expiration: datetime | timedelta | None = None,
    feedback_config: FeedbackConfig | None = None,
    feedback_id: ID_TYPE | None = None,
) -> FeedbackIngestToken

Create a pre-signed URL to send feedback data to.

This is useful for giving browser-based clients a way to upload feedback data directly to LangSmith without accessing the API key.

PARAMETER DESCRIPTION
run_id

The ID of the run.

TYPE: UUID | str

feedback_key

The key of the feedback to create.

TYPE: str

expiration

The expiration time of the pre-signed URL. Either a datetime or a timedelta offset from now. Default to 3 hours.

TYPE: datetime | timedelta | None DEFAULT: None

feedback_config

If creating a feedback_key for the first time, this defines how the metric should be interpreted, such as a continuous score (w/ optional bounds), or distribution over categorical values.

TYPE: FeedbackConfig | None DEFAULT: None

feedback_id

The ID of the feedback to create. If not provided, a new feedback will be created.

TYPE: Optional[Union[UUID, str] DEFAULT: None

RETURNS DESCRIPTION
FeedbackIngestToken

The pre-signed URL for uploading feedback data.

TYPE: FeedbackIngestToken

create_presigned_feedback_tokens

create_presigned_feedback_tokens(
    run_id: ID_TYPE,
    feedback_keys: Sequence[str],
    *,
    expiration: datetime | timedelta | None = None,
    feedback_configs: Sequence[FeedbackConfig | None] | None = None,
) -> Sequence[FeedbackIngestToken]

Create a pre-signed URL to send feedback data to.

This is useful for giving browser-based clients a way to upload feedback data directly to LangSmith without accessing the API key.

PARAMETER DESCRIPTION
run_id

The ID of the run.

TYPE: UUID | str

feedback_keys

The key of the feedback to create.

TYPE: Sequence[str]

expiration

The expiration time of the pre-signed URL. Either a datetime or a timedelta offset from now. Default to 3 hours.

TYPE: datetime | timedelta | None DEFAULT: None

feedback_configs

If creating a feedback_key for the first time, this defines how the metric should be interpreted, such as a continuous score (w/ optional bounds), or distribution over categorical values.

TYPE: Sequence[FeedbackConfig | None] | None DEFAULT: None

RETURNS DESCRIPTION
Sequence[FeedbackIngestToken]

Sequence[FeedbackIngestToken]: The pre-signed URL for uploading feedback data.

list_presigned_feedback_tokens

list_presigned_feedback_tokens(
    run_id: ID_TYPE, *, limit: int | None = None
) -> Iterator[FeedbackIngestToken]

List the feedback ingest tokens for a run.

PARAMETER DESCRIPTION
run_id

The ID of the run to filter by.

TYPE: UUID | str

limit

The maximum number of tokens to return.

TYPE: int | None DEFAULT: None

YIELDS DESCRIPTION
FeedbackIngestToken

The feedback ingest tokens.

list_feedback_formulas

list_feedback_formulas(
    *,
    dataset_id: ID_TYPE | None = None,
    session_id: ID_TYPE | None = None,
    limit: int | None = None,
    offset: int = 0,
) -> Iterator[FeedbackFormula]

List feedback formulas.

PARAMETER DESCRIPTION
dataset_id

The ID of the dataset to filter by.

TYPE: UUID | str | None DEFAULT: None

session_id

The ID of the session to filter by.

TYPE: UUID | str | None DEFAULT: None

limit

The maximum number of feedback formulas to return.

TYPE: int | None DEFAULT: None

offset

The starting offset for pagination.

TYPE: int DEFAULT: 0

YIELDS DESCRIPTION
FeedbackFormula

The feedback formulas.

get_feedback_formula_by_id

get_feedback_formula_by_id(feedback_formula_id: ID_TYPE) -> FeedbackFormula

Get a feedback formula by ID.

PARAMETER DESCRIPTION
feedback_formula_id

The ID of the feedback formula to retrieve.

TYPE: UUID | str

RETURNS DESCRIPTION
FeedbackFormula

The requested feedback formula.

create_feedback_formula

create_feedback_formula(
    *,
    feedback_key: str,
    aggregation_type: Literal["sum", "avg"],
    formula_parts: Sequence[FeedbackFormulaWeightedVariable | dict],
    dataset_id: ID_TYPE | None = None,
    session_id: ID_TYPE | None = None,
) -> FeedbackFormula

Create a feedback formula.

PARAMETER DESCRIPTION
feedback_key

The feedback key for the formula.

TYPE: str

aggregation_type

The aggregation type to use when combining parts.

TYPE: Literal['sum', 'avg']

formula_parts

The weighted feedback keys included in the formula.

TYPE: Sequence[FeedbackFormulaWeightedVariable | dict]

dataset_id

The dataset to scope the formula to.

TYPE: UUID | str | None DEFAULT: None

session_id

The session to scope the formula to.

TYPE: UUID | str | None DEFAULT: None

RETURNS DESCRIPTION
FeedbackFormula

The created feedback formula.

update_feedback_formula

update_feedback_formula(
    feedback_formula_id: ID_TYPE,
    *,
    feedback_key: str,
    aggregation_type: Literal["sum", "avg"],
    formula_parts: Sequence[FeedbackFormulaWeightedVariable | dict],
) -> FeedbackFormula

Update a feedback formula.

PARAMETER DESCRIPTION
feedback_formula_id

The ID of the feedback formula to update.

TYPE: UUID | str

feedback_key

The feedback key for the formula.

TYPE: str

aggregation_type

The aggregation type to use when combining parts.

TYPE: Literal['sum', 'avg']

formula_parts

The weighted feedback keys included in the formula.

TYPE: Sequence[FeedbackFormulaWeightedVariable | dict]

RETURNS DESCRIPTION
FeedbackFormula

The updated feedback formula.

delete_feedback_formula

delete_feedback_formula(feedback_formula_id: ID_TYPE) -> None

Delete a feedback formula by ID.

PARAMETER DESCRIPTION
feedback_formula_id

The ID of the feedback formula to delete.

TYPE: UUID | str

list_annotation_queues

list_annotation_queues(
    *,
    queue_ids: list[ID_TYPE] | None = None,
    name: str | None = None,
    name_contains: str | None = None,
    limit: int | None = None,
) -> Iterator[AnnotationQueue]

List the annotation queues on the LangSmith API.

PARAMETER DESCRIPTION
queue_ids

The IDs of the queues to filter by.

TYPE: List[UUID | str] | None DEFAULT: None

name

The name of the queue to filter by.

TYPE: str | None DEFAULT: None

name_contains

The substring that the queue name should contain.

TYPE: str | None DEFAULT: None

limit

The maximum number of queues to return.

TYPE: int | None DEFAULT: None

YIELDS DESCRIPTION
AnnotationQueue

The annotation queues.

create_annotation_queue

create_annotation_queue(
    *,
    name: str,
    description: str | None = None,
    queue_id: ID_TYPE | None = None,
    rubric_instructions: str | None = None,
) -> AnnotationQueueWithDetails

Create an annotation queue on the LangSmith API.

PARAMETER DESCRIPTION
name

The name of the annotation queue.

TYPE: str

description

The description of the annotation queue.

TYPE: str | None DEFAULT: None

queue_id

The ID of the annotation queue.

TYPE: UUID | str | None DEFAULT: None

rubric_instructions

The rubric instructions for the annotation queue.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
AnnotationQueue

The created annotation queue object.

TYPE: AnnotationQueueWithDetails

read_annotation_queue

read_annotation_queue(queue_id: ID_TYPE) -> AnnotationQueue

Read an annotation queue with the specified queue_id.

PARAMETER DESCRIPTION
queue_id

The ID of the annotation queue to read.

TYPE: UUID | str

RETURNS DESCRIPTION
AnnotationQueue

The annotation queue object.

TYPE: AnnotationQueue

update_annotation_queue

update_annotation_queue(
    queue_id: ID_TYPE,
    *,
    name: str,
    description: str | None = None,
    rubric_instructions: str | None = None,
) -> None

Update an annotation queue with the specified queue_id.

PARAMETER DESCRIPTION
queue_id

The ID of the annotation queue to update.

TYPE: UUID | str

name

The new name for the annotation queue.

TYPE: str

description

The new description for the annotation queue.

TYPE: str | None DEFAULT: None

rubric_instructions

The new rubric instructions for the annotation queue.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
None

None

delete_annotation_queue

delete_annotation_queue(queue_id: ID_TYPE) -> None

Delete an annotation queue with the specified queue_id.

PARAMETER DESCRIPTION
queue_id

The ID of the annotation queue to delete.

TYPE: UUID | str

RETURNS DESCRIPTION
None

None

add_runs_to_annotation_queue

add_runs_to_annotation_queue(queue_id: ID_TYPE, *, run_ids: list[ID_TYPE]) -> None

Add runs to an annotation queue with the specified queue_id.

PARAMETER DESCRIPTION
queue_id

The ID of the annotation queue.

TYPE: UUID | str

run_ids

The IDs of the runs to be added to the annotation queue.

TYPE: List[UUID | str]

RETURNS DESCRIPTION
None

None

delete_run_from_annotation_queue

delete_run_from_annotation_queue(queue_id: ID_TYPE, *, run_id: ID_TYPE) -> None

Delete a run from an annotation queue with the specified queue_id and run_id.

PARAMETER DESCRIPTION
queue_id

The ID of the annotation queue.

TYPE: UUID | str

run_id

The ID of the run to be added to the annotation queue.

TYPE: UUID | str

RETURNS DESCRIPTION
None

None

get_run_from_annotation_queue

get_run_from_annotation_queue(
    queue_id: ID_TYPE, *, index: int
) -> RunWithAnnotationQueueInfo

Get a run from an annotation queue at the specified index.

PARAMETER DESCRIPTION
queue_id

The ID of the annotation queue.

TYPE: UUID | str

index

The index of the run to retrieve.

TYPE: int

RETURNS DESCRIPTION
RunWithAnnotationQueueInfo

The run at the specified index.

TYPE: RunWithAnnotationQueueInfo

RAISES DESCRIPTION
LangSmithNotFoundError

If the run is not found at the given index.

LangSmithError

For other API-related errors.

create_comparative_experiment

create_comparative_experiment(
    name: str,
    experiments: Sequence[ID_TYPE],
    *,
    reference_dataset: ID_TYPE | None = None,
    description: str | None = None,
    created_at: datetime | None = None,
    metadata: dict[str, Any] | None = None,
    id: ID_TYPE | None = None,
) -> ComparativeExperiment

Create a comparative experiment on the LangSmith API.

These experiments compare 2 or more experiment results over a shared dataset.

PARAMETER DESCRIPTION
name

The name of the comparative experiment.

TYPE: str

experiments

The IDs of the experiments to compare.

TYPE: Sequence[UUID | str]

reference_dataset

The ID of the dataset these experiments are compared on.

TYPE: UUID | str | None DEFAULT: None

description

The description of the comparative experiment.

TYPE: str | None DEFAULT: None

created_at

The creation time of the comparative experiment.

TYPE: datetime | None DEFAULT: None

metadata

Additional metadata for the comparative experiment.

TYPE: Dict[str, Any] | None DEFAULT: None

id

The ID of the comparative experiment.

TYPE: UUID | str | None DEFAULT: None

RETURNS DESCRIPTION
ComparativeExperiment

The created comparative experiment object.

TYPE: ComparativeExperiment

arun_on_dataset async

arun_on_dataset(
    dataset_name: str,
    llm_or_chain_factory: Any,
    *,
    evaluation: Any | None = None,
    concurrency_level: int = 5,
    project_name: str | None = None,
    project_metadata: dict[str, Any] | None = None,
    dataset_version: datetime | str | None = None,
    verbose: bool = False,
    input_mapper: Callable[[dict], Any] | None = None,
    revision_id: str | None = None,
    **kwargs: Any,
) -> dict[str, Any]

Asynchronously run the Chain or language model on a dataset.

.. deprecated:: 0.1.0

This method is deprecated. Use :func:langsmith.aevaluate instead.

run_on_dataset

run_on_dataset(
    dataset_name: str,
    llm_or_chain_factory: Any,
    *,
    evaluation: Any | None = None,
    concurrency_level: int = 5,
    project_name: str | None = None,
    project_metadata: dict[str, Any] | None = None,
    dataset_version: datetime | str | None = None,
    verbose: bool = False,
    input_mapper: Callable[[dict], Any] | None = None,
    revision_id: str | None = None,
    **kwargs: Any,
) -> dict[str, Any]

Run the Chain or language model on a dataset.

.. deprecated:: 0.1.0

This method is deprecated. Use :func:langsmith.aevaluate instead.

like_prompt

like_prompt(prompt_identifier: str) -> dict[str, int]

Like a prompt.

PARAMETER DESCRIPTION
prompt_identifier

The identifier of the prompt.

TYPE: str

RETURNS DESCRIPTION
dict[str, int]

Dict[str, int]: A dictionary with the key 'likes' and the count of likes as the value.

unlike_prompt

unlike_prompt(prompt_identifier: str) -> dict[str, int]

Unlike a prompt.

PARAMETER DESCRIPTION
prompt_identifier

The identifier of the prompt.

TYPE: str

RETURNS DESCRIPTION
dict[str, int]

Dict[str, int]: A dictionary with the key 'likes' and the count of likes as the value.

list_prompts

list_prompts(
    *,
    limit: int = 100,
    offset: int = 0,
    is_public: bool | None = None,
    is_archived: bool | None = False,
    sort_field: PromptSortField = updated_at,
    sort_direction: Literal["desc", "asc"] = "desc",
    query: str | None = None,
) -> ListPromptsResponse

List prompts with pagination.

PARAMETER DESCRIPTION
limit

The maximum number of prompts to return. Defaults to 100.

TYPE: int, default=100 DEFAULT: 100

offset

The number of prompts to skip. Defaults to 0.

TYPE: int, default=0 DEFAULT: 0

is_public

Filter prompts by if they are public.

TYPE: bool | None DEFAULT: None

is_archived

Filter prompts by if they are archived.

TYPE: bool | None DEFAULT: False

sort_field

The field to sort by. Defaults to "updated_at".

TYPE: PromptSortField DEFAULT: updated_at

sort_direction

The order to sort by. Defaults to "desc".

TYPE: Literal["desc", "asc"], default="desc" DEFAULT: 'desc'

query

Filter prompts by a search query.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
ListPromptsResponse

A response object containing

TYPE: ListPromptsResponse

ListPromptsResponse

the list of prompts.

get_prompt

get_prompt(prompt_identifier: str) -> Prompt | None

Get a specific prompt by its identifier.

PARAMETER DESCRIPTION
prompt_identifier

The identifier of the prompt. The identifier should be in the format "prompt_name" or "owner/prompt_name".

TYPE: str

RETURNS DESCRIPTION
Prompt | None

Optional[Prompt]: The prompt object.

RAISES DESCRIPTION
HTTPError

If the prompt is not found or another error occurs.

create_prompt

create_prompt(
    prompt_identifier: str,
    *,
    description: str | None = None,
    readme: str | None = None,
    tags: Sequence[str] | None = None,
    is_public: bool = False,
) -> Prompt

Create a new prompt.

Does not attach prompt object, just creates an empty prompt.

PARAMETER DESCRIPTION
prompt_identifier

The identifier of the prompt. The identifier should be in the formatof owner/name:hash, name:hash, owner/name, or name

TYPE: str

description

A description of the prompt.

TYPE: str | None DEFAULT: None

readme

A readme for the prompt.

TYPE: str | None DEFAULT: None

tags

A list of tags for the prompt.

TYPE: Sequence[str] | None DEFAULT: None

is_public

Whether the prompt should be public.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
Prompt

The created prompt object.

TYPE: Prompt

RAISES DESCRIPTION
ValueError

If the current tenant is not the owner.

HTTPError

If the server request fails.

create_commit

create_commit(
    prompt_identifier: str, object: Any, *, parent_commit_hash: str | None = None
) -> str

Create a commit for an existing prompt.

PARAMETER DESCRIPTION
prompt_identifier

The identifier of the prompt.

TYPE: str

object

The LangChain object to commit.

TYPE: Any

parent_commit_hash

The hash of the parent commit. Defaults to latest commit.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
str

The url of the prompt commit.

TYPE: str

RAISES DESCRIPTION
HTTPError

If the server request fails.

ValueError

If the prompt does not exist.

update_prompt

update_prompt(
    prompt_identifier: str,
    *,
    description: str | None = None,
    readme: str | None = None,
    tags: Sequence[str] | None = None,
    is_public: bool | None = None,
    is_archived: bool | None = None,
) -> dict[str, Any]

Update a prompt's metadata.

To update the content of a prompt, use push_prompt or create_commit instead.

PARAMETER DESCRIPTION
prompt_identifier

The identifier of the prompt to update.

TYPE: str

description

New description for the prompt.

TYPE: str | None DEFAULT: None

readme

New readme for the prompt.

TYPE: str | None DEFAULT: None

tags

New list of tags for the prompt.

TYPE: Sequence[str] | None DEFAULT: None

is_public

New public status for the prompt.

TYPE: bool | None DEFAULT: None

is_archived

New archived status for the prompt.

TYPE: bool | None DEFAULT: None

RETURNS DESCRIPTION
dict[str, Any]

Dict[str, Any]: The updated prompt data as returned by the server.

RAISES DESCRIPTION
ValueError

If the prompt_identifier is empty.

HTTPError

If the server request fails.

delete_prompt

delete_prompt(prompt_identifier: str) -> None

Delete a prompt.

PARAMETER DESCRIPTION
prompt_identifier

The identifier of the prompt to delete.

TYPE: str

RETURNS DESCRIPTION
bool

True if the prompt was successfully deleted, False otherwise.

TYPE: None

RAISES DESCRIPTION
ValueError

If the current tenant is not the owner of the prompt.

pull_prompt_commit

pull_prompt_commit(
    prompt_identifier: str, *, include_model: bool | None = False
) -> PromptCommit

Pull a prompt object from the LangSmith API.

PARAMETER DESCRIPTION
prompt_identifier

The identifier of the prompt.

TYPE: str

RETURNS DESCRIPTION
PromptCommit

The prompt object.

TYPE: PromptCommit

RAISES DESCRIPTION
ValueError

If no commits are found for the prompt.

list_prompt_commits

list_prompt_commits(
    prompt_identifier: str,
    *,
    limit: int | None = None,
    offset: int = 0,
    include_model: bool = False,
) -> Iterator[ListedPromptCommit]

List commits for a given prompt.

PARAMETER DESCRIPTION
prompt_identifier

The identifier of the prompt in the format 'owner/repo_name'.

TYPE: str

limit

The maximum number of commits to return. If None, returns all commits.

TYPE: int | None DEFAULT: None

offset

The number of commits to skip before starting to return results.

TYPE: int, default=0 DEFAULT: 0

include_model

Whether to include the model information in the commit data.

TYPE: bool, default=False DEFAULT: False

YIELDS DESCRIPTION
ListedPromptCommit

A ListedPromptCommit object for each commit.

Note

This method uses pagination to retrieve commits. It will make multiple API calls if necessary to retrieve all commits or up to the specified limit.

pull_prompt

pull_prompt(prompt_identifier: str, *, include_model: bool | None = False) -> Any

Pull a prompt and return it as a LangChain PromptTemplate.

This method requires langchain-core.

PARAMETER DESCRIPTION
prompt_identifier

The identifier of the prompt.

TYPE: str

include_model

Whether to include the model information in the prompt data.

TYPE: bool | None DEFAULT: False

RETURNS DESCRIPTION
Any

The prompt object in the specified format.

TYPE: Any

push_prompt

push_prompt(
    prompt_identifier: str,
    *,
    object: Any | None = None,
    parent_commit_hash: str = "latest",
    is_public: bool | None = None,
    description: str | None = None,
    readme: str | None = None,
    tags: Sequence[str] | None = None,
) -> str

Push a prompt to the LangSmith API.

Can be used to update prompt metadata or prompt content.

If the prompt does not exist, it will be created. If the prompt exists, it will be updated.

PARAMETER DESCRIPTION
prompt_identifier

The identifier of the prompt.

TYPE: str

object

The LangChain object to push.

TYPE: Any | None DEFAULT: None

parent_commit_hash

The parent commit hash. Defaults to "latest".

TYPE: str DEFAULT: 'latest'

is_public

Whether the prompt should be public. If None (default), the current visibility status is maintained for existing prompts. For new prompts, None defaults to private. Set to True to make public, or False to make private.

TYPE: bool | None DEFAULT: None

description

A description of the prompt. Defaults to an empty string.

TYPE: str | None DEFAULT: None

readme

A readme for the prompt. Defaults to an empty string.

TYPE: str | None DEFAULT: None

tags

A list of tags for the prompt. Defaults to an empty list.

TYPE: Sequence[str] | None DEFAULT: None

RETURNS DESCRIPTION
str

The URL of the prompt.

TYPE: str

cleanup

cleanup() -> None

Manually trigger cleanup of the background thread.

evaluate

evaluate(
    target: TARGET_T | Runnable | EXPERIMENT_T | tuple[EXPERIMENT_T, EXPERIMENT_T],
    /,
    data: DATA_T | None = None,
    evaluators: Sequence[EVALUATOR_T] | Sequence[COMPARATIVE_EVALUATOR_T] | None = None,
    summary_evaluators: Sequence[SUMMARY_EVALUATOR_T] | None = None,
    metadata: dict | None = None,
    experiment_prefix: str | None = None,
    description: str | None = None,
    max_concurrency: int | None = 0,
    num_repetitions: int = 1,
    blocking: bool = True,
    experiment: EXPERIMENT_T | None = None,
    upload_results: bool = True,
    error_handling: Literal["log", "ignore"] = "log",
    **kwargs: Any,
) -> ExperimentResults | ComparativeExperimentResults

Evaluate a target system on a given dataset.

PARAMETER DESCRIPTION
target

The target system or experiment(s) to evaluate.

Can be a function that takes a dict and returns a dict, a langchain Runnable, an existing experiment ID, or a two-tuple of experiment IDs.

TYPE: TARGET_T | Runnable | EXPERIMENT_T | Tuple[EXPERIMENT_T, EXPERIMENT_T]

data

The dataset to evaluate on.

Can be a dataset name, a list of examples, or a generator of examples.

TYPE: DATA_T DEFAULT: None

evaluators

A list of evaluators to run on each example. The evaluator signature depends on the target type. Default to None.

TYPE: Sequence[EVALUATOR_T] | Sequence[COMPARATIVE_EVALUATOR_T] | None DEFAULT: None

summary_evaluators

A list of summary evaluators to run on the entire dataset. Should not be specified if comparing two existing experiments.

TYPE: Sequence[SUMMARY_EVALUATOR_T] | None DEFAULT: None

metadata

Metadata to attach to the experiment.

TYPE: dict | None DEFAULT: None

experiment_prefix

A prefix to provide for your experiment name.

TYPE: str | None DEFAULT: None

description

A free-form text description for the experiment.

TYPE: str | None DEFAULT: None

max_concurrency

The maximum number of concurrent evaluations to run.

If None then no limit is set. If 0 then no concurrency.

TYPE: Optional[int], default=0 DEFAULT: 0

blocking

Whether to block until the evaluation is complete.

TYPE: bool, default=True DEFAULT: True

num_repetitions

The number of times to run the evaluation. Each item in the dataset will be run and evaluated this many times. Defaults to 1.

TYPE: int, default=1 DEFAULT: 1

experiment

An existing experiment to extend.

If provided, experiment_prefix is ignored.

For advanced usage only. Should not be specified if target is an existing experiment or two-tuple fo experiments.

TYPE: EXPERIMENT_T | None DEFAULT: None

upload_results

Whether to upload the results to LangSmith.

TYPE: bool, default=True DEFAULT: True

error_handling

How to handle individual run errors.

'log' will trace the runs with the error message as part of the experiment, 'ignore' will not count the run as part of the experiment at all.

TYPE: str, default="log" DEFAULT: 'log'

**kwargs

Additional keyword arguments to pass to the evaluator.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
ExperimentResults

If target is a function, Runnable, or existing experiment.

TYPE: ExperimentResults | ComparativeExperimentResults

ComparativeExperimentResults

If target is a two-tuple of existing experiments.

TYPE: ExperimentResults | ComparativeExperimentResults

Examples:

Prepare the dataset:

from langsmith import Client

client = Client()
dataset = client.clone_public_dataset(
    "https://smith.langchain.com/public/419dcab2-1d66-4b94-8901-0357ead390df/d"
)
dataset_name = "Evaluate Examples"

Basic usage:

def accuracy(outputs: dict, reference_outputs: dict) -> dict:
    # Row-level evaluator for accuracy.
    pred = outputs["response"]
    expected = reference_outputs["answer"]
    return {"score": expected.lower() == pred.lower()}
def precision(outputs: list[dict], reference_outputs: list[dict]) -> dict:
    # Experiment-level evaluator for precision.
    # TP / (TP + FP)
    predictions = [out["response"].lower() for out in outputs]
    expected = [ref["answer"].lower() for ref in reference_outputs]
    # yes and no are the only possible answers
    tp = sum([p == e for p, e in zip(predictions, expected) if p == "yes"])
    fp = sum([p == "yes" and e == "no" for p, e in zip(predictions, expected)])
    return {"score": tp / (tp + fp)}


def predict(inputs: dict) -> dict:
    # This can be any function or just an API call to your app.
    return {"response": "Yes"}


results = client.evaluate(
    predict,
    data=dataset_name,
    evaluators=[accuracy],
    summary_evaluators=[precision],
    experiment_prefix="My Experiment",
    description="Evaluating the accuracy of a simple prediction model.",
    metadata={
        "my-prompt-version": "abcd-1234",
    },
)

Evaluating over only a subset of the examples

experiment_name = results.experiment_name
examples = client.list_examples(dataset_name=dataset_name, limit=5)
results = client.evaluate(
    predict,
    data=examples,
    evaluators=[accuracy],
    summary_evaluators=[precision],
    experiment_prefix="My Experiment",
    description="Just testing a subset synchronously.",
)

Streaming each prediction to more easily + eagerly debug.

results = client.evaluate(
    predict,
    data=dataset_name,
    evaluators=[accuracy],
    summary_evaluators=[precision],
    description="I don't even have to block!",
    blocking=False,
)
for i, result in enumerate(results):  # doctest: +ELLIPSIS
    pass

Using the evaluate API with an off-the-shelf LangChain evaluator:

from langsmith.evaluation import LangChainStringEvaluator
from langchain.chat_models import init_chat_model


def prepare_criteria_data(run: Run, example: Example):
    return {
        "prediction": run.outputs["output"],
        "reference": example.outputs["answer"],
        "input": str(example.inputs),
    }


results = client.evaluate(
    predict,
    data=dataset_name,
    evaluators=[
        accuracy,
        LangChainStringEvaluator("embedding_distance"),
        LangChainStringEvaluator(
            "labeled_criteria",
            config={
                "criteria": {
                    "usefulness": "The prediction is useful if it is correct"
                    " and/or asks a useful followup question."
                },
                "llm": init_chat_model("gpt-4o"),
            },
            prepare_data=prepare_criteria_data,
        ),
    ],
    description="Evaluating with off-the-shelf LangChain evaluators.",
    summary_evaluators=[precision],
)

View the evaluation results for experiment:... Evaluating a LangChain object:

from langchain_core.runnables import chain as as_runnable


@as_runnable
def nested_predict(inputs):
    return {"response": "Yes"}


@as_runnable
def lc_predict(inputs):
    return nested_predict.invoke(inputs)


results = client.evaluate(
    lc_predict,
    data=dataset_name,
    evaluators=[accuracy],
    description="This time we're evaluating a LangChain object.",
    summary_evaluators=[precision],
)

Comparative evaluation:

results = client.evaluate(
    # The target is a tuple of the experiment IDs to compare
    target=(
        "12345678-1234-1234-1234-123456789012",
        "98765432-1234-1234-1234-123456789012",
    ),
    evaluators=[accuracy],
    summary_evaluators=[precision],
)

Evaluate an existing experiment:

results = client.evaluate(
    # The target is the ID of the experiment we are evaluating
    target="12345678-1234-1234-1234-123456789012",
    evaluators=[accuracy],
    summary_evaluators=[precision],
)

Added in langsmith 0.2.0

aevaluate async

aevaluate(
    target: ATARGET_T | AsyncIterable[dict] | Runnable | str | UUID | TracerSession,
    /,
    data: DATA_T | AsyncIterable[Example] | Iterable[Example] | None = None,
    evaluators: Sequence[EVALUATOR_T | AEVALUATOR_T] | None = None,
    summary_evaluators: Sequence[SUMMARY_EVALUATOR_T] | None = None,
    metadata: dict | None = None,
    experiment_prefix: str | None = None,
    description: str | None = None,
    max_concurrency: int | None = 0,
    num_repetitions: int = 1,
    blocking: bool = True,
    experiment: TracerSession | str | UUID | None = None,
    upload_results: bool = True,
    error_handling: Literal["log", "ignore"] = "log",
    **kwargs: Any,
) -> AsyncExperimentResults

Evaluate an async target system on a given dataset.

PARAMETER DESCRIPTION
target

The target system or experiment(s) to evaluate.

Can be an async function that takes a dict and returns a dict, a langchain Runnable, an existing experiment ID, or a two-tuple of experiment IDs.

TYPE: ATARGET_T | AsyncIterable[dict] | Runnable | str | UUID | TracerSession

data

The dataset to evaluate on.

Can be a dataset name, a list of examples, an async generator of examples, or an async iterable of examples.

TYPE: DATA_T | AsyncIterable[Example] DEFAULT: None

evaluators

A list of evaluators to run on each example.

TYPE: Sequence[EVALUATOR_T] | None DEFAULT: None

summary_evaluators

A list of summary evaluators to run on the entire dataset.

TYPE: Sequence[SUMMARY_EVALUATOR_T] | None DEFAULT: None

metadata

Metadata to attach to the experiment.

TYPE: dict | None DEFAULT: None

experiment_prefix

A prefix to provide for your experiment name.

TYPE: str | None DEFAULT: None

description

A description of the experiment.

TYPE: str | None DEFAULT: None

max_concurrency

The maximum number of concurrent evaluations to run.

If None then no limit is set. If 0 then no concurrency.

TYPE: Optional[int], default=0 DEFAULT: 0

num_repetitions

The number of times to run the evaluation. Each item in the dataset will be run and evaluated this many times. Defaults to 1.

TYPE: int, default=1 DEFAULT: 1

blocking

Whether to block until the evaluation is complete.

TYPE: bool, default=True DEFAULT: True

experiment

An existing experiment to extend.

If provided, experiment_prefix is ignored.

For advanced usage only.

TYPE: TracerSession | None DEFAULT: None

upload_results

Whether to upload the results to LangSmith.

TYPE: bool, default=True DEFAULT: True

error_handling

How to handle individual run errors.

'log' will trace the runs with the error message as part of the experiment, 'ignore' will not count the run as part of the experiment at all.

TYPE: str, default="log" DEFAULT: 'log'

**kwargs

Additional keyword arguments to pass to the evaluator.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
AsyncExperimentResults

An async iterator over the experiment results.

Environment
  • LANGSMITH_TEST_CACHE: If set, API calls will be cached to disk to save time and cost during testing.

    Recommended to commit the cache files to your repository for faster CI/CD runs.

    Requires the 'langsmith[vcr]' package to be installed.

Examples:

Prepare the dataset:

import asyncio
from langsmith import Client

client = Client()
dataset = client.clone_public_dataset(
    "https://smith.langchain.com/public/419dcab2-1d66-4b94-8901-0357ead390df/d"
)
dataset_name = "Evaluate Examples"

Basic usage:

def accuracy(outputs: dict, reference_outputs: dict) -> dict:
    # Row-level evaluator for accuracy.
    pred = outputs["resposen"]
    expected = reference_outputs["answer"]
    return {"score": expected.lower() == pred.lower()}


def precision(outputs: list[dict], reference_outputs: list[dict]) -> dict:
    # Experiment-level evaluator for precision.
    # TP / (TP + FP)
    predictions = [out["response"].lower() for out in outputs]
    expected = [ref["answer"].lower() for ref in reference_outputs]
    # yes and no are the only possible answers
    tp = sum([p == e for p, e in zip(predictions, expected) if p == "yes"])
    fp = sum([p == "yes" and e == "no" for p, e in zip(predictions, expected)])
    return {"score": tp / (tp + fp)}


async def apredict(inputs: dict) -> dict:
    # This can be any async function or just an API call to your app.
    await asyncio.sleep(0.1)
    return {"response": "Yes"}


results = asyncio.run(
    client.aevaluate(
        apredict,
        data=dataset_name,
        evaluators=[accuracy],
        summary_evaluators=[precision],
        experiment_prefix="My Experiment",
        description="Evaluate the accuracy of the model asynchronously.",
        metadata={
            "my-prompt-version": "abcd-1234",
        },
    )
)

Evaluating over only a subset of the examples using an async generator:

async def example_generator():
    examples = client.list_examples(dataset_name=dataset_name, limit=5)
    for example in examples:
        yield example


results = asyncio.run(
    client.aevaluate(
        apredict,
        data=example_generator(),
        evaluators=[accuracy],
        summary_evaluators=[precision],
        experiment_prefix="My Subset Experiment",
        description="Evaluate a subset of examples asynchronously.",
    )
)

Streaming each prediction to more easily + eagerly debug.

results = asyncio.run(
    client.aevaluate(
        apredict,
        data=dataset_name,
        evaluators=[accuracy],
        summary_evaluators=[precision],
        experiment_prefix="My Streaming Experiment",
        description="Streaming predictions for debugging.",
        blocking=False,
    )
)


async def aenumerate(iterable):
    async for elem in iterable:
        print(elem)


asyncio.run(aenumerate(results))

Running without concurrency:

results = asyncio.run(
    client.aevaluate(
        apredict,
        data=dataset_name,
        evaluators=[accuracy],
        summary_evaluators=[precision],
        experiment_prefix="My Experiment Without Concurrency",
        description="This was run without concurrency.",
        max_concurrency=0,
    )
)

Using Async evaluators:

async def helpfulness(outputs: dict) -> dict:
    # Row-level evaluator for helpfulness.
    await asyncio.sleep(5)  # Replace with your LLM API call
    return {"score": outputs["output"] == "Yes"}


results = asyncio.run(
    client.aevaluate(
        apredict,
        data=dataset_name,
        evaluators=[helpfulness],
        summary_evaluators=[precision],
        experiment_prefix="My Helpful Experiment",
        description="Applying async evaluators example.",
    )
)

Evaluate an existing experiment:

results = asyncio.run(
    client.aevaluate(
        # The target is the ID of the experiment we are evaluating
        target="419dcab2-1d66-4b94-8901-0357ead390df",
        evaluators=[accuracy, helpfulness],
        summary_evaluators=[precision],
    )
)

Added in langsmith 0.2.0

get_experiment_results

get_experiment_results(
    name: str | None = None,
    project_id: UUID | None = None,
    preview: bool = False,
    comparative_experiment_id: UUID | None = None,
    filters: dict[UUID, list[str]] | None = None,
    limit: int | None = None,
) -> ExperimentResults

Get results for an experiment, including experiment session aggregated stats and experiment runs for each dataset example.

Experiment results may not be available immediately after the experiment is created.

PARAMETER DESCRIPTION
name

The experiment name.

TYPE: str | None DEFAULT: None

project_id

Experiment's tracing project id, also called session_id, can be found in the url of the LS experiment page

TYPE: UUID | None DEFAULT: None

preview

Whether to return lightweight preview data only. When True, fetches inputs_preview/outputs_preview summaries instead of full inputs/outputs from S3 storage. Faster and less bandwidth.

TYPE: bool DEFAULT: False

comparative_experiment_id

Optional comparative experiment UUID for pairwise comparison experiment results.

TYPE: UUID | None DEFAULT: None

filters

Optional filters to apply to results

TYPE: dict[UUID, list[str]] | None DEFAULT: None

limit

Maximum number of results to return

TYPE: int | None DEFAULT: None

RETURNS DESCRIPTION
ExperimentResults

ExperimentResults with: - feedback_stats: Combined feedback statistics including session-level feedback - run_stats: Aggregated run statistics (latency, tokens, cost, etc.) - examples_with_runs: Iterator of ExampleWithRuns

RAISES DESCRIPTION
ValueError

If project not found for the given session_id

Example
client = Client()
results = client.get_experiment_results(
    project_id="037ae90f-f297-4926-b93c-37d8abf6899f",
)
for example_with_runs in results["examples_with_runs"]:
    print(example_with_runs.dict())

# Access aggregated experiment statistics
print(f"Total runs: {results['run_stats']['run_count']}")
print(f"Total cost: {results['run_stats']['total_cost']}")
print(f"P50 latency: {results['run_stats']['latency_p50']}")

# Access feedback statistics
print(f"Feedback stats: {results['feedback_stats']}")

generate_insights

generate_insights(
    *,
    chat_histories: list[list[dict]],
    instructions: str = DEFAULT_INSTRUCTIONS,
    name: str | None = None,
    model: Literal["openai", "anthropic"] | None = None,
    openai_api_key: str | None = None,
    anthropic_api_key: str | None = None,
) -> InsightsReport

Generate Insights over your agent chat histories.

Note

  • Only available to Plus and higher tier LangSmith users.
  • Insights Agent uses user's model API key. The cost of the report grows linearly with the number of chat histories you upload and the size of each history. For more see insights.
  • This method will upload your chat histories as traces to LangSmith.
  • If you pass in a model API key this will be set as a workspace secret meaning it will be usedin for evaluators and the playground.
PARAMETER DESCRIPTION
chat_histories

A list of chat histories. Each chat history should be a list of messages. We recommend formatting these as OpenAI messages with a "role" and "content" key. Max length 1000 items.

TYPE: list[list[dict]]

instructions

Instructions for the Insights agent. Should focus on what your agent does and what types of insights you want to generate.

TYPE: str DEFAULT: DEFAULT_INSTRUCTIONS

name

Name for the generated Insights report.

TYPE: str | None DEFAULT: None

model

Whether to use OpenAI or Anthropic models. This will impact the cost of generating the Insights Report.

TYPE: Literal['openai', 'anthropic'] | None DEFAULT: None

openai_api_key

OpenAI API key to use. Only needed if you have not already stored this in LangSmith as a workspace secret.

TYPE: str | None DEFAULT: None

anthropic_api_key

Anthropic API key to use. Only needed if you have not already stored this in LangSmith as a workspace secret.

TYPE: str | None DEFAULT: None

Example
import os
from langsmith import Client

client = client()

chat_histories = [
    [
        {"role": "user", "content": "how are you"},
        {"role": "assistant", "content": "good!"},
    ],
    [
        {"role": "user", "content": "do you like art"},
        {"role": "assistant", "content": "only Tarkovsky"},
    ],
]

report = client.generate_insights(
    chat_histories=chat_histories,
    name="Conversation Topics",
    instructions="What are the high-level topics of conversations users are having with the assistant?",
    openai_api_key=os.environ["OPENAI_API_KEY"],
)

# client.poll_insights(report=report)

poll_insights

poll_insights(
    *,
    report: InsightsReport | None = None,
    id: str | UUID | None = None,
    project_id: str | UUID | None = None,
    rate: int = 30,
    timeout: int = 30 * 60,
    verbose: bool = False,
) -> InsightsReport

Poll the status of an Insights report.

PARAMETER DESCRIPTION
report

THe InsightsReport.

TYPE: InsightsReport | None DEFAULT: None

id

The Insights report ID. Should only specify if 'report' is not specified.

TYPE: str | UUID | None DEFAULT: None

project_id

The Tracing project ID. Should only specify if 'report' is not specified.

TYPE: str | UUID | None DEFAULT: None

close_session

close_session(session: Session) -> None

Close the session.

PARAMETER DESCRIPTION
session

The session to close.

TYPE: Session

convert_prompt_to_openai_format

convert_prompt_to_openai_format(
    messages: Any, model_kwargs: dict[str, Any] | None = None
) -> dict

Convert a prompt to OpenAI format.

Requires the langchain_openai package to be installed.

PARAMETER DESCRIPTION
messages

The messages to convert.

TYPE: Any

model_kwargs

Model configuration arguments including stop and any other required arguments.

TYPE: Dict[str, Any] | None DEFAULT: None

RETURNS DESCRIPTION
dict

The prompt in OpenAI format.

TYPE: dict

RAISES DESCRIPTION
ImportError

If the langchain_openai package is not installed.

LangSmithError

If there is an error during the conversion process.

convert_prompt_to_anthropic_format

convert_prompt_to_anthropic_format(
    messages: Any, model_kwargs: dict[str, Any] | None = None
) -> dict

Convert a prompt to Anthropic format.

Requires the langchain_anthropic package to be installed.

PARAMETER DESCRIPTION
messages

The messages to convert.

TYPE: Any

model_kwargs

Model configuration arguments including model_name and stop.

TYPE: Dict[str, Any] | None DEFAULT: None

RETURNS DESCRIPTION
dict

The prompt in Anthropic format.

TYPE: dict

dump_model

dump_model(model) -> dict[str, Any]

Dump model depending on pydantic version.

prep_obj_for_push

prep_obj_for_push(obj: Any) -> Any

Format the object so its Prompt Hub compatible.