| Name | Type | Description |
|---|---|---|
target* | Union[ATARGET_T, AsyncIterable[dict], Runnable, str, uuid.UUID, TracerSession] | |
data | Union[DATA_T, AsyncIterable[Example]] | Default: None |
evaluators | Optional[Sequence[EVALUATOR_T]] | Default: None |
summary_evaluators | Optional[Sequence[SUMMARY_EVALUATOR_T]] | Default: None |
metadata | Optional[dict] | Default: None |
experiment_prefix | Optional[str] | Default: None |
description | Optional[str] | Default: None |
max_concurrency | Optional[int], default=0 | Default: 0 |
num_repetitions | int, default=1 | Default: 1 |
blocking | bool, default=True | Default: True |
experiment | Optional[TracerSession] | Default: None |
upload_results | bool, default=True | Default: True |
error_handling | str, default="log" | Default: 'log' |
**kwargs | Any | Default: {} |
Evaluate an async target system on a given dataset.
Environment:
LANGSMITH_TEST_CACHE: If set, API calls will be cached to disk to save time and
cost during testing.Recommended to commit the cache files to your repository for faster CI/CD runs.
Requires the 'langsmith[vcr]' package to be installed.
The target system or experiment(s) to evaluate.
Can be an async function that takes a dict and returns a dict, a langchain Runnable, an
existing experiment ID, or a two-tuple of experiment IDs.
The dataset to evaluate on.
Can be a dataset name, a list of examples, an async generator of examples, or an async iterable of examples.
A list of evaluators to run on each example.
A list of summary evaluators to run on the entire dataset.
Metadata to attach to the experiment.
A prefix to provide for your experiment name.
A description of the experiment.
The maximum number of concurrent evaluations to run.
If None then no limit is set. If 0 then no concurrency.
The number of times to run the evaluation. Each item in the dataset will be run and evaluated this many times. Defaults to 1.
Whether to block until the evaluation is complete.
An existing experiment to extend.
If provided, experiment_prefix is ignored.
For advanced usage only.
Whether to upload the results to LangSmith.
How to handle individual run errors.
'log' will trace the runs with the error message as part of the
experiment, 'ignore' will not count the run as part of the experiment at
all.
Additional keyword arguments to pass to the evaluator.