Evaluate an async target system on a given dataset.
aevaluate(
target: Union[ATARGET_T, AsyncIterable[dict], Runnable, str, uuid.UUID, schemas.TracerSession],
,
data: Union[DATA_T, AsyncIterable[schemas.Example], Iterable[schemas.Example], None] = None,
evaluators: Optional[Sequence[Union[EVALUATOR_T, AEVALUATOR_T]]] = None,
summary_evaluators: Optional[Sequence[SUMMARY_EVALUATOR_T]] = None,
metadata: Optional[dict] = None,
experiment_prefix: Optional[str] = None,
description: Optional[str] = None,
max_concurrency: Optional[int] = 0,
num_repetitions: int = 1,
client: Optional[langsmith.Client] = None,
blocking: bool = True,
experiment: Optional[Union[schemas.TracerSession, str, uuid.UUID]] = None,
upload_results: bool = True,
error_handling: Literal['log', 'ignore'] = 'log',
**kwargs: Any = {}
) -> AsyncExperimentResultsEnvironment:
LANGSMITH_TEST_CACHE: If set, API calls will be cached to disk to save time and
cost during testing.Recommended to commit the cache files to your repository for faster CI/CD runs.
Requires the 'langsmith[vcr]' package to be installed.
'max_concurrency' default updated from None (no limit on concurrency) to 0 (no concurrency at all).
| Name | Type | Description |
|---|---|---|
target* | AsyncCallable[[dict], dict] | AsyncIterable[dict] | Runnable | EXPERIMENT_T | Tuple[EXPERIMENT_T, EXPERIMENT_T] | The target system or experiment(s) to evaluate. Can be an async function that takes a |
data | Union[DATA_T, AsyncIterable[schemas.Example]] | Default: NoneThe dataset to evaluate on. Can be a dataset name, a list of examples, an async generator of examples, or an async iterable of examples. |
evaluators | Optional[Sequence[EVALUATOR_T]] | Default: NoneA list of evaluators to run on each example. |
summary_evaluators | Optional[Sequence[SUMMARY_EVALUATOR_T]] | Default: NoneA list of summary evaluators to run on the entire dataset. |
metadata | Optional[dict] | Default: NoneMetadata to attach to the experiment. |
experiment_prefix | Optional[str] | Default: NoneA prefix to provide for your experiment name. |
description | Optional[str] | Default: NoneA description of the experiment. |
max_concurrency | int | None | Default: 0The maximum number of concurrent evaluations to run. If |
num_repetitions | int | Default: 1The number of times to run the evaluation. Each item in the dataset will be run and evaluated this many times. |
client | Optional[langsmith.Client] | Default: NoneThe LangSmith client to use. |
blocking | bool | Default: TrueWhether to block until the evaluation is complete. |
experiment | Optional[schemas.TracerSession] | Default: NoneAn existing experiment to extend. If provided, |
error_handling | str, default="log" | Default: 'log'How to handle individual run errors.
|