Method●Since v0.2

aevaluate

Evaluate an async target system on a given dataset.

aevaluate(
  self,
  target: Union[ATARGET_T, AsyncIterable[dict], Runnable, str, uuid.UUID, schemas.TracerSession],
  ,
  data: Union[DATA_T, AsyncIterable[schemas.Example], Iterable[schemas.Example], None] = None,
  evaluators: Optional[Sequence[Union[EVALUATOR_T, AEVALUATOR_T]]] = None,
  summary_evaluators: Optional[Sequence[SUMMARY_EVALUATOR_T]] = None,
  metadata: Optional[dict] = None,
  experiment_prefix: Optional[str] = None,
  description: Optional[str] = None,
  max_concurrency: Optional[int] = 0,
  num_repetitions: int = 1,
  blocking: bool = True,
  experiment: Optional[Union[schemas.TracerSession, str, uuid.UUID]] = None,
  upload_results: bool = True,
  error_handling: Literal['log', 'ignore'] = 'log',
  **kwargs: Any = {}
) -> AsyncExperimentResults

Environment:

LANGSMITH_TEST_CACHE: If set, API calls will be cached to disk to save time and cost during testing.

Recommended to commit the cache files to your repository for faster CI/CD runs.

Requires the 'langsmith[vcr]' package to be installed.

Parameters

Name	Type	Description
`target`*	`Union[ATARGET_T, AsyncIterable[dict], Runnable, str, uuid.UUID, TracerSession]`	The target system or experiment(s) to evaluate. Can be an async function that takes a `dict` and returns a `dict`, a langchain `Runnable`, an existing experiment ID, or a two-tuple of experiment IDs.
`data`	`Union[DATA_T, AsyncIterable[Example]]`	Default:`None` The dataset to evaluate on. Can be a dataset name, a list of examples, an async generator of examples, or an async iterable of examples.
`evaluators`	`Optional[Sequence[EVALUATOR_T]]`	Default:`None` A list of evaluators to run on each example.
`summary_evaluators`	`Optional[Sequence[SUMMARY_EVALUATOR_T]]`	Default:`None` A list of summary evaluators to run on the entire dataset.
`metadata`	`Optional[dict]`	Default:`None` Metadata to attach to the experiment.
`experiment_prefix`	`Optional[str]`	Default:`None` A prefix to provide for your experiment name.
`description`	`Optional[str]`	Default:`None` A description of the experiment.
`max_concurrency`	`Optional[int], default=0`	Default:`0` The maximum number of concurrent evaluations to run. If `None` then no limit is set. If `0` then no concurrency.
`num_repetitions`	`int, default=1`	Default:`1` The number of times to run the evaluation. Each item in the dataset will be run and evaluated this many times. Defaults to 1.
`blocking`	`bool, default=True`	Default:`True` Whether to block until the evaluation is complete.
`experiment`	`Optional[TracerSession]`	Default:`None` An existing experiment to extend. If provided, `experiment_prefix` is ignored. For advanced usage only.
`upload_results`	`bool, default=True`	Default:`True` Whether to upload the results to LangSmith.
`error_handling`	`str, default="log"`	Default:`'log'` How to handle individual run errors. `'log'` will trace the runs with the error message as part of the experiment, `'ignore'` will not count the run as part of the experiment at all.
`**kwargs`	`Any`	Default:`{}` Additional keyword arguments to pass to the evaluator.

View source on GitHub

aevaluate( self, target: Union[ATARGET_T, AsyncIterable[dict], Runnable, str, uuid.UUID, schemas.TracerSession], , data: Union[DATA_T, AsyncIterable[schemas.Example], Iterable[schemas.Example], None] = None, evaluators: Optional[Sequence[Union[EVALUATOR_T, AEVALUATOR_T]]] = None, summary_evaluators: Optional[Sequence[SUMMARY_EVALUATOR_T]] = None, metadata: Optional[dict] = None, experiment_prefix: Optional[str] = None, description: Optional[str] = None, max_concurrency: Optional[int] = 0, num_repetitions: int = 1, blocking: bool = True, experiment: Optional[Union[schemas.TracerSession, str, uuid.UUID]] = None, upload_results: bool = True, error_handling: Literal['log', 'ignore'] = 'log', **kwargs: Any = {} ) -> AsyncExperimentResults

Parameters

Name	Type	Description
`target`*	`Union[ATARGET_T, AsyncIterable[dict], Runnable, str, uuid.UUID, TracerSession]`	The target system or experiment(s) to evaluate. Can be an async function that takes a `dict` and returns a `dict`, a langchain `Runnable`, an existing experiment ID, or a two-tuple of experiment IDs.
`data`	`Union[DATA_T, AsyncIterable[Example]]`	Default:`None` The dataset to evaluate on. Can be a dataset name, a list of examples, an async generator of examples, or an async iterable of examples.
`evaluators`	`Optional[Sequence[EVALUATOR_T]]`	Default:`None` A list of evaluators to run on each example.
`summary_evaluators`	`Optional[Sequence[SUMMARY_EVALUATOR_T]]`	Default:`None` A list of summary evaluators to run on the entire dataset.
`metadata`	`Optional[dict]`	Default:`None` Metadata to attach to the experiment.
`experiment_prefix`	`Optional[str]`	Default:`None` A prefix to provide for your experiment name.
`description`	`Optional[str]`	Default:`None` A description of the experiment.
`max_concurrency`	`Optional[int], default=0`	Default:`0` The maximum number of concurrent evaluations to run. If `None` then no limit is set. If `0` then no concurrency.
`num_repetitions`	`int, default=1`	Default:`1` The number of times to run the evaluation. Each item in the dataset will be run and evaluated this many times. Defaults to 1.
`blocking`	`bool, default=True`	Default:`True` Whether to block until the evaluation is complete.
`experiment`	`Optional[TracerSession]`	Default:`None` An existing experiment to extend. If provided, `experiment_prefix` is ignored. For advanced usage only.
`upload_results`	`bool, default=True`	Default:`True` Whether to upload the results to LangSmith.
`error_handling`	`str, default="log"`	Default:`'log'` How to handle individual run errors. `'log'` will trace the runs with the error message as part of the experiment, `'ignore'` will not count the run as part of the experiment at all.
`**kwargs`	`Any`	Default:`{}` Additional keyword arguments to pass to the evaluator.

aevaluate

Parameters

LangChain Assistant

Menu

aevaluate

Parameters

aevaluate

Used in Docs

Parameters

Menu

aevaluate

Used in Docs

Parameters