LangChain Reference home pageLangChain ReferenceLangChain Reference
  • GitHub
  • Main Docs
Deep Agents
LangChain
LangGraph
Integrations
LangSmith
  • Overview
  • Client
  • AsyncClient
  • Run Helpers
  • Run Trees
  • Evaluation
  • Schemas
  • Utilities
  • Wrappers
  • Anonymizer
  • Testing
  • Expect API
  • Middleware
  • Pytest Plugin
  • Deployment SDK
  • RemoteGraph
⌘I

LangChain Assistant

Ask a question to get started

Enter to send•Shift+Enter new line

Menu

OverviewClientAsyncClientRun HelpersRun TreesEvaluationSchemasUtilitiesWrappersAnonymizerTestingExpect APIMiddlewarePytest PluginDeployment SDKRemoteGraph
Language
Theme
Pythonlangsmithevaluation_arunneraevaluate
Function●Since v0.1

aevaluate

Evaluate an async target system on a given dataset.

Copy
aevaluate(
  target: Union[ATARGET_T, AsyncIterable[dict], Runnable, str, uuid.UUID, schemas.TracerSession],
  ,
  data: Union[DATA_T, AsyncIterable[schemas.Example], Iterable[schemas.Example], None] = None,
  evaluators: Optional[Sequence[Union[EVALUATOR_T, AEVALUATOR_T]]] = None,
  summary_evaluators: Optional[Sequence[SUMMARY_EVALUATOR_T]] = None,
  metadata: Optional[dict] = None,
  experiment_prefix: Optional[str] = None,
  description: Optional[str] = None,
  max_concurrency: Optional[int] = 0,
  num_repetitions: int = 1,
  client: Optional[langsmith.Client] = None,
  blocking: bool = True,
  experiment: Optional[Union[schemas.TracerSession, str, uuid.UUID]] = None,
  upload_results: bool = True,
  error_handling: Literal['log', 'ignore'] = 'log',
  **kwargs: Any = {}
) -> AsyncExperimentResults

Environment:

  • LANGSMITH_TEST_CACHE: If set, API calls will be cached to disk to save time and cost during testing.

Recommended to commit the cache files to your repository for faster CI/CD runs.

Requires the 'langsmith[vcr]' package to be installed.

'max_concurrency' default updated from None (no limit on concurrency) to 0 (no concurrency at all).

Used in Docs

  • How to evaluate a graph
  • How to evaluate a runnable
  • How to handle model rate limits
  • How to run an evaluation asynchronously

Parameters

NameTypeDescription
target*AsyncCallable[[dict], dict] | AsyncIterable[dict] | Runnable | EXPERIMENT_T | Tuple[EXPERIMENT_T, EXPERIMENT_T]

The target system or experiment(s) to evaluate.

Can be an async function that takes a dict and returns a dict, a langchain Runnable, an existing experiment ID, or a two-tuple of experiment IDs.

dataUnion[DATA_T, AsyncIterable[schemas.Example]]
Default:None

The dataset to evaluate on.

Can be a dataset name, a list of examples, an async generator of examples, or an async iterable of examples.

evaluatorsOptional[Sequence[EVALUATOR_T]]
Default:None

A list of evaluators to run on each example.

summary_evaluatorsOptional[Sequence[SUMMARY_EVALUATOR_T]]
Default:None

A list of summary evaluators to run on the entire dataset.

metadataOptional[dict]
Default:None

Metadata to attach to the experiment.

experiment_prefixOptional[str]
Default:None

A prefix to provide for your experiment name.

descriptionOptional[str]
Default:None

A description of the experiment.

max_concurrencyint | None
Default:0

The maximum number of concurrent evaluations to run.

If None then no limit is set. If 0 then no concurrency.

num_repetitionsint
Default:1

The number of times to run the evaluation. Each item in the dataset will be run and evaluated this many times.

clientOptional[langsmith.Client]
Default:None

The LangSmith client to use.

blockingbool
Default:True

Whether to block until the evaluation is complete.

experimentOptional[schemas.TracerSession]
Default:None

An existing experiment to extend.

If provided, experiment_prefix is ignored. For advanced usage only.

error_handlingstr, default="log"
Default:'log'

How to handle individual run errors.

'log' will trace the runs with the error message as part of the experiment, 'ignore' will not count the run as part of the experiment at all.

View source on GitHub