# evaluate

> **Function** in `langsmith`

📖 [View in docs](https://reference.langchain.com/python/langsmith/evaluation/_runner/evaluate)

Evaluate a target system on a given dataset.

## Signature

```python
evaluate(
    target: Union[TARGET_T, Runnable, EXPERIMENT_T, tuple[EXPERIMENT_T, EXPERIMENT_T]],
    /,
    data: Optional[DATA_T] = None,
    evaluators: Optional[Union[Sequence[EVALUATOR_T], Sequence[COMPARATIVE_EVALUATOR_T]]] = None,
    summary_evaluators: Optional[Sequence[SUMMARY_EVALUATOR_T]] = None,
    metadata: Optional[dict] = None,
    experiment_prefix: Optional[str] = None,
    description: Optional[str] = None,
    max_concurrency: Optional[int] = 0,
    num_repetitions: int = 1,
    client: Optional[langsmith.Client] = None,
    blocking: bool = True,
    experiment: Optional[EXPERIMENT_T] = None,
    upload_results: bool = True,
    error_handling: Literal['log', 'ignore'] = 'log',
    **kwargs: Any = {},
) -> Union[ExperimentResults, ComparativeExperimentResults]
```

## Description

!!! warning "Behavior changed in `langsmith` 0.2.0"

'max_concurrency' default updated from None (no limit on concurrency)
to 0 (no concurrency at all).

## Parameters

| Name | Type | Required | Description |
|------|------|----------|-------------|
| `target` | `TARGET_T \| Runnable \| EXPERIMENT_T \| Tuple[EXPERIMENT_T, EXPERIMENT_T]` | Yes |  The target system or experiment(s) to evaluate.  Can be a function that takes a dict and returns a `dict`, a langchain `Runnable`, an existing experiment ID, or a two-tuple of experiment IDs. |
| `data` | `DATA_T` | No | The dataset to evaluate on.  Can be a dataset name, a list of examples, or a generator of examples. (default: `None`) |
| `evaluators` | `Sequence[EVALUATOR_T] \| Sequence[COMPARATIVE_EVALUATOR_T] \| None` | No |  A list of evaluators to run on each example. The evaluator signature depends on the target type. (default: `None`) |
| `summary_evaluators` | `Sequence[SUMMARY_EVALUATOR_T] \| None` | No | A list of summary evaluators to run on the entire dataset.  Should not be specified if comparing two existing experiments. (default: `None`) |
| `metadata` | `dict \| None` | No | Metadata to attach to the experiment. (default: `None`) |
| `experiment_prefix` | `str \| None` | No | A prefix to provide for your experiment name. (default: `None`) |
| `description` | `str \| None` | No | A free-form text description for the experiment. (default: `None`) |
| `max_concurrency` | `int \| None` | No | The maximum number of concurrent evaluations to run.  If `None` then no limit is set. If `0` then no concurrency. (default: `0`) |
| `client` | `langsmith.Client \| None` | No | The LangSmith client to use. (default: `None`) |
| `blocking` | `bool` | No | Whether to block until the evaluation is complete. (default: `True`) |
| `num_repetitions` | `int` | No | The number of times to run the evaluation. Each item in the dataset will be run and evaluated this many times. (default: `1`) |
| `experiment` | `schemas.TracerSession \| None` | No | An existing experiment to extend.  If provided, `experiment_prefix` is ignored.  For advanced usage only. Should not be specified if target is an existing experiment or two-tuple fo experiments. (default: `None`) |
| `error_handling` | `str, default="log"` | No | How to handle individual run errors.  `'log'` will trace the runs with the error message as part of the experiment, `'ignore'` will not count the run as part of the experiment at all. (default: `'log'`) |

## Returns

`Union[ExperimentResults, ComparativeExperimentResults]`

If target is a function, `Runnable`, or existing experiment.

---

[View source on GitHub](https://github.com/langchain-ai/langsmith-sdk/blob/6a74bf5af9e542d8065af8edca54b2448f430916/python/langsmith/evaluation/_runner.py#L137)