LangChain Reference home pageLangChain ReferenceLangChain Reference
  • GitHub
  • Main Docs
Deep Agents
LangChain
LangGraph
Integrations
LangSmith
  • Overview
  • MCP Adapters
    • Overview
    • Agents
    • Callbacks
    • Chains
    • Chat models
    • Embeddings
    • Evaluation
    • Globals
    • Hub
    • Memory
    • Output parsers
    • Retrievers
    • Runnables
    • LangSmith
    • Storage
    Standard Tests
    Text Splitters
    ⌘I

    LangChain Assistant

    Ask a question to get started

    Enter to send•Shift+Enter new line

    Menu

    MCP Adapters
    OverviewAgentsCallbacksChainsChat modelsEmbeddingsEvaluationGlobalsHubMemoryOutput parsersRetrieversRunnablesLangSmithStorage
    Standard Tests
    Text Splitters
    Language
    Theme
    Pythonlangchain-classicsmithevaluation
    Module●Since v1.0

    evaluation

    LangSmith evaluation utilities.

    This module provides utilities for evaluating Chains and other language model applications using LangChain evaluators and LangSmith.

    For more information on the LangSmith API, see the LangSmith API documentation.

    Example

    from langsmith import Client
    from langchain_openai import ChatOpenAI
    from langchain_classic.chains import LLMChain
    from langchain_classic.smith import EvaluatorType, RunEvalConfig, run_on_dataset
    
    def construct_chain():
        model = ChatOpenAI(temperature=0)
        chain = LLMChain.from_string(model, "What's the answer to {your_input_key}")
        return chain
    
    evaluation_config = RunEvalConfig(
        evaluators=[
            EvaluatorType.QA,  # "Correctness" against a reference answer
            EvaluatorType.EMBEDDING_DISTANCE,
            RunEvalConfig.Criteria("helpfulness"),
            RunEvalConfig.Criteria(
                {
                    "fifth-grader-score": "Do you have to be smarter than a fifth "
                    "grader to answer this question?"
                }
            ),
        ]
    )
    
    client = Client()
    run_on_dataset(
        client, "<my_dataset_name>", construct_chain, evaluation=evaluation_config
    )

    Attributes

    • arun_on_dataset: Asynchronous function to evaluate a chain or other LangChain component over a dataset.
    • run_on_dataset: Function to evaluate a chain or other LangChain component over a dataset.
    • RunEvalConfig: Class representing the configuration for running evaluation.
    • StringRunEvaluatorChain: Class representing a string run evaluator chain.
    • InputFormatError: Exception raised when the input format is incorrect.

    Functions

    function
    arun_on_dataset

    Run on dataset.

    Run the Chain or language model on a dataset and store traces to the specified project name.

    For the (usually faster) async version of this function, see arun_on_dataset.

    function
    run_on_dataset

    Run on dataset.

    Run the Chain or language model on a dataset and store traces to the specified project name.

    For the (usually faster) async version of this function, see arun_on_dataset.

    Classes

    class
    RunEvalConfig

    Configuration for a run evaluation.

    class
    InputFormatError

    Raised when the input format is invalid.

    class
    StringRunEvaluatorChain

    Evaluate Run and optional examples.

    Modules

    module
    string_run_evaluator

    Run evaluator wrapper for string evaluators.

    module
    runner_utils

    Utilities for running language models or Chains over datasets.

    module
    name_generation
    module
    config

    Configuration for run evaluators.

    module
    progress

    A simple progress bar for the console.

    View source on GitHub