LangChain Reference home pageLangChain ReferenceLangChain Reference
  • GitHub
  • Main Docs
Deep Agents
LangChain
LangGraph
Integrations
LangSmith
  • Overview
  • MCP Adapters
    Standard Tests
    Text Splitters
    • Overview
    • Agents
    • Callbacks
    • Chains
    • Chat models
    • Embeddings
    • Evaluation
    • Globals
    • Hub
    • Memory
    • Output parsers
    • Retrievers
    • Runnables
    • LangSmith
    • Storage
    ⌘I

    LangChain Assistant

    Ask a question to get started

    Enter to send•Shift+Enter new line

    Menu

    MCP Adapters
    Standard Tests
    Text Splitters
    OverviewAgentsCallbacksChainsChat modelsEmbeddingsEvaluationGlobalsHubMemoryOutput parsersRetrieversRunnablesLangSmithStorage
    Language
    Theme
    Pythonlangchain-classicsmithevaluationrunner_utils
    Module●Since v1.0

    runner_utils

    Attributes

    Functions

    Classes

    Type Aliases

    Modules

    View source on GitHub
    attribute
    logger
    function
    load_evaluator
    function
    arun_on_dataset
    function
    run_on_dataset
    class
    Chain
    class
    EvaluatorType
    class
    PairwiseStringEvaluator
    class
    StringEvaluator
    class
    InputFormatError
    class
    TestResult
    class
    EvalError
    class
    ChatModelInput
    typeAlias
    MODEL_OR_CHAIN_FACTORY: Callable[[], Chain | Runnable] | BaseLanguageModel | Callable[[dict], Any] | Runnable | Chain
    typeAlias
    MCF: Callable[[], Chain | Runnable] | BaseLanguageModel
    module
    smith_eval
    module
    smith_eval_config
    module
    name_generation
    module
    progress

    Utilities for running language models or Chains over datasets.

    Load the requested evaluation chain specified by a string.

    Parameters

    evaluator : EvaluatorType The type of evaluator to load. llm : BaseLanguageModel, optional The language model to use for evaluation, by default None **kwargs : Any Additional keyword arguments to pass to the evaluator.

    Returns:

    Chain The loaded evaluation chain.

    Examples:

    from langchain_classic.evaluation import load_evaluator, EvaluatorType evaluator = load_evaluator(EvaluatorType.QA)

    Run on dataset.

    Run the Chain or language model on a dataset and store traces to the specified project name.

    For the (usually faster) async version of this function, see arun_on_dataset.

    Run on dataset.

    Run the Chain or language model on a dataset and store traces to the specified project name.

    For the (usually faster) async version of this function, see arun_on_dataset.

    Abstract base class for creating structured sequences of calls to components.

    Chains should be used to encode a sequence of calls to components like models, document retrievers, other chains, etc., and provide a simple interface to this sequence.

    The types of the evaluators.

    Compare the output of two models (or two outputs of the same model).

    String evaluator interface.

    Grade, tag, or otherwise evaluate predictions relative to their inputs and/or reference labels.

    Raised when the input format is invalid.

    A dictionary of the results of a single test run.

    Your architecture raised an error.

    Input for a chat model.

    Configuration for run evaluators.

    A simple progress bar for the console.

    LangSmith evaluation utilities.

    This module provides utilities for evaluating Chains and other language model applications using LangChain evaluators and LangSmith.

    For more information on the LangSmith API, see the LangSmith API documentation.

    Example

    from langsmith import Client
    from langchain_openai import ChatOpenAI
    from langchain_classic.chains import LLMChain
    from langchain_classic.smith import EvaluatorType, RunEvalConfig, run_on_dataset
    
    def construct_chain():
        model = ChatOpenAI(temperature=0)
        chain = LLMChain.from_string(model, "What's the answer to {your_input_key}")
        return chain
    
    evaluation_config = RunEvalConfig(
        evaluators=[
            EvaluatorType.QA,  # "Correctness" against a reference answer
            EvaluatorType.EMBEDDING_DISTANCE,
            RunEvalConfig.Criteria("helpfulness"),
            RunEvalConfig.Criteria(
                {
                    "fifth-grader-score": "Do you have to be smarter than a fifth "
                    "grader to answer this question?"
                }
            ),
        ]
    )
    
    client = Client()
    run_on_dataset(
        client, "<my_dataset_name>", construct_chain, evaluation=evaluation_config
    )

    Attributes

    • arun_on_dataset: Asynchronous function to evaluate a chain or other LangChain component over a dataset.
    • run_on_dataset: Function to evaluate a chain or other LangChain component over a dataset.
    • RunEvalConfig: Class representing the configuration for running evaluation.
    • StringRunEvaluatorChain: Class representing a string run evaluator chain.
    • InputFormatError: Exception raised when the input format is incorrect.