Utilities for running language models or Chains over datasets.
Load the requested evaluation chain specified by a string.
evaluator : EvaluatorType The type of evaluator to load. llm : BaseLanguageModel, optional The language model to use for evaluation, by default None **kwargs : Any Additional keyword arguments to pass to the evaluator.
Chain The loaded evaluation chain.
from langchain_classic.evaluation import load_evaluator, EvaluatorType evaluator = load_evaluator(EvaluatorType.QA)
Run on dataset.
Run the Chain or language model on a dataset and store traces to the specified project name.
For the (usually faster) async version of this function,
see arun_on_dataset.
Run on dataset.
Run the Chain or language model on a dataset and store traces to the specified project name.
For the (usually faster) async version of this function,
see arun_on_dataset.
Abstract base class for creating structured sequences of calls to components.
Chains should be used to encode a sequence of calls to components like models, document retrievers, other chains, etc., and provide a simple interface to this sequence.
The types of the evaluators.
Compare the output of two models (or two outputs of the same model).
String evaluator interface.
Grade, tag, or otherwise evaluate predictions relative to their inputs and/or reference labels.
Raised when the input format is invalid.
A dictionary of the results of a single test run.
Your architecture raised an error.
Input for a chat model.
LangSmith evaluation utilities.
This module provides utilities for evaluating Chains and other language model applications using LangChain evaluators and LangSmith.
For more information on the LangSmith API, see the LangSmith API documentation.
Example
from langsmith import Client
from langchain_openai import ChatOpenAI
from langchain_classic.chains import LLMChain
from langchain_classic.smith import EvaluatorType, RunEvalConfig, run_on_dataset
def construct_chain():
model = ChatOpenAI(temperature=0)
chain = LLMChain.from_string(model, "What's the answer to {your_input_key}")
return chain
evaluation_config = RunEvalConfig(
evaluators=[
EvaluatorType.QA, # "Correctness" against a reference answer
EvaluatorType.EMBEDDING_DISTANCE,
RunEvalConfig.Criteria("helpfulness"),
RunEvalConfig.Criteria(
{
"fifth-grader-score": "Do you have to be smarter than a fifth "
"grader to answer this question?"
}
),
]
)
client = Client()
run_on_dataset(
client, "<my_dataset_name>", construct_chain, evaluation=evaluation_config
)
Attributes
arun_on_dataset: Asynchronous function to evaluate a chain or other LangChain
component over a dataset.run_on_dataset: Function to evaluate a chain or other LangChain component over a
dataset.RunEvalConfig: Class representing the configuration for running evaluation.StringRunEvaluatorChain: Class representing a string run evaluator chain.InputFormatError: Exception raised when the input format is incorrect.Configuration for run evaluators.
A simple progress bar for the console.