Comparison evaluators.
This module contains evaluators for comparing the output of two models, be they LLMs, Chains, or otherwise. This can be used for scoring preferences, measuring similarity / semantic equivalence between outputs, or any other comparison task.
Example:
from langchain_openai import ChatOpenAI from langchain_classic.evaluation.comparison import PairwiseStringEvalChain llm = ChatOpenAI(temperature=0) chain = PairwiseStringEvalChain.from_llm(llm=llm) result = chain.evaluate_string_pairs( ... input = "What is the chemical formula for water?", ... prediction = "H2O", ... prediction_b = ( ... "The chemical formula for water is H2O, which means" ... " there are two hydrogen atoms and one oxygen atom." ... reference = "The chemical formula for water is H2O.", ... ) print(result)
Labeled Pairwise String Evaluation Chain.
A chain for comparing two outputs, such as the outputs of two models, prompts, or outputs of a single model on similar inputs, with labeled preferences.
Pairwise String Evaluation Chain.
A chain for comparing two outputs, such as the outputs of two models, prompts, or outputs of a single model on similar inputs.
Prompts for comparing the outputs of two models for a given question.
This prompt is used to compare two responses and evaluate which one best follows the instructions and answers the question. The prompt is based on the paper from Zheng, et. al. https://arxiv.org/abs/2306.05685
Base classes for comparing the output of two models.