Scoring evaluators.
This module contains evaluators for scoring on a 1-10 the output of models, be they LLMs, Chains, or otherwise. This can be based on a variety of criteria and or a reference answer.
Example:
from langchain_openai import ChatOpenAI from langchain_classic.evaluation.scoring import ScoreStringEvalChain model = ChatOpenAI(temperature=0, model_name="gpt-4") chain = ScoreStringEvalChain.from_llm(llm=model) result = chain.evaluate_strings( ... input="What is the chemical formula for water?", ... prediction="H2O", ... reference="The chemical formula for water is H2O.", ... ) print(result)
Prompts for scoring the outputs of a models for a given question.
This prompt is used to score the responses and evaluate how it follows the instructions and answers the question. The prompt is based on the paper from Zheng, et. al. https://arxiv.org/abs/2306.05685
Base classes for scoring the output of a model on a scale of 1-10.