LangChain Reference home pageLangChain ReferenceLangChain Reference
  • GitHub
  • Main Docs
Deep Agents
LangChain
LangGraph
Integrations
LangSmith
  • Overview
  • MCP Adapters
    • Overview
    • Agents
    • Callbacks
    • Chains
    • Chat models
    • Embeddings
    • Evaluation
    • Globals
    • Hub
    • Memory
    • Output parsers
    • Retrievers
    • Runnables
    • LangSmith
    • Storage
    Standard Tests
    Text Splitters
    ⌘I

    LangChain Assistant

    Ask a question to get started

    Enter to send•Shift+Enter new line

    Menu

    MCP Adapters
    OverviewAgentsCallbacksChainsChat modelsEmbeddingsEvaluationGlobalsHubMemoryOutput parsersRetrieversRunnablesLangSmithStorage
    Standard Tests
    Text Splitters
    Language
    Theme
    Pythonlangchain-classicevaluationscoring
    Module●Since v1.0

    scoring

    Scoring evaluators.

    This module contains evaluators for scoring on a 1-10 the output of models, be they LLMs, Chains, or otherwise. This can be based on a variety of criteria and or a reference answer.

    Example:

    from langchain_openai import ChatOpenAI from langchain_classic.evaluation.scoring import ScoreStringEvalChain model = ChatOpenAI(temperature=0, model_name="gpt-4") chain = ScoreStringEvalChain.from_llm(llm=model) result = chain.evaluate_strings( ... input="What is the chemical formula for water?", ... prediction="H2O", ... reference="The chemical formula for water is H2O.", ... ) print(result)

    {

    "score": 8,

    "comment": "The response accurately states "

    "that the chemical formula for water is H2O."

    "However, it does not provide an explanation of what the formula means."

    }

    Classes

    class
    LabeledScoreStringEvalChain

    A chain for scoring the output of a model on a scale of 1-10.

    class
    ScoreStringEvalChain

    A chain for scoring on a scale of 1-10 the output of a model.

    Modules

    module
    prompt

    Prompts for scoring the outputs of a models for a given question.

    This prompt is used to score the responses and evaluate how it follows the instructions and answers the question. The prompt is based on the paper from Zheng, et. al. https://arxiv.org/abs/2306.05685

    module
    eval_chain

    Base classes for scoring the output of a model on a scale of 1-10.

    View source on GitHub