rubric

Rubric middleware for self-evaluated agent iteration.

RubricMiddleware lets a caller declare what done looks like via a rubric. Each time the agent would otherwise finish — i.e. the model returns a response with no further tool calls — the middleware invokes a separate grader sub-agent against the transcript. If the grader returns needs_revision, its feedback is injected as a HumanMessage and the agent loop resumes. Grading repeats until the grader returns satisfied or failed, or max_iterations is reached.

Verdict the grader sub-agent emits via structured output.

satisfied: every criterion passes.
needs_revision: at least one criterion fails; loop continues.
failed: the rubric itself is malformed or impossible to evaluate against the transcript.

Tag stored on synthetic revision messages this middleware injects.

The revision message is injected as a HumanMessage (the role the model follows most reliably), but it carries:

name="rubric_grader" -- visible at the wire on providers that round-trip the name field; ignored elsewhere.
additional_kwargs={"lc_source": RUBRIC_GRADER_MESSAGE_SOURCE} -- visible to in-process consumers (evals, UIs, observability) so they can attribute the turn to the grader instead of treating it as a real user message.

This follows the same convention as SummarizationMiddleware, which tags its synthetic summary messages with lc_source="summarization".

System prompt for the grader sub-agent.

Establishes the grader's role, the <rubric> / <transcript> payload contract, prompt-injection defenses (transcript content is untrusted observation, not instructions), and the semantics of each RubricResult value. Paired with the structured-output GraderResponse schema, which constrains the grader to one of the allowed result values.

Per-criterion verdict.

Discriminated union on passed: pass-verdicts have no gap; fail-verdicts require one. GraderResponse.model_validate enforces the shape at the trust boundary so a grader cannot emit {passed: True, gap: ...} or {passed: False} with no gap.

One grader evaluation, appended to _rubric_evaluations each iteration.

Consumers can read any field without guarding against absence since all fields are always populated by _build_evaluation and _handle_grader_exception.

State schema for RubricMiddleware.

Only rubric is part of the public I/O schema -- callers write a rubric and read the improved agent response back from messages.

Everything else is bookkeeping: status, iteration count, accumulated evaluations, and rubric-attempt tracking are annotated with PrivateStateAttr so they are omitted from input/output schemas. Tests, evals, and observability consumers can still reach them via the on_evaluation callback, the rubric_evaluation_* stream events, or agent.get_state(config).values on a checkpointed thread.

Middleware that drives self-evaluated iteration against a rubric.

The middleware activates only when a caller passes a rubric on invocation state. With no rubric, both before_agent and after_agent return without modifying state, so the middleware is safe to include unconditionally in a create_deep_agent stack.

Observing non-satisfied terminations

When grading ends with failed, max_iterations_reached, or grader_error, the middleware does not mutate the response messages. The last AIMessage in the agent's output is whatever the model produced just before the grader gave up. Callers who need to branch on non-satisfied termination must inspect one of:

_rubric_status on the returned state (or agent.get_state(...) on a checkpointed thread),
the on_evaluation callback,
the rubric_evaluation_end stream event.

A logger.warning is also emitted when max_iterations_reached fires.

Status recorded on each evaluation.

Superset of GraderVerdict with two middleware-synthesized terminal statuses the grader cannot emit itself:

max_iterations_reached: the iteration cap fired on a needs_revision verdict; the agent terminates with its last response intact.
grader_error: the grader sub-agent raised an exception (provider timeout, missing credentials, malformed structured response, etc.).

Distinct from failed, which the grader returns about the rubric, not about its own machinery.

Only needs_revision continues the loop; every other status ends the grading run.

LangChain Assistant

Menu

Attributes

Classes

Type Aliases