Rubric middleware for self-evaluated agent iteration.
RubricMiddleware lets a caller declare what done looks like via a
rubric. Each time the agent would otherwise finish — i.e. the model
returns a response with no further tool calls — the middleware invokes a
separate grader sub-agent against the transcript. If the grader returns
needs_revision, its feedback is injected as a HumanMessage and the
agent loop resumes. Grading repeats until the grader returns satisfied
or failed, or max_iterations is reached.
Verdict the grader sub-agent emits via structured output.
satisfied: every criterion passes.needs_revision: at least one criterion fails; loop continues.failed: the rubric itself is malformed or impossible to evaluate
against the transcript.Tag stored on synthetic revision messages this middleware injects.
The revision message is injected as a HumanMessage (the role the model
follows most reliably), but it carries:
name="rubric_grader" -- visible at the wire on providers that round-trip
the name field; ignored elsewhere.additional_kwargs={"lc_source": RUBRIC_GRADER_MESSAGE_SOURCE} -- visible
to in-process consumers (evals, UIs, observability) so they can attribute
the turn to the grader instead of treating it as a real user message.This follows the same convention as SummarizationMiddleware, which tags
its synthetic summary messages with lc_source="summarization".
System prompt for the grader sub-agent.
Establishes the grader's role, the <rubric> / <transcript> payload
contract, prompt-injection defenses (transcript content is untrusted
observation, not instructions), and the semantics of each RubricResult
value. Paired with the structured-output GraderResponse schema, which
constrains the grader to one of the allowed result values.
Per-criterion verdict.
Discriminated union on passed: pass-verdicts have no gap; fail-verdicts
require one. GraderResponse.model_validate enforces the shape at the
trust boundary so a grader cannot emit {passed: True, gap: ...} or
{passed: False} with no gap.
Per-criterion grader verdict when the criterion passes.
Per-criterion grader verdict when the criterion fails.
One grader evaluation, appended to _rubric_evaluations each iteration.
Consumers can read any field without guarding against absence since all
fields are always populated by _build_evaluation and
_handle_grader_exception.
State schema for RubricMiddleware.
Only rubric is part of the public I/O schema -- callers write a
rubric and read the improved agent response back from messages.
Everything else is bookkeeping: status, iteration count, accumulated
evaluations, and rubric-attempt tracking are annotated with
PrivateStateAttr
so they are omitted from input/output schemas. Tests, evals, and
observability consumers can still reach them via the on_evaluation
callback, the rubric_evaluation_* stream events, or
agent.get_state(config).values on a checkpointed thread.
Structured output the grader sub-agent must emit.
Passed as response_format=GraderResponse to create_agent so the
underlying provider's structured output strategy is auto-selected.
Middleware that drives self-evaluated iteration against a rubric.
The middleware activates only when a caller passes a rubric on
invocation state. With no rubric, both before_agent and after_agent
return without modifying state, so the middleware is safe to include
unconditionally in a create_deep_agent stack.
When grading ends with failed, max_iterations_reached, or
grader_error, the middleware does not mutate the response
messages. The last AIMessage in the agent's output is whatever
the model produced just before the grader gave up. Callers who
need to branch on non-satisfied termination must inspect one of:
_rubric_status on the returned state (or agent.get_state(...)
on a checkpointed thread),on_evaluation callback,rubric_evaluation_end stream event.A logger.warning is also emitted when max_iterations_reached
fires.
Status recorded on each evaluation.
Superset of GraderVerdict with two middleware-synthesized terminal
statuses the grader cannot emit itself:
max_iterations_reached: the iteration cap fired on a needs_revision
verdict; the agent terminates with its last response intact.grader_error: the grader sub-agent raised an exception (provider
timeout, missing credentials, malformed structured response, etc.).
Distinct from failed, which the grader returns about the rubric,
not about its own machinery.Only needs_revision continues the loop; every other status ends the
grading run.