stream_chunk_timeout

Per-chunk wall-clock timeout (seconds) on async streaming responses.

Applies to async invocations only (astream, ainvoke with streaming, etc.). Sync streaming (stream) is not affected.

Fires between content chunks yielded by the openai SDK's streaming iterator (i.e., each call to __anext__ on the response). Crucially, this is not the same as httpx's timeout.read:

httpx's read timeout is inter-byte and gets reset every time any bytes arrive on the socket — including OpenAI's SSE keepalive comments (: keepalive) that trickle down during long model generations. A stream that's silent on content but still producing keepalives looks alive forever to httpx.
stream_chunk_timeout measures the gap between parsed chunks. The openai SDK's SSE parser consumes keepalive comments internally and does not emit them as chunks, so keepalives do not reset this timer. It fires on genuine content silence.

When it fires, a StreamChunkTimeoutError (subclass of asyncio.TimeoutError) is raised with a self-describing message naming this knob, the env-var override, the model, and the number of chunks received before the stall. A WARNING log with extra={"source": "stream_chunk_timeout", "timeout_s": <value>, "model_name": <value>, "chunks_received": <value>} also fires so aggregate logging can distinguish app-layer timeouts from transport-layer failures.

Defaults to 120s. Set to None or 0 to disable. Overridable via the LANGCHAIN_OPENAI_STREAM_CHUNK_TIMEOUT_S env var. Negative values (from either the env var or the constructor kwarg — e.g., hydrated from YAML/JSON configs) fall back to the default with a WARNING log rather than silently disabling the wrapper, so a misconfigured value still boots safely and the fallback is visible.

LangChain Assistant

Menu