ChatModelIntegrationTests#
- class langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests[source]#
Base class for chat model integration tests.
Test subclasses must implement the
chat_model_classandchat_model_paramsproperties to specify what model to test and its initialization parameters.Example:
from typing import Type from langchain_tests.integration_tests import ChatModelIntegrationTests from my_package.chat_models import MyChatModel class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def chat_model_class(self) -> Type[MyChatModel]: # Return the chat model class to test here return MyChatModel @property def chat_model_params(self) -> dict: # Return initialization parameters for the model. return {"model": "model-001", "temperature": 0}
Note
API references for individual test methods include troubleshooting tips.
Test subclasses must implement the following two properties:
- chat_model_class
The chat model class to test, e.g.,
ChatParrotLink.Example:
@property def chat_model_class(self) -> Type[ChatParrotLink]: return ChatParrotLink
- chat_model_params
Initialization parameters for the chat model.
Example:
@property def chat_model_params(self) -> dict: return {"model": "bird-brain-001", "temperature": 0}
In addition, test subclasses can control what features are tested (such as tool calling or multi-modality) by selectively overriding the following properties. Expand to see details:
has_tool_calling
Boolean property indicating whether the chat model supports tool calling.
By default, this is determined by whether the chat model’s bind_tools method is overridden. It typically does not need to be overridden on the test class.
Example override:
@property def has_tool_calling(self) -> bool: return True
tool_choice_value
Value to use for tool choice when used in tests.
Warning
Deprecated since version 0.3.15. This property will be removed in version 0.3.20. If a model supports
tool_choice, it should accepttool_choice="any"andtool_choice=<string name of tool>. If a model does not support forcing tool calling, override thehas_tool_choiceproperty to returnFalse.Example:
@property def tool_choice_value(self) -> Optional[str]: return "any"
has_tool_choice
Boolean property indicating whether the chat model supports forcing tool calling via a
tool_choiceparameter.By default, this is determined by whether the parameter is included in the signature for the corresponding
bind_toolsmethod.If
True, the minimum requirement for this feature is thattool_choice="any"will force a tool call, andtool_choice=<tool name>will force a call to a specific tool.Example override:
@property def has_tool_choice(self) -> bool: return False
has_structured_output
Boolean property indicating whether the chat model supports structured output.
By default, this is determined by whether the chat model’s
with_structured_outputmethod is overridden. If the base implementation is intended to be used, this method should be overridden.See: https://python.langchain.com/docs/concepts/structured_outputs/
Example:
@property def has_structured_output(self) -> bool: return True
structured_output_kwargs
Dict property that can be used to specify additional kwargs for
with_structured_output. Useful for testing different models.Example:
@property def structured_output_kwargs(self) -> dict: return {"method": "function_calling"}
supports_json_mode
Boolean property indicating whether the chat model supports JSON mode in
with_structured_output.See: https://python.langchain.com/docs/concepts/structured_outputs/#json-mode
Example:
@property def supports_json_mode(self) -> bool: return True
supports_image_inputs
Boolean property indicating whether the chat model supports image inputs. Defaults to
False.If set to
True, the chat model will be tested using content blocks of the form{ "type": "image", "source_type": "base64", "data": "<base64 image data>", "mime_type": "image/jpeg", # or appropriate mime-type }
In addition to OpenAI-style content blocks:
{ "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}, }
See https://python.langchain.com/docs/concepts/multimodality/
Example:
@property def supports_image_inputs(self) -> bool: return True
supports_image_urls
Boolean property indicating whether the chat model supports image inputs from URLs. Defaults to
False.If set to
True, the chat model will be tested using content blocks of the form{ "type": "image", "source_type": "url", "url": "https://...", }
See https://python.langchain.com/docs/concepts/multimodality/
Example:
@property def supports_image_urls(self) -> bool: return True
supports_pdf_inputs
Boolean property indicating whether the chat model supports PDF inputs. Defaults to
False.If set to
True, the chat model will be tested using content blocks of the form{ "type": "file", "source_type": "base64", "data": "<base64 file data>", "mime_type": "application/pdf", }
See https://python.langchain.com/docs/concepts/multimodality/
Example:
@property def supports_pdf_inputs(self) -> bool: return True
supports_audio_inputs
Boolean property indicating whether the chat model supports audio inputs. Defaults to
False.If set to
True, the chat model will be tested using content blocks of the form{ "type": "audio", "source_type": "base64", "data": "<base64 audio data>", "mime_type": "audio/wav", # or appropriate mime-type }
See https://python.langchain.com/docs/concepts/multimodality/
Example:
@property def supports_audio_inputs(self) -> bool: return True
supports_video_inputs
Boolean property indicating whether the chat model supports image inputs. Defaults to
False. No current tests are written for this feature.returns_usage_metadata
Boolean property indicating whether the chat model returns usage metadata on invoke and streaming responses. Defaults to
True.usage_metadatais an optional dict attribute on ``AIMessage``s that track input and output tokens. See more.Example:
@property def returns_usage_metadata(self) -> bool: return False
Models supporting
usage_metadatashould also return the name of the underlying model in theresponse_metadataof theAIMessage.supports_anthropic_inputs
Boolean property indicating whether the chat model supports Anthropic-style inputs.
These inputs might feature “tool use” and “tool result” content blocks, e.g.,
[ {"type": "text", "text": "Hmm let me think about that"}, { "type": "tool_use", "input": {"fav_color": "green"}, "id": "foo", "name": "color_picker", }, ]
If set to
True, the chat model will be tested using content blocks of this form.Example:
@property def supports_anthropic_inputs(self) -> bool: return False
supports_image_tool_message
Boolean property indicating whether the chat model supports ToolMessages that include image content, e.g.,
ToolMessage( content=[ { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}, }, ], tool_call_id="1", name="random_image", )
(OpenAI Chat Completions format), as well as
ToolMessage( content=[ { "type": "image", "source_type": "base64", "data": image_data, "mime_type": "image/jpeg", }, ], tool_call_id="1", name="random_image", )
(standard format).
If set to
True, the chat model will be tested with message sequences that include ToolMessages of this form.Example:
@property def supports_image_tool_message(self) -> bool: return False
supports_pdf_tool_message
Boolean property indicating whether the chat model supports ToolMessages that include PDF content, i.e.,
ToolMessage( content=[ { "type": "file", "source_type": "base64", "data": pdf_data, "mime_type": "application/pdf", }, ], tool_call_id="1", name="random_pdf", )
(standard format).
If set to
True, the chat model will be tested with message sequences that include ToolMessages of this form.Example:
@property def supports_pdf_tool_message(self) -> bool: return False
supported_usage_metadata_details
Property controlling what usage metadata details are emitted in both invoke and stream.
usage_metadatais an optional dict attribute on ``AIMessage``s that track input and output tokens. See more.It includes optional keys
input_token_detailsandoutput_token_detailsthat can track usage details associated with special types of tokens, such as cached, audio, or reasoning.Only needs to be overridden if these details are supplied.
enable_vcr_tests
Property controlling whether to enable select tests that rely on VCR caching of HTTP calls, such as benchmarking tests.
To enable these tests, follow these steps:
Override the
enable_vcr_testsproperty to returnTrue:@property def enable_vcr_tests(self) -> bool: return True
Configure VCR to exclude sensitive headers and other information from cassettes.
Important
VCR will by default record authentication headers and other sensitive information in cassettes. Read below for how to configure what information is recorded in cassettes.
To add configuration to VCR, add a
conftest.pyfile to thetests/directory and implement thevcr_configfixture there.langchain-testsexcludes the headers'authorization','x-api-key', and'api-key'from VCR cassettes. To pick up this configuration, you will need to addconftest.pyas shown below. You can also exclude additional headers, override the default exclusions, or apply other customizations to the VCR configuration. See example below:tests/conftest.py#import pytest from langchain_tests.conftest import ( _base_vcr_config as _base_vcr_config, ) _EXTRA_HEADERS = [ # Specify additional headers to redact ("user-agent", "PLACEHOLDER"), ] def remove_response_headers(response: dict) -> dict: # If desired, remove or modify headers in the response. response["headers"] = {} return response @pytest.fixture(scope="session") def vcr_config(_base_vcr_config: dict) -> dict: # noqa: F811 """Extend the default configuration from langchain_tests.""" config = _base_vcr_config.copy() config.setdefault("filter_headers", []).extend(_EXTRA_HEADERS) config["before_record_response"] = remove_response_headers return config
Compressing cassettes
langchain-testsincludes a custom VCR serializer that compresses cassettes using gzip. To use it, register theyaml.gzserializer to your VCR fixture and enable this serializer in the config. See example below:tests/conftest.py#import pytest from langchain_tests.conftest import ( CustomPersister, CustomSerializer, ) from langchain_tests.conftest import ( _base_vcr_config as _base_vcr_config, ) from vcr import VCR _EXTRA_HEADERS = [ # Specify additional headers to redact ("user-agent", "PLACEHOLDER"), ] def remove_response_headers(response: dict) -> dict: # If desired, remove or modify headers in the response. response["headers"] = {} return response @pytest.fixture(scope="session") def vcr_config(_base_vcr_config: dict) -> dict: # noqa: F811 """Extend the default configuration from langchain_tests.""" config = _base_vcr_config.copy() config.setdefault("filter_headers", []).extend(_EXTRA_HEADERS) config["before_record_response"] = remove_response_headers # New: enable serializer and set file extension config["serializer"] = "yaml.gz" config["path_transformer"] = VCR.ensure_suffix(".yaml.gz") return config def pytest_recording_configure(config: dict, vcr: VCR) -> None: vcr.register_persister(CustomPersister()) vcr.register_serializer("yaml.gz", CustomSerializer())
You can inspect the contents of the compressed cassettes (e.g., to ensure no sensitive information is recorded) using
gunzip -k /path/to/tests/cassettes/TestClass_test.yaml.gz
or by using the serializer:
from langchain_tests.conftest import ( CustomPersister, CustomSerializer, ) cassette_path = "/path/to/tests/cassettes/TestClass_test.yaml.gz" requests, responses = CustomPersister().load_cassette( path, CustomSerializer() )
Run tests to generate VCR cassettes.
Example:
uv run python -m pytest tests/integration_tests/test_chat_models.py::TestMyModel::test_stream_time
This will generate a VCR cassette for the test in
tests/integration_tests/cassettes/.Important
You should inspect the generated cassette to ensure that it does not contain sensitive information. If it does, you can modify the
vcr_configfixture to exclude headers or modify the response before it is recorded.You can then commit the cassette to your repository. Subsequent test runs will use the cassette instead of making HTTP calls.
Attributes
chat_model_classThe chat model class to test, e.g.,
ChatParrotLink.chat_model_paramsInitialization parameters for the chat model.
enable_vcr_tests(bool) whether to enable VCR tests for the chat model.
has_structured_output(bool) whether the chat model supports structured output.
has_tool_calling(bool) whether the model supports tool calling.
has_tool_choice(bool) whether the model supports tool calling.
returns_usage_metadataReturns usage metadata.
structured_output_kwargsIf specified, additional kwargs for with_structured_output.
supported_usage_metadata_detailsSupported usage metadata details.
supports_anthropic_inputs(bool) whether the chat model supports Anthropic-style inputs.
supports_audio_inputsSupports audio inputs.
supports_image_inputsSupports image inputs.
supports_image_tool_messageSupports image ToolMessages.
supports_image_urlsSupports image inputs from URLs.
supports_json_mode(bool) whether the chat model supports JSON mode.
supports_pdf_inputs(bool) whether the chat model supports PDF inputs, defaults to
False.supports_pdf_tool_messageSupports PDF ToolMessages.
supports_video_inputsSupports video inputs.
tool_choice_value(None or str) to use for tool choice when used in tests.
Methods
test_abatch(model)Test to verify that
await model.abatch([messages])works.test_agent_loop(model)Test that the model supports a simple ReAct agent loop.
test_ainvoke(model)Test to verify that
await model.ainvoke(simple_message)works.test_anthropic_inputs(model)Test that model can process Anthropic-style message histories.
test_astream(model)Test to verify that
await model.astream(simple_message)works.test_audio_inputs(model)Test that the model can process audio inputs.
test_batch(model)Test to verify that
model.batch([messages])works.test_bind_runnables_as_tools(model)Test bind runnables as tools.
test_conversation(model)Test to verify that the model can handle multi-turn conversations.
Test to verify that the model can handle double-message conversations.
test_image_inputs(model)Test that the model can process image inputs.
test_image_tool_message(model)Test that the model can process ToolMessages with image inputs.
test_invoke(model)Test to verify that
model.invoke(simple_message)works.test_json_mode(model)Test structured output via `JSON mode.
test_message_with_name(model)Test that
HumanMessagewith values for thenamefield can be handled.test_pdf_inputs(model)Test that the model can process PDF inputs.
test_pdf_tool_message(model)Test that the model can process ToolMessages with PDF inputs.
test_stop_sequence(model)Test that model does not fail when invoked with the
stopparameter.test_stream(model)Test to verify that
model.stream(simple_message)works.test_stream_time(model, benchmark, vcr)Test that streaming does not introduce undue overhead.
test_structured_few_shot_examples(model, ...)Test that the model can process few-shot examples with tool calls.
test_structured_output(model, schema_type)Test to verify structured output is generated both on invoke and stream.
test_structured_output_async(model, schema_type)Test to verify structured output is generated both on invoke and stream.
Test structured output with optional parameters.
Test structured output using pydantic.v1.BaseModel.
test_tool_calling(model)Test that the model generates tool calls.
test_tool_calling_async(model)Test that the model generates tool calls.
Test that the model generates tool calls for tools with no arguments.
test_tool_choice(model)Test
tool_choiceparameter.test_tool_message_error_status(model, ...)Test that
ToolMessagewithstatus="error"can be handled.Test that message histories are compatible with list tool contents.
Test that message histories are compatible with string tool contents.
test_unicode_tool_call_integration(model, *)Generic integration test for Unicode characters in tool calls.
test_usage_metadata(model)Test to verify that the model returns correct usage metadata.
Test usage metadata in streaming mode.
- async test_abatch(
- model: BaseChatModel,
Test to verify that
await model.abatch([messages])works.This should pass for all integrations. Tests the model’s ability to process multiple prompts in a single batch asynchronously.
Troubleshooting
First, debug
test_batch()andtest_ainvoke()becauseabatchhas a default implementation that callsainvokefor each message in the batch.If those tests pass but not this one, you should make sure your
abatchmethod does not raise any exceptions, and that it returns a list of validAIMessageobjects.- Parameters:
model (BaseChatModel)
- Return type:
None
- test_agent_loop(
- model: BaseChatModel,
Test that the model supports a simple ReAct agent loop.
This test is skipped if the
has_tool_callingproperty on the test class is set to False.This test is optional and should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set
has_tool_callingto False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_tool_calling(self) -> bool: return False
Troubleshooting
If this test fails, check that
bind_toolsis implemented to correctly translate LangChain tool objects into the appropriate schema for your chat model.Check also that all required information (e.g., tool calling identifiers) from
AIMessageobjects is propagated correctly to model payloads.This test may fail if the chat model does not consistently generate tool calls in response to an appropriate query. In these cases you can
xfailthe test:@pytest.mark.xfail(reason=("Does not support tool_choice.")) def test_agent_loop(self, model: BaseChatModel) -> None: super().test_agent_loop(model)
- Parameters:
model (BaseChatModel)
- Return type:
None
- async test_ainvoke(
- model: BaseChatModel,
Test to verify that
await model.ainvoke(simple_message)works.This should pass for all integrations. Passing this test does not indicate a “natively async” implementation, but rather that the model can be used in an async context.
Troubleshooting
First, debug
test_invoke(). becauseainvokehas a default implementation that callsinvokein an async context.If that test passes but not this one, you should make sure your _agenerate method does not raise any exceptions, and that it returns a valid
ChatResultlike so:return ChatResult( generations=[ ChatGeneration(message=AIMessage(content="Output text")) ] )
- Parameters:
model (BaseChatModel)
- Return type:
None
- test_anthropic_inputs(
- model: BaseChatModel,
Test that model can process Anthropic-style message histories.
These message histories will include
AIMessageobjects withtool_usecontent blocks, e.g.,AIMessage( [ {"type": "text", "text": "Hmm let me think about that"}, { "type": "tool_use", "input": {"fav_color": "green"}, "id": "foo", "name": "color_picker", }, ] )
as well as
HumanMessageobjects containingtool_resultcontent blocks:HumanMessage( [ { "type": "tool_result", "tool_use_id": "foo", "content": [ { "type": "text", "text": "green is a great pick! " "that's my sister's favorite color", } ], "is_error": False, }, {"type": "text", "text": "what's my sister's favorite color"}, ] )
This test should be skipped if the model does not support messages of this form (or doesn’t support tool calling generally). See Configuration below.
Configuration
To disable this test, set
supports_anthropic_inputsto False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def supports_anthropic_inputs(self) -> bool: return False
Troubleshooting
If this test fails, check that:
The model can correctly handle message histories that include message objects with list content.
The
tool_callsattribute on AIMessage objects is correctly handled and passed to the model in an appropriate format.``HumanMessage``s with “tool_result” content blocks are correctly handled.
Otherwise, if Anthropic tool call and result formats are not supported, set the
supports_anthropic_inputsproperty to False.- Parameters:
model (BaseChatModel)
- Return type:
None
- async test_astream(
- model: BaseChatModel,
Test to verify that
await model.astream(simple_message)works.This should pass for all integrations. Passing this test does not indicate a “natively async” or “streaming” implementation, but rather that the model can be used in an async streaming context.
Troubleshooting
First, debug
test_stream(). andtest_ainvoke(). becauseastreamhas a default implementation that calls_streamin an async context if it is implemented, orainvokeand yields the result as a single chunk if not.If those tests pass but not this one, you should make sure your
_astreammethod does not raise any exceptions, and that it yields validChatGenerationChunkobjects like so:yield ChatGenerationChunk(message=AIMessageChunk(content="chunk text"))
- Parameters:
model (BaseChatModel)
- Return type:
None
- test_audio_inputs(
- model: BaseChatModel,
Test that the model can process audio inputs.
This test should be skipped (see Configuration below) if the model does not support audio inputs. These will take the form:
{ "type": "audio", "source_type": "base64", "data": "<base64 audio data>", "mime_type": "audio/wav", # or appropriate mime-type }
See https://python.langchain.com/docs/concepts/multimodality/
Configuration
To disable this test, set
supports_audio_inputsto False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def supports_audio_inputs(self) -> bool: return False
Troubleshooting
If this test fails, check that the model can correctly handle messages with audio content blocks, specifically base64-encoded files. Otherwise, set the
supports_audio_inputsproperty to False.- Parameters:
model (BaseChatModel)
- Return type:
None
- test_batch(
- model: BaseChatModel,
Test to verify that
model.batch([messages])works.This should pass for all integrations. Tests the model’s ability to process multiple prompts in a single batch.
Troubleshooting
First, debug
test_invoke()becausebatchhas a default implementation that callsinvokefor each message in the batch.If that test passes but not this one, you should make sure your
batchmethod does not raise any exceptions, and that it returns a list of validAIMessageobjects.- Parameters:
model (BaseChatModel)
- Return type:
None
- test_bind_runnables_as_tools(
- model: BaseChatModel,
Test bind runnables as tools.
Test that the model generates tool calls for tools that are derived from LangChain runnables. This test is skipped if the
has_tool_callingproperty on the test class is set to False.This test is optional and should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set
has_tool_callingto False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_tool_calling(self) -> bool: return False
Troubleshooting
If this test fails, check that
bind_toolsis implemented to correctly translate LangChain tool objects into the appropriate schema for your chat model.This test may fail if the chat model does not support a
tool_choiceparameter. This parameter can be used to force a tool call. Iftool_choiceis not supported and the model consistently fails this test, you canxfailthe test:@pytest.mark.xfail(reason=("Does not support tool_choice.")) def test_bind_runnables_as_tools(self, model: BaseChatModel) -> None: super().test_bind_runnables_as_tools(model)
Otherwise, ensure that the
tool_choice_valueproperty is correctly specified on the test class.- Parameters:
model (BaseChatModel)
- Return type:
None
- test_conversation(
- model: BaseChatModel,
Test to verify that the model can handle multi-turn conversations.
This should pass for all integrations. Tests the model’s ability to process a sequence of alternating human and AI messages as context for generating the next response.
Troubleshooting
First, debug
test_invoke()because this test also usesmodel.invoke().If that test passes but not this one, you should verify that: 1. Your model correctly processes the message history 2. The model maintains appropriate context from previous messages 3. The response is a valid
AIMessage- Parameters:
model (BaseChatModel)
- Return type:
None
- test_double_messages_conversation(
- model: BaseChatModel,
Test to verify that the model can handle double-message conversations.
This should pass for all integrations. Tests the model’s ability to process a sequence of double-system, double-human, and double-ai messages as context for generating the next response.
Troubleshooting
First, debug
test_invoke()because this test also usesmodel.invoke().Second, debug
test_conversation()because this test is the “basic case” without double messages.If that test passes those but not this one, you should verify that: 1. Your model API can handle double messages, or the integration should
merge messages before sending them to the API.
The response is a valid
AIMessage
- Parameters:
model (BaseChatModel)
- Return type:
None
- test_image_inputs(
- model: BaseChatModel,
Test that the model can process image inputs.
This test should be skipped (see Configuration below) if the model does not support image inputs. These will take the form:
{ "type": "image", "source_type": "base64", "data": "<base64 image data>", "mime_type": "image/jpeg", # or appropriate mime-type }
For backward-compatibility, we must also support OpenAI-style image content blocks:
[ {"type": "text", "text": "describe the weather in this image"}, { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}, }, ]
See https://python.langchain.com/docs/concepts/multimodality/
If the property
supports_image_urlsis set to True, the test will also check that we can process content blocks of the form:{ "type": "image", "source_type": "url", "url": "<url>", }
Configuration
To disable this test, set
supports_image_inputsto False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def supports_image_inputs(self) -> bool: return False # Can also explicitly disable testing image URLs: @property def supports_image_urls(self) -> bool: return False
Troubleshooting
If this test fails, check that the model can correctly handle messages with image content blocks, including base64-encoded images. Otherwise, set the
supports_image_inputsproperty to False.- Parameters:
model (BaseChatModel)
- Return type:
None
- test_image_tool_message(
- model: BaseChatModel,
Test that the model can process ToolMessages with image inputs.
This test should be skipped if the model does not support messages of the form:
ToolMessage( content=[ { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}, }, ], tool_call_id="1", name="random_image", )
containing image content blocks in OpenAI Chat Completions format, in addition to messages of the form:
ToolMessage( content=[ { "type": "image", "source_type": "base64", "data": image_data, "mime_type": "image/jpeg", }, ], tool_call_id="1", name="random_image", )
containing image content blocks in standard format.
This test can be skipped by setting the
supports_image_tool_messageproperty to False (see Configuration below).Configuration
To disable this test, set
supports_image_tool_messageto False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def supports_image_tool_message(self) -> bool: return False
Troubleshooting
If this test fails, check that the model can correctly handle messages with image content blocks in ToolMessages, including base64-encoded images. Otherwise, set the
supports_image_tool_messageproperty to False.- Parameters:
model (BaseChatModel)
- Return type:
None
- test_invoke(
- model: BaseChatModel,
Test to verify that
model.invoke(simple_message)works.This should pass for all integrations.
Troubleshooting
If this test fails, you should make sure your
_generatemethod does not raise any exceptions, and that it returns a validChatResultlike so:return ChatResult( generations=[ ChatGeneration(message=AIMessage(content="Output text")) ] )
- Parameters:
model (BaseChatModel)
- Return type:
None
- test_json_mode(
- model: BaseChatModel,
Test structured output via JSON mode..
This test is optional and should be skipped if the model does not support the JSON mode feature (see Configuration below).
Configuration
To disable this test, set
supports_json_modeto False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def supports_json_mode(self) -> bool: return False
Troubleshooting
See example implementation of
with_structured_outputhere: https://python.langchain.com/api_reference/_modules/langchain_openai/chat_models/base.html#BaseChatOpenAI.with_structured_output- Parameters:
model (BaseChatModel)
- Return type:
None
- test_message_with_name(
- model: BaseChatModel,
Test that
HumanMessagewith values for thenamefield can be handled.These messages may take the form:
HumanMessage("hello", name="example_user")
If possible, the
namefield should be parsed and passed appropriately to the model. Otherwise, it should be ignored.Troubleshooting
If this test fails, check that the
namefield onHumanMessageobjects is either ignored or passed to the model appropriately.- Parameters:
model (BaseChatModel)
- Return type:
None
- test_pdf_inputs(
- model: BaseChatModel,
Test that the model can process PDF inputs.
This test should be skipped (see Configuration below) if the model does not support PDF inputs. These will take the form:
{ "type": "image", "source_type": "base64", "data": "<base64 image data>", "mime_type": "application/pdf", }
See https://python.langchain.com/docs/concepts/multimodality/
Configuration
To disable this test, set
supports_pdf_inputsto False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def supports_pdf_inputs(self) -> bool: return False
Troubleshooting
If this test fails, check that the model can correctly handle messages with pdf content blocks, including base64-encoded files. Otherwise, set the
supports_pdf_inputsproperty to False.- Parameters:
model (BaseChatModel)
- Return type:
None
- test_pdf_tool_message(
- model: BaseChatModel,
Test that the model can process ToolMessages with PDF inputs.
This test should be skipped if the model does not support messages of the form:
ToolMessage( content=[ { "type": "file", "source_type": "base64", "data": pdf_data, "mime_type": "application/pdf", }, ], tool_call_id="1", name="random_pdf", )
containing PDF content blocks in standard format.
This test can be skipped by setting the
supports_pdf_tool_messageproperty to False (see Configuration below).Configuration
To disable this test, set
supports_pdf_tool_messageto False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def supports_pdf_tool_message(self) -> bool: return False
Troubleshooting
If this test fails, check that the model can correctly handle messages with PDF content blocks in ToolMessages, specifically base64-encoded PDFs. Otherwise, set the
supports_pdf_tool_messageproperty to False.- Parameters:
model (BaseChatModel)
- Return type:
None
- test_stop_sequence(
- model: BaseChatModel,
Test that model does not fail when invoked with the
stopparameter.The
stopparameter is a standard parameter for stopping generation at a certain token.This should pass for all integrations.
Troubleshooting
If this test fails, check that the function signature for
_generate(as well as_streamand async variants) accepts thestopparameter:def _generate( self, messages: List[BaseMessage], stop: Optional[List[str]] = None, run_manager: Optional[CallbackManagerForLLMRun] = None, **kwargs: Any, ) -> ChatResult:
- Parameters:
model (BaseChatModel)
- Return type:
None
- test_stream(
- model: BaseChatModel,
Test to verify that
model.stream(simple_message)works.This should pass for all integrations. Passing this test does not indicate a “streaming” implementation, but rather that the model can be used in a streaming context.
Troubleshooting
First, debug
test_invoke(). becausestreamhas a default implementation that callsinvokeand yields the result as a single chunk.If that test passes but not this one, you should make sure your
_streammethod does not raise any exceptions, and that it yields validChatGenerationChunkobjects like so:yield ChatGenerationChunk(message=AIMessageChunk(content="chunk text"))
- Parameters:
model (BaseChatModel)
- Return type:
None
- test_stream_time(
- model: BaseChatModel,
- benchmark: BenchmarkFixture,
- vcr: Cassette,
Test that streaming does not introduce undue overhead.
See
enable_vcr_testsdropdownabovefor more information.Configuration
This test can be enabled or disabled using the
enable_vcr_testsproperty. For example, to disable the test, set this property toFalse:@property def enable_vcr_tests(self) -> bool: return False
Important
VCR will by default record authentication headers and other sensitive information in cassettes. See
enable_vcr_testsdropdownabovefor how to configure what information is recorded in cassettes.- Parameters:
model (BaseChatModel)
benchmark (BenchmarkFixture)
vcr (Cassette)
- Return type:
None
- test_structured_few_shot_examples(
- model: BaseChatModel,
- my_adder_tool: BaseTool,
Test that the model can process few-shot examples with tool calls.
These are represented as a sequence of messages of the following form:
HumanMessagewith string content;AIMessagewith thetool_callsattribute populated;ToolMessagewith string content;AIMessagewith string content (an answer);HumanMessagewith string content (a follow-up question).
This test should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set
has_tool_callingto False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_tool_calling(self) -> bool: return False
Troubleshooting
This test uses a utility function in
langchain_coreto generate a sequence of messages representing “few-shot” examples.If this test fails, check that the model can correctly handle this sequence of messages.
You can
xfailthe test if tool calling is implemented but this format is not supported.@pytest.mark.xfail(reason=("Not implemented.")) def test_structured_few_shot_examples(self, *args: Any) -> None: super().test_structured_few_shot_examples(*args)
- Parameters:
model (BaseChatModel)
my_adder_tool (BaseTool)
- Return type:
None
- test_structured_output(
- model: BaseChatModel,
- schema_type: Literal['pydantic', 'typeddict', 'json_schema'],
Test to verify structured output is generated both on invoke and stream.
This test is optional and should be skipped if the model does not support structured output (see Configuration below).
Configuration
To disable structured output tests, set
has_structured_outputto False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_structured_output(self) -> bool: return False
By default,
has_structured_outputis True if a model overrides thewith_structured_outputorbind_toolsmethods.Troubleshooting
If this test fails, ensure that the model’s
bind_toolsmethod properly handles both JSON Schema and Pydantic V2 models.langchain_coreimplements a utility function that will accommodate most formats.See example implementation of
with_structured_output.- Parameters:
model (BaseChatModel)
schema_type (Literal['pydantic', 'typeddict', 'json_schema'])
- Return type:
None
- async test_structured_output_async(
- model: BaseChatModel,
- schema_type: Literal['pydantic', 'typeddict', 'json_schema'],
Test to verify structured output is generated both on invoke and stream.
This test is optional and should be skipped if the model does not support structured output (see Configuration below).
Configuration
To disable structured output tests, set
has_structured_outputto False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_structured_output(self) -> bool: return False
By default,
has_structured_outputis True if a model overrides thewith_structured_outputorbind_toolsmethods.Troubleshooting
If this test fails, ensure that the model’s
bind_toolsmethod properly handles both JSON Schema and Pydantic V2 models.langchain_coreimplements a utility function that will accommodate most formats.See example implementation of
with_structured_output.- Parameters:
model (BaseChatModel)
schema_type (Literal['pydantic', 'typeddict', 'json_schema'])
- Return type:
None
- test_structured_output_optional_param(
- model: BaseChatModel,
Test structured output with optional parameters.
Test to verify we can generate structured output that includes optional parameters.
This test is optional and should be skipped if the model does not support structured output (see Configuration below).
Configuration
To disable structured output tests, set
has_structured_outputto False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_structured_output(self) -> bool: return False
By default,
has_structured_outputis True if a model overrides thewith_structured_outputorbind_toolsmethods.Troubleshooting
If this test fails, ensure that the model’s
bind_toolsmethod properly handles Pydantic V2 models with optional parameters.langchain_coreimplements a utility function that will accommodate most formats.See example implementation of
with_structured_output.- Parameters:
model (BaseChatModel)
- Return type:
None
- test_structured_output_pydantic_2_v1(
- model: BaseChatModel,
Test structured output using pydantic.v1.BaseModel.
Verify we can generate structured output using
pydantic.v1.BaseModel.pydantic.v1.BaseModelis available in the Pydantic 2 package.This test is optional and should be skipped if the model does not support structured output (see Configuration below).
Configuration
To disable structured output tests, set
has_structured_outputto False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_structured_output(self) -> bool: return False
By default,
has_structured_outputis True if a model overrides thewith_structured_outputorbind_toolsmethods.Troubleshooting
If this test fails, ensure that the model’s
bind_toolsmethod properly handles both JSON Schema and Pydantic V1 models.langchain_coreimplements a utility function that will accommodate most formats.See example implementation of
with_structured_output.- Parameters:
model (BaseChatModel)
- Return type:
None
- test_tool_calling(
- model: BaseChatModel,
Test that the model generates tool calls.
This test is skipped if the
has_tool_callingproperty on the test class is set to False.This test is optional and should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set
has_tool_callingto False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_tool_calling(self) -> bool: return False
Troubleshooting
If this test fails, check that
bind_toolsis implemented to correctly translate LangChain tool objects into the appropriate schema for your chat model.This test may fail if the chat model does not support a
tool_choiceparameter. This parameter can be used to force a tool call. Iftool_choiceis not supported and the model consistently fails this test, you canxfailthe test:@pytest.mark.xfail(reason=("Does not support tool_choice.")) def test_tool_calling(self, model: BaseChatModel) -> None: super().test_tool_calling(model)
Otherwise, in the case that only one tool is bound, ensure that
tool_choicesupports the string'any'to force calling that tool.- Parameters:
model (BaseChatModel)
- Return type:
None
- async test_tool_calling_async(
- model: BaseChatModel,
Test that the model generates tool calls.
This test is skipped if the
has_tool_callingproperty on the test class is set to False.This test is optional and should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set
has_tool_callingto False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_tool_calling(self) -> bool: return False
Troubleshooting
If this test fails, check that
bind_toolsis implemented to correctly translate LangChain tool objects into the appropriate schema for your chat model.This test may fail if the chat model does not support a
tool_choiceparameter. This parameter can be used to force a tool call. Iftool_choiceis not supported and the model consistently fails this test, you canxfailthe test:@pytest.mark.xfail(reason=("Does not support tool_choice.")) async def test_tool_calling_async(self, model: BaseChatModel) -> None: await super().test_tool_calling_async(model)
Otherwise, in the case that only one tool is bound, ensure that
tool_choicesupports the string'any'to force calling that tool.- Parameters:
model (BaseChatModel)
- Return type:
None
- test_tool_calling_with_no_arguments(
- model: BaseChatModel,
Test that the model generates tool calls for tools with no arguments.
This test is skipped if the
has_tool_callingproperty on the test class is set to False.This test is optional and should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set
has_tool_callingto False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_tool_calling(self) -> bool: return False
Troubleshooting
If this test fails, check that
bind_toolsis implemented to correctly translate LangChain tool objects into the appropriate schema for your chat model. It should correctly handle the case where a tool has no arguments.This test may fail if the chat model does not support a
tool_choiceparameter. This parameter can be used to force a tool call. It may also fail if a provider does not support this form of tool. In these cases, you canxfailthe test:@pytest.mark.xfail(reason=("Does not support tool_choice.")) def test_tool_calling_with_no_arguments( self, model: BaseChatModel ) -> None: super().test_tool_calling_with_no_arguments(model)
Otherwise, in the case that only one tool is bound, ensure that
tool_choicesupports the string'any'to force calling that tool.- Parameters:
model (BaseChatModel)
- Return type:
None
- test_tool_choice(
- model: BaseChatModel,
Test
tool_choiceparameter.Test that the model can force tool calling via the
tool_choiceparameter. This test is skipped if thehas_tool_choiceproperty on the test class is set to False.This test is optional and should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set
has_tool_choiceto False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_tool_choice(self) -> bool: return False
Troubleshooting
If this test fails, check whether the
test_tool_callingtest is passing. If it is not, refer to the troubleshooting steps in that test first.If
test_tool_callingis passing, check that the underlying model supports forced tool calling. If it does,bind_toolsshould accept atool_choiceparameter that can be used to force a tool call.It should accept (1) the string
'any'to force calling the bound tool, and (2) the string name of the tool to force calling that tool.- Parameters:
model (BaseChatModel)
- Return type:
None
- test_tool_message_error_status(
- model: BaseChatModel,
- my_adder_tool: BaseTool,
Test that
ToolMessagewithstatus="error"can be handled.These messages may take the form:
ToolMessage( "Error: Missing required argument 'b'.", name="my_adder_tool", tool_call_id="abc123", status="error", )
If possible, the
statusfield should be parsed and passed appropriately to the model.This test is optional and should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set
has_tool_callingto False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_tool_calling(self) -> bool: return False
Troubleshooting
If this test fails, check that the
statusfield onToolMessageobjects is either ignored or passed to the model appropriately.- Parameters:
model (BaseChatModel)
my_adder_tool (BaseTool)
- Return type:
None
- test_tool_message_histories_list_content(
- model: BaseChatModel,
- my_adder_tool: BaseTool,
Test that message histories are compatible with list tool contents.
For instance with Anthropic format contents.
These message histories will include
AIMessageobjects with “tool use” and content blocks, e.g.,[ {"type": "text", "text": "Hmm let me think about that"}, { "type": "tool_use", "input": {"fav_color": "green"}, "id": "foo", "name": "color_picker", }, ]
This test should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set
has_tool_callingto False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_tool_calling(self) -> bool: return False
Troubleshooting
If this test fails, check that:
The model can correctly handle message histories that include
AIMessageobjects with list content.The
tool_callsattribute onAIMessageobjects is correctly handled and passed to the model in an appropriate format.The model can correctly handle ToolMessage objects with string content and arbitrary string values for
tool_call_id.
You can
xfailthe test if tool calling is implemented but this format is not supported.@pytest.mark.xfail(reason=("Not implemented.")) def test_tool_message_histories_list_content(self, *args: Any) -> None: super().test_tool_message_histories_list_content(*args)
- Parameters:
model (BaseChatModel)
my_adder_tool (BaseTool)
- Return type:
None
- test_tool_message_histories_string_content(
- model: BaseChatModel,
- my_adder_tool: BaseTool,
Test that message histories are compatible with string tool contents.
For instance with OpenAI format contents. If a model passes this test, it should be compatible with messages generated from providers following OpenAI format.
This test should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set
has_tool_callingto False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_tool_calling(self) -> bool: return False
Troubleshooting
If this test fails, check that:
The model can correctly handle message histories that include
AIMessageobjects with""content.The
tool_callsattribute onAIMessageobjects is correctly handled and passed to the model in an appropriate format.The model can correctly handle
ToolMessageobjects with string content and arbitrary string values fortool_call_id.
You can
xfailthe test if tool calling is implemented but this format is not supported.@pytest.mark.xfail(reason=("Not implemented.")) def test_tool_message_histories_string_content( self, *args: Any ) -> None: super().test_tool_message_histories_string_content(*args)
- Parameters:
model (BaseChatModel)
my_adder_tool (BaseTool)
- Return type:
None
- test_unicode_tool_call_integration(
- model: BaseChatModel,
- *,
- tool_choice: str | None = None,
- force_tool_call: bool = True,
Generic integration test for Unicode characters in tool calls.
- Parameters:
model (BaseChatModel) – The chat model to test
tool_choice (str | None) – Tool choice parameter to pass to
bind_tools()(provider-specific)force_tool_call (bool) – Whether to force a tool call (use
tool_choice=Trueif None)
- Return type:
None
Tests that Unicode characters in tool call arguments are preserved correctly, not escaped as
\\uXXXXsequences.
- test_usage_metadata(
- model: BaseChatModel,
Test to verify that the model returns correct usage metadata.
This test is optional and should be skipped if the model does not return usage metadata (see Configuration below).
Changed in version 0.3.17: Additionally check for the presence of
model_namein the response metadata, which is needed for usage tracking in callback handlers.Configuration
By default, this test is run.
To disable this feature, set
returns_usage_metadatatoFalsein your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def returns_usage_metadata(self) -> bool: return False
This test can also check the format of specific kinds of usage metadata based on the
supported_usage_metadata_detailsproperty. This property should be configured as follows with the types of tokens that the model supports tracking:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def supported_usage_metadata_details(self) -> dict: return { "invoke": [ "audio_input", "audio_output", "reasoning_output", "cache_read_input", "cache_creation_input", ], "stream": [ "audio_input", "audio_output", "reasoning_output", "cache_read_input", "cache_creation_input", ], }
Troubleshooting
If this test fails, first verify that your model returns
UsageMetadatadicts attached to the returned AIMessage object in_generate:return ChatResult( generations=[ ChatGeneration( message=AIMessage( content="Output text", usage_metadata={ "input_tokens": 350, "output_tokens": 240, "total_tokens": 590, "input_token_details": { "audio": 10, "cache_creation": 200, "cache_read": 100, }, "output_token_details": { "audio": 10, "reasoning": 200, }, }, ) ) ] )
Check also that the response includes a
'model_name'key in itsusage_metadata.- Parameters:
model (BaseChatModel)
- Return type:
None
- test_usage_metadata_streaming(
- model: BaseChatModel,
Test usage metadata in streaming mode.
Test to verify that the model returns correct usage metadata in streaming mode.
Changed in version 0.3.17: Additionally check for the presence of
model_namein the response metadata, which is needed for usage tracking in callback handlers.Configuration
By default, this test is run. To disable this feature, set
returns_usage_metadatatoFalsein your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def returns_usage_metadata(self) -> bool: return False
This test can also check the format of specific kinds of usage metadata based on the
supported_usage_metadata_detailsproperty. This property should be configured as follows with the types of tokens that the model supports tracking:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def supported_usage_metadata_details(self) -> dict: return { "invoke": [ "audio_input", "audio_output", "reasoning_output", "cache_read_input", "cache_creation_input", ], "stream": [ "audio_input", "audio_output", "reasoning_output", "cache_read_input", "cache_creation_input", ], }
Troubleshooting
If this test fails, first verify that your model yields
UsageMetadatadicts attached to the returned AIMessage object in_streamthat sum up to the total usage metadata.Note that
input_tokensshould only be included on one of the chunks (typically the first or the last chunk), and the rest should have0orNoneto avoid counting input tokens multiple times.output_tokenstypically count the number of tokens in each chunk, not the sum. This test will pass as long as the sum ofoutput_tokensacross all chunks is not0.yield ChatResult( generations=[ ChatGeneration( message=AIMessage( content="Output text", usage_metadata={ "input_tokens": ( num_input_tokens if is_first_chunk else 0 ), "output_tokens": 11, "total_tokens": ( 11 + num_input_tokens if is_first_chunk else 11 ), "input_token_details": { "audio": 10, "cache_creation": 200, "cache_read": 100, }, "output_token_details": { "audio": 10, "reasoning": 200, }, }, ) ) ] )
Check also that the aggregated response includes a
'model_name'key in itsusage_metadata.- Parameters:
model (BaseChatModel)
- Return type:
None