Integration tests¶
langchain_tests.integration_tests
¶
Integration tests for LangChain components.
BaseStoreAsyncTests
¶
Bases: BaseStandardTests, Generic[V]
Test suite for checking the key-value API of a BaseStore.
This test suite verifies the basic key-value API of a BaseStore.
The test suite is designed for synchronous key-value stores.
Implementers should subclass this test suite and provide a fixture that returns an empty key-value store for each test.
| METHOD | DESCRIPTION |
|---|---|
test_no_overrides_DO_NOT_OVERRIDE |
Test that no standard tests are overridden. |
kv_store |
Get the key-value store class to test. |
three_values |
Three example values that will be used in the tests. |
test_three_values |
Test that the fixture provides three values. |
test_kv_store_is_empty |
Test that the key-value store is empty. |
test_set_and_get_values |
Test setting and getting values in the key-value store. |
test_store_still_empty |
Test that the store is still empty. |
test_delete_values |
Test deleting values from the key-value store. |
test_delete_bulk_values |
Test that we can delete several values at once. |
test_delete_missing_keys |
Deleting missing keys should not raise an exception. |
test_set_values_is_idempotent |
Setting values by key should be idempotent. |
test_get_can_get_same_value |
Test that the same value can be retrieved multiple times. |
test_overwrite_values_by_key |
Test that we can overwrite values by key using mset. |
test_yield_keys |
Test that we can yield keys from the store. |
test_no_overrides_DO_NOT_OVERRIDE
¶
Test that no standard tests are overridden.
kv_store
abstractmethod
async
¶
kv_store() -> BaseStore[str, V]
Get the key-value store class to test.
The returned key-value store should be EMPTY.
three_values
abstractmethod
¶
three_values() -> tuple[V, V, V]
Three example values that will be used in the tests.
test_three_values
async
¶
test_three_values(three_values: tuple[V, V, V]) -> None
Test that the fixture provides three values.
test_kv_store_is_empty
async
¶
test_kv_store_is_empty(kv_store: BaseStore[str, V]) -> None
Test that the key-value store is empty.
test_set_and_get_values
async
¶
Test setting and getting values in the key-value store.
test_store_still_empty
async
¶
test_store_still_empty(kv_store: BaseStore[str, V]) -> None
Test that the store is still empty.
This test should follow a test that sets values.
This just verifies that the fixture is set up properly to be empty after each test.
test_delete_values
async
¶
Test deleting values from the key-value store.
test_delete_bulk_values
async
¶
Test that we can delete several values at once.
test_delete_missing_keys
async
¶
test_delete_missing_keys(kv_store: BaseStore[str, V]) -> None
Deleting missing keys should not raise an exception.
test_set_values_is_idempotent
async
¶
Setting values by key should be idempotent.
test_get_can_get_same_value
async
¶
Test that the same value can be retrieved multiple times.
test_overwrite_values_by_key
async
¶
Test that we can overwrite values by key using mset.
BaseStoreSyncTests
¶
Bases: BaseStandardTests, Generic[V]
Test suite for checking the key-value API of a BaseStore.
This test suite verifies the basic key-value API of a BaseStore.
The test suite is designed for synchronous key-value stores.
Implementers should subclass this test suite and provide a fixture that returns an empty key-value store for each test.
| METHOD | DESCRIPTION |
|---|---|
test_no_overrides_DO_NOT_OVERRIDE |
Test that no standard tests are overridden. |
kv_store |
Get the key-value store class to test. |
three_values |
Three example values that will be used in the tests. |
test_three_values |
Test that the fixture provides three values. |
test_kv_store_is_empty |
Test that the key-value store is empty. |
test_set_and_get_values |
Test setting and getting values in the key-value store. |
test_store_still_empty |
Test that the store is still empty. |
test_delete_values |
Test deleting values from the key-value store. |
test_delete_bulk_values |
Test that we can delete several values at once. |
test_delete_missing_keys |
Deleting missing keys should not raise an exception. |
test_set_values_is_idempotent |
Setting values by key should be idempotent. |
test_get_can_get_same_value |
Test that the same value can be retrieved multiple times. |
test_overwrite_values_by_key |
Test that we can overwrite values by key using mset. |
test_yield_keys |
Test that we can yield keys from the store. |
test_no_overrides_DO_NOT_OVERRIDE
¶
Test that no standard tests are overridden.
kv_store
abstractmethod
¶
kv_store() -> BaseStore[str, V]
Get the key-value store class to test.
The returned key-value store should be EMPTY.
three_values
abstractmethod
¶
three_values() -> tuple[V, V, V]
Three example values that will be used in the tests.
test_three_values
¶
test_three_values(three_values: tuple[V, V, V]) -> None
Test that the fixture provides three values.
test_kv_store_is_empty
¶
test_kv_store_is_empty(kv_store: BaseStore[str, V]) -> None
Test that the key-value store is empty.
test_set_and_get_values
¶
Test setting and getting values in the key-value store.
test_store_still_empty
¶
test_store_still_empty(kv_store: BaseStore[str, V]) -> None
Test that the store is still empty.
This test should follow a test that sets values.
This just verifies that the fixture is set up properly to be empty after each test.
test_delete_values
¶
Test deleting values from the key-value store.
test_delete_bulk_values
¶
Test that we can delete several values at once.
test_delete_missing_keys
¶
test_delete_missing_keys(kv_store: BaseStore[str, V]) -> None
Deleting missing keys should not raise an exception.
test_set_values_is_idempotent
¶
Setting values by key should be idempotent.
test_get_can_get_same_value
¶
Test that the same value can be retrieved multiple times.
test_overwrite_values_by_key
¶
Test that we can overwrite values by key using mset.
AsyncCacheTestSuite
¶
Bases: BaseStandardTests
Test suite for checking the BaseCache API of a caching layer for LLMs.
This test suite verifies the basic caching API of a caching layer for LLMs.
The test suite is designed for synchronous caching layers.
Implementers should subclass this test suite and provide a fixture that returns an empty cache for each test.
| METHOD | DESCRIPTION |
|---|---|
test_no_overrides_DO_NOT_OVERRIDE |
Test that no standard tests are overridden. |
cache |
Get the cache class to test. |
get_sample_prompt |
Return a sample prompt for testing. |
get_sample_llm_string |
Return a sample LLM string for testing. |
get_sample_generation |
Return a sample Generation object for testing. |
test_cache_is_empty |
Test that the cache is empty. |
test_update_cache |
Test updating the cache. |
test_cache_still_empty |
Test that the cache is still empty. |
test_clear_cache |
Test clearing the cache. |
test_cache_miss |
Test cache miss. |
test_cache_hit |
Test cache hit. |
test_update_cache_with_multiple_generations |
Test updating the cache with multiple Generation objects. |
test_no_overrides_DO_NOT_OVERRIDE
¶
Test that no standard tests are overridden.
cache
abstractmethod
async
¶
cache() -> BaseCache
Get the cache class to test.
The returned cache should be EMPTY.
get_sample_generation
¶
Return a sample Generation object for testing.
test_cache_is_empty
async
¶
test_cache_is_empty(cache: BaseCache) -> None
Test that the cache is empty.
SyncCacheTestSuite
¶
Bases: BaseStandardTests
Test suite for checking the BaseCache API of a caching layer for LLMs.
This test suite verifies the basic caching API of a caching layer for LLMs.
The test suite is designed for synchronous caching layers.
Implementers should subclass this test suite and provide a fixture that returns an empty cache for each test.
| METHOD | DESCRIPTION |
|---|---|
test_no_overrides_DO_NOT_OVERRIDE |
Test that no standard tests are overridden. |
cache |
Get the cache class to test. |
get_sample_prompt |
Return a sample prompt for testing. |
get_sample_llm_string |
Return a sample LLM string for testing. |
get_sample_generation |
Return a sample Generation object for testing. |
test_cache_is_empty |
Test that the cache is empty. |
test_update_cache |
Test updating the cache. |
test_cache_still_empty |
Test that the cache is still empty. |
test_clear_cache |
Test clearing the cache. |
test_cache_miss |
Test cache miss. |
test_cache_hit |
Test cache hit. |
test_update_cache_with_multiple_generations |
Test updating the cache with multiple Generation objects. |
test_no_overrides_DO_NOT_OVERRIDE
¶
Test that no standard tests are overridden.
cache
abstractmethod
¶
cache() -> BaseCache
Get the cache class to test.
The returned cache should be EMPTY.
get_sample_generation
¶
Return a sample Generation object for testing.
ChatModelIntegrationTests
¶
Bases: ChatModelTests
Base class for chat model integration tests.
Test subclasses must implement the chat_model_class and
chat_model_params properties to specify what model to test and its
initialization parameters.
from typing import Type
from langchain_tests.integration_tests import ChatModelIntegrationTests
from my_package.chat_models import MyChatModel
class TestMyChatModelIntegration(ChatModelIntegrationTests):
@property
def chat_model_class(self) -> Type[MyChatModel]:
# Return the chat model class to test here
return MyChatModel
@property
def chat_model_params(self) -> dict:
# Return initialization parameters for the model.
return {"model": "model-001", "temperature": 0}
Note
API references for individual test methods include troubleshooting tips.
Test subclasses must implement the following two properties:
chat_model_class: The chat model class to test, e.g., ChatParrotLink.
chat_model_params: Initialization parameters for the chat model.
In addition, test subclasses can control what features are tested (such as tool calling or multi-modality) by selectively overriding the following properties. Expand to see details:
has_tool_calling
Boolean property indicating whether the chat model supports tool calling.
By default, this is determined by whether the chat model's bind_tools method
is overridden. It typically does not need to be overridden on the test class.
Example override:
tool_choice_value
Value to use for tool choice when used in tests.
Warning
Deprecated since version 0.3.15.
This property will be removed in version 0.3.20. If a model supports
tool_choice, it should accept tool_choice="any" and
tool_choice=<string name of tool>. If a model does not
support forcing tool calling, override the has_tool_choice property to
return False.
has_tool_choice
Boolean property indicating whether the chat model supports forcing tool
calling via a tool_choice parameter.
By default, this is determined by whether the parameter is included in the
signature for the corresponding bind_tools method.
If True, the minimum requirement for this feature is that
tool_choice="any" will force a tool call, and tool_choice=<tool name>
will force a call to a specific tool.
Example override:
has_structured_output
Boolean property indicating whether the chat model supports structured output.
By default, this is determined by whether the chat model's
with_structured_output method is overridden. If the base implementation is
intended to be used, this method should be overridden.
See: https://docs.langchain.com/oss/python/langchain/structured-output
structured_output_kwargs
Dict property that can be used to specify additional kwargs for
with_structured_output. Useful for testing different models.
supports_json_mode
Boolean property indicating whether the chat model supports JSON mode in
with_structured_output.
See: https://docs.langchain.com/oss/python/langchain/structured-output
supports_image_inputs
Boolean property indicating whether the chat model supports image inputs.
Defaults to False.
If set to True, the chat model will be tested by inputting an
ImageContentBlock with the shape:
{
"type": "image",
"base64": "<base64 image data>",
"mime_type": "image/jpeg", # or appropriate mime-type
}
In addition to OpenAI-style content blocks:
See https://docs.langchain.com/oss/python/langchain/models#multimodal
supports_image_urls
Boolean property indicating whether the chat model supports image inputs from URLs.
Defaults to False.
If set to True, the chat model will be tested using content blocks of the
form
See https://docs.langchain.com/oss/python/langchain/models#multimodal
supports_pdf_inputs
Boolean property indicating whether the chat model supports PDF inputs.
Defaults to False.
If set to True, the chat model will be tested by inputting a
FileContentBlock with the shape:
See https://docs.langchain.com/oss/python/langchain/models#multimodal
supports_audio_inputs
Boolean property indicating whether the chat model supports audio inputs.
Defaults to False.
If set to True, the chat model will be tested by inputting an
AudioContentBlock with the shape:
{
"type": "audio",
"base64": "<base64 audio data>",
"mime_type": "audio/wav", # or appropriate mime-type
}
See https://docs.langchain.com/oss/python/langchain/models#multimodal
supports_video_inputs
Boolean property indicating whether the chat model supports image inputs.
Defaults to False. No current tests are written for this feature.
returns_usage_metadata
Boolean property indicating whether the chat model returns usage metadata on invoke and streaming responses.
Defaults to True.
usage_metadata is an optional dict attribute on AIMessage objects that track
input and output tokens.
See more.
Models supporting usage_metadata should also return the name of the
underlying model in the response_metadata of the AIMessage.
supports_anthropic_inputs
Boolean property indicating whether the chat model supports Anthropic-style inputs.
These inputs might feature "tool use" and "tool result" content blocks, e.g.,
[
{"type": "text", "text": "Hmm let me think about that"},
{
"type": "tool_use",
"input": {"fav_color": "green"},
"id": "foo",
"name": "color_picker",
},
]
If set to True, the chat model will be tested using content blocks of this
form.
supports_image_tool_message
Boolean property indicating whether the chat model supports a ToolMessage
that includes image content, e.g. in the OpenAI Chat Completions format:
ToolMessage(
content=[
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
},
],
tool_call_id="1",
name="random_image",
)
as well as the LangChain ImageContentBlock format:
ToolMessage(
content=[
{
"type": "image",
"base64": image_data,
"mime_type": "image/jpeg",
},
],
tool_call_id="1",
name="random_image",
)
If set to True, the chat model will be tested with message sequences that
include ToolMessage objects of this form.
supports_pdf_tool_message
Boolean property indicating whether the chat model supports a ToolMessage
that include PDF content using the LangChainFileContentBlock` format:
ToolMessage(
content=[
{
"type": "file",
"base64": pdf_data,
"mime_type": "application/pdf",
},
],
tool_call_id="1",
name="random_pdf",
)
If set to True, the chat model will be tested with message sequences that
include ToolMessage objects of this form.
supported_usage_metadata_details
Property controlling what usage metadata details are emitted in both invoke and stream.
usage_metadata is an optional dict attribute on AIMessage objects that track
input and output tokens.
See more.
It includes optional keys input_token_details and output_token_details
that can track usage details associated with special types of tokens, such as
cached, audio, or reasoning.
Only needs to be overridden if these details are supplied.
enable_vcr_tests
Property controlling whether to enable select tests that rely on VCR caching of HTTP calls, such as benchmarking tests.
To enable these tests, follow these steps:
-
Override the
enable_vcr_testsproperty to returnTrue: -
Configure VCR to exclude sensitive headers and other information from cassettes.
Warning
VCR will by default record authentication headers and other sensitive information in cassettes. Read below for how to configure what information is recorded in cassettes.
To add configuration to VCR, add a
conftest.pyfile to thetests/directory and implement thevcr_configfixture there.langchain-testsexcludes the headers'authorization','x-api-key', and'api-key'from VCR cassettes. To pick up this configuration, you will need to addconftest.pyas shown below. You can also exclude additional headers, override the default exclusions, or apply other customizations to the VCR configuration. See example below:tests/conftest.pyimport pytest from langchain_tests.conftest import ( _base_vcr_config as _base_vcr_config, ) _EXTRA_HEADERS = [ # Specify additional headers to redact ("user-agent", "PLACEHOLDER"), ] def remove_response_headers(response: dict) -> dict: # If desired, remove or modify headers in the response. response["headers"] = {} return response @pytest.fixture(scope="session") def vcr_config(_base_vcr_config: dict) -> dict: # noqa: F811 """Extend the default configuration from langchain_tests.""" config = _base_vcr_config.copy() config.setdefault("filter_headers", []).extend(_EXTRA_HEADERS) config["before_record_response"] = remove_response_headers return configCompressing cassettes
langchain-testsincludes a custom VCR serializer that compresses cassettes using gzip. To use it, register theyaml.gzserializer to your VCR fixture and enable this serializer in the config. See example below:tests/conftest.pyimport pytest from langchain_tests.conftest import ( CustomPersister, CustomSerializer, ) from langchain_tests.conftest import ( _base_vcr_config as _base_vcr_config, ) from vcr import VCR _EXTRA_HEADERS = [ # Specify additional headers to redact ("user-agent", "PLACEHOLDER"), ] def remove_response_headers(response: dict) -> dict: # If desired, remove or modify headers in the response. response["headers"] = {} return response @pytest.fixture(scope="session") def vcr_config(_base_vcr_config: dict) -> dict: # noqa: F811 """Extend the default configuration from langchain_tests.""" config = _base_vcr_config.copy() config.setdefault("filter_headers", []).extend(_EXTRA_HEADERS) config["before_record_response"] = remove_response_headers # New: enable serializer and set file extension config["serializer"] = "yaml.gz" config["path_transformer"] = VCR.ensure_suffix(".yaml.gz") return config def pytest_recording_configure(config: dict, vcr: VCR) -> None: vcr.register_persister(CustomPersister()) vcr.register_serializer("yaml.gz", CustomSerializer())You can inspect the contents of the compressed cassettes (e.g., to ensure no sensitive information is recorded) using
or by using the serializer:
-
Run tests to generate VCR cassettes.
Example:
This will generate a VCR cassette for the test in
tests/integration_tests/cassettes/.Warning
You should inspect the generated cassette to ensure that it does not contain sensitive information. If it does, you can modify the
vcr_configfixture to exclude headers or modify the response before it is recorded.You can then commit the cassette to your repository. Subsequent test runs will use the cassette instead of making HTTP calls.
| METHOD | DESCRIPTION |
|---|---|
test_no_overrides_DO_NOT_OVERRIDE |
Test that no standard tests are overridden. |
model |
Model fixture. |
my_adder_tool |
Adder tool fixture. |
test_invoke |
Test to verify that |
test_ainvoke |
Test to verify that |
test_stream |
Test to verify that |
test_astream |
Test to verify that |
test_batch |
Test to verify that |
test_abatch |
Test to verify that |
test_conversation |
Test to verify that the model can handle multi-turn conversations. |
test_double_messages_conversation |
Test to verify that the model can handle double-message conversations. |
test_usage_metadata |
Test to verify that the model returns correct usage metadata. |
test_usage_metadata_streaming |
Test usage metadata in streaming mode. |
test_stop_sequence |
Test that model does not fail when invoked with the |
test_tool_calling |
Test that the model generates tool calls. |
test_tool_calling_async |
Test that the model generates tool calls. |
test_bind_runnables_as_tools |
Test bind runnables as tools. |
test_tool_message_histories_string_content |
Test that message histories are compatible with string tool contents. |
test_tool_message_histories_list_content |
Test that message histories are compatible with list tool contents. |
test_tool_choice |
Test |
test_tool_calling_with_no_arguments |
Test that the model generates tool calls for tools with no arguments. |
test_tool_message_error_status |
Test that |
test_structured_few_shot_examples |
Test that the model can process few-shot examples with tool calls. |
test_structured_output |
Test to verify structured output is generated both on invoke and stream. |
test_structured_output_async |
Test to verify structured output is generated both on invoke and stream. |
test_structured_output_pydantic_2_v1 |
Test structured output using pydantic.v1.BaseModel. |
test_structured_output_optional_param |
Test structured output with optional parameters. |
test_json_mode |
Test structured output via JSON mode.. |
test_pdf_inputs |
Test that the model can process PDF inputs. |
test_audio_inputs |
Test that the model can process audio inputs. |
test_image_inputs |
Test that the model can process image inputs. |
test_image_tool_message |
Test that the model can process |
test_pdf_tool_message |
Test that the model can process |
test_anthropic_inputs |
Test that model can process Anthropic-style message histories. |
test_message_with_name |
Test that |
test_agent_loop |
Test that the model supports a simple ReAct agent loop. |
test_stream_time |
Test that streaming does not introduce undue overhead. |
invoke_with_audio_input |
Invoke with audio input. |
invoke_with_audio_output |
Invoke with audio output. |
invoke_with_reasoning_output |
Invoke with reasoning output. |
invoke_with_cache_read_input |
Invoke with cache read input. |
invoke_with_cache_creation_input |
Invoke with cache creation input. |
test_unicode_tool_call_integration |
Generic integration test for Unicode characters in tool calls. |
chat_model_class
abstractmethod
property
¶
chat_model_class: type[BaseChatModel]
The chat model class to test, e.g., ChatParrotLink.
tool_choice_value
property
¶
tool_choice_value: str | None
(None or str) to use for tool choice when used in tests.
has_structured_output
property
¶
has_structured_output: bool
Whether the chat model supports structured output.
structured_output_kwargs
property
¶
structured_output_kwargs: dict
If specified, additional kwargs for with_structured_output.
supports_image_inputs
property
¶
supports_image_inputs: bool
Supports image inputs.
Whether the chat model supports image inputs, defaults to
False.
supports_image_urls
property
¶
supports_image_urls: bool
Supports image inputs from URLs.
Whether the chat model supports image inputs from URLs, defaults to
False.
supports_pdf_inputs
property
¶
supports_pdf_inputs: bool
Whether the chat model supports PDF inputs, defaults to False.
supports_audio_inputs
property
¶
supports_audio_inputs: bool
Supports audio inputs.
Whether the chat model supports audio inputs, defaults to False.
supports_video_inputs
property
¶
supports_video_inputs: bool
Supports video inputs.
Whether the chat model supports video inputs, defaults to False.
No current tests are written for this feature.
returns_usage_metadata
property
¶
returns_usage_metadata: bool
Returns usage metadata.
Whether the chat model returns usage metadata on invoke and streaming responses.
supports_anthropic_inputs
property
¶
supports_anthropic_inputs: bool
Whether the chat model supports Anthropic-style inputs.
supports_image_tool_message
property
¶
supports_image_tool_message: bool
Supports image ToolMessage objects.
Whether the chat model supports ToolMessage objects that include image
content.
supports_pdf_tool_message
property
¶
supports_pdf_tool_message: bool
Supports PDF ToolMessage objects.
Whether the chat model supports ToolMessage objects that include PDF
content.
enable_vcr_tests
property
¶
enable_vcr_tests: bool
Whether to enable VCR tests for the chat model.
Warning
See enable_vcr_tests dropdown above <ChatModelTests> for more
information.
supported_usage_metadata_details
property
¶
supported_usage_metadata_details: dict[
Literal["invoke", "stream"],
list[
Literal[
"audio_input",
"audio_output",
"reasoning_output",
"cache_read_input",
"cache_creation_input",
]
],
]
Supported usage metadata details.
What usage metadata details are emitted in invoke and stream. Only needs to be overridden if these details are returned by the model.
standard_chat_model_params
property
¶
standard_chat_model_params: dict
Standard parameters for chat model.
test_no_overrides_DO_NOT_OVERRIDE
¶
Test that no standard tests are overridden.
test_invoke
¶
test_invoke(model: BaseChatModel) -> None
Test to verify that model.invoke(simple_message) works.
This should pass for all integrations.
test_ainvoke
async
¶
test_ainvoke(model: BaseChatModel) -> None
Test to verify that await model.ainvoke(simple_message) works.
This should pass for all integrations. Passing this test does not indicate a "natively async" implementation, but rather that the model can be used in an async context.
Troubleshooting
First, debug
langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.test_invoke.
because ainvoke has a default implementation that calls invoke in an
async context.
If that test passes but not this one, you should make sure your _agenerate
method does not raise any exceptions, and that it returns a valid
langchain_core.outputs.chat_result.ChatResult like so:
test_stream
¶
test_stream(model: BaseChatModel) -> None
Test to verify that model.stream(simple_message) works.
This should pass for all integrations. Passing this test does not indicate a "streaming" implementation, but rather that the model can be used in a streaming context.
Troubleshooting
First, debug
langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.test_invoke.
because stream has a default implementation that calls invoke and
yields the result as a single chunk.
If that test passes but not this one, you should make sure your _stream
method does not raise any exceptions, and that it yields valid
langchain_core.outputs.chat_generation.ChatGenerationChunk
objects like so:
test_astream
async
¶
test_astream(model: BaseChatModel) -> None
Test to verify that await model.astream(simple_message) works.
This should pass for all integrations. Passing this test does not indicate a "natively async" or "streaming" implementation, but rather that the model can be used in an async streaming context.
Troubleshooting
First, debug
langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.test_stream.
and
langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.test_ainvoke.
because astream has a default implementation that calls _stream in
an async context if it is implemented, or ainvoke and yields the result
as a single chunk if not.
If those tests pass but not this one, you should make sure your _astream
method does not raise any exceptions, and that it yields valid
langchain_core.outputs.chat_generation.ChatGenerationChunk
objects like so:
test_batch
¶
test_batch(model: BaseChatModel) -> None
Test to verify that model.batch([messages]) works.
This should pass for all integrations. Tests the model's ability to process multiple prompts in a single batch.
Troubleshooting
First, debug
langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.test_invoke
because batch has a default implementation that calls invoke for
each message in the batch.
If that test passes but not this one, you should make sure your batch
method does not raise any exceptions, and that it returns a list of valid
langchain_core.messages.AIMessage objects.
test_abatch
async
¶
test_abatch(model: BaseChatModel) -> None
Test to verify that await model.abatch([messages]) works.
This should pass for all integrations. Tests the model's ability to process multiple prompts in a single batch asynchronously.
Troubleshooting
First, debug
langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.test_batch
and
langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.test_ainvoke
because abatch has a default implementation that calls ainvoke for
each message in the batch.
If those tests pass but not this one, you should make sure your abatch
method does not raise any exceptions, and that it returns a list of valid
langchain_core.messages.AIMessage objects.
test_conversation
¶
test_conversation(model: BaseChatModel) -> None
Test to verify that the model can handle multi-turn conversations.
This should pass for all integrations. Tests the model's ability to process a sequence of alternating human and AI messages as context for generating the next response.
Troubleshooting
First, debug
langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.test_invoke
because this test also uses model.invoke.
If that test passes but not this one, you should verify that:
1. Your model correctly processes the message history
2. The model maintains appropriate context from previous messages
3. The response is a valid langchain_core.messages.AIMessage
test_double_messages_conversation
¶
test_double_messages_conversation(model: BaseChatModel) -> None
Test to verify that the model can handle double-message conversations.
This should pass for all integrations. Tests the model's ability to process a sequence of double-system, double-human, and double-ai messages as context for generating the next response.
Troubleshooting
First, debug
langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.test_invoke
because this test also uses model.invoke.
Second, debug
langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.test_conversation
because this test is the "basic case" without double messages.
If that test passes those but not this one, you should verify that:
1. Your model API can handle double messages, or the integration should
merge messages before sending them to the API.
2. The response is a valid langchain_core.messages.AIMessage
test_usage_metadata
¶
test_usage_metadata(model: BaseChatModel) -> None
Test to verify that the model returns correct usage metadata.
This test is optional and should be skipped if the model does not return usage metadata (see Configuration below).
Behavior changed in 0.3.17
Additionally check for the presence of model_name in the response
metadata, which is needed for usage tracking in callback handlers.
Configuration
By default, this test is run.
To disable this feature, set returns_usage_metadata to False in your
test class:
class TestMyChatModelIntegration(ChatModelIntegrationTests):
@property
def returns_usage_metadata(self) -> bool:
return False
This test can also check the format of specific kinds of usage metadata
based on the supported_usage_metadata_details property. This property
should be configured as follows with the types of tokens that the model
supports tracking:
class TestMyChatModelIntegration(ChatModelIntegrationTests):
@property
def supported_usage_metadata_details(self) -> dict:
return {
"invoke": [
"audio_input",
"audio_output",
"reasoning_output",
"cache_read_input",
"cache_creation_input",
],
"stream": [
"audio_input",
"audio_output",
"reasoning_output",
"cache_read_input",
"cache_creation_input",
],
}
Troubleshooting
If this test fails, first verify that your model returns
langchain_core.messages.ai.UsageMetadata dicts
attached to the returned AIMessage object in _generate:
return ChatResult(
generations=[
ChatGeneration(
message=AIMessage(
content="Output text",
usage_metadata={
"input_tokens": 350,
"output_tokens": 240,
"total_tokens": 590,
"input_token_details": {
"audio": 10,
"cache_creation": 200,
"cache_read": 100,
},
"output_token_details": {
"audio": 10,
"reasoning": 200,
},
},
)
)
]
)
Check also that the response includes a model_name key in its
usage_metadata.
test_usage_metadata_streaming
¶
test_usage_metadata_streaming(model: BaseChatModel) -> None
Test usage metadata in streaming mode.
Test to verify that the model returns correct usage metadata in streaming mode.
Behavior changed in 0.3.17
Additionally check for the presence of model_name in the response
metadata, which is needed for usage tracking in callback handlers.
Configuration
By default, this test is run.
To disable this feature, set returns_usage_metadata to False in your
test class:
class TestMyChatModelIntegration(ChatModelIntegrationTests):
@property
def returns_usage_metadata(self) -> bool:
return False
This test can also check the format of specific kinds of usage metadata
based on the supported_usage_metadata_details property. This property
should be configured as follows with the types of tokens that the model
supports tracking:
class TestMyChatModelIntegration(ChatModelIntegrationTests):
@property
def supported_usage_metadata_details(self) -> dict:
return {
"invoke": [
"audio_input",
"audio_output",
"reasoning_output",
"cache_read_input",
"cache_creation_input",
],
"stream": [
"audio_input",
"audio_output",
"reasoning_output",
"cache_read_input",
"cache_creation_input",
],
}
Troubleshooting
If this test fails, first verify that your model yields
langchain_core.messages.ai.UsageMetadata dicts
attached to the returned AIMessage object in _stream
that sum up to the total usage metadata.
Note that input_tokens should only be included on one of the chunks
(typically the first or the last chunk), and the rest should have 0 or
None to avoid counting input tokens multiple times.
output_tokens typically count the number of tokens in each chunk, not
the sum. This test will pass as long as the sum of output_tokens across
all chunks is not 0.
yield ChatResult(
generations=[
ChatGeneration(
message=AIMessage(
content="Output text",
usage_metadata={
"input_tokens": (
num_input_tokens if is_first_chunk else 0
),
"output_tokens": 11,
"total_tokens": (
11 + num_input_tokens if is_first_chunk else 11
),
"input_token_details": {
"audio": 10,
"cache_creation": 200,
"cache_read": 100,
},
"output_token_details": {
"audio": 10,
"reasoning": 200,
},
},
)
)
]
)
Check also that the aggregated response includes a model_name key
in its usage_metadata.
test_stop_sequence
¶
test_stop_sequence(model: BaseChatModel) -> None
Test that model does not fail when invoked with the stop parameter.
The stop parameter is a standard parameter for stopping generation at a
certain token.
This should pass for all integrations.
Troubleshooting
If this test fails, check that the function signature for _generate
(as well as _stream and async variants) accepts the stop parameter:
test_tool_calling
¶
test_tool_calling(model: BaseChatModel) -> None
Test that the model generates tool calls.
This test is skipped if the has_tool_calling property on the test class is
set to False.
This test is optional and should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set has_tool_calling to False in your
test class:
Troubleshooting
If this test fails, check that bind_tools is implemented to correctly
translate LangChain tool objects into the appropriate schema for your
chat model.
This test may fail if the chat model does not support a tool_choice
parameter. This parameter can be used to force a tool call. If
tool_choice is not supported and the model consistently fails this
test, you can xfail the test:
@pytest.mark.xfail(reason=("Does not support tool_choice."))
def test_tool_calling(self, model: BaseChatModel) -> None:
super().test_tool_calling(model)
Otherwise, in the case that only one tool is bound, ensure that
tool_choice supports the string 'any' to force calling that tool.
test_tool_calling_async
async
¶
test_tool_calling_async(model: BaseChatModel) -> None
Test that the model generates tool calls.
This test is skipped if the has_tool_calling property on the test class is
set to False.
This test is optional and should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set has_tool_calling to False in your
test class:
Troubleshooting
If this test fails, check that bind_tools is implemented to correctly
translate LangChain tool objects into the appropriate schema for your
chat model.
This test may fail if the chat model does not support a tool_choice
parameter. This parameter can be used to force a tool call. If
tool_choice is not supported and the model consistently fails this
test, you can xfail the test:
@pytest.mark.xfail(reason=("Does not support tool_choice."))
async def test_tool_calling_async(self, model: BaseChatModel) -> None:
await super().test_tool_calling_async(model)
Otherwise, in the case that only one tool is bound, ensure that
tool_choice supports the string 'any' to force calling that tool.
test_bind_runnables_as_tools
¶
test_bind_runnables_as_tools(model: BaseChatModel) -> None
Test bind runnables as tools.
Test that the model generates tool calls for tools that are derived from
LangChain runnables. This test is skipped if the has_tool_calling property
on the test class is set to False.
This test is optional and should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set has_tool_calling to False in your
test class:
Troubleshooting
If this test fails, check that bind_tools is implemented to correctly
translate LangChain tool objects into the appropriate schema for your
chat model.
This test may fail if the chat model does not support a tool_choice
parameter. This parameter can be used to force a tool call. If
tool_choice is not supported and the model consistently fails this
test, you can xfail the test:
@pytest.mark.xfail(reason=("Does not support tool_choice."))
def test_bind_runnables_as_tools(self, model: BaseChatModel) -> None:
super().test_bind_runnables_as_tools(model)
Otherwise, ensure that the tool_choice_value property is correctly
specified on the test class.
test_tool_message_histories_string_content
¶
test_tool_message_histories_string_content(
model: BaseChatModel, my_adder_tool: BaseTool
) -> None
Test that message histories are compatible with string tool contents.
For instance with OpenAI format contents. If a model passes this test, it should be compatible with messages generated from providers following OpenAI format.
This test should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set has_tool_calling to False in your
test class:
Troubleshooting
If this test fails, check that:
- The model can correctly handle message histories that include
AIMessageobjects with""content. - The
tool_callsattribute onAIMessageobjects is correctly handled and passed to the model in an appropriate format. - The model can correctly handle
ToolMessageobjects with string content and arbitrary string values fortool_call_id.
You can xfail the test if tool calling is implemented but this format
is not supported.
test_tool_message_histories_list_content
¶
test_tool_message_histories_list_content(
model: BaseChatModel, my_adder_tool: BaseTool
) -> None
Test that message histories are compatible with list tool contents.
For instance with Anthropic format contents.
These message histories will include AIMessage objects with "tool use" and
content blocks, e.g.,
[
{"type": "text", "text": "Hmm let me think about that"},
{
"type": "tool_use",
"input": {"fav_color": "green"},
"id": "foo",
"name": "color_picker",
},
]
This test should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set has_tool_calling to False in your
test class:
Troubleshooting
If this test fails, check that:
- The model can correctly handle message histories that include
AIMessageobjects with list content. - The
tool_callsattribute onAIMessageobjects is correctly handled and passed to the model in an appropriate format. - The model can correctly handle ToolMessage objects with string content
and arbitrary string values for
tool_call_id.
You can xfail the test if tool calling is implemented but this format
is not supported.
test_tool_choice
¶
test_tool_choice(model: BaseChatModel) -> None
Test tool_choice parameter.
Test that the model can force tool calling via the tool_choice
parameter. This test is skipped if the has_tool_choice property on the
test class is set to False.
This test is optional and should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set has_tool_choice to False in your
test class:
Troubleshooting
If this test fails, check whether the test_tool_calling test is passing.
If it is not, refer to the troubleshooting steps in that test first.
If test_tool_calling is passing, check that the underlying model
supports forced tool calling. If it does, bind_tools should accept a
tool_choice parameter that can be used to force a tool call.
It should accept (1) the string 'any' to force calling the bound tool,
and (2) the string name of the tool to force calling that tool.
test_tool_calling_with_no_arguments
¶
test_tool_calling_with_no_arguments(model: BaseChatModel) -> None
Test that the model generates tool calls for tools with no arguments.
This test is skipped if the has_tool_calling property on the test class
is set to False.
This test is optional and should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set has_tool_calling to False in your
test class:
Troubleshooting
If this test fails, check that bind_tools is implemented to correctly
translate LangChain tool objects into the appropriate schema for your
chat model. It should correctly handle the case where a tool has no
arguments.
This test may fail if the chat model does not support a tool_choice
parameter. This parameter can be used to force a tool call. It may also
fail if a provider does not support this form of tool. In these cases,
you can xfail the test:
@pytest.mark.xfail(reason=("Does not support tool_choice."))
def test_tool_calling_with_no_arguments(self, model: BaseChatModel) -> None:
super().test_tool_calling_with_no_arguments(model)
Otherwise, in the case that only one tool is bound, ensure that
tool_choice supports the string 'any' to force calling that tool.
test_tool_message_error_status
¶
test_tool_message_error_status(model: BaseChatModel, my_adder_tool: BaseTool) -> None
Test that ToolMessage with status="error" can be handled.
These messages may take the form:
ToolMessage(
"Error: Missing required argument 'b'.",
name="my_adder_tool",
tool_call_id="abc123",
status="error",
)
If possible, the status field should be parsed and passed appropriately
to the model.
This test is optional and should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set has_tool_calling to False in your
test class:
Troubleshooting
If this test fails, check that the status field on ToolMessage
objects is either ignored or passed to the model appropriately.
test_structured_few_shot_examples
¶
test_structured_few_shot_examples(
model: BaseChatModel, my_adder_tool: BaseTool
) -> None
Test that the model can process few-shot examples with tool calls.
These are represented as a sequence of messages of the following form:
HumanMessagewith string content;AIMessagewith thetool_callsattribute populated;ToolMessagewith string content;AIMessagewith string content (an answer);HumanMessagewith string content (a follow-up question).
This test should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set has_tool_calling to False in your
test class:
Troubleshooting
This test uses a utility function.
in langchain_core to generate a sequence of messages representing
"few-shot" examples.
If this test fails, check that the model can correctly handle this sequence of messages.
You can xfail the test if tool calling is implemented but this format
is not supported.
test_structured_output
¶
test_structured_output(
model: BaseChatModel, schema_type: Literal["pydantic", "typeddict", "json_schema"]
) -> None
Test to verify structured output is generated both on invoke and stream.
This test is optional and should be skipped if the model does not support structured output (see Configuration below).
Configuration
To disable structured output tests, set has_structured_output to False
in your test class:
class TestMyChatModelIntegration(ChatModelIntegrationTests):
@property
def has_structured_output(self) -> bool:
return False
By default, has_structured_output is True if a model overrides the
with_structured_output or bind_tools methods.
Troubleshooting
If this test fails, ensure that the model's bind_tools method
properly handles both JSON Schema and Pydantic V2 models.
langchain_core implements a utility function.
that will accommodate most formats.
See example implementation.
of with_structured_output.
test_structured_output_async
async
¶
test_structured_output_async(
model: BaseChatModel, schema_type: Literal["pydantic", "typeddict", "json_schema"]
) -> None
Test to verify structured output is generated both on invoke and stream.
This test is optional and should be skipped if the model does not support structured output (see Configuration below).
Configuration
To disable structured output tests, set has_structured_output to False
in your test class:
class TestMyChatModelIntegration(ChatModelIntegrationTests):
@property
def has_structured_output(self) -> bool:
return False
By default, has_structured_output is True if a model overrides the
with_structured_output or bind_tools methods.
Troubleshooting
If this test fails, ensure that the model's bind_tools method
properly handles both JSON Schema and Pydantic V2 models.
langchain_core implements a utility function.
that will accommodate most formats.
See example implementation.
of with_structured_output.
test_structured_output_pydantic_2_v1
¶
test_structured_output_pydantic_2_v1(model: BaseChatModel) -> None
Test structured output using pydantic.v1.BaseModel.
Verify we can generate structured output using pydantic.v1.BaseModel.
pydantic.v1.BaseModel is available in the Pydantic 2 package.
This test is optional and should be skipped if the model does not support structured output (see Configuration below).
Configuration
To disable structured output tests, set has_structured_output to False
in your test class:
class TestMyChatModelIntegration(ChatModelIntegrationTests):
@property
def has_structured_output(self) -> bool:
return False
By default, has_structured_output is True if a model overrides the
with_structured_output or bind_tools methods.
Troubleshooting
If this test fails, ensure that the model's bind_tools method
properly handles both JSON Schema and Pydantic V1 models.
langchain_core implements a utility function.
that will accommodate most formats.
See example implementation.
of with_structured_output.
test_structured_output_optional_param
¶
test_structured_output_optional_param(model: BaseChatModel) -> None
Test structured output with optional parameters.
Test to verify we can generate structured output that includes optional parameters.
This test is optional and should be skipped if the model does not support structured output (see Configuration below).
Configuration
To disable structured output tests, set has_structured_output to False
in your test class:
class TestMyChatModelIntegration(ChatModelIntegrationTests):
@property
def has_structured_output(self) -> bool:
return False
By default, has_structured_output is True if a model overrides the
with_structured_output or bind_tools methods.
Troubleshooting
If this test fails, ensure that the model's bind_tools method
properly handles Pydantic V2 models with optional parameters.
langchain_core implements a utility function.
that will accommodate most formats.
See example implementation.
of with_structured_output.
test_json_mode
¶
test_json_mode(model: BaseChatModel) -> None
Test structured output via JSON mode..
This test is optional and should be skipped if the model does not support the JSON mode feature (see Configuration below).
Configuration
To disable this test, set supports_json_mode to False in your
test class:
Troubleshooting
See example implementation of with_structured_output here: https://python.langchain.com/api_reference/_modules/langchain_openai/chat_models/base.html#BaseChatOpenAI.with_structured_output
test_pdf_inputs
¶
test_pdf_inputs(model: BaseChatModel) -> None
Test that the model can process PDF inputs.
This test should be skipped (see Configuration below) if the model does not
support PDF inputs. These will take the shape of the LangChain
FileContentBlock:
Furthermore, for backward-compatibility, we must also support OpenAI chat completions file content blocks:
(
{
"type": "file",
"file": {
"filename": "test_file.pdf",
"file_data": f"data:application/pdf;base64,{pdf_data}",
},
},
)
Configuration
To disable this test, set supports_pdf_inputs to False in your
test class:
Troubleshooting
If this test fails, check that the model can correctly handle messages
with pdf content blocks, including base64-encoded files. Otherwise, set
the supports_pdf_inputs property to False.
test_audio_inputs
¶
test_audio_inputs(model: BaseChatModel) -> None
Test that the model can process audio inputs.
This test should be skipped (see Configuration below) if the model does not
support audio inputs. These will take the shape of the LangChain
AudioContentBlock:
{
"type": "audio",
"base64": "<base64 audio data>",
"mime_type": "audio/wav", # or appropriate mime-type
}
Furthermore, for backward-compatibility, we must also support OpenAI chat completions audio content blocks:
{
"type": "input_audio",
"input_audio": {
"data": "<base64 audio data>",
"format": "wav", # or appropriate format
},
}
Configuration
To disable this test, set supports_audio_inputs to False in your
test class:
Troubleshooting
If this test fails, check that the model can correctly handle messages
with audio content blocks, specifically base64-encoded files. Otherwise,
set the supports_audio_inputs property to False.
test_image_inputs
¶
test_image_inputs(model: BaseChatModel) -> None
Test that the model can process image inputs.
This test should be skipped (see Configuration below) if the model does not
support image inputs. These will take the shape of the LangChain
ImageContentBlock:
{
"type": "image",
"base64": "<base64 image data>",
"mime_type": "image/jpeg", # or appropriate mime-type
}
For backward-compatibility, we must also support OpenAI chat completions image content blocks containing base64-encoded images:
[
{"type": "text", "text": "describe the weather in this image"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
},
]
See https://python.langchain.com/docs/concepts/multimodality/
If the property supports_image_urls is set to True, the test will also
check that we can process content blocks of the form:
Configuration
To disable this test, set supports_image_inputs to False in your
test class:
Troubleshooting
If this test fails, check that the model can correctly handle messages
with image content blocks, including base64-encoded images. Otherwise, set
the supports_image_inputs property to False.
test_image_tool_message
¶
test_image_tool_message(model: BaseChatModel) -> None
Test that the model can process ToolMessage objects with image inputs.
This test should be skipped if the model does not support messages of the
Chat Completions image_url format:
ToolMessage(
content=[
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
},
],
tool_call_id="1",
name="random_image",
)
In addition, models should support the standard LangChain ImageContentBlock
format:
ToolMessage(
content=[
{
"type": "image",
"base64": image_data,
"mime_type": "image/jpeg",
},
],
tool_call_id="1",
name="random_image",
)
This test can be skipped by setting the supports_image_tool_message property
to False (see Configuration below).
Configuration
To disable this test, set supports_image_tool_message to False in your
test class:
Troubleshooting
If this test fails, check that the model can correctly handle messages
with image content blocks in ToolMessage objects, including base64-encoded
images. Otherwise, set the supports_image_tool_message property to
False.
test_pdf_tool_message
¶
test_pdf_tool_message(model: BaseChatModel) -> None
Test that the model can process ToolMessage objects with PDF inputs.
This test should be skipped if the model does not support messages of the
LangChain FileContentBlock format:
ToolMessage(
content=[
{
"type": "file",
"base64": pdf_data,
"mime_type": "application/pdf",
},
],
tool_call_id="1",
name="random_pdf",
)
This test can be skipped by setting the supports_pdf_tool_message property
to False (see Configuration below).
Configuration
To disable this test, set supports_pdf_tool_message to False in your
test class:
Troubleshooting
If this test fails, check that the model can correctly handle messages
with PDF content blocks in ToolMessage objects, specifically
base64-encoded PDFs. Otherwise, set the supports_pdf_tool_message property
to False.
test_anthropic_inputs
¶
test_anthropic_inputs(model: BaseChatModel) -> None
Test that model can process Anthropic-style message histories.
These message histories will include AIMessage objects with tool_use
content blocks, e.g.,
AIMessage(
[
{"type": "text", "text": "Hmm let me think about that"},
{
"type": "tool_use",
"input": {"fav_color": "green"},
"id": "foo",
"name": "color_picker",
},
]
)
as well as HumanMessage objects containing tool_result content blocks:
HumanMessage(
[
{
"type": "tool_result",
"tool_use_id": "foo",
"content": [
{
"type": "text",
"text": "green is a great pick! "
"that's my sister's favorite color",
}
],
"is_error": False,
},
{"type": "text", "text": "what's my sister's favorite color"},
]
)
This test should be skipped if the model does not support messages of this form (or doesn't support tool calling generally). See Configuration below.
Configuration
To disable this test, set supports_anthropic_inputs to False in your
test class:
Troubleshooting
If this test fails, check that:
- The model can correctly handle message histories that include message objects with list content.
- The
tool_callsattribute on AIMessage objects is correctly handled and passed to the model in an appropriate format. HumanMessages with "tool_result" content blocks are correctly handled.
Otherwise, if Anthropic tool call and result formats are not supported,
set the supports_anthropic_inputs property to False.
test_message_with_name
¶
test_message_with_name(model: BaseChatModel) -> None
Test that HumanMessage with values for the name field can be handled.
These messages may take the form:
If possible, the name field should be parsed and passed appropriately
to the model. Otherwise, it should be ignored.
Troubleshooting
If this test fails, check that the name field on HumanMessage
objects is either ignored or passed to the model appropriately.
test_agent_loop
¶
test_agent_loop(model: BaseChatModel) -> None
Test that the model supports a simple ReAct agent loop.
This test is skipped if the has_tool_calling property on the test class is
set to False.
This test is optional and should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set has_tool_calling to False in your
test class:
Troubleshooting
If this test fails, check that bind_tools is implemented to correctly
translate LangChain tool objects into the appropriate schema for your
chat model.
Check also that all required information (e.g., tool calling identifiers)
from AIMessage objects is propagated correctly to model payloads.
This test may fail if the chat model does not consistently generate tool
calls in response to an appropriate query. In these cases you can xfail
the test:
test_stream_time
¶
test_stream_time(
model: BaseChatModel, benchmark: BenchmarkFixture, vcr: Cassette
) -> None
Test that streaming does not introduce undue overhead.
See enable_vcr_tests dropdown above <ChatModelIntegrationTests>
for more information.
Configuration
This test can be enabled or disabled using the enable_vcr_tests
property. For example, to disable the test, set this property to False:
Warning
VCR will by default record authentication headers and other sensitive
information in cassettes. See enable_vcr_tests dropdown
above <ChatModelIntegrationTests> for how to configure what
information is recorded in cassettes.
invoke_with_audio_input
¶
Invoke with audio input.
invoke_with_audio_output
¶
Invoke with audio output.
invoke_with_reasoning_output
¶
Invoke with reasoning output.
invoke_with_cache_read_input
¶
Invoke with cache read input.
invoke_with_cache_creation_input
¶
Invoke with cache creation input.
test_unicode_tool_call_integration
¶
test_unicode_tool_call_integration(
model: BaseChatModel,
*,
tool_choice: str | None = None,
force_tool_call: bool = True,
) -> None
Generic integration test for Unicode characters in tool calls.
| PARAMETER | DESCRIPTION |
|---|---|
model
|
The chat model to test
TYPE:
|
tool_choice
|
Tool choice parameter to pass to
TYPE:
|
force_tool_call
|
Whether to force a tool call
(use
TYPE:
|
Tests that Unicode characters in tool call arguments are preserved correctly,
not escaped as \\uXXXX sequences.
EmbeddingsIntegrationTests
¶
Bases: EmbeddingsTests
Base class for embeddings integration tests.
Test subclasses must implement the embeddings_class property to specify the
embeddings model to be tested. You can also override the
embedding_model_params property to specify initialization parameters.
from typing import Type
from langchain_tests.integration_tests import EmbeddingsIntegrationTests
from my_package.embeddings import MyEmbeddingsModel
class TestMyEmbeddingsModelIntegration(EmbeddingsIntegrationTests):
@property
def embeddings_class(self) -> Type[MyEmbeddingsModel]:
# Return the embeddings model class to test here
return MyEmbeddingsModel
@property
def embedding_model_params(self) -> dict:
# Return initialization parameters for the model.
return {"model": "model-001"}
Note
API references for individual test methods include troubleshooting tips.
| METHOD | DESCRIPTION |
|---|---|
test_no_overrides_DO_NOT_OVERRIDE |
Test that no standard tests are overridden. |
model |
Embeddings model fixture. |
test_embed_query |
Test embedding a string query. |
test_embed_documents |
Test embedding a list of strings. |
test_aembed_query |
Test embedding a string query async. |
test_aembed_documents |
Test embedding a list of strings async. |
test_no_overrides_DO_NOT_OVERRIDE
¶
Test that no standard tests are overridden.
test_embed_query
¶
test_embed_query(model: Embeddings) -> None
Test embedding a string query.
Troubleshooting
If this test fails, check that:
- The model will generate a list of floats when calling
.embed_queryon a string. - The length of the list is consistent across different inputs.
test_embed_documents
¶
test_embed_documents(model: Embeddings) -> None
Test embedding a list of strings.
Troubleshooting
If this test fails, check that:
- The model will generate a list of lists of floats when calling
.embed_documentson a list of strings. - The length of each list is the same.
test_aembed_query
async
¶
test_aembed_query(model: Embeddings) -> None
Test embedding a string query async.
Troubleshooting
If this test fails, check that:
- The model will generate a list of floats when calling
.aembed_queryon a string. - The length of the list is consistent across different inputs.
test_aembed_documents
async
¶
test_aembed_documents(model: Embeddings) -> None
Test embedding a list of strings async.
Troubleshooting
If this test fails, check that:
- The model will generate a list of lists of floats when calling
.aembed_documentson a list of strings. - The length of each list is the same.
RetrieversIntegrationTests
¶
Bases: BaseStandardTests
Base class for retrievers integration tests.
| METHOD | DESCRIPTION |
|---|---|
test_no_overrides_DO_NOT_OVERRIDE |
Test that no standard tests are overridden. |
retriever |
Return retriever fixture. |
test_k_constructor_param |
Test the number of results constructor parameter. |
test_invoke_with_k_kwarg |
Test the number of results parameter in |
test_invoke_returns_documents |
Test invoke returns documents. |
test_ainvoke_returns_documents |
Test ainvoke returns documents. |
retriever_constructor
abstractmethod
property
¶
retriever_constructor: type[BaseRetriever]
A BaseRetriever subclass to be tested.
retriever_constructor_params
property
¶
retriever_constructor_params: dict
Returns a dictionary of parameters to pass to the retriever constructor.
retriever_query_example
abstractmethod
property
¶
retriever_query_example: str
Returns a str representing the query of an example retriever call.
num_results_arg_name
property
¶
num_results_arg_name: str
Returns the name of the parameter for the number of results returned.
Usually something like k or top_k.
test_no_overrides_DO_NOT_OVERRIDE
¶
Test that no standard tests are overridden.
test_k_constructor_param
¶
Test the number of results constructor parameter.
Test that the retriever constructor accepts a parameter representing the number of documents to return.
By default, the parameter tested is named k, but it can be overridden by
setting the num_results_arg_name property.
Note
If the retriever doesn't support configuring the number of results returned
via the constructor, this test can be skipped using a pytest xfail on
the test class:
Troubleshooting
If this test fails, the retriever constructor does not accept a number
of results parameter, or the retriever does not return the correct number
of documents ( of the one set in num_results_arg_name) when it is
set.
For example, a retriever like
should return 3 documents when invoked with a query.
test_invoke_with_k_kwarg
¶
test_invoke_with_k_kwarg(retriever: BaseRetriever) -> None
Test the number of results parameter in invoke.
Test that the invoke method accepts a parameter representing the number of documents to return.
By default, the parameter is named, but it can be overridden by
setting the num_results_arg_name property.
Note
If the retriever doesn't support configuring the number of results returned
via the invoke method, this test can be skipped using a pytest xfail on
the test class:
Troubleshooting
If this test fails, the retriever's invoke method does not accept a number
of results parameter, or the retriever does not return the correct number
of documents (k of the one set in num_results_arg_name) when it is
set.
For example, a retriever like
should return 3 documents when invoked with a query.
test_invoke_returns_documents
¶
test_invoke_returns_documents(retriever: BaseRetriever) -> None
Test invoke returns documents.
If invoked with the example params, the retriever should return a list of Documents.
Troubleshooting
If this test fails, the retriever's invoke method does not return a list of
Document objects. Please confirm that your
_get_relevant_documents method returns a list of Document objects.
test_ainvoke_returns_documents
async
¶
test_ainvoke_returns_documents(retriever: BaseRetriever) -> None
Test ainvoke returns documents.
If ainvoked with the example params, the retriever should return a list of Documents.
See test_invoke_returns_documents for more information on
troubleshooting.
ToolsIntegrationTests
¶
Bases: ToolsTests
Base class for tools integration tests.
| METHOD | DESCRIPTION |
|---|---|
test_no_overrides_DO_NOT_OVERRIDE |
Test that no standard tests are overridden. |
tool |
Tool fixture. |
test_invoke_matches_output_schema |
Test invoke matches output schema. |
test_async_invoke_matches_output_schema |
Test async invoke matches output schema. |
test_invoke_no_tool_call |
Test invoke without |
test_async_invoke_no_tool_call |
Test async invoke without |
tool_constructor
abstractmethod
property
¶
Returns a class or instance of a tool to be tested.
tool_constructor_params
property
¶
tool_constructor_params: dict
Returns a dictionary of parameters to pass to the tool constructor.
tool_invoke_params_example
property
¶
tool_invoke_params_example: dict
Returns a dictionary representing the "args" of an example tool call.
This should NOT be a ToolCall dict - it should not have
{"name", "id", "args"} keys.
test_no_overrides_DO_NOT_OVERRIDE
¶
Test that no standard tests are overridden.
test_invoke_matches_output_schema
¶
test_invoke_matches_output_schema(tool: BaseTool) -> None
Test invoke matches output schema.
If invoked with a ToolCall, the tool should return a valid ToolMessage
content.
If you have followed the custom tool guide,
this test should always pass because ToolCall inputs are handled by the
langchain_core.tools.BaseTool class.
If you have not followed this guide, you should ensure that your tool's
invoke method returns a valid ToolMessage content when it receives
a dict representing a ToolCall as input (as opposed to distinct args).
test_async_invoke_matches_output_schema
async
¶
test_async_invoke_matches_output_schema(tool: BaseTool) -> None
Test async invoke matches output schema.
If ainvoked with a ToolCall, the tool should return a valid ToolMessage
content.
For debugging tips, see test_invoke_matches_output_schema.
test_invoke_no_tool_call
¶
test_invoke_no_tool_call(tool: BaseTool) -> None
Test invoke without ToolCall.
If invoked without a ToolCall, the tool can return anything
but it shouldn't throw an error.
If this test fails, your tool may not be handling the input you defined
in tool_invoke_params_example correctly, and it's throwing an error.
This test doesn't have any checks. It's just to ensure that the tool doesn't throw an error when invoked with a dictionary of kwargs.
VectorStoreIntegrationTests
¶
Bases: BaseStandardTests
Base class for vector store integration tests.
Implementers should subclass this test suite and provide a fixture that returns an empty vector store for each test.
The fixture should use the get_embeddings method to get a pre-defined
embeddings model that should be used for this test suite.
Here is a template:
from typing import Generator
import pytest
from langchain_core.vectorstores import VectorStore
from langchain_parrot_link.vectorstores import ParrotVectorStore
from langchain_tests.integration_tests.vectorstores import VectorStoreIntegrationTests
class TestParrotVectorStore(VectorStoreIntegrationTests):
@pytest.fixture()
def vectorstore(self) -> Generator[VectorStore, None, None]: # type: ignore
"""Get an empty vectorstore."""
store = ParrotVectorStore(self.get_embeddings())
# note: store should be EMPTY at this point
# if you need to delete data, you may do so here
try:
yield store
finally:
# cleanup operations, or deleting data
pass
In the fixture, before the yield we instantiate an empty vector store. In the
finally block, we call whatever logic is necessary to bring the vector store
to a clean state.
from typing import Generator
import pytest
from langchain_core.vectorstores import VectorStore
from langchain_tests.integration_tests.vectorstores import VectorStoreIntegrationTests
from langchain_chroma import Chroma
class TestChromaStandard(VectorStoreIntegrationTests):
@pytest.fixture()
def vectorstore(self) -> Generator[VectorStore, None, None]: # type: ignore
"""Get an empty vectorstore for unit tests."""
store = Chroma(embedding_function=self.get_embeddings())
try:
yield store
finally:
store.delete_collection()
pass
Note that by default we enable both sync and async tests. To disable either,
override the has_sync or has_async properties to False in the
subclass. For example:
class TestParrotVectorStore(VectorStoreIntegrationTests):
@pytest.fixture()
def vectorstore(self) -> Generator[VectorStore, None, None]: # type: ignore
...
@property
def has_async(self) -> bool:
return False
Note
API references for individual test methods include troubleshooting tips.
| METHOD | DESCRIPTION |
|---|---|
test_no_overrides_DO_NOT_OVERRIDE |
Test that no standard tests are overridden. |
vectorstore |
Get the vectorstore class to test. |
get_embeddings |
Get embeddings. |
test_vectorstore_is_empty |
Test that the vectorstore is empty. |
test_add_documents |
Test adding documents into the |
test_vectorstore_still_empty |
Test that the vectorstore is still empty. |
test_deleting_documents |
Test deleting documents from the vectorstore. |
test_deleting_bulk_documents |
Test that we can delete several documents at once. |
test_delete_missing_content |
Deleting missing content should not raise an exception. |
test_add_documents_with_ids_is_idempotent |
Adding by ID should be idempotent. |
test_add_documents_by_id_with_mutation |
Test that we can overwrite by ID using add_documents. |
test_get_by_ids |
Test get by IDs. |
test_get_by_ids_missing |
Test get by IDs with missing IDs. |
test_add_documents_documents |
Run |
test_add_documents_with_existing_ids |
Test that |
test_vectorstore_is_empty_async |
Test that the vectorstore is empty. |
test_add_documents_async |
Test adding documents into the |
test_vectorstore_still_empty_async |
Test that the vectorstore is still empty. |
test_deleting_documents_async |
Test deleting documents from the vectorstore. |
test_deleting_bulk_documents_async |
Test that we can delete several documents at once. |
test_delete_missing_content_async |
Deleting missing content should not raise an exception. |
test_add_documents_with_ids_is_idempotent_async |
Adding by ID should be idempotent. |
test_add_documents_by_id_with_mutation_async |
Test that we can overwrite by ID using add_documents. |
test_get_by_ids_async |
Test get by IDs. |
test_get_by_ids_missing_async |
Test get by IDs with missing IDs. |
test_add_documents_documents_async |
Run |
test_add_documents_with_existing_ids_async |
Test that |
test_no_overrides_DO_NOT_OVERRIDE
¶
Test that no standard tests are overridden.
vectorstore
abstractmethod
¶
vectorstore() -> VectorStore
Get the vectorstore class to test.
The returned vectorstore should be empty.
get_embeddings
staticmethod
¶
get_embeddings() -> Embeddings
Get embeddings.
A pre-defined embeddings model that should be used for this test.
This currently uses DeterministicFakeEmbedding from langchain-core,
which uses numpy to generate random numbers based on a hash of the input text.
The resulting embeddings are not meaningful, but they are deterministic.
test_vectorstore_is_empty
¶
test_vectorstore_is_empty(vectorstore: VectorStore) -> None
Test that the vectorstore is empty.
Troubleshooting
If this test fails, check that the test class (i.e., sub class of
VectorStoreIntegrationTests) initializes an empty vector store in the
vectorestore fixture.
test_add_documents
¶
test_add_documents(vectorstore: VectorStore) -> None
Test adding documents into the VectorStore.
Troubleshooting
If this test fails, check that:
- We correctly initialize an empty vector store in the
vectorestorefixture. - Calling
.similarity_searchfor the topksimilar documents does not threshold by score. - We do not mutate the original document object when adding it to the vector store (e.g., by adding an ID).
test_vectorstore_still_empty
¶
test_vectorstore_still_empty(vectorstore: VectorStore) -> None
Test that the vectorstore is still empty.
This test should follow a test that adds documents.
This just verifies that the fixture is set up properly to be empty after each test.
Troubleshooting
If this test fails, check that the test class (i.e., sub class of
VectorStoreIntegrationTests) correctly clears the vector store in the
finally block.
test_deleting_documents
¶
test_deleting_documents(vectorstore: VectorStore) -> None
Test deleting documents from the vectorstore.
Troubleshooting
If this test fails, check that add_documents preserves identifiers
passed in through ids, and that delete correctly removes
documents.
test_deleting_bulk_documents
¶
test_deleting_bulk_documents(vectorstore: VectorStore) -> None
Test that we can delete several documents at once.
Troubleshooting
If this test fails, check that delete correctly removes multiple
documents when given a list of IDs.
test_delete_missing_content
¶
test_delete_missing_content(vectorstore: VectorStore) -> None
Deleting missing content should not raise an exception.
Troubleshooting
If this test fails, check that delete does not raise an exception
when deleting IDs that do not exist.
test_add_documents_with_ids_is_idempotent
¶
test_add_documents_with_ids_is_idempotent(vectorstore: VectorStore) -> None
Adding by ID should be idempotent.
Troubleshooting
If this test fails, check that adding the same document twice with the same IDs has the same effect as adding it once (i.e., it does not duplicate the documents).
test_add_documents_by_id_with_mutation
¶
test_add_documents_by_id_with_mutation(vectorstore: VectorStore) -> None
Test that we can overwrite by ID using add_documents.
Troubleshooting
If this test fails, check that when add_documents is called with an
ID that already exists in the vector store, the content is updated
rather than duplicated.
test_get_by_ids
¶
test_get_by_ids(vectorstore: VectorStore) -> None
Test get by IDs.
This test requires that get_by_ids be implemented on the vector store.
Troubleshooting
If this test fails, check that get_by_ids is implemented and returns
documents in the same order as the IDs passed in.
test_get_by_ids_missing
¶
test_get_by_ids_missing(vectorstore: VectorStore) -> None
Test get by IDs with missing IDs.
Troubleshooting
If this test fails, check that get_by_ids is implemented and does not
raise an exception when given IDs that do not exist.
test_add_documents_documents
¶
test_add_documents_documents(vectorstore: VectorStore) -> None
Run add_documents tests.
Troubleshooting
If this test fails, check that get_by_ids is implemented and returns
documents in the same order as the IDs passed in.
Check also that add_documents will correctly generate string IDs if
none are provided.
test_add_documents_with_existing_ids
¶
test_add_documents_with_existing_ids(vectorstore: VectorStore) -> None
Test that add_documents with existing IDs is idempotent.
Troubleshooting
If this test fails, check that get_by_ids is implemented and returns
documents in the same order as the IDs passed in.
This test also verifies that:
- IDs specified in the
Document.idfield are assigned when adding documents. - If some documents include IDs and others don't string IDs are generated for the latter.
test_vectorstore_is_empty_async
async
¶
test_vectorstore_is_empty_async(vectorstore: VectorStore) -> None
Test that the vectorstore is empty.
Troubleshooting
If this test fails, check that the test class (i.e., sub class of
VectorStoreIntegrationTests) initializes an empty vector store in the
vectorestore fixture.
test_add_documents_async
async
¶
test_add_documents_async(vectorstore: VectorStore) -> None
Test adding documents into the VectorStore.
Troubleshooting
If this test fails, check that:
- We correctly initialize an empty vector store in the
vectorestorefixture. - Calling
.asimilarity_searchfor the topksimilar documents does not threshold by score. - We do not mutate the original document object when adding it to the vector store (e.g., by adding an ID).
test_vectorstore_still_empty_async
async
¶
test_vectorstore_still_empty_async(vectorstore: VectorStore) -> None
Test that the vectorstore is still empty.
This test should follow a test that adds documents.
This just verifies that the fixture is set up properly to be empty after each test.
Troubleshooting
If this test fails, check that the test class (i.e., sub class of
VectorStoreIntegrationTests) correctly clears the vector store in the
finally block.
test_deleting_documents_async
async
¶
test_deleting_documents_async(vectorstore: VectorStore) -> None
Test deleting documents from the vectorstore.
Troubleshooting
If this test fails, check that aadd_documents preserves identifiers
passed in through ids, and that delete correctly removes
documents.
test_deleting_bulk_documents_async
async
¶
test_deleting_bulk_documents_async(vectorstore: VectorStore) -> None
Test that we can delete several documents at once.
Troubleshooting
If this test fails, check that adelete correctly removes multiple
documents when given a list of IDs.
test_delete_missing_content_async
async
¶
test_delete_missing_content_async(vectorstore: VectorStore) -> None
Deleting missing content should not raise an exception.
Troubleshooting
If this test fails, check that adelete does not raise an exception
when deleting IDs that do not exist.
test_add_documents_with_ids_is_idempotent_async
async
¶
test_add_documents_with_ids_is_idempotent_async(vectorstore: VectorStore) -> None
Adding by ID should be idempotent.
Troubleshooting
If this test fails, check that adding the same document twice with the same IDs has the same effect as adding it once (i.e., it does not duplicate the documents).
test_add_documents_by_id_with_mutation_async
async
¶
test_add_documents_by_id_with_mutation_async(vectorstore: VectorStore) -> None
Test that we can overwrite by ID using add_documents.
Troubleshooting
If this test fails, check that when aadd_documents is called with an
ID that already exists in the vector store, the content is updated
rather than duplicated.
test_get_by_ids_async
async
¶
test_get_by_ids_async(vectorstore: VectorStore) -> None
Test get by IDs.
This test requires that get_by_ids be implemented on the vector store.
Troubleshooting
If this test fails, check that get_by_ids is implemented and returns
documents in the same order as the IDs passed in.
test_get_by_ids_missing_async
async
¶
test_get_by_ids_missing_async(vectorstore: VectorStore) -> None
Test get by IDs with missing IDs.
Troubleshooting
If this test fails, check that get_by_ids is implemented and does not
raise an exception when given IDs that do not exist.
test_add_documents_documents_async
async
¶
test_add_documents_documents_async(vectorstore: VectorStore) -> None
Run add_documents tests.
Troubleshooting
If this test fails, check that get_by_ids is implemented and returns
documents in the same order as the IDs passed in.
Check also that aadd_documents will correctly generate string IDs if
none are provided.
test_add_documents_with_existing_ids_async
async
¶
test_add_documents_with_existing_ids_async(vectorstore: VectorStore) -> None
Test that add_documents with existing IDs is idempotent.
Troubleshooting
If this test fails, check that get_by_ids is implemented and returns
documents in the same order as the IDs passed in.
This test also verifies that:
- IDs specified in the
Document.idfield are assigned when adding documents. - If some documents include IDs and others don't string IDs are generated for the latter.