OllamaLLM

True: Enables reasoning mode. The model's reasoning process will be captured and returned separately in the additional_kwargs of the response message, under reasoning_content. The main response content will not include the reasoning tags.
False: Disables reasoning mode. The model will not perform any reasoning, and the response will not include any reasoning content.
None (Default): The model will use its default reasoning behavior. If the model performs reasoning, the <think> and </think> tags will be present directly within the main response content.

Setup:

Install langchain-ollama and install/run the Ollama server locally:

pip install -U langchain-ollama
# Visit https://ollama.com/download to download and install Ollama
# (Linux users): start the server with `ollama serve`

Download a model to use:

ollama pull llama3.1

Key init args — generation params: model: str Name of the Ollama model to use (e.g. 'llama4'). temperature: float | None Sampling temperature. Higher values make output more creative. num_predict: int | None Maximum number of tokens to predict. top_k: int | None Limits the next token selection to the K most probable tokens. top_p: float | None Nucleus sampling parameter. Higher values lead to more diverse text. mirostat: int | None Enable Mirostat sampling for controlling perplexity. seed: int | None Random number seed for generation reproducibility.

Key init args — client params: base_url: Base URL where Ollama server is hosted. keep_alive: How long the model stays loaded into memory. format: Specify the format of the output.

See full list of supported init args and their descriptions in the params section.

Instantiate:

from langchain_ollama import OllamaLLM

model = OllamaLLM(
    model="llama3.1",
    temperature=0.7,
    num_predict=256,
    # base_url="http://localhost:11434",
    # other params...
)

Invoke:

input_text = "The meaning of life is "
response = model.invoke(input_text)
print(response)

"a philosophical question that has been contemplated by humans for
centuries..."

Stream:

for chunk in model.stream(input_text):
    print(chunk, end="")

a philosophical question that has been contemplated by humans for
centuries...

Async:

response = await model.ainvoke(input_text)

# stream:
# async for chunk in model.astream(input_text):
#     print(chunk, end="")

LangChain Assistant

Menu

Bases

Attributes

Inherited fromBaseLLM(langchain_core)

Attributes

Methods

Inherited fromBaseLanguageModel(langchain_core)

Attributes

Methods

Inherited fromRunnableSerializable(langchain_core)

Attributes

Methods

Inherited fromSerializable(langchain_core)

Attributes

Methods

Inherited fromRunnable(langchain_core)

Attributes

Methods

Menu

OllamaLLM

Bases

Used in Docs

Attributes

Inherited fromBaseLLM(langchain_core)

Attributes

Methods

Inherited fromBaseLanguageModel(langchain_core)

Attributes

Methods

Inherited fromRunnableSerializable(langchain_core)

Attributes

Methods

Inherited fromSerializable(langchain_core)

Attributes

Methods

Inherited fromRunnable(langchain_core)

Attributes

Methods