ChatOllama
Ollama allows you to run open-source large language models, such as Llama 2, locally.
Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile.
It optimizes setup and configuration details, including GPU usage.
For a complete list of supported models and model variants, see the Ollama model library.
Setupβ
First, follow these instructions to set up and run a local Ollama instance:
- Download and install Ollama onto the available supported platforms (including Windows Subsystem for Linux)
- Fetch available LLM model via
ollama pull <name-of-model>
- View a list of available models via the model library
- e.g.,
ollama pull llama3
- This will download the default tagged version of the model. Typically, the default points to the latest, smallest sized-parameter model.
On Mac, the models will be download to
~/.ollama/models
On Linux (or WSL), the models will be stored at
/usr/share/ollama/.ollama/models
- Specify the exact version of the model of interest as such
ollama pull vicuna:13b-v1.5-16k-q4_0
(View the various tags for theVicuna
model in this instance) - To view all pulled models, use
ollama list
- To chat directly with a model from the command line, use
ollama run <name-of-model>
- View the Ollama documentation
for more commands. Run
ollama help
in the terminal to see available commands too.
Usageβ
You can see a full list of supported parameters on the API reference page.
If you are using a LLaMA chat
model (e.g., ollama pull llama3
) then
you can use the ChatOllama
interface.
This includes special tokens for system message and user input.
Interacting with Modelsβ
Here are a few ways to interact with pulled local models
directly in the terminal:β
- All of your local models are automatically served on
localhost:11434
- Run
ollama run <name-of-model>
to start interacting via the command line directly
via an APIβ
Send an application/json
request to the API endpoint of Ollama to
interact.
curl http://localhost:11434/api/generate -d '{
"model": "llama3",
"prompt":"Why is the sky blue?"
}'
See the Ollama API documentation for all endpoints.
via LangChainβ
See a typical basic example of using Ollama via the ChatOllama
chat
model in your LangChain application.
# LangChain supports many other chat models. Here, we're using Ollama
from langchain_community.chat_models import ChatOllama
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
# supports many more optional parameters. Hover on your `ChatOllama(...)`
# class to view the latest available supported parameters
llm = ChatOllama(model="llama3")
prompt = ChatPromptTemplate.from_template("Tell me a short joke about {topic}")
# using LangChain Expressive Language chain syntax
# learn more about the LCEL on
# /docs/expression_language/why
chain = prompt | llm | StrOutputParser()
# for brevity, response is printed in terminal
# You can use LangServe to deploy your application for
# production
print(chain.invoke({"topic": "Space travel"}))
API Reference:
Why did the astronaut break up with his girlfriend?
Because he needed space!
LCEL chains, out of the box, provide extra functionalities, such as streaming of responses, and async support
topic = {"topic": "Space travel"}
for chunks in chain.stream(topic):
print(chunks)
Why
did
the
astronaut
break
up
with
his
girlfriend
before
going
to
Mars
?
Because
he
needed
space
!
For streaming async support, hereβs an example - all possible via the single chain created above.
topic = {"topic": "Space travel"}
async for chunks in chain.astream(topic):
print(chunks)
Take a look at the LangChain Expressive Language (LCEL) Interface for the other available interfaces for use when a chain is created.
Building from sourceβ
For up to date instructions on building from source, check the Ollama documentation on Building from Source
Extractionβ
Use the latest version of Ollama and supply the
format
flag. The format
flag will force the model to produce the response in
JSON.
Note: You can also try out the experimental OllamaFunctions wrapper for convenience.
from langchain_community.chat_models import ChatOllama
llm = ChatOllama(model="llama3", format="json", temperature=0)
API Reference:
from langchain_core.messages import HumanMessage
messages = [
HumanMessage(
content="What color is the sky at different times of the day? Respond using JSON"
)
]
chat_model_response = llm.invoke(messages)
print(chat_model_response)
API Reference:
content='{ "morning": "blue", "noon": "clear blue", "afternoon": "hazy yellow", "evening": "orange-red" }\n\n \n\n\n\n\n\n \n\n\n\n\n\n \n\n\n\n\n\n \n\n\n\n\n\n \n\n\n\n\n\n \n\n\n\n\n\n \n\n\n\n\n\n \n\n\n\n\n\n \n\n\n\n\n\n \n\n\n\n\n\n ' id='run-e893700f-e2d0-4df8-ad86-17525dcee318-0'
import json
from langchain_community.chat_models import ChatOllama
from langchain_core.messages import HumanMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
json_schema = {
"title": "Person",
"description": "Identifying information about a person.",
"type": "object",
"properties": {
"name": {"title": "Name", "description": "The person's name", "type": "string"},
"age": {"title": "Age", "description": "The person's age", "type": "integer"},
"fav_food": {
"title": "Fav Food",
"description": "The person's favorite food",
"type": "string",
},
},
"required": ["name", "age"],
}
llm = ChatOllama(model="llama2")
messages = [
HumanMessage(
content="Please tell me about a person using the following JSON schema:"
),
HumanMessage(content="{dumps}"),
HumanMessage(
content="Now, considering the schema, tell me about a person named John who is 35 years old and loves pizza."
),
]
prompt = ChatPromptTemplate.from_messages(messages)
dumps = json.dumps(json_schema, indent=2)
chain = prompt | llm | StrOutputParser()
print(chain.invoke({"dumps": dumps}))
API Reference:
Name: John
Age: 35
Likes: Pizza
Multi-modalβ
Ollama has support for multi-modal LLMs, such as bakllava and llava.
Browse the full set of versions for models with tags
, such as
Llava.
Download the desired LLM via ollama pull bakllava
Be sure to update Ollama so that you have the most recent version to support multi-modal.
Check out the typical example of how to use ChatOllama multi-modal support below:
pip install --upgrade --quiet pillow
Note: you may need to restart the kernel to use updated packages.
import base64
from io import BytesIO
from IPython.display import HTML, display
from PIL import Image
def convert_to_base64(pil_image):
"""
Convert PIL images to Base64 encoded strings
:param pil_image: PIL image
:return: Re-sized Base64 string
"""
buffered = BytesIO()
pil_image.save(buffered, format="JPEG") # You can change the format if needed
img_str = base64.b64encode(buffered.getvalue()).decode("utf-8")
return img_str
def plt_img_base64(img_base64):
"""
Disply base64 encoded string as image
:param img_base64: Base64 string
"""
# Create an HTML img tag with the base64 string as the source
image_html = f'<img src="data:image/jpeg;base64,{img_base64}" />'
# Display the image by rendering the HTML
display(HTML(image_html))
file_path = "../../../static/img/ollama_example_img.jpg"
pil_image = Image.open(file_path)
image_b64 = convert_to_base64(pil_image)
plt_img_base64(image_b64)
from langchain_community.chat_models import ChatOllama
from langchain_core.messages import HumanMessage
llm = ChatOllama(model="bakllava", temperature=0)
def prompt_func(data):
text = data["text"]
image = data["image"]
image_part = {
"type": "image_url",
"image_url": f"data:image/jpeg;base64,{image}",
}
content_parts = []
text_part = {"type": "text", "text": text}
content_parts.append(image_part)
content_parts.append(text_part)
return [HumanMessage(content=content_parts)]
from langchain_core.output_parsers import StrOutputParser
chain = prompt_func | llm | StrOutputParser()
query_chain = chain.invoke(
{"text": "What is the Dollar-based gross retention rate?", "image": image_b64}
)
print(query_chain)
API Reference:
90%