Ollama
Ollama allows you to run open-source large language models, such as Llama 2, locally.
Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile.
It optimizes setup and configuration details, including GPU usage.
For a complete list of supported models and model variants, see the Ollama model library.
Setup
First, follow these instructions to set up and run a local Ollama instance:
- Download and install Ollama onto the available supported platforms (including Windows Subsystem for Linux)
- Fetch available LLM model via
ollama pull <name-of-model>
- View a list of available models via the model library
- e.g.,
ollama pull llama3
- This will download the default tagged version of the model. Typically, the default points to the latest, smallest sized-parameter model.
On Mac, the models will be download to
~/.ollama/models
On Linux (or WSL), the models will be stored at
/usr/share/ollama/.ollama/models
- Specify the exact version of the model of interest as such
ollama pull vicuna:13b-v1.5-16k-q4_0
(View the various tags for theVicuna
model in this instance) - To view all pulled models, use
ollama list
- To chat directly with a model from the command line, use
ollama run <name-of-model>
- View the Ollama documentation
for more commands. Run
ollama help
in the terminal to see available commands too.
Usage
You can see a full list of supported parameters on the API reference page.
If you are using a LLaMA chat
model (e.g., ollama pull llama3
) then
you can use the ChatOllama
interface.
This includes special tokens for system message and user input.
Interacting with Models
Here are a few ways to interact with pulled local models
directly in the terminal:
- All of your local models are automatically served on
localhost:11434
- Run
ollama run <name-of-model>
to start interacting via the command line directly
via an API
Send an application/json
request to the API endpoint of Ollama to
interact.
curl http://localhost:11434/api/generate -d '{
"model": "llama3",
"prompt":"Why is the sky blue?"
}'
See the Ollama API documentation for all endpoints.
via LangChain
See a typical basic example of using Ollama chat model in your LangChain application.
from langchain_community.llms import Ollama
llm = Ollama(model="llama3")
llm.invoke("Tell me a joke")
API Reference:
"Here's one:\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!\n\nHope that made you smile! Do you want to hear another one?"
To stream tokens, use the .stream(...)
method:
query = "Tell me a joke"
for chunks in llm.stream(query):
print(chunks)
S
ure
,
here
'
s
one
:
Why
don
'
t
scient
ists
trust
atoms
?
B
ecause
they
make
up
everything
!
I
hope
you
found
that
am
using
!
Do
you
want
to
hear
another
one
?
To learn more about the LangChain Expressive Language and the available methods on an LLM, see the LCEL Interface
Multi-modal
Ollama has support for multi-modal LLMs, such as bakllava and llava.
ollama pull bakllava
Be sure to update Ollama so that you have the most recent version to support multi-modal.
from langchain_community.llms import Ollama
bakllava = Ollama(model="bakllava")
API Reference:
import base64
from io import BytesIO
from IPython.display import HTML, display
from PIL import Image
def convert_to_base64(pil_image):
"""
Convert PIL images to Base64 encoded strings
:param pil_image: PIL image
:return: Re-sized Base64 string
"""
buffered = BytesIO()
pil_image.save(buffered, format="JPEG") # You can change the format if needed
img_str = base64.b64encode(buffered.getvalue()).decode("utf-8")
return img_str
def plt_img_base64(img_base64):
"""
Display base64 encoded string as image
:param img_base64: Base64 string
"""
# Create an HTML img tag with the base64 string as the source
image_html = f'<img src="data:image/jpeg;base64,{img_base64}" />'
# Display the image by rendering the HTML
display(HTML(image_html))
file_path = "../../../static/img/ollama_example_img.jpg"
pil_image = Image.open(file_path)
image_b64 = convert_to_base64(pil_image)
plt_img_base64(image_b64)
llm_with_image_context = bakllava.bind(images=[image_b64])
llm_with_image_context.invoke("What is the dollar based gross retention rate:")
'90%'