Ollama
Ollama allows you to run open-source large language models, such as Llama 2, locally.
Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile.
It optimizes setup and configuration details, including GPU usage.
For a complete list of supported models and model variants, see the Ollama model library.
Setup
First, follow these instructions to set up and run a local Ollama instance:
- Download and install Ollama onto the available supported platforms (including Windows Subsystem for Linux)
- Fetch available LLM model via ollama pull <name-of-model>- View a list of available models via the model library
- e.g., ollama pull llama3
 
- This will download the default tagged version of the model. Typically, the default points to the latest, smallest sized-parameter model.
On Mac, the models will be download to
~/.ollama/modelsOn Linux (or WSL), the models will be stored at
/usr/share/ollama/.ollama/models
- Specify the exact version of the model of interest as such
ollama pull vicuna:13b-v1.5-16k-q4_0(View the various tags for theVicunamodel in this instance)
- To view all pulled models, use ollama list
- To chat directly with a model from the command line, use
ollama run <name-of-model>
- View the Ollama documentation
for more commands. Run ollama helpin the terminal to see available commands too.
Usage
You can see a full list of supported parameters on the API reference page.
If you are using a LLaMA chat model (e.g., ollama pull llama3) then
you can use the ChatOllama interface.
This includes special tokens for system message and user input.
Interacting with Models
Here are a few ways to interact with pulled local models
directly in the terminal:
- All of your local models are automatically served on
localhost:11434
- Run ollama run <name-of-model>to start interacting via the command line directly
via an API
Send an application/json request to the API endpoint of Ollama to
interact.
curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt":"Why is the sky blue?"
}'
See the Ollama API documentation for all endpoints.
via LangChain
See a typical basic example of using Ollama chat model in your LangChain application.
from langchain_community.llms import Ollama
llm = Ollama(model="llama3")
llm.invoke("Tell me a joke")
API Reference:
"Here's one:\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!\n\nHope that made you smile! Do you want to hear another one?"
To stream tokens, use the .stream(...) method:
query = "Tell me a joke"
for chunks in llm.stream(query):
    print(chunks)
S
ure
,
 here
'
s
 one
:
Why
 don
'
t
 scient
ists
 trust
 atoms
?
B
ecause
 they
 make
 up
 everything
!
I
 hope
 you
 found
 that
 am
using
!
 Do
 you
 want
 to
 hear
 another
 one
?
To learn more about the LangChain Expressive Language and the available methods on an LLM, see the LCEL Interface
Multi-modal
Ollama has support for multi-modal LLMs, such as bakllava and llava.
ollama pull bakllava
Be sure to update Ollama so that you have the most recent version to support multi-modal.
from langchain_community.llms import Ollama
bakllava = Ollama(model="bakllava")
API Reference:
import base64
from io import BytesIO
from IPython.display import HTML, display
from PIL import Image
def convert_to_base64(pil_image):
    """
    Convert PIL images to Base64 encoded strings
    :param pil_image: PIL image
    :return: Re-sized Base64 string
    """
    buffered = BytesIO()
    pil_image.save(buffered, format="JPEG")  # You can change the format if needed
    img_str = base64.b64encode(buffered.getvalue()).decode("utf-8")
    return img_str
def plt_img_base64(img_base64):
    """
    Display base64 encoded string as image
    :param img_base64:  Base64 string
    """
    # Create an HTML img tag with the base64 string as the source
    image_html = f'<img src="data:image/jpeg;base64,{img_base64}" />'
    # Display the image by rendering the HTML
    display(HTML(image_html))
file_path = "../../../static/img/ollama_example_img.jpg"
pil_image = Image.open(file_path)
image_b64 = convert_to_base64(pil_image)
plt_img_base64(image_b64)
llm_with_image_context = bakllava.bind(images=[image_b64])
llm_with_image_context.invoke("What is the dollar based gross retention rate:")
'90%'