llamafile
Let’s load the llamafile Embeddings class.
Setup
First, the are 3 setup steps:
- Download a llamafile. In this notebook, we use
TinyLlama-1.1B-Chat-v1.0.Q5_K_M
but there are many others available on HuggingFace. - Make the llamafile executable.
- Start the llamafile in server mode.
You can run the following bash script to do all this:
%%bash
# llamafile setup
# Step 1: Download a llamafile. The download may take several minutes.
wget -nv -nc https://huggingface.co/jartine/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile
# Step 2: Make the llamafile executable. Note: if you're on Windows, just append '.exe' to the filename.
chmod +x TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile
# Step 3: Start llamafile server in background. All the server logs will be written to 'tinyllama.log'.
# Alternatively, you can just open a separate terminal outside this notebook and run:
# ./TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile --server --nobrowser --embedding
./TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile --server --nobrowser --embedding > tinyllama.log 2>&1 &
pid=$!
echo "${pid}" > .llamafile_pid # write the process pid to a file so we can terminate the server later
Embedding texts using LlamafileEmbeddings
Now, we can use the LlamafileEmbeddings
class to interact with the
llamafile server that’s currently serving our TinyLlama model at
http://localhost:8080.
from langchain_community.embeddings import LlamafileEmbeddings
API Reference:
embedder = LlamafileEmbeddings()
text = "This is a test document."
To generate embeddings, you can either query an invidivual text, or you can query a list of texts.
query_result = embedder.embed_query(text)
query_result[:5]
doc_result = embedder.embed_documents([text])
doc_result[0][:5]
%%bash
# cleanup: kill the llamafile server process
kill $(cat .llamafile_pid)
rm .llamafile_pid