FastEmbed by Qdrant
FastEmbed from Qdrant is a lightweight, fast, Python library built for embedding generation.
- Quantized model weights
- ONNX Runtime, no PyTorch dependency
- CPU-first design
- Data-parallelism for encoding of large datasets.
Dependencies
To use FastEmbed with LangChain, install the fastembed
Python package.
%pip install --upgrade --quiet fastembed
Imports
from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
API Reference:
Instantiating FastEmbed
Parameters
model_name: str
(default: “BAAI/bge-small-en-v1.5”) > Name of the FastEmbedding model to use. You can find the list of supported models here.max_length: int
(default: 512) > The maximum number of tokens. Unknown behavior for values > 512.cache_dir: Optional[str]
> The path to the cache directory. Defaults tolocal_cache
in the parent directory.threads: Optional[int]
> The number of threads a single onnxruntime session can use. Defaults to None.doc_embed_type: Literal["default", "passage"]
(default: “default”) > “default”: Uses FastEmbed’s default embedding method.“passage”: Prefixes the text with “passage” before embedding.
embeddings = FastEmbedEmbeddings()
Usage
Generating document embeddings
document_embeddings = embeddings.embed_documents(
["This is a document", "This is some other document"]
)
Generating query embeddings
query_embeddings = embeddings.embed_query("This is a query")