LLMs
Features (natively supported)
All LLMs implement the Runnable interface, which comes with default implementations of all methods, ie. ainvoke
, batch
, abatch
, stream
, astream
. This gives all LLMs basic support for async, streaming and batch, which by default is implemented as below:
- Async support defaults to calling the respective sync method in asyncio's default thread pool executor. This lets other async functions in your application make progress while the LLM is being executed, by moving this call to a background thread.
- Streaming support defaults to returning an
Iterator
(orAsyncIterator
in the case of async streaming) of a single value, the final result returned by the underlying LLM provider. This obviously doesn't give you token-by-token streaming, which requires native support from the LLM provider, but ensures your code that expects an iterator of tokens can work for any of our LLM integrations. - Batch support defaults to calling the underlying LLM in parallel for each input by making use of a thread pool executor (in the sync batch case) or
asyncio.gather
(in the async batch case). The concurrency can be controlled with themax_concurrency
key inRunnableConfig
.
Each LLM integration can optionally provide native implementations for async, streaming or batch, which, for providers that support it, can be more efficient. The table shows, for each integration, which features have been implemented with native support.
Model | Invoke | Async invoke | Stream | Async stream | Batch | Async batch | Tool calling |
---|---|---|---|---|---|---|---|
AI21 | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
AlephAlpha | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
AmazonAPIGateway | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
Anthropic | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
Anyscale | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
Aphrodite | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ |
Arcee | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
Aviary | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
AzureMLOnlineEndpoint | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ |
AzureOpenAI | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
BaichuanLLM | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
Banana | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
Baseten | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
Beam | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
Bedrock | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
CTransformers | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
CTranslate2 | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ |
CerebriumAI | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
ChatGLM | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
Clarifai | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
Cohere | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
Databricks | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
DeepInfra | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
DeepSparse | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
EdenAI | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
Fireworks | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
ForefrontAI | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
Friendli | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
GPT4All | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
GigaChat | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
GooglePalm | ✅ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ |
GooseAI | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
GradientLLM | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ |
HuggingFaceEndpoint | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
HuggingFaceHub | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
HuggingFacePipeline | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ |
HuggingFaceTextGenInference | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
HumanInputLLM | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
IpexLLM | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
JavelinAIGateway | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
KoboldApiLLM | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
Konko | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
LlamaCpp | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ |
Llamafile | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ |
MLXPipeline | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ |
ManifestWrapper | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
Minimax | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
Mlflow | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
MlflowAIGateway | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
Modal | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
MosaicML | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
NIBittensorLLM | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
NLPCloud | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
Nebula | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
OCIGenAI | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
OCIModelDeploymentTGI | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
OCIModelDeploymentVLLM | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
OctoAIEndpoint | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
Ollama | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
OpaquePrompts | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
OpenAI | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
OpenLLM | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
OpenLM | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
PaiEasEndpoint | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ |
Petals | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
PipelineAI | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
Predibase | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
PredictionGuard | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
PromptLayerOpenAI | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
QianfanLLMEndpoint | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
RWKV | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
Replicate | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ |
SagemakerEndpoint | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
SelfHostedHuggingFaceLLM | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
SelfHostedPipeline | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
SparkLLM | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ |
StochasticAI | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
TextGen | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
TitanTakeoff | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ |
TitanTakeoffPro | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ |
Together | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
Tongyi | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
VLLM | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ |
VLLMOpenAI | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
VertexAI | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ |
VertexAIModelGarden | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ |
VolcEngineMaasLLM | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ |
WatsonxLLM | ✅ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ |
WeightOnlyQuantPipeline | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
Writer | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
Xinference | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
YandexGPT | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
Yuan2 | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |