DeepInfra
DeepInfra is a serverless inference as a service that provides access to a variety of LLMs and embeddings models. This notebook goes over how to use LangChain with DeepInfra for chat models.
Set the Environment API Keyโ
Make sure to get your API key from DeepInfra. You have to Login and get a new token.
You are given a 1 hour free of serverless GPU compute to test different
models. (see here) You
can print your token with deepctl auth token
# get a new token: https://deepinfra.com/login?from=%2Fdash
from getpass import getpass
DEEPINFRA_API_TOKEN = getpass()
ยทยทยทยทยทยทยทยท
import os
# or pass deepinfra_api_token parameter to the ChatDeepInfra constructor
os.environ["DEEPINFRA_API_TOKEN"] = DEEPINFRA_API_TOKEN
from langchain_community.chat_models import ChatDeepInfra
from langchain_core.messages import HumanMessage
API Reference:
chat = ChatDeepInfra(model="meta-llama/Llama-2-7b-chat-hf")
messages = [
HumanMessage(
content="Translate this sentence from English to French. I love programming."
)
]
chat(messages)
AIMessage(content=" J'aime la programmation.", additional_kwargs={}, example=False)
ChatDeepInfra
also supports async and streaming functionality:โ
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
API Reference:
await chat.agenerate([messages])
LLMResult(generations=[[ChatGeneration(text=" J'aime programmer.", generation_info=None, message=AIMessage(content=" J'aime programmer.", additional_kwargs={}, example=False))]], llm_output={}, run=[RunInfo(run_id=UUID('8cc8fb68-1c35-439c-96a0-695036a93652'))])
chat = ChatDeepInfra(
streaming=True,
verbose=True,
callbacks=[StreamingStdOutCallbackHandler()],
)
chat(messages)
J'aime la programmation.
AIMessage(content=" J'aime la programmation.", additional_kwargs={}, example=False)