Kinetica SqlAssist LLM Demo
This notebook demonstrates how to use Kinetica to transform natural language into SQL and simplify the process of data retrieval. This demo is intended to show the mechanics of creating and using a chain as opposed to the capabilities of the LLM.
Overviewβ
With the Kinetica LLM workflow you create an LLM context in the database
that provides information needed for infefencing that includes tables,
annotations, rules, and samples. Invoking
ChatKinetica.load_messages_from_context()
will retrieve the context
information from the database so that it can be used to create a chat
prompt.
The chat prompt consists of a SystemMessage
and pairs of
HumanMessage
/AIMessage
that contain the samples which are
question/SQL pairs. You can append pairs samples to this list but it is
not intended to facilitate a typical natural language conversation.
When you create a chain from the chat prompt and execute it, the
Kinetica LLM will generate SQL from the input. Optionally you can use
KineticaSqlOutputParser
to execute the SQL and return the result as a
dataframe.
Currently, 2 LLMβs are supported for SQL generation:
- Kinetica SQL-GPT: This LLM is based on OpenAI ChatGPT API.
- Kinetica SqlAssist: This LLM is purpose built to integrate with the Kinetica database and it can run in a secure customer premise.
For this demo we will be using SqlAssist. See the Kinetica Documentation site for more information.
Prerequisitesβ
To get started you will need a Kinetica DB instance. If you donβt have one you can obtain a free development instance.
You will need to install the following packagesβ¦
# Install Langchain community and core packages
%pip install --upgrade --quiet langchain-core langchain-community
# Install Kineitca DB connection package
%pip install --upgrade --quiet gpudb typeguard
# Install packages needed for this tutorial
%pip install --upgrade --quiet faker
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Database Connectionβ
You must set the database connection in the following environment
variables. If you are using a virtual environment you can set them in
the .env
file of the project: * KINETICA_URL
: Database connection
URL * KINETICA_USER
: Database user * KINETICA_PASSWD
: Secure
password.
If you can create an instance of KineticaChatLLM
then you are
successfully connected.
from langchain_community.chat_models.kinetica import ChatKinetica
kinetica_llm = ChatKinetica()
# Test table we will create
table_name = "demo.user_profiles"
# LLM Context we will create
kinetica_ctx = "demo.test_llm_ctx"
API Reference:
Create test dataβ
Before we can generate SQL we will need to create a Kinetica table and an LLM context that can inference the table.
Create some fake user profilesβ
We will use the faker
package to create a dataframe with 100 fake
profiles.
from typing import Generator
import pandas as pd
from faker import Faker
Faker.seed(5467)
faker = Faker(locale="en-US")
def profile_gen(count: int) -> Generator:
for id in range(0, count):
rec = dict(id=id, **faker.simple_profile())
rec["birthdate"] = pd.Timestamp(rec["birthdate"])
yield rec
load_df = pd.DataFrame.from_records(data=profile_gen(100), index="id")
load_df.head()
username | name | sex | address | birthdate | ||
---|---|---|---|---|---|---|
id | ||||||
0 | eduardo69 | Haley Beck | F | 59836 Carla Causeway Suite 939\nPort Eugene, I... | meltondenise@yahoo.com | 1997-09-09 |
1 | lbarrera | Joshua Stephens | M | 3108 Christina Forges\nPort Timothychester, KY... | erica80@hotmail.com | 1924-05-05 |
2 | bburton | Paula Kaiser | F | Unit 7405 Box 3052\nDPO AE 09858 | timothypotts@gmail.com | 1933-09-06 |
3 | melissa49 | Wendy Reese | F | 6408 Christopher Hill Apt. 459\nNew Benjamin, ... | dadams@gmail.com | 1988-07-28 |
4 | melissacarter | Manuel Rios | M | 2241 Bell Gardens Suite 723\nScottside, CA 38463 | williamayala@gmail.com | 1930-12-19 |
Create a Kinetica table from the Dataframeβ
from gpudb import GPUdbTable
gpudb_table = GPUdbTable.from_df(
load_df,
db=kinetica_llm.kdbc,
table_name=table_name,
clear_table=True,
load_data=True,
)
# See the Kinetica column types
gpudb_table.type_as_df()
name | type | properties | |
---|---|---|---|
0 | username | string | [char32] |
1 | name | string | [char32] |
2 | sex | string | [char1] |
3 | address | string | [char64] |
4 | string | [char32] | |
5 | birthdate | long | [timestamp] |
Create the LLM contextβ
You can create an LLM Context using the Kinetica Workbench UI or you can
manually create it with the CREATE OR REPLACE CONTEXT
syntax.
Here we create a context from the SQL syntax referencing the table we created.
# create an LLM context for the table.
from gpudb import GPUdbException
sql = f"""
CREATE OR REPLACE CONTEXT {kinetica_ctx}
(
TABLE = demo.test_profiles
COMMENT = 'Contains user profiles.'
),
(
SAMPLES = (
'How many male users are there?' =
'select count(1) as num_users
from demo.test_profiles
where sex = ''M'';')
)
"""
def _check_error(response: dict) -> None:
status = response["status_info"]["status"]
if status != "OK":
message = response["status_info"]["message"]
raise GPUdbException("[%s]: %s" % (status, message))
response = kinetica_llm.kdbc.execute_sql(sql)
_check_error(response)
response["status_info"]
{'status': 'OK',
'message': '',
'data_type': 'execute_sql_response',
'response_time': 0.0148}
Use Langchain for inferencingβ
In the example below we will create a chain from the previously created table and LLM context. This chain will generate SQL and return the resulting data as a dataframe.
Load the chat prompt from the Kinetica DBβ
The load_messages_from_context()
function will retrieve a context from
the DB and convert it into a list of chat messages that we use to create
a ChatPromptTemplate
.
from langchain_core.prompts import ChatPromptTemplate
# load the context from the database
ctx_messages = kinetica_llm.load_messages_from_context(kinetica_ctx)
# Add the input prompt. This is where input question will be substituted.
ctx_messages.append(("human", "{input}"))
# Create the prompt template.
prompt_template = ChatPromptTemplate.from_messages(ctx_messages)
prompt_template.pretty_print()
API Reference:
================================ System Message ================================
CREATE TABLE demo.test_profiles AS
(
username VARCHAR (32) NOT NULL,
name VARCHAR (32) NOT NULL,
sex VARCHAR (1) NOT NULL,
address VARCHAR (64) NOT NULL,
mail VARCHAR (32) NOT NULL,
birthdate TIMESTAMP NOT NULL
);
COMMENT ON TABLE demo.test_profiles IS 'Contains user profiles.';
================================ Human Message =================================
How many male users are there?
================================== Ai Message ==================================
select count(1) as num_users
from demo.test_profiles
where sex = 'M';
================================ Human Message =================================
{input}
Create the chainβ
The last element of this chain is KineticaSqlOutputParser
that will
execute the SQL and return a dataframe. This is optional and if we left
it out then only SQL would be returned.
from langchain_community.chat_models.kinetica import (
KineticaSqlOutputParser,
KineticaSqlResponse,
)
chain = prompt_template | kinetica_llm | KineticaSqlOutputParser(kdbc=kinetica_llm.kdbc)
API Reference:
Generate the SQLβ
The chain we created will take a question as input and return a
KineticaSqlResponse
containing the generated SQL and data. The
question must be relevant to the to LLM context we used to create the
prompt.
# Here you must ask a question relevant to the LLM context provided in the prompt template.
response: KineticaSqlResponse = chain.invoke(
{"input": "What are the female users ordered by username?"}
)
print(f"SQL: {response.sql}")
response.dataframe.head()
SQL: SELECT username, name
FROM demo.test_profiles
WHERE sex = 'F'
ORDER BY username;
username | name | |
---|---|---|
0 | alexander40 | Tina Ramirez |
1 | bburton | Paula Kaiser |
2 | brian12 | Stefanie Williams |
3 | brownanna | Jennifer Rowe |
4 | carl19 | Amanda Potts |