Write Your Own Chat Data
With AWEL
In this guide, we will show you how to write your own Chat Data
with AWEL
, just
link the scene of Chat Data
in DB-GPT.
This guide is a little bit advanced, may take you some time to understand it. If you have any questions, please feel free to ask in the DB-GPT issues.
Introduction
Chat Data
is chat with your database. Its goal is to interact with the database
through natural language, it includes the following steps:
- Build knowledge base: parse the database schema and other information to build a knowledge base.
- Chat with database: chat with the database through natural language.
There are some steps of Chat with database:
- Retrieve relevant information: retrieve the relevant information from the database according to the user's query.
- Generate response: pass relevant information and user query to the LLM, and then generate a response which includes some SQL and other information.
- Execute SQL: execute the SQL to get the final result.
- Visualize result: visualize the result and return it to the user.
In this guide, we mainly focus on step 1, 2, and 3.
Install Dependencies
First, you need to install the dbgpt
library.
pip install "dbgpt[rag]>=0.5.3rc0" -U
Build Knowledge Base
Prepare Embedding Model
First, you need to prepare the embedding model, you can provide an embedding model according Prepare Embedding Model.
Here we use OpenAI's embedding model.
from dbgpt.rag.embedding import DefaultEmbeddingFactory
embeddings = DefaultEmbeddingFactory.openai()
Prepare Database
Here we create a simple SQLite database.
from dbgpt.datasource.rdbms.conn_sqlite import SQLiteTempConnector
db_conn = SQLiteTempConnector.create_temporary_db()
db_conn.create_temp_tables(
{
"user": {
"columns": {
"id": "INTEGER PRIMARY KEY",
"name": "TEXT",
"age": "INTEGER",
},
"data": [
(1, "Tom", 10),
(2, "Jerry", 16),
(3, "Jack", 18),
(4, "Alice", 20),
(5, "Bob", 22),
],
}
}
)
Store Database Schema To Vector Store
import asyncio
import shutil
from dbgpt.core.awel import DAG, InputOperator
from dbgpt.rag import ChunkParameters
from dbgpt.rag.operators import DBSchemaAssemblerOperator
from dbgpt.storage.vector_store.chroma_store import ChromaVectorConfig, ChromaStore
# Delete old vector store directory(/tmp/awel_with_data_vector_store)
shutil.rmtree("/tmp/awel_with_data_vector_store", ignore_errors=True)
vector_store = ChromaStore(
ChromaVectorConfig(
persist_path="/tmp/tmp_ltm_vector_store",
name="ltm_vector_store",
embedding_fn=embeddings,
)
)
with DAG("load_schema_dag") as load_schema_dag:
input_task = InputOperator.dummy_input()
# Load database schema to vector store
assembler_task = DBSchemaAssemblerOperator(
connector=db_conn,
index_store=vector_store,
chunk_parameters=ChunkParameters(chunk_strategy="CHUNK_BY_SIZE")
)
input_task >> assembler_task
chunks = asyncio.run(assembler_task.call())
print(chunks)
Retrieve Database Schema From Vector Store
from dbgpt.core.awel import InputSource
from dbgpt.rag.operators import DBSchemaRetrieverOperator
with DAG("retrieve_schema_dag") as retrieve_schema_dag:
input_task = InputOperator(input_source=InputSource.from_callable())
# Retrieve database schema from vector store
retriever_task = DBSchemaRetrieverOperator(
top_k=1,
index_store=vector_store,
)
input_task >> retriever_task
chunks = asyncio.run(retriever_task.call("Query the name and age of users younger than 18 years old"))
print("Retrieved schema:\n", chunks)
Chat With Database
Prepare LLM
We use LLM to generate SQL queries. Here we use OpenAI's LLM model, you can replace it with other models according to Prepare LLM.
from dbgpt.model.proxy import OpenAILLMClient
llm_client = OpenAILLMClient()