Write Your Own Chat Data With AWEL
In this guide, we will show you how to write your own Chat Data with AWEL, just
link the scene of Chat Data in DB-GPT.
This guide is a little bit advanced, may take you some time to understand it. If you have any questions, please feel free to ask in the DB-GPT issues.
Introduction
Chat Data is chat with your database. Its goal is to interact with the database
through natural language, it includes the following steps:
- Build knowledge base: parse the database schema and other information to build a knowledge base.
- Chat with database: chat with the database through natural language.
There are some steps of Chat with database:
- Retrieve relevant information: retrieve the relevant information from the database according to the user's query.
- Generate response: pass relevant information and user query to the LLM, and then generate a response which includes some SQL and other information.
- Execute SQL: execute the SQL to get the final result.
- Visualize result: visualize the result and return it to the user.
In this guide, we mainly focus on step 1, 2, and 3.
Install Dependencies
First, you need to install the dbgpt library.
pip install "dbgpt[rag, agent, client, simple_framework]>=0.7.0" "dbgpt_ext>=0.7.0" -U
pip install openai
Build Knowledge Base
Prepare Embedding Model
First, you need to prepare the embedding model, you can provide an embedding model according Prepare Embedding Model.
Here we use OpenAI's embedding model.
from dbgpt.rag.embedding import DefaultEmbeddingFactory
embeddings = DefaultEmbeddingFactory.openai()
Prepare Database
Here we create a simple SQLite database.
from dbgpt_ext.datasource.rdbms.conn_sqlite import SQLiteTempConnector
db_conn = SQLiteTempConnector.create_temporary_db()
db_conn.create_temp_tables(
{
"user": {
"columns": {
"id": "INTEGER PRIMARY KEY",
"name": "TEXT",
"age": "INTEGER",
},
"data": [
(1, "Tom", 10),
(2, "Jerry", 16),
(3, "Jack", 18),
(4, "Alice", 20),
(5, "Bob", 22),
],
}
}
)
Store Database Schema To Vector Store
import asyncio
import shutil
from dbgpt.core.awel import DAG, InputOperator
from dbgpt_ext.rag import ChunkParameters
from dbgpt_ext.rag.operators.db_schema import DBSchemaAssemblerOperator
from dbgpt_ext.storage.vector_store.chroma_store import ChromaVectorConfig, ChromaStore
# Delete old vector store directory(/tmp/awel_with_data_vector_store)
shutil.rmtree("/tmp/awel_with_data_vector_store", ignore_errors=True)
vector_store = ChromaStore(
ChromaVectorConfig(
persist_path="/tmp/tmp_ltm_vector_store",
),
name="ltm_vector_store",
embedding_fn=embeddings,
)
with DAG("load_schema_dag") as load_schema_dag:
input_task = InputOperator.dummy_input()
# Load database schema to vector store
assembler_task = DBSchemaAssemblerOperator(
connector=db_conn,
table_vector_store_connector=vector_store,
chunk_parameters=ChunkParameters(chunk_strategy="CHUNK_BY_SIZE")
)
input_task >> assembler_task
chunks = asyncio.run(assembler_task.call())
print(chunks)