RAG With AWEL
In this example, we will show how to use the AWEL library to create a RAG program.
Now, let us create a python file first_rag_with_awel.py
.
In this example, we will load your knowledge from a URL and store it in a vector store.
Install Dependencies
First, you need to install the dbgpt
library.
pip install "dbgpt[rag]>=0.5.2"
Prepare Embedding Model
To store the knowledge in a vector store, we need an embedding model, DB-GPT supports a lot of embedding models, here are some of them:
- Open AI(API)
- text2vec(local)
- Embedding API Server(cluster)
from dbgpt.rag.embedding import DefaultEmbeddingFactory
embeddings = DefaultEmbeddingFactory.openai()
from dbgpt.rag.embedding import DefaultEmbeddingFactory
embeddings = DefaultEmbeddingFactory.default("/data/models/text2vec-large-chinese")
If you have deployed DB-GPT cluster and API server , you can connect to the API server to get the embeddings.
from dbgpt.rag.embedding import DefaultEmbeddingFactory
embeddings = DefaultEmbeddingFactory.remote(
api_url="http://localhost:8100/api/v1/embeddings",
api_key="{your_api_key}",
model_name="text2vec"
)
Load Knowledge And Store In Vector Store
Then we can create a DAG which loads the knowledge from a URL and stores it in a vector store.
import asyncio
import shutil
from dbgpt.core.awel import DAG
from dbgpt.rag import ChunkParameters
from dbgpt.rag.knowledge import KnowledgeType
from dbgpt.rag.operators import EmbeddingAssemblerOperator, KnowledgeOperator
from dbgpt.storage.vector_store.chroma_store import ChromaStore, ChromaVectorConfig
# Delete old vector store directory(/tmp/awel_rag_test_vector_store)
shutil.rmtree("/tmp/awel_rag_test_vector_store", ignore_errors=True)
vector_store = ChromaStore(
vector_store_config=ChromaVectorConfig(
name="test_vstore",
persist_path="/tmp/awel_rag_test_vector_store",
embedding_fn=embeddings
)
)
with DAG("load_knowledge_dag") as knowledge_dag:
# Load knowledge from URL
knowledge_task = KnowledgeOperator(knowledge_type=KnowledgeType.URL.name)
assembler_task = EmbeddingAssemblerOperator(
index_store=vector_store,
chunk_parameters=ChunkParameters(chunk_strategy="CHUNK_BY_SIZE")
)
knowledge_task >> assembler_task
chunks = asyncio.run(assembler_task.call("https://docs.dbgpt.site/docs/awel/"))
print(f"Chunk length: {len(chunks)}")
Retrieve Knowledge From Vector Store
Then you can retrieve the knowledge from the vector store.
from dbgpt.core.awel import MapOperator
from dbgpt.rag.operators import EmbeddingRetrieverOperator
with DAG("retriever_dag") as retriever_dag:
retriever_task = EmbeddingRetrieverOperator(
top_k=3,
index_store=vector_store,
)
content_task = MapOperator(lambda cks: "\n".join(c.content for c in cks))
retriever_task >> content_task
chunks = asyncio.run(content_task.call("What is the AWEL?"))
print(chunks)
Prepare LLM
To build a RAG program, we need a LLM, here are some of the LLMs that DB-GPT supports:
- Open AI(API)
- YI(API)
- API Server(cluster)
First, you should install the openai
library.
pip install openai
Then set your API key in the environment OPENAI_API_KEY
.
from dbgpt.model.proxy import OpenAILLMClient
llm_client = OpenAILLMClient()
You should have a YI account and get the API key from the YI official website.
First, you should install the openai
library.
pip install openai