Graph RAG User Manual
In this example, we will show how to use the Graph RAG framework in DB-GPT. Using a graph database to implement RAG can, to some extent, alleviate the uncertainty and interpretability issues brought about by vector database retrieval.
You can refer to the python example file DB-GPT/examples/rag/graph_rag_example.py
in the source code. This example demonstrates how to load knowledge from a document and store it in a graph store. Subsequently, it recalls knowledge relevant to your question by searching for triplets in the graph store.
Install Dependencies
First, you need to install the dbgpt
library.
pip install "dbgpt[graph_rag]>=0.6.1"
Prepare Graph Database
To store the knowledge in graph, we need an graph database, TuGraph is the first graph database supported by DB-GPT.
Visit github repository of TuGraph to view Quick Start document, follow the instructions to pull the TuGraph database docker image (latest / version >= 4.3.2) and launch it.
docker pull tugraph/tugraph-runtime-centos7:latest
docker run -d -p 7070:7070 -p 7687:7687 -p 9090:9090 --name tugraph_demo tugraph/tugraph-runtime-centos7:latest lgraph_server -d run --enable_plugin true
The default port for the bolt protocol is 7687
.
Prepare LLM
To build a Graph RAG program, we need a LLM, here are some of the LLMs that DB-GPT supports:
- Open AI(API)
- YI(API)
- API Server(cluster)
First, you should install the openai
library.
pip install openai
Then set your API key in the environment OPENAI_API_KEY
.
from dbgpt.model.proxy import OpenAILLMClient
llm_client = OpenAILLMClient()
You should have a YI account and get the API key from the YI official website.
First, you should install the openai
library.
pip install openai
Then set your API key in the environment variable YI_API_KEY
.
from dbgpt.model.proxy import YiLLMClient
llm_client = YiLLMClient()
If you have deployed DB-GPT cluster and API server , you can connect to the API server to get the LLM model.
The API is compatible with the OpenAI API, so you can use the OpenAILLMClient to connect to the API server.
First you should install the openai
library.
pip install openai
from dbgpt.model.proxy import OpenAILLMClient
llm_client = OpenAILLMClient(api_base="http://localhost:8100/api/v1/", api_key="{your_api_key}")
TuGraph Configuration
Set variables below in .env
file, let DB-GPT know how to connect to TuGraph.
GRAPH_STORE_TYPE=TuGraph
TUGRAPH_HOST=127.0.0.1
TUGRAPH_PORT=7687
TUGRAPH_USERNAME=admin
TUGRAPH_PASSWORD=73@TuGraph
GRAPH_COMMUNITY_SUMMARY_ENABLED=True # enable the graph community summary
TRIPLET_GRAPH_ENABLED=True # enable the graph search for the triplets
DOCUMENT_GRAPH_ENABLED=True # enable the graph search for documents and chunks
KNOWLEDGE_GRAPH_CHUNK_SEARCH_TOP_SIZE=5 # the number of the searched triplets in a retrieval
Load into Knowledge Graph
When using a graph database as the underlying knowledge storage platform, it is necessary to build a knowledge graph to facilitate the archiving and retrieval of documents. DB-GPT leverages the capabilities of large language models to implement an integrated knowledge graph, while still maintaining the flexibility to freely connect to other knowledge graph systems and graph database systems.
We created a knowledge graph with graph community summaries based on CommunitySummaryKnowledgeGraph
.
from dbgpt.model.proxy.llms.chatgpt import OpenAILLMClient
from dbgpt.storage.knowledge_graph.community_summary import (
CommunitySummaryKnowledgeGraph,
CommunitySummaryKnowledgeGraphConfig,
)
llm_client = OpenAILLMClient()
model_name = "gpt-4o-mini"
def __create_community_kg_connector():
"""Create community knowledge graph connector."""
return CommunitySummaryKnowledgeGraph(
config=CommunitySummaryKnowledgeGraphConfig(
name="community_graph_rag_test",
embedding_fn=DefaultEmbeddingFactory.openai(),
llm_client=llm_client,
model_name=model_name,
graph_store_type="TuGraphGraph",
),
)
Retrieve from Knowledge Graph
Then you can retrieve the knowledge from the knowledge graph, which is the same with vector store.
import os
from dbgpt.configs.model_config import ROOT_PATH
from dbgpt.core import Chunk, HumanPromptTemplate, ModelMessage, ModelRequest
from dbgpt.rag import ChunkParameters
from dbgpt.rag.assembler import EmbeddingAssembler
from dbgpt.rag.knowledge import KnowledgeFactory
from dbgpt.rag.retriever import RetrieverStrategy
async def test_community_graph_rag():
await __run_graph_rag(
knowledge_file="examples/test_files/graphrag-mini.md",
chunk_strategy="CHUNK_BY_MARKDOWN_HEADER",
knowledge_graph=__create_community_kg_connector(),
question="What's the relationship between TuGraph and DB-GPT ?",
)
async def __run_graph_rag(knowledge_file, chunk_strategy, knowledge_graph, question):
file_path = os.path.join(ROOT_PATH, knowledge_file).format()
knowledge = KnowledgeFactory.from_file_path(file_path)
try:
chunk_parameters = ChunkParameters(chunk_strategy=chunk_strategy)
# get embedding assembler
assembler = await EmbeddingAssembler.aload_from_knowledge(
knowledge=knowledge,
chunk_parameters=chunk_parameters,
index_store=knowledge_graph,
retrieve_strategy=RetrieverStrategy.GRAPH,
)
await assembler.apersist()
# get embeddings retriever
retriever = assembler.as_retriever(1)
chunks = await retriever.aretrieve_with_scores(question, score_threshold=0.3)
# chat
print(f"{await ask_chunk(chunks[0], question)}")
finally:
knowledge_graph.delete_vector_name(knowledge_graph.get_config().name)
async def ask_chunk(chunk: Chunk, question) -> str:
rag_template = (
"Based on the following [Context] {context}, "
"answer [Question] {question}."
)
template = HumanPromptTemplate.from_template(rag_template)
messages = template.format_messages(context=chunk.content, question=question)
model_messages = ModelMessage.from_base_messages(messages)
request = ModelRequest(model=model_name, messages=model_messages)
response = await llm_client.generate(request=request)
if not response.success:
code = str(response.error_code)
reason = response.text
raise Exception(f"request llm failed ({code}) {reason}")
return response.text
Chat Knowledge via GraphRAG
Note: The current test data is in Chinese.
Here we demonstrate how to achieve chat knowledge through Graph RAG on web page.
First, create a knowledge base using the Knowledge Graph
type.
Then, upload the documents (tugraph.md, osgraph.md, dbgpt.md) and process them automatically (markdown header by default).
After indexing, the graph data may look like this.
Start to chat on knowledge graph.
Performance Testing
Performance testing is based on the gpt-4o-mini
model.
Indexing Performance
DB-GPT | GraphRAG(microsoft) | |
---|---|---|
Document Tokens | 42631 | 42631 |
Graph Size | 808 nodes, 1170 edges | 779 nodes, 967 edges |
Prompt Tokens | 452614 | 744990 |
Completion Tokens | 48325 | 227230 |
Total Tokens | 500939 | 972220 |