Graph RAG User Manual
In this example, we will show how to use the Graph RAG framework in DB-GPT. Using a graph database to implement RAG can, to some extent, alleviate the uncertainty and interpretability issues brought about by vector database retrieval.
You can refer to the python example file DB-GPT/examples/rag/graph_rag_example.py
in the source code. This example demonstrates how to load knowledge from a document and store it in a graph store. Subsequently, it recalls knowledge relevant to your question by searching for triplets in the graph store.
Install Dependencies
First, you need to install the dbgpt
library.
uv sync --all-packages --frozen \
--extra "proxy_openai" \
--extra "rag" \
--extra "storage_chromadb" \
--extra "dbgpts"
--extra "graph_rag"
Prepare Graph Database
To store the knowledge in graph, we need an graph database, TuGraph is the first graph database supported by DB-GPT.
Visit github repository of TuGraph to view Quick Start document, follow the instructions to pull the TuGraph database docker image (latest / version >= 4.5.1) and launch it.
docker pull tugraph/tugraph-runtime-centos7:4.5.1
docker run -d -p 7070:7070 -p 7687:7687 -p 9090:9090 --name tugraph_demo tugraph/tugraph-runtime-centos7:latest lgraph_server -d run --enable_plugin true
The default port for the bolt protocol is 7687
.
Download Tips:
There is also a corresponding version of the TuGraph Docker image package on OSS. You can also directly download and import it.
wget 'https://tugraph-web.oss-cn-beijing.aliyuncs.com/tugraph/tugraph-4.5.1/tugraph-runtime-centos7-4.5.1.tar' -O tugraph-runtime-centos7-4.5.1.tar
docker load -i tugraph-runtime-centos7-4.5.1.tar
Prepare LLM
To build a Graph RAG program, we need a LLM, here are some of the LLMs that DB-GPT supports:
- Open AI(API)
- YI(API)
- API Server(cluster)
Then set your API key in the environment OPENAI_API_KEY
.
from dbgpt.model.proxy import OpenAILLMClient
llm_client = OpenAILLMClient()
You should have a YI account and get the API key from the YI official website.
Then set your API key in the environment variable YI_API_KEY
.
If you have deployed DB-GPT cluster and API server , you can connect to the API server to get the LLM model.
The API is compatible with the OpenAI API, so you can use the OpenAILLMClient to connect to the API server.
First you should install the openai
library.
pip install openai
from dbgpt.model.proxy import OpenAILLMClient
llm_client = OpenAILLMClient(api_base="http://localhost:8100/api/v1/", api_key="{your_api_key}")
TuGraph Configuration
Set variables below in .env
file, let DB-GPT know how to connect to TuGraph.
GRAPH_STORE_TYPE=TuGraph
TUGRAPH_HOST=127.0.0.1
TUGRAPH_PORT=7687
TUGRAPH_USERNAME=admin
TUGRAPH_PASSWORD=73@TuGraph
GRAPH_COMMUNITY_SUMMARY_ENABLED=True # enable the graph community summary
TRIPLET_GRAPH_ENABLED=True # enable the graph search for the triplets
DOCUMENT_GRAPH_ENABLED=True # enable the graph search for documents and chunks
KNOWLEDGE_GRAPH_CHUNK_SEARCH_TOP_SIZE=5 # the number of the searched triplets in a retrieval
KNOWLEDGE_GRAPH_EXTRACTION_BATCH_SIZE=20 # the batch size of triplet extraction from the text
COMMUNITY_SUMMARY_BATCH_SIZE=20 # the batch size of parallel community summary process
Load into Knowledge Graph
When using a graph database as the underlying knowledge storage platform, it is necessary to build a knowledge graph to facilitate the archiving and retrieval of documents. DB-GPT leverages the capabilities of large language models to implement an integrated knowledge graph, while still maintaining the flexibility to freely connect to other knowledge graph systems and graph database systems.
We created a knowledge graph with graph community summaries based on CommunitySummaryKnowledgeGraph
.
from dbgpt.model.proxy.llms.chatgpt import OpenAILLMClient
from dbgpt.storage.knowledge_graph.community_summary import (
CommunitySummaryKnowledgeGraph,
CommunitySummaryKnowledgeGraphConfig,
)
llm_client = OpenAILLMClient()
model_name = "gpt-4o-mini"
def __create_community_kg_connector():
"""Create community knowledge graph connector."""
return CommunitySummaryKnowledgeGraph(
config=CommunitySummaryKnowledgeGraphConfig(
name="community_graph_rag_test",
embedding_fn=DefaultEmbeddingFactory.openai(),
llm_client=llm_client,
model_name=model_name,
graph_store_type="TuGraphGraph",
),
)
Retrieve from Knowledge Graph
Then you can retrieve the knowledge from the knowledge graph, which is the same with vector store.
import os
from dbgpt.configs.model_config import ROOT_PATH
from dbgpt.core import Chunk, HumanPromptTemplate, ModelMessage, ModelRequest
from dbgpt_ext.rag import ChunkParameters
from dbgpt_ext.rag.assembler import EmbeddingAssembler
from dbgpt_ext.rag.knowledge import KnowledgeFactory
from dbgpt.rag.retriever import RetrieverStrategy
async def test_community_graph_rag():
await __run_graph_rag(
knowledge_file="examples/test_files/graphrag-mini.md",
chunk_strategy="CHUNK_BY_MARKDOWN_HEADER",
knowledge_graph=__create_community_kg_connector(),
question="What's the relationship between TuGraph and DB-GPT ?",
)
async def __run_graph_rag(knowledge_file, chunk_strategy, knowledge_graph, question):
file_path = os.path.join(ROOT_PATH, knowledge_file).format()
knowledge = KnowledgeFactory.from_file_path(file_path)
try:
chunk_parameters = ChunkParameters(chunk_strategy=chunk_strategy)
# get embedding assembler
assembler = await EmbeddingAssembler.aload_from_knowledge(
knowledge=knowledge,
chunk_parameters=chunk_parameters,
index_store=knowledge_graph,
retrieve_strategy=RetrieverStrategy.GRAPH,
)
await assembler.apersist()
# get embeddings retriever
retriever = assembler.as_retriever(1)
chunks = await retriever.aretrieve_with_scores(question, score_threshold=0.3)
# chat
print(f"{await ask_chunk(chunks[0], question)}")
finally:
knowledge_graph.delete_vector_name(knowledge_graph.get_config().name)
async def ask_chunk(chunk: Chunk, question) -> str:
rag_template = (
"Based on the following [Context] {context}, "
"answer [Question] {question}."
)
template = HumanPromptTemplate.from_template(rag_template)
messages = template.format_messages(context=chunk.content, question=question)
model_messages = ModelMessage.from_base_messages(messages)
request = ModelRequest(model=model_name, messages=model_messages)
response = await llm_client.generate(request=request)
if not response.success:
code = str(response.error_code)
reason = response.text
raise Exception(f"request llm failed ({code}) {reason}")
return response.text
Chat Knowledge via GraphRAG
Note: The current test data is in Chinese.
Here we demonstrate how to achieve chat knowledge through Graph RAG on web page.
First, create a knowledge base using the Knowledge Graph
type.
Then, upload the documents (graphrag-test.md) and process them automatically (markdown header by default).
After indexing, the graph data may look like this.
Start to chat on knowledge graph.
Performance Testing
Performance testing is based on the gpt-4o-mini
model.
Indexing Performance
DB-GPT | GraphRAG(microsoft) | |
---|---|---|
Doc Tokens | 42631 | 42631 |
Triplets Graph | 734 nodes, 1064 edges | 779 nodes, 967 edges |
Doc Structure Graph | 76 nodes, 1090 edges | N/A |
Prompt Tokens | 375768 | 744990 |
Completion Tokens | 41797 | 227230 |
Total Tokens | 417565 | 972220 |
Indexing Time | 170s | 210s |
Querying Performance
Global Search
DB-GPT | GraphRAG(microsoft) | |
---|---|---|
Time | 8s | 40s |
Tokens | 7432 | 63317 |
Question
请总结知识库的内容,并进行摘要说明。
- DB-GPT Answer
· 知识图谱概述: 知识图谱集中展示了与图数据库及相关处理系统的多种实体和其相互关联,涵盖了分布式内存图数据库、图计算模型及多种图处理框架和技术,强调了这些技术在图处理· 领域中的应用和相互影响。
· TuGraph DB: TuGraph DB是一种高效的图数据库,支持大数据容量和低延迟查找,具备ACID特性。其社区版适用于学习和小型项目,支持实时数据管理和多图结构。
· DB-GPT: DB-GPT是一个私有化的LLM技术数据库交互方式,支持多种高级功能,旨在推动数据3.0时代的到来。它提供了多模型管理、信息检索优化和智能体协作等功能。
· 应用领域: 知识库中提到的图技术应用于金融风控、社交网络和数据分析等多个领域,展示了图数据库的快速发展及其在各行各业中的重要性。
· 开发者生态: 知识库还强调了开发者在开源项目中的活跃表现,尤其是在Apache Flink和Spark等项目中的贡献,表明了良好的合作与开发生态。
整体而言,知识库展示了图数据库及其相关技术的广泛应用和快速进步,强调了DB-GPT和TuGraph等工具在数据智能和处理中的重要角色。
- GraphRAG(microsoft) Answer
知识库内容总结
概述
知识库涵盖了与图处理相关的多个社区和框架,强调了这些实体在推动技术解决方案和协作努力中的重要性。主要关注的领域包括图处理、人工智能开发和开源贡献。
关键实体
1. 图处理框架:知识库中提到的多个重要框架,如 GraphChi、X-Stream、GraphQ、PowerLyra 、Gemini 和 TuGraph,均在处理大规模图数据方面发挥了重要作用。这些框架的相互关联性表明了一个促进创新的协作环境。
2. 企业贡献:一些主要企业如 Ant Group、Google、Tencent 和 Alibaba 在图处理和数据管理技术的发展中扮演了关键角色。它们的合作关系强调了在推动技术创新方面的伙伴关系的重要性。
3. 开源项目:Apache 软件基金会的多个开源项目(如 Apache Hudi、Iceberg、Pulsar 和 Kafka)展示了全球开发者的贡献,增强了数据管理和分析能力。
发展趋势
知识库还指出了动态图处理、人工智能驱动的数据库管理和自动化工具集成等新兴趋势,这些趋势对于提高现代软件开发的效率和有效性至关重要。
结论
整体而言,知识库展示了一个充满活力的技术生态系统,其中各个实体通过合作和创新共同推动图处理和相关技术的发展。这种相互联系的网络不仅促进了技术的进步,也为解决复杂的技术挑战提供了支持。
Local Search
DB-GPT | GraphRAG(microsoft) | |
---|---|---|
Time | 15s | 15s |
Tokens | 9230 | 11619 |
Question
DB-GPT社区和TuGraph社区在社区贡献,社区生态,开发者的这几个方面的联系和区别分别是什么?
- DB-GPT Answer