Skip to main content
Version: dev

Cluster Deployment

Deploy DB-GPT as a distributed cluster — separate the webserver, model workers, and controller for scalability.

Architecture overview​

ComponentRoleDefault Port
ControllerService registry and routing8000
LLM WorkerServes language models8001+
Embedding WorkerServes embedding models8003+
Reranker WorkerServes reranking models8004+
API ServerREST API gateway (optional)8100
WebserverWeb UI + application logic5670

Option A — Manual cluster (CLI)​

Step 1 — Start the controller​

dbgpt start controller

The controller starts on port 8000 by default.

Step 2 — Start LLM workers​

dbgpt start worker \
--model_name glm-4-9b-chat \
--model_path /app/models/glm-4-9b-chat \
--port 8001 \
--controller_addr http://127.0.0.1:8000

Add more workers on different ports:

dbgpt start worker \
--model_name vicuna-13b-v1.5 \
--model_path /app/models/vicuna-13b-v1.5 \
--port 8002 \
--controller_addr http://127.0.0.1:8000
info

Replace model names and paths with your own. Each worker must use a unique port.

Step 3 — Start embedding worker​

dbgpt start worker \
--model_name text2vec \
--model_path /app/models/text2vec-large-chinese \
--worker_type text2vec \
--port 8003 \
--controller_addr http://127.0.0.1:8000

Step 4 — Start reranker worker (optional)​

dbgpt start worker \
--worker_type text2vec \
--rerank \
--model_name bge-reranker-base \
--model_path /app/models/bge-reranker-base \
--port 8004 \
--controller_addr http://127.0.0.1:8000

Step 5 — Verify deployed models​

dbgpt model list

Expected output:

+-------------------+------------+------+---------+
| Model Name | Model Type | Port | Healthy |
+-------------------+------------+------+---------+
| glm-4-9b-chat | llm | 8001 | True |
| vicuna-13b-v1.5 | llm | 8002 | True |
| text2vec | text2vec | 8003 | True |
| bge-reranker-base | text2vec | 8004 | True |
+-------------------+------------+------+---------+

Step 6 — Start the webserver​

LLM_MODEL=glm-4-9b-chat \
MODEL_SERVER=http://127.0.0.1:8000 \
dbgpt start webserver --light --remote_embedding
FlagPurpose
--lightDon't start embedded model service
--remote_embeddingUse remote embedding workers

Option B — Docker Compose cluster​

Use the pre-built cluster Compose file:

docker compose -f docker/compose_examples/cluster-docker-compose.yml up -d

This starts:

  • Controller — Service registry
  • LLM Worker — glm-4-9b-chat on GPU
  • Embedding Worker — text2vec-large-chinese on GPU
  • Webserver — Web UI in lightweight mode
warning

Edit the Compose file to set your model paths before running. The default expects models at /data/models/.

High-availability cluster​

For HA deployments with multiple controllers:

docker compose -f docker/compose_examples/ha-cluster-docker-compose.yml up -d

CLI reference​

dbgpt start worker --help

Key options:

OptionDescriptionDefault
--model_nameModel name (required)—
--model_pathPath to model files (required)—
--worker_typeWorker type (llm, text2vec)llm
--portWorker port8001
--controller_addrController address—
--deviceDevice (cuda, cpu, mps)auto
--num_gpusNumber of GPUs to useall
--load_8bitEnable 8-bit quantizationfalse
--load_4bitEnable 4-bit quantizationfalse
--max_context_sizeMaximum context window4096
dbgpt model --help
CommandDescription
dbgpt model listList all registered model instances
dbgpt model startStart a model instance
dbgpt model stopStop a model instance
dbgpt model restartRestart a model instance
dbgpt model chatChat with a model from CLI

Next steps​

TopicLink
Docker single-containerDocker
Docker ComposeDocker Compose
Source code deploymentSource Code
SMMF deep diveMulti-Model Management