Version: dev

vLLM

Configure DB-GPT to use vLLM for high-throughput local model inference on NVIDIA GPUs.

Prerequisites

NVIDIA GPU with CUDA 12.1+
Sufficient VRAM for your chosen model (8 GB+ for 7B models)
DB-GPT installed with vllm extra

Install dependencies

uv sync --all-packages \
  --extra "base" \
  --extra "hf" \
  --extra "cuda121" \
  --extra "vllm" \
  --extra "rag" \
  --extra "storage_chromadb" \
  --extra "quant_bnb" \
  --extra "dbgpts"

Configuration

Edit configs/dbgpt-local-vllm.toml:

[models]
[[models.llms]]
name = "DeepSeek-R1-Distill-Qwen-1.5B"
provider = "vllm"
# Download from HuggingFace automatically, or specify local path:
# path = "models/DeepSeek-R1-Distill-Qwen-1.5B"

[[models.embeddings]]
name = "BAAI/bge-large-zh-v1.5"
provider = "hf"
# path = "models/bge-large-zh-v1.5"

Model download

If you don't specify a path, the model will be downloaded from HuggingFace Hub automatically. For large models, pre-downloading is recommended:

# Using huggingface-cli
huggingface-cli download deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --local-dir models/DeepSeek-R1-Distill-Qwen-1.5B

Popular model choices

Model	VRAM Required	Notes
DeepSeek-R1-Distill-Qwen-1.5B	~4 GB	Small, good for testing
GLM-4-9B-Chat	~20 GB	Strong Chinese & English
Qwen2.5-7B-Instruct	~16 GB	Good balance
Qwen2.5-Coder-7B-Instruct	~16 GB	Code-focused

Start the server

uv run dbgpt start webserver --config configs/dbgpt-local-vllm.toml

GPU selection

To use a specific GPU:

CUDA_VISIBLE_DEVICES=0 uv run dbgpt start webserver --config configs/dbgpt-local-vllm.toml

Troubleshooting

Issue	Solution
CUDA not found	Install CUDA 12.1+ and verify with `nvidia-smi`
Out of GPU memory	Use a smaller model or enable quantization (`quant_bnb`)
Model download fails	Pre-download the model or configure a HuggingFace mirror
Slow first request	vLLM compiles kernels on first run — subsequent requests are fast

What's next

Getting Started — Full setup walkthrough
vLLM Advanced — Advanced vLLM configuration
Model Providers — Try other providers

vLLM

Prerequisites​

Install dependencies​

Configuration​

Popular model choices​

Start the server​

Troubleshooting​

What's next​