跳到主要内容
版本:dev

SMMF (Service-oriented Multi-Model Management Framework)

SMMF is DB-GPT's model management layer. It provides a unified interface for managing, switching, and deploying multiple LLM and embedding models — whether they are API proxies or locally hosted.

Why SMMF?

Different tasks benefit from different models. SMMF lets you:

  • Run multiple models simultaneously (e.g., one for chat, one for embeddings)
  • Switch models without code changes — just update config
  • Scale independently — deploy models on separate machines in cluster mode
  • Mix providers — use OpenAI for chat and a local model for embeddings

Supported providers

API Proxy

ProviderConfig prefixExample models
OpenAIproxy/openaiGPT-4o, GPT-4o-mini
DeepSeekproxy/deepseekDeepSeek-V3, DeepSeek-R1
Qwen (Tongyi)proxy/tongyiQwen-Max, Qwen-Plus
SiliconFlowproxy/siliconflowVarious hosted models
Ollamaproxy/ollamaAny Ollama-served model
Azure OpenAIproxy/openaiAzure-hosted OpenAI models

Local Inference

ProviderConfig prefixRequirements
HuggingFacehfGPU recommended
vLLMvllmNVIDIA GPU + CUDA
llama.cppllama.cppCPU or GPU
MLXmlxApple Silicon Mac

Configuration

Models are configured in TOML files under configs/:

[models]

# LLM configuration
[[models.llms]]
name = "chatgpt_proxyllm"
provider = "proxy/openai"
api_key = "sk-..."

# Embedding model configuration
[[models.embeddings]]
name = "text-embedding-3-small"
provider = "proxy/openai"
api_key = "sk-..."

You can define multiple LLMs and embeddings in the same config file.

Deployment modes

Standalone

All models run in the same process as the DB-GPT server. Simple and suitable for development or single-machine deployments.

uv run dbgpt start webserver --config configs/dbgpt-proxy-openai.toml

Cluster

Models run on separate worker nodes, managed by a controller. Suitable for production deployments with multiple GPUs or machines.

Learn more: Cluster Deployment

What's next