Version: dev

Model Issues

Common problems with model configuration, loading, and generation.

API key errors

Symptom: 401 Unauthorized, Invalid API key, or Authentication failed.

Fix:

Verify your API key is correctly set in the TOML config:

[[models.llms]]
api_key = "sk-..."  # Must be a valid key

Or use environment variables:

[[models.llms]]
api_key = "${env:OPENAI_API_KEY}"

export OPENAI_API_KEY="sk-your-actual-key"

Check that the key has sufficient permissions and credits with the provider.

Model not found

Symptom: Model 'xxx' not found or No model registered.

Fix:

Check the model name in your config matches the provider's expected format:

Provider	Example Name
OpenAI	`chatgpt_proxyllm`, `gpt-4o`
DeepSeek	`deepseek-chat`, `deepseek-reasoner`
Ollama	`qwen2.5:latest` (must be pulled first)
HuggingFace	`THUDM/glm-4-9b-chat-hf`

For Ollama, ensure the model is downloaded:

ollama pull qwen2.5:latest
ollama list  # Verify it appears

For cluster deployments, verify workers are registered:

dbgpt model list

Ollama connection refused

Symptom: Connection refused when using Ollama provider.

Fix:

Ensure Ollama is running:

ollama serve
# Or check: curl http://localhost:11434/api/tags

If running DB-GPT in Docker, use the host network address instead of localhost:

[[models.llms]]
api_base = "http://host.docker.internal:11434"  # Docker for Mac/Windows
# Or use the host's actual IP address

Out of memory (OOM)

Symptom: CUDA out of memory or RuntimeError: CUDA error.

Fix:

Use a smaller model:

[[models.llms]]
name = "Qwen2.5-Coder-0.5B-Instruct"  # Smaller model

Enable quantization:

dbgpt start worker --model_name ... --load_4bit

Limit GPU memory:

CUDA_VISIBLE_DEVICES=0 uv run dbgpt start webserver ...

Or switch to an API proxy (no GPU needed):

[[models.llms]]
provider = "proxy/openai"  # Uses remote API instead of local GPU

Slow model responses

Symptom: Very slow response times or timeouts.

Possible causes and fixes:

Cause	Fix
Model downloading on first run	Wait for download to complete (check logs)
Insufficient GPU VRAM	Use quantization or a smaller model
Slow network to API	Check connectivity to provider endpoint
Large context window	Reduce `max_context_size` in config

Embedding model errors

Symptom: Embedding model not found or knowledge base operations fail.

Fix:

Ensure an embedding model is configured:

[[models.embeddings]]
name = "text-embedding-3-small"
provider = "proxy/openai"
api_key = "your-key"

For HuggingFace embeddings, ensure the model is downloaded or accessible:

[[models.embeddings]]
name = "BAAI/bge-large-zh-v1.5"
provider = "hf"
# path = "/path/to/local/model"  # Optional: local path

Add the HuggingFace extra if using local embeddings:

uv sync --all-packages --extra "hf" --extra "cpu" ...

Reranker not working

Symptom: RAG results not improving with reranker enabled.

Fix:

Ensure reranker is configured in your TOML:

[[models.rerankers]]
name = "BAAI/bge-reranker-base"
provider = "hf"

Or for SiliconFlow:

[[models.rerankers]]
name = "BAAI/bge-reranker-v2-m3"
provider = "proxy/siliconflow"
api_key = "${env:SILICONFLOW_API_KEY}"

Still stuck?

Check LLM FAQ for more solutions
Review the Model Providers documentation
Search GitHub Issues

Model Issues

API key errors​

Model not found​

Ollama connection refused​

Out of memory (OOM)​

Slow model responses​

Embedding model errors​

Reranker not working​

Still stuck?​

API key errors

Model not found

Ollama connection refused

Out of memory (OOM)

Slow model responses

Embedding model errors

Reranker not working

Still stuck?