Model Issues
Common problems with model configuration, loading, and generation.
API key errorsâ
Symptom: 401 Unauthorized, Invalid API key, or Authentication failed.
Fix:
- Verify your API key is correctly set in the TOML config:
[[models.llms]]
api_key = "sk-..." # Must be a valid key
- Or use environment variables:
[[models.llms]]
api_key = "${env:OPENAI_API_KEY}"
export OPENAI_API_KEY="sk-your-actual-key"
- Check that the key has sufficient permissions and credits with the provider.
Model not foundâ
Symptom: Model 'xxx' not found or No model registered.
Fix:
- Check the model name in your config matches the provider's expected format:
| Provider | Example Name |
|---|---|
| OpenAI | chatgpt_proxyllm, gpt-4o |
| DeepSeek | deepseek-chat, deepseek-reasoner |
| Ollama | qwen2.5:latest (must be pulled first) |
| HuggingFace | THUDM/glm-4-9b-chat-hf |
- For Ollama, ensure the model is downloaded:
ollama pull qwen2.5:latest
ollama list # Verify it appears
- For cluster deployments, verify workers are registered:
dbgpt model list
Ollama connection refusedâ
Symptom: Connection refused when using Ollama provider.
Fix:
- Ensure Ollama is running:
ollama serve
# Or check: curl http://localhost:11434/api/tags
- If running DB-GPT in Docker, use the host network address instead of
localhost:
[[models.llms]]
api_base = "http://host.docker.internal:11434" # Docker for Mac/Windows
# Or use the host's actual IP address
Out of memory (OOM)â
Symptom: CUDA out of memory or RuntimeError: CUDA error.
Fix:
- Use a smaller model:
[[models.llms]]
name = "Qwen2.5-Coder-0.5B-Instruct" # Smaller model
- Enable quantization:
dbgpt start worker --model_name ... --load_4bit
- Limit GPU memory:
CUDA_VISIBLE_DEVICES=0 uv run dbgpt start webserver ...
- Or switch to an API proxy (no GPU needed):
[[models.llms]]
provider = "proxy/openai" # Uses remote API instead of local GPU
Slow model responsesâ
Symptom: Very slow response times or timeouts.
Possible causes and fixes:
| Cause | Fix |
|---|---|
| Model downloading on first run | Wait for download to complete (check logs) |
| Insufficient GPU VRAM | Use quantization or a smaller model |
| Slow network to API | Check connectivity to provider endpoint |
| Large context window | Reduce max_context_size in config |
Embedding model errorsâ
Symptom: Embedding model not found or knowledge base operations fail.
Fix:
- Ensure an embedding model is configured:
[[models.embeddings]]
name = "text-embedding-3-small"
provider = "proxy/openai"
api_key = "your-key"
- For HuggingFace embeddings, ensure the model is downloaded or accessible:
[[models.embeddings]]
name = "BAAI/bge-large-zh-v1.5"
provider = "hf"
# path = "/path/to/local/model" # Optional: local path
- Add the HuggingFace extra if using local embeddings:
uv sync --all-packages --extra "hf" --extra "cpu" ...
Reranker not workingâ
Symptom: RAG results not improving with reranker enabled.
Fix:
Ensure reranker is configured in your TOML:
[[models.rerankers]]
name = "BAAI/bge-reranker-base"
provider = "hf"
Or for SiliconFlow:
[[models.rerankers]]
name = "BAAI/bge-reranker-v2-m3"
provider = "proxy/siliconflow"
api_key = "${env:SILICONFLOW_API_KEY}"
Still stuck?â
- Check LLM FAQ for more solutions
- Review the Model Providers documentation
- Search GitHub Issues