LLama.cpp Server
DB-GPT supports native llama.cpp server, which supports concurrent requests and continuous batching inference.
Install dependencies
You can add the extra --extra "llama_cpp_server"
to install the dependencies needed for llama-cpp server.
If you has a Nvidia GPU, you can enable the CUDA support by setting the environment variable CMAKE_ARGS="-DGGML_CUDA=ON"
.
# Use uv to install dependencies needed for llama-cpp
# Install core dependencies and select desired extensions
CMAKE_ARGS="-DGGML_CUDA=ON" uv sync --all-packages \
--extra "base" \
--extra "hf" \
--extra "cuda121" \
--extra "llama_cpp_server" \
--extra "rag" \
--extra "storage_chromadb" \
--extra "quant_bnb" \
--extra "dbgpts"
Otherwise, run the following command to install dependencies without CUDA support.
# Use uv to install dependencies needed for llama-cpp
# Install core dependencies and select desired extensions
uv sync --all-packages \
--extra "base" \
--extra "hf" \
--extra "llama_cpp_server" \
--extra "rag" \
--extra "storage_chromadb" \
--extra "quant_bnb" \
--extra "dbgpts"