Skip to main content
Version: dev

HFLLMDeployModelParameters Configuration

Local deploy model parameters.

Parameters

NameTypeRequiredDescription
namestring
The name of the model.
pathstring
The path of the model, if you want to deploy a local model.
backendstring
The real model name to pass to the provider, default is None. If backend is None, use name as the real model name.
devicestring
Device to run model. If None, the device is automatically determined
providerstring
The provider of the model. If model is deployed in local, this is the inference type. If model is deployed in third-party service, this is platform name('proxy/<platform>')
Defaults:hf
verboseboolean
Show verbose output.
Defaults:False
concurrencyinteger
Model concurrency limit
Defaults:5
prompt_templatestring
Prompt template. If None, the prompt template is automatically determined from model. Just for local deployment.
context_lengthinteger
The context length of the model. If None, it is automatically determined from model.
trust_remote_codeboolean
Trust remote code or not.
Defaults:True
quantizationBaseHFQuantization (bitsandbytes configuration, bitsandbytes_8bits configuration, bitsandbytes_4bits configuration)
The quantization parameters.
low_cpu_mem_usageboolean
Whether to use low CPU memory usage mode. It can reduce the memory when loading the model, if you load your model with quantization, it will be True by default. You must install `accelerate` to make it work.
num_gpusinteger
The number of gpus you expect to use, if it is empty, use all of them as much as possible
max_gpu_memorystring
The maximum memory limit of each GPU, only valid in multi-GPU configuration, eg: 10GiB, 24GiB
torch_dtypestring
The dtype of the model, default is None.
Valid values:auto, float16, bfloat16, float, float32