跳到主要内容
版本:dev

BitsandbytesQuantization4bits Configuration

Bits and bytes quantization 4 bits parameters.

Parameters

NameTypeRequiredDescription
load_in_4bitsboolean
Whether to load the model in 4 bits.
Defaults:True
bnb_4bit_compute_dtypestring
To speedup computation, you can change the data type from float32 (the default value) to bfloat16
Valid values:bfloat16, float16, float32
bnb_4bit_quant_typestring
Quantization datatypes, `fp4` (four bit float) and `nf4` (normal four bit float), only valid when load_4bit=True
Valid values:nf4, fp4
Defaults:nf4
bnb_4bit_use_double_quantboolean
Nested quantization is a technique that can save additional memory at no additional performance cost. This feature performs a second quantization of the already quantized weights to save an additional 0.4 bits/parameter.
Defaults:True
load_in_8bitsboolean
Whether to load the model in 8 bits(LLM.int8() algorithm), default is False.
Defaults:False