BitsandbytesQuantization4bits Configuration
Bits and bytes quantization 4 bits parameters.
Parameters
Name | Type | Required | Description |
---|---|---|---|
load_in_4bits | boolean | ❌ | Whether to load the model in 4 bits. Defaults: True |
bnb_4bit_compute_dtype | string | ❌ | To speedup computation, you can change the data type from float32 (the default value) to bfloat16 Valid values: bfloat16 , float16 , float32 |
bnb_4bit_quant_type | string | ❌ | Quantization datatypes, `fp4` (four bit float) and `nf4` (normal four bit float), only valid when load_4bit=True Valid values: nf4 , fp4 Defaults: nf4 |
bnb_4bit_use_double_quant | boolean | ❌ | Nested quantization is a technique that can save additional memory at no additional performance cost. This feature performs a second quantization of the already quantized weights to save an additional 0.4 bits/parameter. Defaults: True |
load_in_8bits | boolean | ❌ | Whether to load the model in 8 bits(LLM.int8() algorithm), default is False. Defaults: False |