GradientPTQConfig Class¶
The following API can be used to create a GradientPTQConfig instance which can be used for post training quantization using knowledge distillation from a teacher (float model) to a student (the quantized model)
- class model_compression_toolkit.gptq.GradientPTQConfig(n_epochs, loss, optimizer, optimizer_rest, train_bias, hessian_weights_config, gradual_activation_quantization_config, regularization_factor, rounding_type=RoundingType.SoftQuantizer, optimizer_quantization_parameter=None, optimizer_bias=None, log_function=None, gptq_quantizer_params_override=<factory>)¶
Configuration to use for quantization with GradientPTQ.
- Parameters:
n_epochs – Number of representative dataset epochs to train.
loss – The loss to use. See ‘multiple_tensors_mse_loss’ for the expected interface.
optimizer – Optimizer to use.
optimizer_rest – Default optimizer to use for bias and quantizer parameters.
train_bias – Whether to update the bias during the training or not.
hessian_weights_config – A configuration that include all necessary arguments to run a computation of Hessian scores for the GPTQ loss.
gradual_activation_quantization_config – A configuration for Gradual Activation Quantization.
regularization_factor – A floating point number that defines the regularization factor.
rounding_type – An enum that defines the rounding type.
optimizer_quantization_parameter – Optimizer to override the rest optimizer for quantizer parameters.
optimizer_bias – Optimizer to override the rest optimizer for bias.
log_function – Function to log information about the GPTQ process.
gptq_quantizer_params_override – A dictionary of parameters to override in GPTQ quantizer instantiation.
GPTQHessianScoresConfig Class¶
The following API can be used to create a GPTQHessianScoresConfig instance which can be used to define necessary parameters for computing Hessian scores for the GPTQ loss function.
- class model_compression_toolkit.gptq.GPTQHessianScoresConfig(per_sample, hessians_num_samples, norm_scores=None, log_norm=None, scale_log_norm=False, hessian_batch_size=32)¶
Configuration to use for computing the Hessian-based scores for GPTQ loss metric.
- Parameters:
per_sample (bool) – Whether to use per sample attention score.
hessians_num_samples (int|None) – Number of samples to use for computing the Hessian-based scores. If None, compute Hessian for all images.
norm_scores (bool) – Whether to normalize the returned scores of the weighted loss function (to get values between 0 and 1).
log_norm (bool) – Whether to use log normalization for the GPTQ Hessian-based scores.
scale_log_norm (bool) – Whether to scale the final vector of the Hessian-based scores.
hessian_batch_size (int) – The Hessian computation batch size. used only if using GPTQ with Hessian-based objective.
RoundingType¶
- class model_compression_toolkit.gptq.RoundingType(value)¶
An enum for choosing the GPTQ rounding methods:
STE - STRAIGHT-THROUGH ESTIMATOR
SoftQuantizer - SoftQuantizer
GradualActivationQuantizationConfig¶
The following API can be used to configure the gradual activation quantization when using GPTQ.
- class model_compression_toolkit.gptq.GradualActivationQuantizationConfig(q_fraction_scheduler_policy=<factory>)¶
Configuration for Gradual Activation Quantization.
By default, the quantized fraction increases linearly from 0 to 1 throughout the training.
- Parameters:
q_fraction_scheduler_policy – config for the scheduling of the quantized fraction. Only linear annealing is currently supported.
QFractionLinearAnnealingConfig¶
- class model_compression_toolkit.gptq.QFractionLinearAnnealingConfig(initial_q_fraction, target_q_fraction, start_step, end_step)¶
Config for the quantized fraction linear scheduler of Gradual Activation Quantization.
- Parameters:
initial_q_fraction – initial quantized fraction
target_q_fraction – target quantized fraction
start_step – gradient step to begin annealing
end_step – gradient step to complete annealing. None means last step.