GradientPTQConfig Class¶

The following API can be used to create a GradientPTQConfig instance which can be used for post training quantization using knowledge distillation from a teacher (float model) to a student (the quantized model)

class model_compression_toolkit.gptq.GradientPTQConfig(n_epochs, loss, optimizer, optimizer_rest, train_bias, hessian_weights_config, gradual_activation_quantization_config, regularization_factor, rounding_type=RoundingType.SoftQuantizer, optimizer_quantization_parameter=None, optimizer_bias=None, log_function=None, gptq_quantizer_params_override=<factory>)¶

Configuration to use for quantization with GradientPTQ.

Parameters:

n_epochs – Number of representative dataset epochs to train.
loss – The loss to use. See ‘multiple_tensors_mse_loss’ for the expected interface.
optimizer – Optimizer to use.
optimizer_rest – Default optimizer to use for bias and quantizer parameters.
train_bias – Whether to update the bias during the training or not.
hessian_weights_config – A configuration that include all necessary arguments to run a computation of Hessian scores for the GPTQ loss.
gradual_activation_quantization_config – A configuration for Gradual Activation Quantization.
regularization_factor – A floating point number that defines the regularization factor.
rounding_type – An enum that defines the rounding type.
optimizer_quantization_parameter – Optimizer to override the rest optimizer for quantizer parameters.
optimizer_bias – Optimizer to override the rest optimizer for bias.
log_function – Function to log information about the GPTQ process.
gptq_quantizer_params_override – A dictionary of parameters to override in GPTQ quantizer instantiation.

GPTQHessianScoresConfig Class¶

The following API can be used to create a GPTQHessianScoresConfig instance which can be used to define necessary parameters for computing Hessian scores for the GPTQ loss function.

class model_compression_toolkit.gptq.GPTQHessianScoresConfig(per_sample, hessians_num_samples, norm_scores=None, log_norm=None, scale_log_norm=False, hessian_batch_size=32)¶

Configuration to use for computing the Hessian-based scores for GPTQ loss metric.

Parameters:

per_sample (bool) – Whether to use per sample attention score.
hessians_num_samples (int|None) – Number of samples to use for computing the Hessian-based scores. If None, compute Hessian for all images.
norm_scores (bool) – Whether to normalize the returned scores of the weighted loss function (to get values between 0 and 1).
log_norm (bool) – Whether to use log normalization for the GPTQ Hessian-based scores.
scale_log_norm (bool) – Whether to scale the final vector of the Hessian-based scores.
hessian_batch_size (int) – The Hessian computation batch size. used only if using GPTQ with Hessian-based objective.

RoundingType¶

class model_compression_toolkit.gptq.RoundingType(value)¶

An enum for choosing the GPTQ rounding methods:

STE - STRAIGHT-THROUGH ESTIMATOR

SoftQuantizer - SoftQuantizer

GradualActivationQuantizationConfig¶

The following API can be used to configure the gradual activation quantization when using GPTQ.

class model_compression_toolkit.gptq.GradualActivationQuantizationConfig(q_fraction_scheduler_policy=<factory>)¶

Configuration for Gradual Activation Quantization.

By default, the quantized fraction increases linearly from 0 to 1 throughout the training.

Parameters:: q_fraction_scheduler_policy – config for the scheduling of the quantized fraction. Only linear annealing is currently supported.

QFractionLinearAnnealingConfig¶

class model_compression_toolkit.gptq.QFractionLinearAnnealingConfig(initial_q_fraction, target_q_fraction, start_step, end_step)¶

Config for the quantized fraction linear scheduler of Gradual Activation Quantization.

Parameters:

initial_q_fraction – initial quantized fraction
target_q_fraction – target quantized fraction
start_step – gradient step to begin annealing
end_step – gradient step to complete annealing. None means last step.