GradientPTQConfig Class

The following API can be used to create a GradientPTQConfig instance which can be used for post training quantization using knowledge distillation from a teacher (float model) to a student (the quantized model)

class model_compression_toolkit.gptq.GradientPTQConfig(n_epochs, loss, optimizer, optimizer_rest, train_bias, hessian_weights_config, gradual_activation_quantization_config, regularization_factor, rounding_type=RoundingType.SoftQuantizer, optimizer_quantization_parameter=None, optimizer_bias=None, log_function=None, gptq_quantizer_params_override=<factory>)

Configuration to use for quantization with GradientPTQ.

Parameters:
  • n_epochs – Number of representative dataset epochs to train.

  • loss – The loss to use. See ‘multiple_tensors_mse_loss’ for the expected interface.

  • optimizer – Optimizer to use.

  • optimizer_rest – Default optimizer to use for bias and quantizer parameters.

  • train_bias – Whether to update the bias during the training or not.

  • hessian_weights_config – A configuration that include all necessary arguments to run a computation of Hessian scores for the GPTQ loss.

  • gradual_activation_quantization_config – A configuration for Gradual Activation Quantization.

  • regularization_factor – A floating point number that defines the regularization factor.

  • rounding_type – An enum that defines the rounding type.

  • optimizer_quantization_parameter – Optimizer to override the rest optimizer for quantizer parameters.

  • optimizer_bias – Optimizer to override the rest optimizer for bias.

  • log_function – Function to log information about the GPTQ process.

  • gptq_quantizer_params_override – A dictionary of parameters to override in GPTQ quantizer instantiation.

GPTQHessianScoresConfig Class

The following API can be used to create a GPTQHessianScoresConfig instance which can be used to define necessary parameters for computing Hessian scores for the GPTQ loss function.

class model_compression_toolkit.gptq.GPTQHessianScoresConfig(per_sample, hessians_num_samples, norm_scores=None, log_norm=None, scale_log_norm=False, hessian_batch_size=32)

Configuration to use for computing the Hessian-based scores for GPTQ loss metric.

Parameters:
  • per_sample (bool) – Whether to use per sample attention score.

  • hessians_num_samples (int|None) – Number of samples to use for computing the Hessian-based scores. If None, compute Hessian for all images.

  • norm_scores (bool) – Whether to normalize the returned scores of the weighted loss function (to get values between 0 and 1).

  • log_norm (bool) – Whether to use log normalization for the GPTQ Hessian-based scores.

  • scale_log_norm (bool) – Whether to scale the final vector of the Hessian-based scores.

  • hessian_batch_size (int) – The Hessian computation batch size. used only if using GPTQ with Hessian-based objective.

RoundingType

class model_compression_toolkit.gptq.RoundingType(value)

An enum for choosing the GPTQ rounding methods:

STE - STRAIGHT-THROUGH ESTIMATOR

SoftQuantizer - SoftQuantizer

GradualActivationQuantizationConfig

The following API can be used to configure the gradual activation quantization when using GPTQ.

class model_compression_toolkit.gptq.GradualActivationQuantizationConfig(q_fraction_scheduler_policy=<factory>)

Configuration for Gradual Activation Quantization.

By default, the quantized fraction increases linearly from 0 to 1 throughout the training.

Parameters:

q_fraction_scheduler_policy – config for the scheduling of the quantized fraction. Only linear annealing is currently supported.

QFractionLinearAnnealingConfig

class model_compression_toolkit.gptq.QFractionLinearAnnealingConfig(initial_q_fraction, target_q_fraction, start_step, end_step)

Config for the quantized fraction linear scheduler of Gradual Activation Quantization.

Parameters:
  • initial_q_fraction – initial quantized fraction

  • target_q_fraction – target quantized fraction

  • start_step – gradient step to begin annealing

  • end_step – gradient step to complete annealing. None means last step.