trainable_infrastructure Module

The trainable infrastructure is a module containing quantization abstraction and quantizers for hardware-oriented model optimization tools. It provides the required abstraction for trainable quantization methods such as quantization-aware training. It utilizes the Inferable Quantizers Infrastructure provided by the MCT Quantizers package, which proposes the required abstraction for emulating inference-time quantization.

When using a trainable quantizer, each layer with quantized weights is wrapped with a “Quantization Wrapper” object, and each activation quantizer is being stored in an “Activation Quantization Holder” object. Both components are provided by the MCT Quantizers package.

The quantizers in this module are built upon the “Inferable Quantizer” abstraction (from MCT Quantizers), and define the “Trainable Quantizer” framework, which contains learnable quantization parameters that can be optimized during training.

Now, we will explain how a trainable quantizer is built and used. We start by explaining the basic building block of a trainable quantizer, and then explain how to initialize it using a configuration object.

BaseKerasTrainableQuantizer

This class is a base class for trainable Keras quantizers which validates provided quantization config and defines an abstract function which any quantizer needs to implement. It adds to the base quantizer a get_config and from_config functions to enable loading and saving the keras model.

class model_compression_toolkit.trainable_infrastructure.BaseKerasTrainableQuantizer(quantization_config)

This class is a base quantizer which validates provided quantization config and defines an abstract function which any quantizer needs to implement. This class adds to the base quantizer a get_config and from_config functions to enable loading and saving the keras model.

Parameters:

quantization_config – quantizer config class contains all the information about a quantizer configuration.

BasePytorchTrainableQuantizer

This class is a base class for trainable Pytorch quantizers which validates provided quantization config and defines an abstract function which any quantizer needs to implement. It adds to the base quantizer a get_config and from_config functions to enable loading and saving the keras model.

class model_compression_toolkit.trainable_infrastructure.BasePytorchTrainableQuantizer(quantization_config)

This class is a base Pytorch quantizer which validates the provided quantization config and defines an abstract function which any quantizer needs to implement.

Parameters:

quantization_config – quantizer config class contains all the information about the quantizer configuration.

TrainableQuantizerWeightsConfig

This configuration object contains the necessary attributes for configuring a weights trainable quantizer.

class model_compression_toolkit.trainable_infrastructure.TrainableQuantizerWeightsConfig(weights_quantization_method, weights_n_bits, weights_quantization_params, enable_weights_quantization, weights_channels_axis, weights_per_channel_threshold, min_threshold, weights_quantization_candidates=None)

Attributes for configuring weights trainable quantizer.

Parameters:
  • weights_quantization_method (QuantizationMethod) – Which method to use from QuantizationMethod for weights quantization.

  • weights_n_bits (int) – Number of bits to quantize the coefficients.

  • weights_quantization_params (Dict) – Dictionary that contains weights quantization params.

  • enable_weights_quantization (bool) – Whether to quantize the layer’s weights or not.

  • weights_channels_axis (int) – Axis to quantize a node’s kernel when quantizing per-channel.

  • weights_per_channel_threshold (bool) – Whether to quantize the weights per-channel or not (per-tensor).

  • min_threshold (float) – Minimum threshold to use during thresholds selection.

For example, we can set a trainable weights quantizer with the following configuration:

from model_compression_toolkit.target_platform_capabilities.target_platform import QuantizationMethod
from model_compression_toolkit.constants import THRESHOLD, MIN_THRESHOLD

TrainableQuantizerWeightsConfig(weights_quantization_method=QuantizationMethod.SYMMETRIC,
                                           weights_n_bits=8,
                                           weights_quantization_params={THRESHOLD: 2.0},
                                           enable_weights_quantization=True,
                                           weights_channels_axis=3,
                                           weights_per_channel_threshold=True,
                                           min_threshold=MIN_THRESHOLD)

TrainableQuantizerActivationConfig

This configuration object contains the necessary attributes for configuring an activation trainable quantizer.

class model_compression_toolkit.trainable_infrastructure.TrainableQuantizerActivationConfig(activation_quantization_method, activation_n_bits, activation_quantization_params, enable_activation_quantization, min_threshold, activation_quantization_candidates=None)

Attributes for configuring activations trainable quantizer.

Parameters:
  • activation_quantization_method (QuantizationMethod) – Which method to use from QuantizationMethod for activation quantization.

  • activation_n_bits (int) – Number of bits to quantize the activations.

  • activation_quantization_params (Dict) – Dictionary that contains activation quantization params.

  • enable_activation_quantization (bool) – Whether to quantize the layer’s activations or not.

  • min_threshold (float) – Minimum threshold to use during thresholds selection.

For example, we can set a trainable activation quantizer with the following configuration:

from model_compression_toolkit.target_platform_capabilities.target_platform import QuantizationMethod
from model_compression_toolkit.constants import THRESHOLD, MIN_THRESHOLD

TrainableQuantizerActivationConfig(activation_quantization_method=QuantizationMethod.UNIFORM,
                                              activation_n_bits=8,
                                              activation_quantization_params=={THRESHOLD: 2.0},
                                              enable_activation_quantization=True,
                                              min_threshold=MIN_THRESHOLD)