target_platform Module

MCT can be configured to quantize and optimize models for different hardware settings. For example, when using qnnpack backend for Pytorch model inference, Pytorch quantization configuration uses per-tensor weights quantization for Conv2d, while when using tflite modeling, Tensorflow uses per-channel weights quantization for Conv2D.

This can be addressed in MCT by using the target_platform module, that can configure different parameters that are hardware-related, and the optimization process will use this to optimize the model accordingly. Models for IMX500, TFLite and qnnpack can be observed here, and can be used using get_target_platform_capabilities function.


Note

For now, some fields of OpQuantizationConfig are ignored during the optimization process such as quantization_preserving, fixed_scale, and fixed_zero_point.


The object MCT should get called TargetPlatformCapabilities (or shortly TPC). This diagram demonstrates the main components:

../../../images/tpc.jpg

Now, we will detail about the different components.

QuantizationMethod

class model_compression_toolkit.target_platform.QuantizationMethod(value)

Method for quantization function selection:

POWER_OF_TWO - Symmetric, uniform, threshold is power of two quantization.

KMEANS - k-means quantization.

LUT_POT_QUANTIZER - quantization using a lookup table and power of 2 threshold.

SYMMETRIC - Symmetric, uniform, quantization.

UNIFORM - uniform quantization,

LUT_SYM_QUANTIZER - quantization using a lookup table and symmetric threshold.

OpQuantizationConfig

class model_compression_toolkit.target_platform.OpQuantizationConfig(default_weight_attr_config, attr_weights_configs_mapping, activation_quantization_method, activation_n_bits, enable_activation_quantization, quantization_preserving, fixed_scale, fixed_zero_point, simd_size)

OpQuantizationConfig is a class to configure the quantization parameters of an operator.

Parameters:
  • default_weight_attr_config (AttributeQuantizationConfig) – A default attribute quantization configuration for the operation.

  • attr_weights_configs_mapping (Dict[str, AttributeQuantizationConfig]) – A mapping between an op attribute name and its quantization configuration.

  • activation_quantization_method (QuantizationMethod) – Which method to use from QuantizationMethod for activation quantization.

  • activation_n_bits (int) – Number of bits to quantize the activations.

  • enable_activation_quantization (bool) – Whether to quantize the model activations or not.

  • quantization_preserving (bool) – Whether quantization parameters should be the same for an operator’s input and output.

  • fixed_scale (float) – Scale to use for an operator quantization parameters.

  • fixed_zero_point (int) – Zero-point to use for an operator quantization parameters.

  • simd_size (int) – Per op integer representing the Single Instruction, Multiple Data (SIMD) width of an operator. It indicates the number of data elements that can be fetched and processed simultaneously in a single instruction.

AttributeQuantizationConfig

class model_compression_toolkit.target_platform.AttributeQuantizationConfig(weights_quantization_method=QuantizationMethod.POWER_OF_TWO, weights_n_bits=FLOAT_BITWIDTH, weights_per_channel_threshold=False, enable_weights_quantization=False, lut_values_bitwidth=None)

Hold the quantization configuration of a weight attribute of a layer.

Initializes an attribute quantization config.

Parameters:
  • weights_quantization_method (QuantizationMethod) – Which method to use from QuantizationMethod for weights quantization.

  • weights_n_bits (int) – Number of bits to quantize the coefficients.

  • weights_per_channel_threshold (bool) – Whether to quantize the weights per-channel or not (per-tensor).

  • enable_weights_quantization (bool) – Whether to quantize the model weights or not.

  • lut_values_bitwidth (int) – Number of bits to use when quantizing in look-up-table.

QuantizationConfigOptions

class model_compression_toolkit.target_platform.QuantizationConfigOptions(quantization_config_list, base_config=None)

Wrap a set of quantization configurations to consider during the quantization of an operator.

Parameters:
  • quantization_config_list (List[OpQuantizationConfig]) – List of possible OpQuantizationConfig to gather.

  • base_config (OpQuantizationConfig) – Fallback OpQuantizationConfig to use when optimizing the model in a non mixed-precision manner.

TargetPlatformModel

class model_compression_toolkit.target_platform.TargetPlatformModel(default_qco, name='default_tp_model')

Modeling of the hardware the quantized model will use during inference. The model contains definition of operators, quantization configurations of them, and fusing patterns so that multiple operators will be combined into a single operator.

Parameters:
  • default_qco (QuantizationConfigOptions) – Default QuantizationConfigOptions to use for operators that their QuantizationConfigOptions are not defined in the model.

  • name (str) – Name of the model.

OperatorsSet

class model_compression_toolkit.target_platform.OperatorsSet(name, qc_options=None)

Set of operators that are represented by a unique label.

Parameters:
  • name (str) – Set’s label (must be unique in a TargetPlatformModel).

  • qc_options (QuantizationConfigOptions) – Configuration options to use for this set of operations.

Fusing

class model_compression_toolkit.target_platform.Fusing(operator_groups_list, name=None)

Fusing defines a list of operators that should be combined and treated as a single operator, hence no quantization is applied between them.

Parameters:
  • operator_groups_list (List[Union[OperatorsSet, OperatorSetConcat]]) – A list of operator groups, each being either an OperatorSetConcat or an OperatorsSet.

  • name (str) – The name for the Fusing instance. If not provided, it’s generated from the operator groups’ names.

OperatorSetConcat

class model_compression_toolkit.target_platform.OperatorSetConcat(*opsets)

Concatenate a list of operator sets to treat them similarly in different places (like fusing).

Group a list of operation sets.

Parameters:

*opsets (OperatorsSet) – List of operator sets to group.

OperationsToLayers

class model_compression_toolkit.target_platform.OperationsToLayers(op_sets_to_layers=None)

Gather multiple OperationsSetToLayers to represent mapping of framework’s layers to TargetPlatformModel OperatorsSet.

Parameters:

op_sets_to_layers (List[OperationsSetToLayers]) – List of OperationsSetToLayers where each of them maps an OperatorsSet name to a list of layers that represents the OperatorsSet.

OperationsSetToLayers

class model_compression_toolkit.target_platform.OperationsSetToLayers(op_set_name, layers, attr_mapping=None)

Associate an OperatorsSet to a list of framework’s layers.

Parameters:
  • op_set_name (str) – Name of OperatorsSet to associate with layers.

  • layers (List[Any]) – List of layers/FilterLayerParams to associate with OperatorsSet.

  • attr_mapping (Dict[str, DefaultDict]) – A mapping between a general attribute name to a DefaultDict that maps a layer type to the layer’s framework name of this attribute.

LayerFilterParams

class model_compression_toolkit.target_platform.LayerFilterParams(layer, *conditions, **kwargs)

Wrap a layer with filters to filter framework’s layers by their attributes.

Parameters:
  • layer – Layer to match when filtering.

  • *conditions (AttributeFilter) – List of conditions to satisfy.

  • **kwargs – Keyword arguments to filter layers according to.

More filters and usage examples are detailed here.

TargetPlatformCapabilities

class model_compression_toolkit.target_platform.TargetPlatformCapabilities(tp_model, name='base', version=None)

Attach framework information to a modeled hardware.

Parameters:
  • tp_model (TargetPlatformModel) – Modeled hardware to attach framework information to.

  • name (str) – Name of the TargetPlatformCapabilities.

  • version (str) – TPC version.