target_platform_capabilities Module¶

MCT can be configured to quantize and optimize models for different hardware settings. For example, when using qnnpack backend for Pytorch model inference, Pytorch quantization configuration uses per-tensor weights quantization for Conv2d, while when using tflite modeling, Tensorflow uses per-channel weights quantization for Conv2D.

This can be addressed in MCT by using the target_platform_capabilities module, that can configure different parameters that are hardware-related, and the optimization process will use this to optimize the model accordingly. Models for IMX500, TFLite and qnnpack can be observed here, and can be used using get_target_platform_capabilities function.

Note

For now, some fields of OpQuantizationConfig are ignored during the optimization process such as quantization_preserving, fixed_scale, and fixed_zero_point.

MCT will use more information from OpQuantizationConfig, in the future.

The object MCT should get called TargetPlatformCapabilities (or shortly TPC). This diagram demonstrates the main components:

Now, we will detail about the different components.

QuantizationMethod¶

class model_compression_toolkit.target_platform_capabilities.QuantizationMethod(value)¶

Method for quantization function selection:

POWER_OF_TWO - Symmetric, uniform, threshold is power of two quantization.

LUT_POT_QUANTIZER - quantization using a lookup table and power of 2 threshold.

SYMMETRIC - Symmetric, uniform, quantization.

UNIFORM - uniform quantization,

LUT_SYM_QUANTIZER - quantization using a lookup table and symmetric threshold.

OpQuantizationConfig¶

class model_compression_toolkit.target_platform_capabilities.schema.mct_current_schema.OpQuantizationConfig(**data)¶

OpQuantizationConfig is a class to configure the quantization parameters of an operator.

Parameters:

default_weight_attr_config (AttributeQuantizationConfig) – A default attribute quantization configuration for the operation.
attr_weights_configs_mapping (Dict[str, AttributeQuantizationConfig]) – A mapping between an op attribute name and its quantization configuration.
activation_quantization_method (QuantizationMethod) – Which method to use from QuantizationMethod for activation quantization.
activation_n_bits (int) – Number of bits to quantize the activations.
supported_input_activation_n_bits (Union[int, Tuple[int, ...]]) – Number of bits that operator accepts as input.
enable_activation_quantization (bool) – Whether to quantize the model activations or not.
quantization_preserving (bool) – Whether quantization parameters should be the same for an operator’s input and output.
fixed_scale (Optional[float]) – Scale to use for an operator quantization parameters.
fixed_zero_point (Optional[int]) – Zero-point to use for an operator quantization parameters.
simd_size (Optional[int]) – Per op integer representing the Single Instruction, Multiple Data (SIMD) width of an operator. It indicates the number of data elements that can be fetched and processed simultaneously in a single instruction.
signedness (Signedness) – Set activation quantization signedness.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

AttributeQuantizationConfig¶

class model_compression_toolkit.target_platform_capabilities.schema.mct_current_schema.AttributeQuantizationConfig(**data)¶

Holds the quantization configuration of a weight attribute of a layer.

weights_quantization_method¶

The method to use from QuantizationMethod for weights quantization.

Type:: QuantizationMethod

weights_n_bits¶

Number of bits to quantize the coefficients.

Type:: int

weights_per_channel_threshold¶

Indicates whether to quantize the weights per-channel or per-tensor.

Type:: bool

enable_weights_quantization¶

Indicates whether to quantize the model weights or not.

Type:: bool

lut_values_bitwidth¶

Number of bits to use when quantizing in a look-up table. If None, defaults to 8 in hptq; otherwise, it uses the provided value.

Type:: Optional[int]

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

QuantizationConfigOptions¶

class model_compression_toolkit.target_platform_capabilities.schema.mct_current_schema.QuantizationConfigOptions(**data)¶

QuantizationConfigOptions wraps a set of quantization configurations to consider during the quantization of an operator.

quantization_configurations¶

Tuple of possible OpQuantizationConfig to gather.

Type:: Tuple[OpQuantizationConfig, …]

base_config¶

Fallback OpQuantizationConfig to use when optimizing the model in a non-mixed-precision manner.

Type:: Optional[OpQuantizationConfig]

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

TargetPlatformCapabilities¶

class model_compression_toolkit.target_platform_capabilities.schema.mct_current_schema.TargetPlatformCapabilities(**data)¶

Represents the hardware configuration used for quantized model inference.

default_qco¶

Default quantization configuration options for the model.

Type:: QuantizationConfigOptions

operator_set¶

Tuple of operator sets within the model.

Type:: Optional[Tuple[OperatorsSet, …]]

fusing_patterns¶

Tuple of fusing patterns for the model.

Type:: Optional[Tuple[Fusing, …]]

tpc_minor_version¶

Minor version of the Target Platform Configuration.

Type:: Optional[int]

tpc_patch_version¶

Patch version of the Target Platform Configuration.

Type:: Optional[int]

tpc_platform_type¶

Type of the platform for the Target Platform Configuration.

Type:: Optional[str]

add_metadata¶

Flag to determine if metadata should be added.

Type:: bool

name¶

Name of the Target Platform Model.

Type:: str

is_simd_padding¶

Indicates if SIMD padding is applied.

Type:: bool

SCHEMA_VERSION¶

Version of the schema for the Target Platform Model.

Type:: int

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

OperatorsSet¶

class model_compression_toolkit.target_platform_capabilities.schema.mct_current_schema.OperatorsSet(**data)¶

Set of operators that are represented by a unique label.

name¶

The set’s label (must be unique within a TargetPlatformCapabilities).

Type:: Union[str, OperatorSetNames]

qc_options¶

Configuration options to use for this set of operations. If None, it represents a fusing set.

Type:: Optional[QuantizationConfigOptions]

type¶

Fixed type identifier.

Type:: Literal[“OperatorsSet”]

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Fusing¶

class model_compression_toolkit.target_platform_capabilities.schema.mct_current_schema.Fusing(**data)¶

Fusing defines a tuple of operators that should be combined and treated as a single operator, hence no quantization is applied between them.

operator_groups¶

A tuple of operator groups, each being either an OperatorSetGroup or an OperatorsSet.

Type:: Tuple[Union[OperatorsSet, OperatorSetGroup], …]

name¶

The name for the Fusing instance. If not provided, it is generated from the operator groups’ names.

Type:: Optional[str]

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

OperatorSetGroup¶

class model_compression_toolkit.target_platform_capabilities.schema.mct_current_schema.OperatorSetGroup(**data)¶

Concatenate a tuple of operator sets to treat them similarly in different places (like fusing).

operators_set¶

Tuple of operator sets to group.

Type:: Tuple[OperatorsSet, …]

name¶

Concatenated name generated from the names of the operator sets.

Type:: Optional[str]

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.