Keras Post Training Quantization

model_compression_toolkit.ptq.keras_post_training_quantization(in_model, representative_data_gen, target_resource_utilization=None, core_config=CoreConfig(), target_platform_capabilities=DEFAULT_KERAS_TPC)

Quantize a trained Keras model using post-training quantization. The model is quantized using a symmetric constraint quantization thresholds (power of two). The model is first optimized using several transformations (e.g. BatchNormalization folding to preceding layers). Then, using a given dataset, statistics (e.g. min/max, histogram, etc.) are being collected for each layer’s output (and input, depends on the quantization configuration). For each possible bit width (per layer) a threshold is then being calculated using the collected statistics. Then, if given a mixed precision config in the core_config, using an ILP solver we find a mixed-precision configuration, and set a bit-width for each layer. The model is then quantized (both coefficients and activations by default). In order to limit the maximal model’s size, a target ResourceUtilization need to be passed after weights_memory is set (in bytes).

Parameters:
  • in_model (Model) – Keras model to quantize.

  • representative_data_gen (Callable) – Dataset used for calibration.

  • target_resource_utilization (ResourceUtilization) – ResourceUtilization object to limit the search of the mixed-precision configuration as desired.

  • core_config (CoreConfig) – Configuration object containing parameters of how the model should be quantized, including mixed precision parameters.

  • target_platform_capabilities (TargetPlatformCapabilities) – TargetPlatformCapabilities to optimize the Keras model according to.

Returns:

A quantized model and information the user may need to handle the quantized model.

Examples

Import MCT:

>>> import model_compression_toolkit as mct

Import a Keras model:

>>> from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2
>>> model = MobileNetV2()

Create a random dataset generator, for required number of calibration iterations (num_calibration_batches): In this example a random dataset of 10 batches each containing 4 images is used.

>>> import numpy as np
>>> num_calibration_batches = 10
>>> def repr_datagen():
>>>     for _ in range(num_calibration_batches):
>>>         yield [np.random.random((4, 224, 224, 3))]

Create a MCT core config, containing the quantization configuration:

>>> config = mct.core.CoreConfig()

If mixed precision is desired, create a MCT core config with a mixed-precision configuration, to quantize a model with different bitwidths for different layers. The candidates bitwidth for quantization should be defined in the target platform model. In this example we use 1 image to search mixed-precision configuration:

>>> config = mct.core.CoreConfig(mixed_precision_config=mct.core.MixedPrecisionQuantizationConfig(num_of_images=1))

For mixed-precision set a target ResourceUtilization object: Create a ResourceUtilization object to limit our returned model’s size. Note that this value affects only coefficients that should be quantized (for example, the kernel of Conv2D in Keras will be affected by this value, while the bias will not):

>>> ru = mct.core.ResourceUtilization(model.count_params() * 0.75)  # About 0.75 of the model size when quantized with 8 bits.

Pass the model, the representative dataset generator, the configuration and the target resource utilization to get a quantized model:

>>> quantized_model, quantization_info = mct.ptq.keras_post_training_quantization(model, repr_datagen, ru, core_config=config)

For more configuration options, please take a look at our API documentation.