# Post-Training Quantization Example of MobileNetV2 Keras Model

[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/imx500_notebooks/keras/example_keras_mobilenetv2_for_imx500.ipynb)

## Overview

This tutorial demonstrates a pre-trained model quantization using the **Model Compression Toolkit (MCT)**. 

It is done using the MCT's **Post-Training Quantization** tool. 

As we will see, post-training quantization is a low complexity yet effective quantization scheme. 

In this example, we quantize the model and evaluate the accuracy before and after quantization.

## Summary

In this tutorial we cover the following subjects:

1. Post-Training Quantization using MCT.
2. Loading and preprocessing ImageNet's validation dataset.
3. Constructing an unlabeled representative dataset.
4. Accuracy evaluation of the floating-point and the quantized models.

## Setup

Install and import the relevant packages:


In [None]:
!pip install -q tensorflow

import importlib
if not importlib.util.find_spec('model_compression_toolkit'):
    !pip install model_compression_toolkit

In [None]:
import tensorflow as tf
import keras
import model_compression_toolkit as mct
import os

## Dataset preparation

Download ImageNet dataset with only the validation split.

**Note** that for demonstration purposes we use the validation set for the model quantization routines. Usually, a subset of the training dataset is used, but loading it is a heavy procedure that is unnecessary for the sake of this demonstration.

This step may take several minutes...

In [None]:
if not os.path.isdir('imagenet'):
    !mkdir imagenet
    !wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_devkit_t12.tar.gz
    !mv ILSVRC2012_devkit_t12.tar.gz imagenet/
    !wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar
    !mv ILSVRC2012_img_val.tar imagenet/

Extract ImageNet validation dataset using torchvision "datasets" module

In [None]:
if not os.path.isdir('imagenet/val'):
    import torchvision
    ds = torchvision.datasets.ImageNet(root='./imagenet', split='val')

Define the required preprocessing method for the pretrained model

In [None]:
def imagenet_preprocess_input(images, labels):
    return tf.keras.applications.mobilenet_v2.preprocess_input(images), labels

### Representative dataset construction
We show how to create a generator for the representative dataset, which is required for post-training quantization.

The representative dataset is used for collecting statistics on the inference outputs of all layers in the model.
 
In order to decide on the size of the representative dataset, we configure the batch size and the number of calibration iterations.
This gives us the total number of samples that will be used during PTQ (batch_size x n_iter).
In this example we set `batch_size = 50` and `n_iter = 10`, resulting in a total of 500 representative images.

Please ensure that the dataset path has been set correctly.

In [None]:
from typing import Generator

REPRESENTATIVE_DATASET_FOLDER = './imagenet/val'
BATCH_SIZE = 50
n_iter=10

# Create representative dataset generator
def get_representative_dataset() -> Generator:
    """A function that loads the dataset and returns a representative dataset generator.

    Returns:
        Generator: A generator yielding batches of preprocessed images.
    """

    # Load the dataset from folder
    print('loading dataset, this may take few minutes ...')    
    dataset = tf.keras.utils.image_dataset_from_directory(
        directory=REPRESENTATIVE_DATASET_FOLDER,
        batch_size=BATCH_SIZE,
        image_size=[224, 224],
        shuffle=True,
        crop_to_aspect_ratio=True,
        interpolation='bilinear')  
    # Preprocess the data
    dataset = dataset.map(lambda x, y: (imagenet_preprocess_input(x, y)))

    def representative_dataset() -> Generator:
        """A generator function that yields batch of preprocessed images.

        Yields:
            A batch of preprocessed images.
        """
        for _ in range(n_iter):
            yield dataset.take(1).get_single_element()[0].numpy()

    return representative_dataset

# Create a representative dataset generator
representative_dataset_gen = get_representative_dataset()

## Model Post-Training quantization using MCT

This is the main part in which we quantize our model.

First, we load a pre-trained MobileNetV2 model from Keras, in 32-bits floating-point precision format.

In [None]:
from keras.applications.mobilenet_v2 import MobileNetV2
float_model = MobileNetV2()

Next, we need to define a `TargetPlatformCapability` object, representing the HW specifications on which we wish to eventually deploy our quantized model.

In addition, we need to define the Quantization Configuration for our PTQ routine.

Here, we demonstrate how to define a quantization configuration with several key argument that can be controlled by the user.
**Note** that you can skip this part if you prefer to use the default quantization settings.

In [None]:
from model_compression_toolkit.core import QuantizationErrorMethod

# Specify the IMX500-v1 target platform capability (TPC) 
tpc = mct.get_target_platform_capabilities("tensorflow", 'imx500', target_platform_version='v1')

# Set the following quantization configurations:
# Choose the desired QuantizationErrorMethod for the quantization parameters search.
# Enable weights bias correction induced by quantization.
# Enable shift negative corrections for improving 'signed' non-linear functions quantization (such as swish, prelu, etc.) 
# Set the threshold to filter outliers with z_score of 16. 
q_config = mct.core.QuantizationConfig(activation_error_method=QuantizationErrorMethod.MSE,
                                       weights_error_method=QuantizationErrorMethod.MSE,
                                       weights_bias_correction=True,
                                       shift_negative_activation_correction=True,
                                       z_threshold=16)

ptq_config = mct.core.CoreConfig(quantization_config=q_config)

### Run model Post-Training Quantization
Lastly, we quantize our model using MCT's post-training quantization API.

In [None]:
quantized_model, quantization_info = mct.ptq.keras_post_training_quantization(
    in_model=float_model, 
    representative_data_gen=representative_dataset_gen, 
    core_config=ptq_config, 
    target_platform_capabilities=tpc)

That's it! Our model is now quantized.

## Models evaluation

In order to evaluate our models, we first need to load the validation dataset. As before, please ensure that the dataset path has been set correctly.

In [None]:
TEST_DATASET_FOLDER = './imagenet/val'
def get_validation_dataset() -> tf.data.Dataset:
    """Load the validation dataset for evaluation.

    Returns:
        tf.data.Dataset: The validation dataset.
    """
    dataset = tf.keras.utils.image_dataset_from_directory(
        directory=TEST_DATASET_FOLDER,
        batch_size=BATCH_SIZE,
        image_size=[224, 224],
        shuffle=False,
        crop_to_aspect_ratio=True,
        interpolation='bilinear')
    dataset = dataset.map(lambda x, y: (imagenet_preprocess_input(x, y)))
    return dataset

evaluation_dataset = get_validation_dataset()

Let's start with the floating-point model evaluation.

We need to compile the model before evaluation and set the loss and the evaluation metric:

In [None]:
float_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics=["accuracy"])
results = float_model.evaluate(evaluation_dataset)

Finally, let's evaluate the quantized model:

In [None]:
quantized_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics=["accuracy"])
results = quantized_model.evaluate(evaluation_dataset)

You can see that we got a very small degradation with a compression rate of x4 !

Now, we can export the model to Keras and TFLite. Please ensure that the `save_model_path` has been set correctly.

In [None]:
mct.exporter.keras_export_model(model=quantized_model, save_model_path='./qmodel.tflite',
                                serialization_format=mct.exporter.KerasExportSerializationFormat.TFLITE, quantization_format=mct.exporter.QuantizationFormat.FAKELY_QUANT)

mct.exporter.keras_export_model(model=quantized_model, save_model_path='./qmodel.keras')

## Conclusion

In this tutorial, we demonstrated how to quantize a pre-trained model using MCT with a few lines of code. We saw that we can achieve an x4 compression ratio with minimal performance degradation.





Copyright 2022 Sony Semiconductor Israel, Inc. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
