Quantization-Aware-Training Tutorial
========================================

What is model Quantization-Aware-Training
--------------------------
In general, the weights and the activation of artificial neural networks are represented by float32; on the other hand, model quantization means use lower precision to represent numbers, such as float16, int8 and uint8.

By reducing the precision we can reduce model size and memory occupancy. What's more, on some devices we can also shorten the inference time.

However, using lower precision instead of float32 will introduces quantization error to the model. Quantization-Aware-Training (QAT) mitigates the quantization errors by simulating quantization effect at training time.

Quantization-Aware-Training with NNabla
-----------------------------------
In nnabla, we divide QAT into two stages, RERORDING and TRAINING.
In RECORDING stage, we will collect and record the dynamic range of each parameter and buffer.
In TRAINING stage, we will insert Quantization&Dequantization node to simulate the quantization effect.

We provide QATScheduler to support Quantization-Aware-Training.
Here is some sample code about QATScheduler.

Create QATScheduler
~~~~~~~~~~~~~~~~~~~~
Firstly, we need to create a QATScheduler. QATScheduler will convert the network automatically to make it support Quantization-Aware-Training.

.. code:: python

    from nnabla.utils.qnn import QATScheduler, QATConfig, PrecisionMode
    # Create training network
    pred = model(image, test=False)
    # Create validation network
    vpred = model(vimage, test=True)
    # configure of QATScheduler
    config = QATConfig()
    config.bn_folding = True
    config.bn_self_folding = True
    config.channel_last = False
    config.precision_mode = PrecisionMode.SIM_QNN
    config.skip_bias = True
    config.niter_to_recording = 1
    config.niter_to_training = steps_per_epoch * 2
    # Create a QATScheduler object
    qat_scheduler = QATScheduler(config=config, solver=solver)
    # register the training network to QATScheduler
    qat_scheduler(pred)
    # register the validation network to QATScheduler
    qat_scheduler(vpred, training=False)

Modify your training loop
~~~~~~~~~~~~~~~~~~~~~~~~~
In general, Your training loop should look like this:

.. code:: python

    for step in range(max_step):
        x, y = dataset.next()
        image.d, label.d = x, y
        loss.forward()
        solver.zero_grad()
        loss.backword()
        solver.weight_decay(weight_decay)
        solver.update()

Compare with the training loop above. You just need to insert a line of code.
Inside the 'step' function, it records your training step and converts the network when the step reaches 'niter_to_recording' or 'niter_to_recording'.

.. code:: python

    for step in range(max_step):
        # Run qat_scheduler step by step
        qat_scheduler.step()
        x, y = dataset.next()
        image.d, label.d = x, y
        loss.forward()
        solver.zero_grad()
        loss.backword()
        solver.weight_decay(weight_decay)
        solver.update()
    # Save the QAT model
    qat_scheduler.save('your_model.nnp', vimage, batch_size=1, deploy=False)

Performance of the quantized model
----------------------------------

================== =============== ================================= ==========================
Model              with BN-folding float32 model validation error(%) QAT model validation error(%)
================== =============== ================================= ==========================
Mobilenetv1        NO              28.05                             27.53
Resnet18           NO              29.66                             28.71
Resnet50           NO              23.46                             23.19
Mobilenetv1        YES             28.05                             27.92
Resnet18           YES             29.66                             28.63
Resnet50           YES             23.46                             23.16
================== =============== ================================= ==========================

Above is some Quantization-Aware-Training experimental results on imagenet dataset.

Deploy the model with TensorRT
----------------------------------

========= =============== ====================== ===================== ====================== =====================
with pow2 with BN-folding maximum absolute error mean absolute error   maximum relative error mean relative error
========= =============== ====================== ===================== ====================== =====================
NO        NO              0.0144                 0.0094                9.9737                 0.0907
NO        YES             4.639e-04              1.577e-04             13.3496                0.0151
YES       NO              0.0378                 0.0287                21.7627                0.0433
YES       YES             4.673e-07              3.465e-07             0.00069                2.588e-08
========= =============== ====================== ===================== ====================== =====================

You can also deploy the NNabla QAT model with other framework by using our :ref:`File_Format_Converter` to convert .nnp to other format.
Above is the comparison between the output of NNabla and the output of TensorRT. The model we used to compare is mobilenetv1.
If you want to deploy the model with TensorRT, we recommend that you enable these options in QATConfig: 'bn_folding', 'bn_self_folding', 'pow2', otherwise, the error between NNabla and TensorRT may become large. See also QATTensorRTConfig (lined).