Training on non quantized models and quantize them later

lorenzo_cevolani · July 11, 2022, 9:25am

In your presentation you said that the model needs to be quantized in order to e made FHE.
FHE is slowing also down the execution time for the NN, which makes training longer, since there are much many more flops there than in other parts of the pipeline.
Would it be possible to execute the training in floating point and then quantize the model? assuming that the training is done in a completely isolated environment, which is possible in Sagemaker for example, then simply quantize the model afterwards.
Would it be possible using your libraries?

andrei-stoian-zama · July 11, 2022, 7:15pm

Concrete-ML allows the user to train the model on non-encrypted clear data and compile it to FHE.

Currently, in Concrete-ML 0.2, the approach is to perform pure float32 training and then quantize the weights and activations post-training. This can work well for linear models or very small neural networks, but does not scale well to bigger ones. I think this is approach that you suggest and indeed you could train the model with SageMaker in this case, if you can export it to ONNX Concrete-ML will be able to import it (in the next release, coming soon).

Also in the next release, it will also be possible to use quantization aware training (QAT). While QAT is a bit slower than pure float32 training, it is at worst, twice as slow, which is not a big factor. However, QAT allows to have excellent accuracy while obeying FHE constraints.

benoit · September 7, 2022, 9:53am

New version of Concrete-ML (v0.3): now, support for QAT is added, https://docs.zama.ai/concrete-ml/deep-learning/torch_support. This support will be simplified in the next release (expected in October).