How does quantized model work in FHE?

Quantization of Neural Networks for Fully Homomorphic Encryption (zama.ai)

The above link mentions about the quantization technique for FHE Friendly Neural Nets. How is the inference happening in FHE over here?

In this blog there is no FHE inference. It simply address the question of building FHE friendly neural networks.

Now there are actual examples where the neural network runs in FHE. You can find some examples in concrete-ml/use_case_examples at release/0.6.x · zama-ai/concrete-ml · GitHub.

The documentation should explain the basics of how this is done. You can start from Key Concepts - Concrete ML.

If anything is unclear, please do raise it and we will do our best to improve it.

How should I compile my manual defined model say FC_Model in pytorch into an FHE model? Like for the predefined models like LinearRegressor use .compile method, what is the command for user-defined models?

You can use the following:

from concrete.ml.torch.compile import compile_torch_model


quantized_numpy_module = compile_torch_model(
    torch_model,
    torch_input,
    n_bits=n_bits,
)

You can then use this quantized_numpy_module to check if the accuracy matches of the quantized model still matches your requirements. Note that this will quantize your model depending on the number of bits you provide. The lower the bitwidth the faster the FHE inference. However low bitwidth without quantization aware training in your network might result in poor accuracy.

You can find more information: Using Torch - Concrete ML and more examples at Deep Learning Examples - Concrete ML.

Thanks, this is working. When I am trying to do inference using

qunatized_numpy_module.forward_fhe.encrypt_run_decrypt(x_test_quantized)

setting n_bits as 3, it is taking a lot of time to run. Why is this happening?

Extra Info: My pytorch model has one hidden layer with ReLU activation and matrix is like this
2048 x 1024 → 1024 x 10 (2048 input neurons, 1024 hidden and 10 output)

Hello @divyesh , glad to hear it worked!

When using quantized_numpy_module.forward_fhe.encrypt_run_decrypt(x_test_quantized) you launch a FHE evaluation of your model (i.e. encryption → fhe circuit run → decryption), that can take some time but your issue may come from the fact that the first time you run this command it will also generate the keys needed to do the encryption/decryption and FHE operations.
This operation can take some time but only has to be done once.

If you want to time only the inference you can call explicitly quantized_numpy_module.circuit.keygen() before running encrypt_run_decrypt.