Hi to everyone,
Maybe I’m not understanding how Concrete works, but I wanted to know if it is possibile:
to have a CNN in torch trained with my samples (made my myself without use Concrete-ML) and after this phase, convert the model in a FHE version without retraining.
I’m asking because I need to convert it in FHE version having a very small subset of the training. So it’s impossible for me retraining it with all the samples.
Thank you for your interest in CML, and the answer is YES!
To start with, Concrete-ML uses Fully Homomorphic Encryption (FHE) for privacy-preserving machine learning inference only. The library is very similar to the familiar Scikit-learn and PyTorch APIs, and you don’t need any knowledge in cryptography to use it. Therefore, you should not feel out of place.
However, to use FHE, models must be FHE-compatible, in other words:
the model must be quantized beforehand because FHE only operates on integers. The most popular approaches for quantization are post-training quantization (PTO) and quantization-aware training (QAT). For custom neural networks, we highly recommend the QAT approach, which is the most efficient method to achieve good results. Tutorials are available in advanced_examples and use_case_examples.
the accumulator bit-width must be kept as low as possible, otherwise, FHE inference computation becomes very costly
In your case, you need to quantize your model through PTQ or QAT. We have a complete tutorial that shows how to make your custom PyTorch network FHE-compatible in QAT with the Brevitas framework.
In summary, you should create an equivalent Brevitas model, assign the pre-trained weights, and then fine-tune your quantized network.
The compilation of the model and the inference are performed, as follows:
from concrete.ml.torch.compile import compile_brevitas_qat_model
qmodel = compile_brevitas_qat_model(
# Quantized Pytorch model using Brevitas
torch_model=custom_qnn,
# Representative data-set for compilation and calibration
torch_inputset=x_calib
)
# Check the maximum bit-width of your model
qmodel.fhe_circuit.graph.maximum_integer_bit_width()
# Inference using FHE simulation, it's faster than actual FHE execution
# test_data must be an numpy array
prediction = quantized_module.forward(test_data, fhe="simulate")
# Inference using FHE execution
prediction = quantized_module.forward(test_data, fhe="execute")
Please feel free to reach out to us if you encounter any issues.