FHE reproducibility of results

michela_polito · March 8, 2025, 2:15pm

Hello, I’m trying to evaluate the accuracy of several hybrid models on simulation mode.
I’m using Cifar-10 and evaluating of the whole test set (10,000 images, 10 batches of 1,000 images). Even thought I set several seeds (torch, numpy, random, cuda seed), the resulting accuracy is slightly different at each run.
Recently, I found this post about seed experiments (can’t put links but the title is seed experiments if I remember right) where someone responded with:
“However, if you want to have consistent FHE execution behavior (i.e., circuit.run(args)), it’s not possible at the moment.”
This was in 2022, is it still the case? Is it still not possible to obtain the same result on every experiment or did I misunderstand?

jfrery · March 10, 2025, 9:29am

Hi @michela_polito,

How are you running cifar using the hybrid model?

If you are running cifar as we do in our use cases then you are not running within the hybrid model.

When you use actual FHE execution or the simulation, there is some noise coming from the FHE operations that we cannot seed. In Concrete ML the main parameter to control that noise is the p_error which you can pass to the compile_torch_model method.

If you force this to a low value e.g. 2^-40, you should be able to reproduce the same accuracy every time. This could increase the actual FHE execution time.

michela_polito · March 10, 2025, 9:51am

Hi @jfrery, thanks for answering!
I actually started from your use case, but I custom made the model to adapt it to the hybrid model (2 sequential blocks, 1 for client and 1 for server).
The values of p_error I’m considering are 1^-3, 1^-4, and 1^-5. Are the results inconsistent because they’re too high?

jfrery · March 10, 2025, 10:26am

Noise is always present in FHE operations. However, with a very p_error I would expect the accuracy to be fairly stable.

A good sanity check would be to set the p_error very low and see if the simulation gives you the same accuracy every time. It should. The you can increase the p_error slowly until the variance of the accuracy is on that fits your constraints.

michela_polito · March 10, 2025, 5:03pm

I run again the code with 10^-10 and the accuracy is stable now (difference is only 0.01 in 3 runs), thanks!