Large Evaluation Keys [from Discord]

Discord_everyone Hi Concrete-ML Fellows :call_me_hand:
Is that “normal” that generating private and evaluation_keys makes a file of 1,05 Go?

Hello <@734775961542852701> ,
Depending on the complexity of your model the key might be quite heavy indeed.
What could also happen is that you have multiple keys cached.
We are working on making keys smaller for transfer purposes.

Thx <@97934670632595456> :v: , so nothing related to the volume of data-points (as train_input) being used at compiling phase, then?
What is considered as complex model in the context of compilation for FHE? (Out of compilation context my models are simple.).

Indeed, nothing related to data points!! (#tested)

An additional question: Where the “multiple keys cached” could be located?

Here are what I’ve got:

├── large_number
│ └── 0_0
│ ├── ksKey_0
│ ├── pbsKey_0
│ ├── pksKey_0
│ ├── secretKey_0
│ └── secretKey_1
└── client.zip

ksKey_0 is 181 Mo
pbsKey_0 is 433 Mo
pksKey_0 is 453 Mo
secretKey_0 is 16 Ko
secretKey_1 is 6 Ko

The reason why I asking all these questions is thatFHEModelServer.run(encrypted_input, serialized_evaluation_keys) keeps crashing
and it appears that inside run() funct thisresult = self.server.run(deserialized_encrypted_quantized_data, deserialized_evaluation_keys) crashes

As you rightly observed the number of data-points does not influence directly the size of the keys.
The size of the input set influences the compile time (more points == longer compile time) but a “big enough” dataset is needed to compile as it is used to determine the bit-width of everything in the FHE circuit (indirectly influencing the crypto-parameters and thus the size of the keys).
They are multiple factors that come into play when considering the complexity a FHE circuit but mostly: the number of programmable bootstrappings and the bit with of such PBs.
I don’t think you have a key-cache issue here indeed, just heavy keys.
Your issue is probably not due to the size of you keys but the complexity of your model.
Some data that is creating in your FHE computation is probably exhausting the RAM of your machine.
I can’t really say more without having a look at the circuit you are running.
We’ll gladly take a look a what is happening in your situtation if you are able to share some code to replicate it! :smile:

Many thanks <@97934670632595456> !!

I appreciate your digest. I need to dissect the details of compiling and FHE circuit.
RAM limitation was one of my hypothesis, interesting to have a convergence on it. I’ll change the architecture, then.
All the code will be public if the Zama Team accepts the app.
Cheers :raised_hands:

Hi <@97934670632595456> Hope you’re okay!??

May I have a quick feedback based on your FHE experience?

Immediat prediction in FHE simulation VS over 10 minutes in FHE production, is it plausible?
I know that at the serialization step (while running FHEModelServer) it never ends.

Hello, <@734775961542852701> I’m doing good and you?
Simulation is done only to check that with quantization and p-error the inference of your model on the test set does not diverge too much, but indeed simulation will always be way faster.
The serialization step can indeed take some time but it usually isn’t the bottleneck.
What should take time is the FHE computation, and sometimes the network transfer when sending a big cipher-text to the FHE server.
We have teams at Zama working on both making FHE computation faster and methods to compress cipher-texts for transfer!

If you can share with us the shape of your inputs we could take a look at why the serialization is taking so long on your system

Hi and Thx <@97934670632595456>

I’ve made a mistake, it is when running deserialized encrypted quantized data with deserialization eval keys rather than serialization.

Details:
At concrete.ml.deployment.fhe_client_server >> run() method within Class FHEModelServer

result = self.server.run( deserialized_encrypted_quantized_data, deserialized_evaluation_keys )

Not sure to understand your message, you are saying that you have an error in your code that you are working on?

Nope, I’ve not been clear, my bad!!

Recall of the problem:

# Run the model over encrypted data serialized_result = fhemodel_server.run(serialized_qx_new_encrypted, serialized_evaluation_keys)

The machine runs forever.
At the beginning, I thought it was at serialization step that the bottleneck happened (more precisely at

serialized_result = self.server.client_specs.serialize_public_result(result)) .

It actually arrives one step earlier at

result = self.server.run(deserialized_encrypted_quantized_data, deserialized_evaluation_keys )

I cannot see deserialized_encrypted_quantized_data & deserialized_evaluation_keys but perhaps the initial bytes from serialized_qx_new_encrypted = fhemodel_client.quantize_encrypt_serialize(x_test) is too huge and so the keys.

My new configuration is Ubuntu 20.04 with 8 CPU 32Go Ram

Are you running a deep neural network by any chance?
FHE circuits can still take some time to compute if too big.
You would benefit from running it on a bigger compute machine like a m6i.metal from AWS, since PBSs are parallelized in Concrete

<@97934670632595456> Thx for your reply
Yes it is based on deep neural network
I’ve found the equivalent at another cloud supplier.

Stream closed!