I precisely wish to know what is the difference. I am trying to run a random forest, but it is very very slow. It gets hung on the model.predict() step. My data file has 67795 rows × 19 columns and training set is 23433 rows x 15 columns.
Model compilation means the generation, for a specific ML model, of executable code that computes the model prediction on encrypted data. This includes, among others, the automatic analysis of the model computation graph to find the best cryptographic parameters, to convert floating point computation to integer, and finally, the generation the executable code.
While it is possible to execute Concrete-ML models on non-encrypted data (without compiling) for development purposes, in order to perform secure computation on encrypted data, the model needs to be compiled.
A model that is compiled can be said to “execute in FHE”. When executed this way on encrypted data, the execution time is much greater than execution time on clear data, due to the complexity involved in homomorphic computations.
I would suggest you first measure the execution time on a single data point first, by passing a single row to .predict(...,execute_in_fhe=True).
To improve the runtime performance in FHE of RandomForestClassifier (and other tree ensemble methods) experimentation is key, here are a few things you can try:
reduce the depth of the trees max_depth parameter to 2 or 3.
reduce the number of estimators in the ensemble
reduce the bitwidth of quantization (4-5 bits are the best compromise between accuracy and runtime speed)
This way the execution time for a single data point can be brought down to as low as ~1-2 seconds while maintaining accuracy.
After tweaking with parameters I have been able to at least arrive at a halting for the prediction with a reasonably correct estimate / prediction.
Actually, the n_bits thing is quite straight forward, but it has not yet been added to the documentation (maybe I should see carefully?) so didn’t know if it was even in the constructor. I understand that the project is in its initial phases so I sometimes try to hack around.
A few more questions ; Since I am running from Docker in MacOS, I get the following error.
WARNING: You are currently using the software variant of concrete-csprng which does not have access to a hardware source of randomness. To ensure the security of your application, please arrange to provide a secret by using the concrete_csprng::set_soft_rdseed_secret function.’
This maybe because docker does not have access to /dev/urandom, but I do not know how to give it access to it either. Actually a very cool feature, did not know Apple had a hardware RNG.
Everything else is perfect now. I do have other questions, but I will put them in a different question as they are of different nature.
if you are on macOS, you could use our tools natively, by having pip install concrete-ml. I do that, for example. I have experienced that it was much faster than with Docker.
if you want to continue with Docker, we had had the issue, did an issue to developers, and finally got this workaround:
could you enable the “Use the new Virtualization framework” experimental feature in Preferences and try again? On my Intel laptop I can see rdseed with the new framework.
So yes, the HW is certainly old enough to miss some features they use in the Concrete Core library, to generate random numbers. Remark that this is not an error but a warning. If you want to know more about the details, then it might be better if you create an independent question about this warning in #concrete-lib category.
Happy to see you’re making some progress. Yes sure, please create as many questions as you have. And please star our repo!