Training on encrypted data

Hi,

I’m confused. Is it possible to train on encrypted data?
I’m only seeing things about inference.

1 Like

Hello Vanessa,

Indeed, we train only on non-encrypted data. Currently, Concrete ML is mainly to protect the privacy of inferences. To handle training on protected data, there are more adapted other technologies than FHE (differential privacy, multi party computation).

3 Likes

Do you plan to develop something to training on protected data using FHE in the future? Would that intrinsically give bad results?
Can you explain the main problem about it?

1 Like

We never know what we’ll do in the future, but I guess training on encrypted data is not the first priority of Zama. It could be very slow and with the 8b of precision, this would not be easy. We have advanced research projects to look at the subject.

2 Likes

Are there any updated on this feature?

Imho encrypted training data would be an awesome and necessary feature in order to make concrete ML rock

Hello @Marvin . Thanks for your interest. For now, the feature is still in the roadmap but not our current priority. Yes agreed, it would be awesome, but we have to decide what we focus on, at Zama, since we’re a small team.

By the way, if you want to contribute, this would be an awesome bounty! Look at our program, if you’re interested.

Hey @benoit. Do you have any update on this topic? Thanks

Hi, thanks for your continued interest in Concrete ML!

We can now train Logistic Regression on encrypted data. See how to do it here: Encrypted training | 1.4 | Concrete ML

Thanks. Do you have a development/research roadmap on that?

We have plans to improve this feature, yes. What use-case are you most interested in?

I am interesting in the speed benchmarks and the cost of training on encrypted data using current version of Concrete ML.

1 Like

The speed and cost will depend a lot on the dataset you’re training on. We tested on several datasets and we report some figures in this paper and in this blog post.

The complexity, and thus the cost of compute, is linear in the number of parameters to train and the dataset size, so it should be fairly easy to extrapolate.

If you’re looking to deploy this for commercial purposes, drop us a line at hello@zama.ai.

2 Likes