Training on encrypted data

Vanessa · April 15, 2022, 12:36pm

Hi,

I’m confused. Is it possible to train on encrypted data?
I’m only seeing things about inference.

benoit · April 15, 2022, 12:46pm

Hello Vanessa,

Indeed, we train only on non-encrypted data. Currently, Concrete ML is mainly to protect the privacy of inferences. To handle training on protected data, there are more adapted other technologies than FHE (differential privacy, multi party computation).

Vanessa · April 16, 2022, 1:22pm

Do you plan to develop something to training on protected data using FHE in the future? Would that intrinsically give bad results?
Can you explain the main problem about it?

benoit · April 19, 2022, 7:40am

We never know what we’ll do in the future, but I guess training on encrypted data is not the first priority of Zama. It could be very slow and with the 8b of precision, this would not be easy. We have advanced research projects to look at the subject.

Marvin · May 4, 2023, 10:12am

Are there any updated on this feature?

Imho encrypted training data would be an awesome and necessary feature in order to make concrete ML rock

benoit · May 4, 2023, 10:23am

Hello @Marvin . Thanks for your interest. For now, the feature is still in the roadmap but not our current priority. Yes agreed, it would be awesome, but we have to decide what we focus on, at Zama, since we’re a small team.

By the way, if you want to contribute, this would be an awesome bounty! Look at our program, if you’re interested.

turbofakesmile · April 3, 2024, 1:13pm

Hey @benoit. Do you have any update on this topic? Thanks

andrei-stoian-zama · April 3, 2024, 1:47pm

Hi, thanks for your continued interest in Concrete ML!

We can now train Logistic Regression on encrypted data. See how to do it here: Encrypted training | 1.4 | Concrete ML

turbofakesmile · April 3, 2024, 2:31pm

Thanks. Do you have a development/research roadmap on that?

andrei-stoian-zama · April 3, 2024, 2:51pm

We have plans to improve this feature, yes. What use-case are you most interested in?

turbofakesmile · April 4, 2024, 8:39am

I am interesting in the speed benchmarks and the cost of training on encrypted data using current version of Concrete ML.

andrei-stoian-zama · April 4, 2024, 9:09am

The speed and cost will depend a lot on the dataset you’re training on. We tested on several datasets and we report some figures in this paper and in this blog post.

The complexity, and thus the cost of compute, is linear in the number of parameters to train and the dataset size, so it should be fairly easy to extrapolate.

If you’re looking to deploy this for commercial purposes, drop us a line at hello@zama.ai.