Training with data previously encrypted

Rodrigo_Kruger · August 19, 2024, 1:29pm

Hi!

I trained a SGDClassifier with the parameter fit_encrypted set to True, which I understood will encrypt data and then run the training, what worked really fine.

But, considering that the data is on the client side, is it supported by concrete-ml to encrypt the data previously on the client side and then fit an SGDClassifier on the server side with the data already encrypted? I tried some times but could not get to work.

Thanks

Celia · August 19, 2024, 2:50pm

Hi @Rodrigo_Kruger,

Thanks for your interest in Concrete ML.

Could you please provide more details or share your code so I can better understand your question?

If it’s helpful, we also have an interesting use case related to this topic, which you can check out here: Federated Learning Example.

Looking forward to your response!

Rodrigo_Kruger · August 19, 2024, 5:25pm

Hi @Celia

Below I attach the code that encrypts data first (in client scope) and then train the model (server scope).

I only whan to ensure if with Concrete ML is possible to build a machine learning model consuming already encrypted data. Look that I use the APIs of Concrete ML to encrypt data, to ensure compatibility.

This code snippet should run on the client side (encryption)

data = pd.read_csv(URL, header=None, names=column_names)

X = data.drop(“Class label”, axis=1)
y = data[“Class label”]

scaler = StandardScaler()
X = scaler.fit_transform(X)

client = ClientEngine(keys_path=“my_keys”)
X_encrypted = client.encrypt_from_pandas(pd.DataFrame(X))
y_encrypted = client.encrypt_from_pandas(pd.DataFrame(y))

This code snippet should run on the server side (training and prediction)

parameters_range = (-1.0, 1.0)

model = SGDClassifier(
random_state=42,
max_iter=50,
fit_encrypted=True,
parameters_range=parameters_range,
)

X = X_encrypted.encrypted_values
y = y_encrypted.encrypted_values.flatten()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

start_time = time.time()
model.fit(X_train, y_train, fhe=FheMode.SIMULATE) # => Runtime Error! Wrong Values!
end_time = time.time()
elapsed_time = end_time - start_time
print(f’FHE Model built! Training time: {elapsed_time:.6f} seconds’)

Celia · August 20, 2024, 2:45pm

Hi,

I understand that you want to encrypt on the client side, run on the server side, and decrypt on the client side.

If you’d like to perform these steps separately, we have an example available in the Logistic Regression Training notebook.

Thanks

Rodrigo_Kruger · August 20, 2024, 4:31pm

Thanks Celia, thanks for your answer.

The notebook show how to train a model and deploy it into a server, in order to use it to make predictions using FHE.

But, even this notebook there is code that consumes clear data to train the models, look:

Fit on encrypted data

model_binary_fhe.fit(X_binary, y_binary, fhe=“execute”) " X_binary and y_binary are in clear data

I want to know if it is possible to send for fit() method data already encrypted, even using another library inside of Concrete-ML or Concrete.

Celia · August 21, 2024, 8:21am

Hello,

By setting fit_encrypted=True and fhe="execute", the training is performed with encrypted weights and encrypted data.
Enable verbose=True to see detailed steps.

parameters_range = (-1.0, 1.0)

model_binary_fhe = SGDClassifier(
    random_state=RANDOM_STATE,
    max_iter=N_ITERATIONS,
    fit_encrypted=True,
    parameters_range=parameters_range,
    verbose=True,
)

# Fit on encrypted data
model_binary_fhe.fit(X_binary, y_binary, fhe="execute")

It will output:

Compiling training circuit ...
Compilation took 1.6765 seconds.
Key Generation...
Key generation took 5.5012 seconds.
Training on encrypted data...
Iteration 0 took 4.01 seconds.
Iteration i took XXXX seconds.

On your side, X_binary and y_binary are in plaintext and in float format.
However, once the .fit method is called, a lot of steps are executed : the data will be quantized, an evaluation key will be generated, your ML model will be converted to an FHE circuit, the data and the weights will be encrypted, and then training will proceed.

All these steps are encapsulated within the fit method. If you want more details, you can check the fit method’s code.

Now, the development section of this notebook shows how to separate these steps by using a server and a client. The key methods are:

fhe_client.get_serialized_evaluation_keys()
fhe_client.quantize_encrypt_serialize
fhe_server.run
fhe_client.deserialize_decrypt_dequantize

To start, I recommend checking the README where these methods are demonstrated for inference on encrypted data: concrete-ml/README.md at main · zama-ai/concrete-ml · GitHub

Rodrigo_Kruger · August 22, 2024, 11:01pm

Thanks for the clues Celia,

I could check the code of the SGDClassifier, and pretty clear that the method fit() is implemented to receive plaintext and encrypt it.

I’m checking if Concrete-ML could be used for data streams and concept drift. In this scenario, I need to train the model many time with data that arrives all the time from the customers. But I need to store the data already encrypted in order to accomplish with FHE.

One possibility is Federated Learning, but I was hoping that there were something ready in concrete-ml.

Thanks!

benoit · August 23, 2024, 7:17am

Hello

Answering when the rest of the people is off here.

I could check the code of the SGDClassifier, and pretty clear that the method fit() is implemented to receive plaintext and encrypt it.

I am not sure why you believe this. You have

    # If the model should be trained using FHE training
    if self.fit_encrypted:
        if fhe is None:
            fhe = "disable"
            warnings.warn(
                "Parameter 'fhe' isn't set while FHE training is enabled.\n"
                f"Defaulting to '{fhe=}'",
                stacklevel=2,
            )

        # Make sure the `fhe` parameter is correct
        assert FheMode.is_valid(fhe), (
            "`fhe` mode is not supported. Expected one of 'disable' (resp. FheMode.DISABLE), "
            "'simulate' (resp. FheMode.SIMULATE) or 'execute' (resp. FheMode.EXECUTE). Got "
            f"{fhe}",
        )

        if sample_weight is not None:
            raise NotImplementedError(
                "Parameter 'sample_weight' is currently not supported for FHE training."
            )

        return self._fit_encrypted(
            X=X,
            y=y,
            fhe=fhe,
            coef_init=coef_init,
            intercept_init=intercept_init,
        )

and _fit_encrypted is really an FHE training. But, as Celia explained, the encryption in this function is made inside _fit_encrypted. I am not the code writer or owner, but I see

        # The underlying quantized module expects (X, y, weight, bias) as inputs. We thus only
        # quantize the input and target values using the first and second positional parameter
        q_X_batch, q_y_batch, _, _ = self.training_quantized_module.quantize_input(
            X_batch, y_batch, None, None
        )

        # If the training is done in FHE, encrypt the input and target values
        if fhe == "execute":

            # Similarly, the underlying FHE circuit expects (X, y, weight, bias) as inputs, and
            # so does the encrypt method
            X_batch_enc, y_batch_enc, _, _ = self.training_quantized_module.fhe_circuit.encrypt(
                q_X_batch, q_y_batch, None, None
            )

        else:
            X_batch_enc, y_batch_enc = q_X_batch, q_y_batch

        X_batches_enc.append(X_batch_enc)
        y_batches_enc.append(y_batch_enc)

That’s why Celia told you to have a look to

fhe_client.get_serialized_evaluation_keys()
fhe_client.quantize_encrypt_serialize
fhe_server.run
fhe_client.deserialize_decrypt_dequantize

Here, when you deploy, you’ll be able to use fhe_server.run, which does exactly the FHE training on encrypted inputs. Remark that the output is also encrypted, and will be decrypted and dequantized only in fhe_client.deserialize_decrypt_dequantize.

You can see _fit_encrypted as a packaged function which is mainly used for tests, which packs all the process, and when we deploy, we call the individual functions. That’s exactly as the

   # Finally we run the inference on encrypted inputs !
   y_pred_fhe = model.predict(X_test, fhe="execute")

that exists for inference: here as well, X_test is in the clear and inside, it will be encrypted. When we deploy this, we call the individual

# Quantize an original float input
q_input = model.quantize_input(X_test[[0]])

# Encrypt the input
q_input_enc = model.fhe_circuit.encrypt(q_input)

# Execute the linear product in FHE
q_y_enc = model.fhe_circuit.run(q_input_enc)

# Decrypt the result (integer)
q_y = model.fhe_circuit.decrypt(q_y_enc)

# De-quantize and post-process the result
y0 = model.post_processing(model.dequantize_output(q_y))

parts.

Remark that currently, only linear-model training is available in FHE; we are working on training for other kinds of models.

Yes FL is an interesting tech. If you want to speak more precisely of your use-case, don’t hesitate to go to https://www.zama.ai, in “Contact us”, and drop a message to say you want to speak to me, explaining a bit the context.

Cheers