Simple linreg model outputs wrong results

bazzilic · January 19, 2024, 9:34am

Hey,

So I’m having some problems with the Zama Concrete ML linear regression model. I tried to build a minimal failing example here:

from sklearn.linear_model import LinearRegression as SklearnLinearRegression
from concrete.ml.sklearn import LinearRegression as ConcreteLinearRegression
import numpy as np


def generate_dataset(n, t):
    x_train = np.array([np.float64(i + 1) for i in range(n)]).reshape(n, 1)
    y_train = np.random.default_rng().uniform(0.0, 100.0, n)
    x_test = np.array([np.float64(i + n + 1) for i in range(t)]).reshape(n, 1)
    return x_train, y_train, x_test


x_train, y_train, x_test = generate_dataset(2, 2)

print("X train: ", " ".join([str(value[0]) for value in x_train]))
print("Y train: ", " ".join([str(value) for value in y_train]))
print("X test: ", " ".join([str(value[0]) for value in x_test]))

plaintext_model = SklearnLinearRegression()
plaintext_model.fit(x_train, y_train)

concrete_model = ConcreteLinearRegression(n_bits=16)
concrete_model.fit(x_train, y_train)

y_pred_plaintext = plaintext_model.predict(x_test)

additional_values = np.array([0.,1.,2.,3.,4.,5.,6.,10.,12.]).reshape(-1,1)
input_range = np.concatenate((x_test, additional_values), axis=0)
concrete_model.compile(input_range, verbose=True)
y_pred_concrete = concrete_model.predict(x_test, fhe="execute")

print(
    "Plaintext coef and intercept: ",
    plaintext_model.coef_[0],
    plaintext_model.intercept_,
)
print(
    "Concrete coef and intercept: ", concrete_model.coef_[0], concrete_model.intercept_
)

print("Plaintext predictions: ", " ".join([str(value) for value in y_pred_plaintext]))
print("Concrete predictions: ", " ".join([str(value[0]) for value in y_pred_concrete]))

So here we generate just two (X,Y) points in the training dataset so we can build a perfect linear regression that fits the two points at X=1.0 and 2.0. We then train to linear regression models – a regular sklearn one and the one from Concrete ML; then run inference on X=3.0 and 4.0.

The model from sklearn works as expected. The one from Concrete ML just repeats the the last known Y, and I can’t figure out why.

I thought initially, the issue is with not enough points in X_calibrate when I compile the circuit, but adding more (via additional_values) doesn’t seem to fix the issue. Different n_bits values have no effect either. Moreover, the issue persists not only with fhe="execute" but also when it is set to simulate and even disable.

I’m out of ideas now, banging my head Any help would be greatly appreciated!

Here’s the output of the program above:

X train:  1.0 2.0
Y train:  12.691858014601243 60.26780285759041
X test:  3.0 4.0

Computation Graph
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
%0 = q_X                                   # EncryptedTensor<int16, shape=(1, 1)>        ∈ [-32768, 32767]
%1 = [[1]]                                 # ClearTensor<uint1, shape=(1, 1)>            ∈ [1, 1]
%2 = matmul(%0, %1)                        # EncryptedTensor<int16, shape=(1, 1)>        ∈ [-32768, 32767]
%3 = sum(%0, axis=1, keepdims=True)        # EncryptedTensor<int16, shape=(1, 1)>        ∈ [-32768, 32767]
%4 = 0                                     # ClearScalar<uint1>                          ∈ [0, 0]
%5 = multiply(%4, %3)                      # EncryptedTensor<uint1, shape=(1, 1)>        ∈ [0, 0]
%6 = subtract(%2, %5)                      # EncryptedTensor<int16, shape=(1, 1)>        ∈ [-32768, 32767]
%7 = [[-146355]]                           # ClearTensor<int19, shape=(1, 1)>            ∈ [-146355, -146355]
%8 = add(%6, %7)                           # EncryptedTensor<int19, shape=(1, 1)>        ∈ [-179123, -113588]
return %8
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Bit-Width Constraints
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
%0:
    %0 >= 16
%1:
    %1 >= 1
%2:
    %2 >= 16
    %0 == %1
    %1 == %2
%3:
    %3 >= 16
    %0 == %3
%4:
    %4 >= 1
%5:
    %5 >= 1
    %4 == %3
    %3 == %5
%6:
    %6 >= 16
    %2 == %5
    %5 == %6
%7:
    %7 >= 19
%8:
    %8 >= 19
    %6 == %7
    %7 == %8
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Bit-Width Assignments
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 %0 = 19
 %1 = 19
 %2 = 19
 %3 = 19
 %4 = 19
 %5 = 19
 %6 = 19
 %7 = 19
 %8 = 19
max = 19
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Bit-Width Assigned Computation Graph
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
%0 = q_X                                   # EncryptedTensor<int19, shape=(1, 1)>         ∈ [-32768, 32767]
%1 = [[1]]                                 # ClearTensor<uint20, shape=(1, 1)>            ∈ [1, 1]
%2 = matmul(%0, %1)                        # EncryptedTensor<int19, shape=(1, 1)>         ∈ [-32768, 32767]
%3 = sum(%0, axis=1, keepdims=True)        # EncryptedTensor<int19, shape=(1, 1)>         ∈ [-32768, 32767]
%4 = 0                                     # ClearScalar<uint20>                          ∈ [0, 0]
%5 = multiply(%4, %3)                      # EncryptedTensor<uint19, shape=(1, 1)>        ∈ [0, 0]
%6 = subtract(%2, %5)                      # EncryptedTensor<int19, shape=(1, 1)>         ∈ [-32768, 32767]
%7 = [[-146355]]                           # ClearTensor<int20, shape=(1, 1)>             ∈ [-146355, -146355]
%8 = add(%6, %7)                           # EncryptedTensor<int19, shape=(1, 1)>         ∈ [-179123, -113588]
return %8
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Optimizer
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
### Optimizer display
--- Circuit
  19 bits integers
  0 manp (maxi log2 norm2)
--- User config
  9.094947e-13 error per pbs call
  1.000000e+00 error per circuit call
-- Solution correctness
  For each pbs call:  1/2147483647, p_error (4.272044e-13)
  For the full circuit: 1/2147483647 global_p_error(4.272044e-13)
--- Complexity for the full circuit
  1.000000e+00 Millions Operations
-- Circuit Solution
CircuitSolution {
    circuit_keys: CircuitKeys {
        secret_keys: [
            SecretLweKey {
                identifier: 0,
                polynomial_size: 1,
                glwe_dimension: 1009,
                description: "big representation",
            },
        ],
        keyswitch_keys: [],
        bootstrap_keys: [],
        conversion_keyswitch_keys: [],
        circuit_bootstrap_keys: [],
        private_functional_packing_keys: [],
    },
    instructions_keys: [],
    crt_decomposition: [],
    complexity: 1009.0,
    p_error: 4.2720437586522667e-13,
    global_p_error: 4.2720437586522667e-13,
    is_feasible: true,
    error_msg: "",
}###
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
size_of_secret_keys: 8072
size_of_bootstrap_keys: 0
size_of_keyswitch_keys: 0
size_of_inputs: 8080
size_of_outputs: 8080
p_error: 4.2720437586522667e-13
global_p_error: 4.2720437586522667e-13
complexity: 1009.0
programmable_bootstrap_count: 0
key_switch_count: 0
packing_key_switch_count: 0
clear_addition_count: 1
clear_addition_count_per_parameter: {
    LweSecretKeyParam(dimension=1009): 1
}
encrypted_addition_count: 2
encrypted_addition_count_per_parameter: {
    LweSecretKeyParam(dimension=1009): 2
}
clear_multiplication_count: 0
encrypted_negation_count: 1
encrypted_negation_count_per_parameter: {
    LweSecretKeyParam(dimension=1009): 1
}
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Plaintext coef and intercept:  47.57594484298915 -34.88408682838791
Concrete coef and intercept:  47.57594484298915 -34.88408682838791
Plaintext predictions:  107.84374770057956 155.41969254356871
Concrete predictions:  60.26794520447507 60.26794520447507

RomanBredehoft · January 19, 2024, 2:33pm

Hello @bazzilic ,
There’s nothing really wrong with the way you use the Concrete model, the problem actually lies in the data itself. Indeed, the intersection of the value ranges between the train and test set is empty. However, after fitting in the clear using Scikit-Learn, the Concrete ML model builds quantization parameters based on the training ranges. This basically means that all input values outside of the training range are clipped to its min/max values during the inference. You can actually observe this behavior in plots found in our advanced examples regarding linear models, like LinearRegression.ipynb or LogisticRegression.ipynb

In summary, in order to be able to properly use Concrete ML models, the training set (or at least its range of values) should be as representative as possible of the inputs expected during the inference !

Side note : a good practice is to run compile on x_train rather than x_test as it better represents a development workflow (the model should only be “aware” of the training set)

Let me know if that solves your issue !

bazzilic · January 19, 2024, 3:59pm

Hi @RomanBredehoft

Thank you for your reply! But unfortunately it doesn’t really solve the issue, more like the issue is by design, it seems

Is there a way to manually expand the range that is used to build the quantization parameters? I can think of a way where I (if I have a sufficiently large training dataset) can just add a couple of “outliers” with X values that are the left and right limits of the range I want to be able to work with: the bigger the training dataset, the less impactful adding these outliers will be. I just tried that and it works:

x_train, y_train, x_test = generate_dataset(2, 2)

# repeat x_train 5 times
x_train = np.repeat(x_train, 5, axis=0)
y_train = np.repeat(y_train, 5, axis=0)

# add point (4.0, 50.0) to (x_train, y_train)
x_train = np.concatenate((x_train, np.array([4.0]).reshape(-1,1)), axis=0)
y_train = np.concatenate((y_train, np.array([50.0])), axis=0)

and now the output is more or less the same for both models:

X train:  1.000 1.000 1.000 1.000 1.000 2.000 2.000 2.000 2.000 2.000 4.000
Y train:  28.832 28.832 28.832 28.832 28.832 43.168 43.168 43.168 43.168 43.168 50.000
X test:  3.000 4.000
Plaintext coef and intercept:  8.26944979145337 22.989278967847355
Concrete coef and intercept:  8.26944979145337 22.989278967847355
Plaintext predictions:  47.79762834220747 56.067078133660836
Concrete predictions:  47.79776049063217 56.06721028208554

But that’s just a hack. There has to be a proper way to set the quantization range?

RomanBredehoft · January 19, 2024, 5:00pm

Hello again,
If you are not able to provide a training set that best represents the inputs expected for the inference, then yes adding some “outliers” could help. As mentioned above, the ranges of values found in the training set needs to be as representative as possible to the inputs. So if the training set (and the set of values that you give to the compilemethod) can contain the expected extreme values, that would solve your issue. If not, for example, any test input given to the inference that is out the training range (a, b) will be basically considered as a if below (b if above).

Still, adding “outliers” should be done carefully as this is not well handled by uniform quantization when using a few bits of information. However, in case of linear models, the number of bits of quantization is not really an issue so this should be fine. You can find more info on quantization in this documentation section.

Could you maybe explain a bit more the use case you are trying to achieve ? That might help us better understand the constraints you are working with and provide a better approach.

Hope that helps !