Range of difference between scikit and concrete-ml

Hi everyone! I´m working on doing differential fuzzers to compare concrete-ml and scikit, I have a question about how big the difference can be, for example, running the following code:

import numpy as np
import sys
import atheris
from sklearn.linear_model import GammaRegressor as SklearnGammaRegressor
from sklearn.linear_model import PoissonRegressor as SklearnPoissonRegressor
from sklearn.linear_model import TweedieRegressor as SklearnTweedieRegressor
from sklearn.model_selection import train_test_split
from concrete.ml.sklearn import GammaRegressor as ConcreteGammaRegressor
from concrete.ml.sklearn import PoissonRegressor as ConcretePoissonRegressor
from concrete.ml.sklearn import TweedieRegressor as ConcreteTweedieRegressor

# Split the data-set into a train and test set,
# each set is split into input and result.
train_data, test_data, x_train_data, x_test_data = train_test_split(**dataset, x, test_size=0.2, random_state=42)

def compare_models(input_bytes):
    
    fdp = atheris.FuzzedDataProvider(input_bytes)
    data = [fdp.ConsumeFloatListInRange(74, 0.0, 1.0) for _ in range(15)]
    
    # Instantiate the models
    sklearn_glm = SklearnGammaRegressor(**init_parameters)
    concrete_glm = ConcreteGammaRegressor(**init_parameters)
    
    # Fit the models
    sklearn_glm.fit(**fit_parameters)
    concrete_glm.fit(**fit_parameters)
    
    # Compute the predictions using sklearn (floating points, in the clear)
        sklearn_predictions = sklearn_glm.predict(data)

    # Compute the predictions using COncrete-ML (in FHE)
    concrete_predictions = concrete_glm.predict(
        data,
        execute_in_fhe=True,
    ).flatten()
    print(sklearn_predictions)
    print(concrete_predictions)
    
    assert np.allclose(sklearn_predictions, concrete_predictions, atol=1), f"Error: The predictions are different, scikit prediction {sklearn_predictions}; concrete prediction {concrete_predictions}"

atheris.Setup(sys.argv, compare_models)
atheris.Fuzz()

When comparing the results, I set a difference of 1 (provisory), but the results difference is a lot smaller, as an example this two:

sklearn_predictions= [0.01807652 0.01807652 0.01807652 0.01807652 0.01807652 0.01807652
 0.01807652 0.01807652 0.01807652 0.01807652 0.01807652 0.01807652
 0.01807652 0.01807652 0.01807652]
concrete_predictions= [0.01808741 0.01808741 0.01808741 0.01808741 0.01808741 0.01808741
 0.01808741 0.01808741 0.01808741 0.01808741 0.01808741 0.01808741
 0.01808741 0.01808741 0.01808741]

can you guide me a little on how big the difference has to be considered valid?

Concrete-ML uses quantization to represent floating point operations using FHE-compatible integers.

The more bits you use for quantization (up to the limit of FHE constraints) the better and with linear models, especially low-dimensional ones, you can probably get within >99% accuracy with respect to the result of the float classifier. For low dimensional linear models you can use up to 11-12 bits for the weights and inputs (n_bits parameter in the linear models).

For neural networks when you use quantization aware training with low number of bits (2-6) the accuracy you get when running with pyTorch should be within 1% of the accuracy you get with Concrete-ML when you import the model with compile_brevitas_qat_model.

So I would suggest you measure accuracy on some ML task for sklearn vs CML, rather than measuring difference in decimals in the prediction results.

2 Likes