So, I’ve been making a fuzzer to compare the concrete-ml FHE models against the scikit-learn ones. The goal is to look for differences that could be pointing out to a possible logical bug. So far I’ve started testing the logistic regression model. I’ve trained both the concrete-ml and the scikit-learn implementations with the same dataset and then I gave them both the same random inputs. The happy path is that both outputs should be the exact same. However, when using inputs that are essentially the same value repeated throughout all the samples save for one, concrete-ml and scikit give out different results. Is this at all expected? Should there be some tolerance for differences between concrete and scikit?
Here is one of the examples:
100 samples, 5 features each:
input:
[ [-0.7098039215686274, -0.9999999952758206, -1.0, -1.0, -1.0], [ -1.0, -1.0, -1.0, -1.0. -1.0], … (all other 99 samples are the same) [-1.0, -1.0, -1.0, -1.0. -1.0] ]
FHE:
[ 0 1 … 1 ]
scikit-learn:
[ 1 1 … 1 ]
And here is the code im testing:
Summary
import sys
import atheris
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression as SklearnLogisticRegression
from concrete.ml.sklearn import LogisticRegression as ConcreteLogisticRegression
# Dataset to train
x, y = make_classification(n_samples=100, class_sep=2, n_features=5, random_state=42)
# Split the data-set into a train and test set,
# each set is split into input and result.
input_train, _, result_train, _ = train_test_split(
x, y, test_size=0.2, random_state=42
)
# Start the concrete-ml logistic regression model, train (unencrypted data) and quantize the weights.
concrete_model = ConcreteLogisticRegression()
concrete_model.fit(input_train, result_train)
# Compile FHE
concrete_model.compile(input_train)
# Start the sklearn logistic regression model
scikit_model = SklearnLogisticRegression()
# Train
scikit_model.fit(input_train, result_train)
def compare_models(input_bytes):
fdp = atheris.FuzzedDataProvider(input_bytes)
data = [fdp.ConsumeFloatListInRange(5, -1.0, 1.0) for _ in range(100)]
# Run the inference, encryption and decryption is done in the background
fhe_pred = concrete_model.predict(data, execute_in_fhe=True)
# Get scikit prediction
prediction = scikit_model.predict(data)
# Compare both outputs
assert((fhe_pred == prediction).all())
atheris.Setup(sys.argv, compare_models)
atheris.Fuzz()