I cannot tell from your documentation how to interpret the output of a model’s predictions. Consider the following code (uninteresting imports omitted for brevity):
from concrete.ml.sklearn import LogisticRegression as ConcreteLogisticRegression
from sklearn.linear_model import LogisticRegression as SkLogisticRegression
# Create the data for classification:
X, y = make_classification(
n_features=30,
n_redundant=0,
n_informative=2,
random_state=2,
n_clusters_per_class=1,
n_samples=250,
)
# Retrieve train and test sets:
X_model_owner, X_client, y_model_owner, y_client = train_test_split(X, y, test_size=0.4, random_state=42)
# Train the model and compile it
concrete_model = ConcreteLogisticRegression()
concrete_model.fit(X_model_owner, y_model_owner)
concrete_model.compile(X_model_owner)
server_dir = TemporaryDirectory()
client_dir = TemporaryDirectory()
dev_dir = TemporaryDirectory()
fhemodel_dev = FHEModelDev(dev_dir.name, concrete_model)
fhemodel_dev.save()
copyfile(dev_dir.name + "/server.zip", server_dir.name + "/server.zip")
copyfile(dev_dir.name + "/client.zip", client_dir.name + "/client.zip")
copyfile(
dev_dir.name + "/serialized_processing.json",
client_dir.name + "/serialized_processing.json",
)
fhemodel_client = FHEModelClient(client_dir.name, key_dir=client_dir.name)
fhemodel_client.generate_private_and_evaluation_keys()
serialized_evaluation_keys = fhemodel_client.get_serialized_evaluation_keys()
clear_input = X_client[[0], :]
encrypted_input = fhemodel_client.quantize_encrypt_serialize(clear_input)
encrypted_prediction = FHEModelServer(server_dir.name).run(encrypted_input, serialized_evaluation_keys)
decrypted_prediction = fhemodel_client.deserialize_decrypt_dequantize(encrypted_prediction)
print("Concrete-ML decrypted prediction:")
print(decrypted_prediction)
direct_prediction = concrete_model.predict_proba(clear_input)
print("Concrete-ML direct prediction (predict_proba):")
print(direct_prediction)
direct_prediction2 = concrete_model.predict(clear_input)
print("Concrete-ML direct prediction (predict):")
print(direct_prediction2)
sklearn_model = SkLogisticRegression()
sklearn_model.fit(X_model_owner, y_model_owner)
sklearn_prediction = sklearn_model.predict(clear_input)
print("Sklearn prediction:")
print(sklearn_prediction)
This allows me to see and compare (all for the same query):
- The results of encrypting the query, sending it to the server to process in a FHE fashion, and having the client decrypt,
- The results of calling
model.predict_proba()
- The results of calling
model.predict()
, and - The results of calling a straight sklearn model.
What do I get?
Concrete-ML decrypted prediction:
[[0.03565675 0.96434325]
[0.03565675 0.96434325]]
Concrete-ML direct prediction (predict_proba):
[[0.03565675 0.96434325]
[0.03565675 0.96434325]]
Concrete-ML direct prediction (predict):
[1 1]
Sklearn prediction:
[1]
How the heck do I predict the first three results? The documentation (concrete.ml.sklearn.base.md - Concrete ML) merely tells me that the results will be ‘numpy ndarray with probabilities (if applicable)’ (for predict_proba()
) and ‘numpy ndarray with predictions’ (for predict()
). Can I get some more detail? I can guess what’s going on, but (1) would prefer not to have to guess, and (2) I sent in one query. Why am I getting two answers?