Hello,
Thank you for your answer it helped a lot. I have tried the two differents methods and I have a few questions.
Regarding my quantized model, torch already gives some tips and methods there Introduction to Quantization on PyTorch | PyTorch , it may differ from brevitas or onny I haven’t checked it yet as I am still playing with concrete numpy.
I am trying to do a linear regression on a (576) vector with 10 labels after a call of a lookup table, i.e. a 1d vector of dimension 576 , but I fail to understand where the time increase comes from.
My code :
import time
import numpy as np
import concrete.numpy as cnp
def table_block():
return np.array([
[0., 0., 1., 0., 1., 1., 0., 1., 1., 1., 0., 0., 1., 0., 0., 0.],
[1., 1., 0., 0., 1., 1., 0., 0., 1., 0., 1., 0., 1., 0., 0., 0.],
[1., 0., 1., 0., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[1., 1., 0., 1., 1., 0., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0.],
[0., 0., 1., 0., 1., 1., 0., 1., 1., 0., 1., 1., 1., 1., 1., 0.],
[1., 1., 0., 0., 0., 1., 0., 0., 1., 0., 1., 1., 1., 1., 1., 1.],
[1., 1., 0., 1., 1., 1., 1., 0., 0., 0., 1., 1., 0., 0., 1., 1.],
[1., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1., 1., 0., 0., 1., 1.],
[1., 0., 0., 0., 0., 1., 0., 1., 1., 0., 0., 0., 1., 0., 0., 0.],
[1., 1., 0., 0., 0., 1., 1., 0., 0., 0., 1., 0., 1., 0., 1., 1.],
[1., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0., 1., 0., 0., 0., 1.],
[1., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1., 1., 0., 0., 1., 1.],
[1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 1., 1., 0., 1., 1.],
[1., 1., 0., 0., 0., 1., 0., 0., 0., 0., 1., 1., 0., 1., 1., 1.],
[1., 1., 0., 0., 0., 1., 1., 0., 0., 0., 1., 1., 0., 0., 1., 1.],
[1., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1., 1., 0., 0., 1., 1.]
]).astype(np.uint8)
lk = []
block = table_block()
npatches = 36
# transforming my truth table into 576 look up tables
for lk_filter in block.transpose(): # on each filter
lk.extend([cnp.LookupTable(lk_filter)] * npatches) # npatches per filter
tables = cnp.LookupTable(lk)
nfeat = 576
nclasses = 10
bits = 4
X_train = np.random.randint(0,16, (1000, nfeat)).astype(np.uint8)
y_train = np.random.randint(0,nclasses, (1000,))
w = np.random.randint(0,256,(nfeat,nclasses))
b = np.random.randint(0,256,(nclasses))
@cnp.compiler({"x": "encrypted"})
def f(x):
y = tables[x]
res = y @ w + b
return res
inputset = X_train[0:4,:]
inpt = X_train[5,:]
print("compiling")
t = time.time()
circuit = f.compile(inputset)
print('compiled in ', time.time()-t)
t = time.time()
circuit.keygen()
print("Keygen done in ", time.time()-t)
enc = circuit.encrypt(inpt)
t = time.time()
res_enc = circuit.run(enc)
print(time.time()-t)
dec = circuit.decrypt(res_enc)
print(dec)
When I just call the lookup table I have an inference of approx 6s, with the following function f_lu :
@cnp.compiler({"x": "encrypted"})
def f_lu(x):
y = tables[x]
return y
When I just call the linear regression I have an inference of approx 0.1s, with the following function f_lr :
@cnp.compiler({"x": "encrypted"})
def f_lr(x):
res = x @ w + b
return res
But when I call the two of them, my inference time increases exponentially and my compilation time too :
compiling
compiled in 4890.356591701508
Keygen done in 51.013827323913574
1253.6375329494476
[70387 70784 11730 46806 41651 11664 21623 51500 18479 6932]
Whereas it would be done in seconds with only the lookup table, or only the linear regression.
I suppose it comes from the bootstrapping, but I expected only a bootstrap between the call of the lookup table and the call of the matrix multiplication of approx bootstrap time * number of values = 0.014 * 576 = 8 s
Is my intuition correct or am I totally missing something ?
Should I try a bigger inputset for the compilation ?
Also, I tried to use the LinearRegression from concrete.ml.sklearn
but the timing are really different compared to the simple x @ w + b
with the following code :
import time
import numpy as np
from concrete.ml.sklearn import LinearRegression as ConcreteLinearRegression
nfeat = 576
nclasses = 10
bits = 4
X_train = np.random.randint(0,2, (1000, nfeat)).astype(np.float32)
y_train = np.random.randint(0,nclasses, (1000,))
X_test = np.random.randint(0,2, (100, nfeat)).astype(np.float32)
y_test = np.random.randint(0,nclasses, (100,))
print("Creating the model ...")
model_lr = ConcreteLinearRegression(n_bits=bits)#{"model_inputs": 4, "op_inputs": 4, "op_weights": 4, "model_outputs": 4})
print("Fitting the model")
# Fit the model
model_lr.fit(X_train, y_train)
print("Evaluate on some inputs")
# Evaluate the model on the test set in clear
y_pred_clear = model_lr.predict(X_test)
res = np.sum((y_pred_clear == y_test))/len(y_test)*100
print("Accuracy on test set : ",res)
# Compile the model
print("Compiling model ...")
circuit = model_lr.compile(X_train)
print("Compiled")
test_input = X_test[:3]
time_begin = time.time()
circuit.client.keygen(force=False)
print(f"Key generation time: {time.time() - time_begin:.2f} seconds")
# Now predict using the FHE-quantized model on the testing set
time_begin = time.time()
y_pred_fhe = model_lr.predict(test_input, execute_in_fhe=True)
print(f"Execution time: {(time.time() - time_begin) / len(test_input):.2f} seconds per sample")
It took my computer approx 3600s per sample there vs the 0.1s of the matrix multiplication. I suppose there is some hidden logic behind that I am not aware of.
Thank you !