How to decide n_bits value in Concrete ML Logistic Regression Function?

Hi,
I tried using the logistic regression function over Pima Indian Diabetes Dataset

It is only predicting 0 values as shown below

Has this error got to do with n_bits value? May I know why I might be getting this error and how I can solve it?

Thank You!

Hello @vrikam ,

Could you share your complete notebook, please? It will be easier for us to help you. And also, which Concrete-ML version are you using?

Thanks

I am using the latest Concrete ML Version. I used the docker command to install it :
docker run --rm -it -p 8888:8888 -v /host/path:/data zamafhe/concrete-ml

Source Code

ok, and can we have your notebook?

I sent the link above
sending it again
https://drive.google.com/file/d/1SMW_O5HTao9tg2_EqC1vReYs62pRP5DA/view?usp=sharing

Hi @vrikam,

Yes you are right. This n_bits parameter basically defines the precision of your features.

For n_bits = 2, the entire input will be split into 2^{\text{n_bits}} values. So in your case you are mapping your entire training set to 2^{2} = 4 different values evenly distributed between the min and max values.

A few solution that can help:

  • standardize your data: in your notebook, your features have very different scales so the quantization is completely missing the features with small range. You can try something like this to have your feature on the same scale:
from sklearn.preprocessing import StandardScaler
std_scaler = StandardScaler()
x_train = std_scaler.fit_transform(x_train)
x_test = std_scaler.transform(x_test)

This should give you much better accuracy already.

  • Increase the n_bits as follows:
CMLogisticRegression(n_bits=6)

However, this solution is risky: when you compile, you might get an error if the n_bits is too high with the given dataset/model to run in FHE.

Hope this helps!

Yes, this works
Thanks for your help!

1 Like

Great that it works, @vrikam .

Another good news is that in the library which is due for early January, we’ll have no more programmable bootstrapping (PBS) in linear models (like, LogisticRegression) so we’ll be able to have model_fhe.n_bits which is much larger. Also, these models will be crazy fast, since PBS is usually the bottleneck.

You’ll see that in January!
Cheers