Concrete XGBRegressor results differ from XGBoost

lstk · October 10, 2023, 11:35am

First of all some background informations:

I want to achieve an anomaly detector. Since it’s not possible yet to use IsolationForest in ConcreteML I have the following pipeline:

Generating anomaly scores for my train dataset (X) with a scikit-learn IsolationForest using decision_function
Train a ConcreteML XGBRegressor (in clear so no FHE) with the training data and the generated anomaly scores (y)

Now i want to compare the prediction delta/difference between the IsolationForest scores and the ConcreteML XGBRegressor scores. While this works more or less fine for the Regressor from the original XGBoost library, I observe a constant right shift for the one from ConcreteML.

no_scaling

I made sure to use the same hyperparameters and checked that both Regressors are of the same XGBoost version (1.6.2). Since the anomaly scores from the IsolationForest are quite small (min=-0.25, max=0.3) i already tried to scale them which kinda solves the problem of the distribution shift.

with_scaling

Anyway i would like to understand why this is needed and would be happy about any input

Celia · October 10, 2023, 12:50pm

Hello Islk,

Thanks for getting in touch with us !
We have done few tests and observed the same behavior.
An issue has been opened. We will let you as soon as the issue is resolved.

Thanks !

Celia · October 13, 2023, 1:56pm

Hi Islk,

The issue has been fixed in Concrete ML 1.2.1 !

Thanks !

lstk · October 13, 2023, 1:59pm

Thanks for the information!