Large memory usage for xgboost

BVialla · September 15, 2022, 8:22am

Hi,

I’m looking into the concrete-ml version of xgboost, and the impact of the quantization on the functional performance of the model.
During my tests the algorithm needs a lot more memory than the original algorithm during the training. The size of the original training dataset is ~10 M, I had to reduce it to 2M elements otherwise the system kills the process, even with small hyper-parameters (max_depth=2, n_estimators=25). While for the orignial algorithm it works fine with max_depth=3, n_estimators=250.
Is this an expected behavior?

The notebook is available at https://github.com/BastienVialla/concrete-xgboost/blob/main/debug_concrete.ipynb

jfrery · September 15, 2022, 10:02am

Hello,

Thank you for your feedback! Indeed I reproduced your problem (thanks for the notebook). This will be fixed in the next release (in about ~4 weeks time).

FYI we are converting tree-based models to matrix multiplications and we naively used the whole training set for this conversion which, in your case, created a matrix with an axis of 12M rows .

benoit · October 26, 2022, 4:44pm

Fixed in Concrete-ML 0.4, which has been published recently. Thanks for the feedback