Large memory usage for xgboost


I’m looking into the concrete-ml version of xgboost, and the impact of the quantization on the functional performance of the model.
During my tests the algorithm needs a lot more memory than the original algorithm during the training. The size of the original training dataset is ~10 M, I had to reduce it to 2M elements otherwise the system kills the process, even with small hyper-parameters (max_depth=2, n_estimators=25). While for the orignial algorithm it works fine with max_depth=3, n_estimators=250.
Is this an expected behavior?

The notebook is available at

1 Like


Thank you for your feedback! Indeed I reproduced your problem (thanks for the notebook). This will be fixed in the next release (in about ~4 weeks time).

FYI we are converting tree-based models to matrix multiplications and we naively used the whole training set for this conversion which, in your case, created a matrix with an axis of 12M rows :upside_down_face:.

Fixed in Concrete-ML 0.4, which has been published recently. Thanks for the feedback