Quantization for Tree-based Models

austin · March 5, 2023, 7:29pm

Hi everyone, from your docs here, why did you choose post training quantization for linear models, but for trees you only quantize the training/testing data?

For trees, can you training in floating points, and then quantize it after the fact, similar to linear models? Or is it true that when you quantize the training data, the trained model will automatically be quantized?

Thanks

jfrery · March 6, 2023, 10:08am

Hello @austin,

why did you choose post training quantization for linear models, but for trees you only quantize the training/testing data?

Tree-based model are feature scale invariant. As long as the ordering is preserved, changing the scale of the input features will have no impact on the final model accuracy. From that observation, we can simply quantize each input feature naively and train a tree as easily as if it was the actual feature values.

And yes you are right, once the tree is trained over quantized input features, the tree parameters (decision splits) are quantized as well making this method very easy to implement and very effective with enough quantization precision.

For linear models (and neural networks), in theory we could do the same. Unfortunately, in practice, the training would be very unstable due to the quantization range.

For trees, can you training in floating points, and then quantize it after the fact, similar to linear models? Or is it true that when you quantize the training data, the trained model will automatically be quantized?

So we took the “naive” approach to train over already quantized inputs since 1. it’s a very simple approach to implement, 2. it makes your model quantized by default and 3. there is almost no loss in accuracy with sufficient bits of precision. However, you could indeed train with floating point values instead and then quantize your inputs based on the learned decision splits. This would make the quantization more effective. But again, with enough bit-width of precision the naive approach is just enough. Also, tree-based model are very flexible, e.g. you could just simply split your feature in two different feature which would allow you to double the precision on that specific feature.

Hope this helps.