Suppose one does a quantization using Post Training Quantization method , Is the max and min value of the range of reals chosen w.r.t to ( input representative set used for compiling the model)? We often get to see the offset parameter in quantization ( what is its role)? How does Concrete -ML chose scale and zero point parameters for quantization ?Is it constant for all values ( that is after max ,min values of real interval is selected and n_bits parameters is also set) , is scale and zero point parameters constant which is further used for quantizing other values ?How is quantization configured for activation ,accumulators and inputs coming into the layers?
Hello @Rish ,
Lots of interesting questions here, this is great !
I would first invite you to take a look at our documentation section on quantization. If you don’t find all of your answers there, feel free to ask them again
Thankyou @RomanBredehoft
Please help me with this queries
Let [α,β] be the range of a value to quantize where α is the minimum and β is the maximum.( from the blog post)
How is this max and min values selected ? (Is this the maximum and minimum of the representative input set used during compilation?)
If such is the case then scale and zero point will be constant for every tensor that we quantize afterwards( is that so?)
Hello again @Rish ,
Which blog post are you referring to exactly ?
Indeed, in Post-Training Quantization, quantization parameters (scale and zero-point) are computed using the min/max values selected during the call to compilation methods such as compile_torch_model
. To be exact, we calibrate these parameters using the input-set provided by the user and build the underlying quantized model before compiling it to FHE.
So yes, for now values are quantized tensor-wise, meaning the scale and zero-points will indeed be constant for the tensors they are associated with once the model has been compiled.
I hope that answers your questions !
Thankyou @RomanBredehoft
Actually I was referring to the documentation that you posted (a mistake from my side)
Just to clarify the difference between threshold rounding bits and n_bits parameter which we set while using a quantized module
If you are more interested in the rounding feature (the rounding_threshold_bits
parameter), feel free to check out this documentation section then !