How to export quantized model

yyf · January 24, 2024, 2:31am

when I use example in Mnist, how can I export quantized model?
# Export to ONNX
print(“\n2. Exporting to ONNX and saving the Brevitas model”)
inp = torch.rand((1, img_size * img_size)).to(device)
torch.onnx.export(model, inp, “mnist.qat.onnx”, opset_version=14)
torch.save(model.state_dict(), “state_dict.pt”)

use these code just export model before quantized. And why I use Netron open mnist.qat.onnx, I can not corresponds with model which define in model.py.

andrei-stoian-zama · January 24, 2024, 1:32pm

I’m afraid I can’t understand from just looking at that code what correspondance can not be done. Could you post the model class and the onnx that is exported? ONNX will usually look quite different from the model class.

yyf · January 24, 2024, 2:31pm

onnx look like this, and model.py are:
class MNISTQATModel(nn.Module):
def init(self, a_bits, w_bits):
super(MNISTQATModel, self).init()

    self.a_bits = a_bits
    self.w_bits = w_bits

    self.cfg = [28 * 28, 192, 192, 192, 10]

    self.quant_inp = qnn.QuantIdentity(
        act_quant=CommonActQuant if a_bits is not None else None,
        bit_width=a_bits,
        return_quant_tensor=True,
    )

    self.fc1 = qnn.QuantLinear(
        self.cfg[0],
        self.cfg[1],
        False,
        weight_quant=CommonWeightQuant if w_bits is not None else None,
        weight_bit_width=w_bits,
        bias_quant=None,
    )

    self.bn1 = nn.BatchNorm1d(self.cfg[1], momentum=0.999)
    self.q1 = QuantIdentity(
        act_quant=CommonActQuant, bit_width=a_bits, return_quant_tensor=True
    )

    self.fc2 = qnn.QuantLinear(
        self.cfg[1],
        self.cfg[2],
        False,
        weight_quant=CommonWeightQuant if w_bits is not None else None,
        weight_bit_width=w_bits,
        bias_quant=None,  # FheBiasQuant if w_bits is not None else None,
    )

    self.bn2 = nn.BatchNorm1d(self.cfg[1], momentum=0.999)
    self.q2 = QuantIdentity(
        act_quant=CommonActQuant, bit_width=a_bits, return_quant_tensor=True
    )

    self.fc3 = qnn.QuantLinear(
        self.cfg[2],
        self.cfg[3],
        False,
        weight_quant=CommonWeightQuant if w_bits is not None else None,
        weight_bit_width=w_bits,
        bias_quant=None,
    )

    self.bn3 = nn.BatchNorm1d(self.cfg[1], momentum=0.999)
    self.q3 = QuantIdentity(
        act_quant=CommonActQuant, bit_width=a_bits, return_quant_tensor=True
    )

    self.fc4 = qnn.QuantLinear(
        self.cfg[3],
        self.cfg[4],
        False,
        weight_quant=CommonWeightQuant if w_bits is not None else None,
        weight_bit_width=w_bits,
    )

when I open onnx, I find input and outout are all float, so I think the model is before quantized model. And I want export quantized model.

yyf · January 24, 2024, 2:35pm

And I want onnx just like this:

How can I set to get onnx after quantized?

andrei-stoian-zama · January 24, 2024, 2:38pm

The model that you export has float inputs and outputs. It contains quantization operations that convert the inputs and activations to integers. But the ONNX operates on floating point “de-quantized” values, which is normal.

Concrete ML takes that model and extracts the integer representation of the weights. It also computes over the integer values that are produced by the quantizers in the graph. To do this Concrete ML has internal implementation of the ONNX ops.

So the ONNX looks fine to me. What do you want to do with the ONNX ? I’d like to point out that this is not the “official” way to export brevitas onnx. You should use the BrevitasONNXManager for that - it will produce much more compact onnx code for the quantizers.

yyf · January 24, 2024, 3:00pm

Thanks for replay, and I want to konw model after quantized, every fc-layer has how many bits integer input , and how many bits integer weight. So how can I export onnx just contain 4-layers. Use BrevitasONNXManager to export model after compiler?

andrei-stoian-zama · January 24, 2024, 3:35pm

I suggest you call compile_brevitas_qat_model on your model and you pass it the output_onnx_file attribute. It will produce an ONNX with a simple structure where, with Netron, you can see the n_bits (in the BrevitasQuant layers) and the Gemm layers more clearly.

yyf · January 24, 2024, 4:00pm

Thank you very much!!

yyf · January 24, 2024, 4:43pm

Now, I find compile_brevitas_qat_model return a calss QuantizedModule, and how coud I export this type of net to onnx？

andrei-stoian-zama · January 24, 2024, 7:37pm

Please see my previous message about passing an argument to compile_brevitas_qat_model