Why choose not to have the front-end generate built-in mlir?

Hello everybody,

I am studying Zama but have a question:

Why doesn’t Zama use the following implementation approach: having the front-end emit an internal MLIR representation, then using Use-Define Chains to convert the internal MLIR into FHE MLIR based on the types annotated on the parameters, and finally lowering it step by step to the target code?

My understanding is as follows: Before converting user code to FHE MLIR, the Zama frontend performs a plaintext computation to determine the bit width of each graph node (e.g., 2-bit, 3-bit) based on the results, optimizing circuit and algorithm performance. However, this approach presents a challenge: MLIR’s built-in data types do not support these non-standard bit widths. To support custom bit-width types, Zama would need to extend MLIR’s built-in data types. Therefore, the frontend directly generates FHE MLIR with arbitrary bit-width type definitions, simplifying the process and meeting specific needs.

Could you confirm if this interpretation is accurate, or provide any additional insights?

Hello,

By built-in MLIR, I assume that you mean using one of the standard dialects available in the MLIR repository ? In this sense, you wonder why the frontend emits some Tensor<FHE.uint<3>> instead of say Tensor<uint16> using an mlir standard integer type ?

One of the great strengths of MLIR is the ability to define a custom dialect in which you can carefully craft a semantic that best match your problem space. By this, I mean that you can enforce some rules that are important to the problem you face.

Dealing with FHE, there are important rules that you need to take care of:

  • Bitwidth is of paramount importance for runtime performance. Every bit saved has a dramatic impact on performances, so we need to encode as much as possible about it.
  • Levelled operation must have matching bitwidth.
  • Lookup table can have different output bitwidth.

These constraits (and others) are embedded in the FHE dialect, which makes it a good UI for the backend.

As for why we don’t perform the bitwidth computation in the backend, I guess the answer is twofold:

  • We would need to compile the plaintext graph, execute it on an input set, and retrieve the intermediate ranges. This is very simply done in python, but not as much in cpp/mlir.
  • Other frontend (for static languages for instance) may work differently, say with direct bitwidth type annotations. In this case, a plaintext dialect for the backend ui would probably not be desirable.

All things considered, the FHE dialect is a good interface for the compiler.

So, the FHE dialect is used as the interface between the frontend and the backend. If a new FHE scheme is introduced and the interface changes, then both the frontend and the backend need to be modified simultaneously.

On the other hand, if built-in MLIR is used as the interface between the frontend and the backend, then when introducing support for a new FHE scheme, only the backend would need to be modified.

The FHE dialect was designed to be scheme-agnostic. This is why we then lower to a second TFHE dialect which, on its side, enforces the constraints of the TFHE scheme that we use under the hood. Of course, we can’t tell what the future will be like, but since the FHE dialect was designed with the help of researchers with extensive knowledge of other existing schemes, we are confident that it should be flexible enough to accomodate to a lowering to a different target if needed.

You are right to say that if ever a scheme appeared that needed changes in the FHE dialect, we would have to change the frontend as well. That being said, if such a change occured, a lot of code would be refactored in the backend, and changing the frontend would likely be a marginal task at this point.

I understand, thank you very much!