The zama compiler lowering the FHE dialect to llvm ir and utilizes llvm infrastructure to generate the corresponding binary files.
So, why does it not choose to lowering the FHE dialect to the emitC dialect, outputting target c++ source code that calls TFHE library functions? In this way, the Zama frontend (concrete-python) could compile and link this c++ source file with the TFHE library using clang/gcc, also generating the corresponding binary files.
Why did zama choose the former approach over the latter? are there any considerations behind this?
I don’t think we considered this approach during the design of the compiler. The goal was to generate a binary file that can be executed, so we naturally went with LLVM/MLIR codegen. We did this so that you could compile and execute directly from your Python interpreter, with Concrete managing all the artifacts for you. So we didn’t consider using emitC.
If I understand correctly, using emitC, you could replicate the same features by compiling the output C++ code (but then, why would that be better than the current codegen we are using?). Or you could change the user experience, and return a C++ code and leave the rest for the user, and then I would ask myself: why would that be useful? I think it would make sense if some users would want C++ code to audit, tweak, and compile themselves, but not sure who would want that. Even if this was a real need, then I would see Concrete supporting both lowerings as the main feature is to be able to run everything in Python.
@ayoub has already given the main reason why we currently do not generate C/C++ source code: the ability to generate binary code without invoking additional compilation infrastructure. Also, there is a substantial amount of passes converting to the LLVM dialect, which means that the risk of ending up in a dead end on the lowering path when using a new operation from the existing dialects is relatively low with LLVM IR as the target. Finally, EmitC wasn’t yet integrated into MLIR when we started working on the compiler and hasn’t gained a lot of traction until recently.
Although EmitC certainly enables interesting use cases, there don’t seem to be any direct benefits for the current compilation flow. However, if you would like to explore this direction, I would recommend trying to pass the output of concretecompiler --action=dump-std
to mlir-opt
with the right options to generate IR in the EmitC dialect (e.g., --convert-arith-to-emitc
, --convert-func-to-emitc
, etc.; mlir-opt --help
provides a list of all EmitC-related conversion passes).