The code finally outputs `[ CPUBFloat16Type{32,1024,8,128} ]` but kills itself

lance · September 25, 2024, 10:43am

My code:

hybrid_model.compile_model( 
            tensor[:, 0:1],
            n_bits=8
        )

allow_save_FHE_model: bool = 1
if allow_save_FHE_model: 
    via_mlir = bool(int(os.environ.get("VIA_MLIR", 1)))
    hybrid_model.save_and_clear_private_info("/fhe_circuit", via_mlir=via_mlir)

The process can run and finally output:

Columns 105 to 128  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
[ CPUBFloat16Type{32,1024,8,128} ]

But the code can not save the circuit, it looks like kills itself after compile()

benoit · September 25, 2024, 10:57am

Hey
Is “/fhe_circuit” a valid path?

benoit · September 25, 2024, 11:03am

ie, maybe you miss a “.”, and wanted to use “./fhe_circuit”

lance · September 25, 2024, 11:23am

I tried many ways to set the saving path, even the resolve path. But they all failed, because the process kills after compile(). It cannot run into the saving part.

I use it to compile a llm, and I found it stopping in a transformer block. It just stops when running on this transformer block. As long as it runs into the block, it stops imediately. There is no error report.

I guess probably something overloads the limitation, noise perhaps. How to refer the “noise monitor” function to find the running situation?

benoit · September 25, 2024, 11:26am

So, are you sure the compilation finishes well? From what you say, I have the impression it doesn’t

maybe do some

print(“AAA”)
hybrid_model.compile_model(
tensor[:, 0:1],
n_bits=8
)
print(“BBB”)

and tell me if you see AAA and BBB.

Also, could you show us (copy-paste) the full trace of the python execution, please, as well as a full code?

benoit · September 25, 2024, 11:28am

But yes, it’s expected you’ll have errors if you try to compile a full LLM. For LLM, doing 100% on FHE is too much for today. You might want to have a look to concrete-ml/use_case_examples/llm at main · zama-ai/concrete-ml · GitHub and to concrete-ml/use_case_examples/hybrid_model at main · zama-ai/concrete-ml · GitHub

lance · September 25, 2024, 11:47am

I can see AAA, but cannot see BBB. I even do not know if the compile is successful when the [ CPUBFloat16Type{32,1024,8,128} ] appears.

I try a very small llm, much smaller than the GPT2 in your example. So it is high likely to compile.

I will submit the full code after organization.

lance · September 25, 2024, 11:58am

As for your example, I have another problem.

In file qgpt2_models.py, in method class QGPT2LMHeadModel(GPT2LMHeadModel)::def q_attention():

    def q_attention(self) -> GPT2Attention:
        """Get GPT-2's attention module found in the first layer.

        Returns:
            GPT2Attention: The attention module.
        """
        return self.transformer.h[0].attn

Does this example only compile the first attention？

benoit · September 25, 2024, 12:05pm

if you don’t see the BBB, for sure, the function compile_model did not finish well

benoit · September 25, 2024, 12:07pm

yes, as I told you, we can’t use FHE on the full LLM, so we’ve shown we can compile a single attention, and have accurate results. In the hybrid model, we show how to do some of the layers on the client side (in the clear) and some in FHE (on the server side). There are much more explanations in the .md, I would recommend reading them.

lance · September 25, 2024, 12:22pm

Do you have a function to monitor the noise growth?

benoit · September 25, 2024, 12:28pm

No tool. Noise is managed by the Concrete optimizer, you as a user shouldn’t worry about it.

It’s not a matter of noise, it’s a matter of “LLM is too huge to be compiled today”

lance · September 25, 2024, 12:58pm

I see. How to compile several funstions? For instance, if I have add() and project() two functions, or the relation is y = project(add(x)), how to compile them?

Can I compile them together by just compiling the project() function?
Or should I compile these two functions perspectively?

benoit · September 25, 2024, 2:26pm

for the sake of clarity, could you open a new question/thread please?