I tried many ways to set the saving path, even the resolve path. But they all failed, because the process kills after compile(). It cannot run into the saving part.
I use it to compile a llm, and I found it stopping in a transformer block. It just stops when running on this transformer block. As long as it runs into the block, it stops imediately. There is no error report.
I guess probably something overloads the limitation, noise perhaps. How to refer the “noise monitor” function to find the running situation?
In file qgpt2_models.py, in method class QGPT2LMHeadModel(GPT2LMHeadModel)::def q_attention():
def q_attention(self) -> GPT2Attention:
"""Get GPT-2's attention module found in the first layer.
Returns:
GPT2Attention: The attention module.
"""
return self.transformer.h[0].attn
Does this example only compile the first attention?
yes, as I told you, we can’t use FHE on the full LLM, so we’ve shown we can compile a single attention, and have accurate results. In the hybrid model, we show how to do some of the layers on the client side (in the clear) and some in FHE (on the server side). There are much more explanations in the .md, I would recommend reading them.
I see. How to compile several funstions? For instance, if I have add() and project() two functions, or the relation is y = project(add(x)), how to compile them?
Can I compile them together by just compiling the project() function?
Or should I compile these two functions perspectively?